pytyche.calibrate.scorecard¶
v2 scorecard — decision summary and per-scenario aggregation.
Functions
|
Group CalibrationRecords by scenario_id and compute per-group metrics. |
Classes
|
Per-cell regret statistics for a single oracle × actual decision pair. |
|
Summary of oracle-vs-actual decision accuracy. |
|
Per-scenario aggregated calibration metrics. |
- class pytyche.calibrate.scorecard.CellRegretStats(mean, median)[source]¶
Bases:
objectPer-cell regret statistics for a single oracle × actual decision pair.
- mean¶
Mean regret across all records in this cell; None if no records.
- median¶
Median regret across all records in this cell; None if no records.
- Parameters:
mean (
float|None)median (
float|None)
- class pytyche.calibrate.scorecard.DecisionSummary(n_correct, n_false_ship, n_missed_win, decision_matrix, cell_regret)[source]¶
Bases:
objectSummary of oracle-vs-actual decision accuracy.
Convenience fields (
n_correct,n_false_ship,n_missed_win) are derived views ofdecision_matrixfor the current 2-arm phase. When multi-arm support lands, the matrix naturally extends without API breakage.- n_correct¶
Number of decisions that matched the oracle.
- n_false_ship¶
Shipped when oracle says don’t (oracle != SHIP, actual == SHIP).
- n_missed_win¶
Didn’t ship when oracle says ship (oracle == SHIP, actual != SHIP).
- decision_matrix¶
{oracle_decision_value: {actual_decision_value: count}}. Keys areDecisionenum values (strings).
- Parameters:
n_correct (
int)n_false_ship (
int)n_missed_win (
int)decision_matrix (
dict[str,dict[str,int]])cell_regret (
dict[str,dict[str,CellRegretStats]])
- class pytyche.calibrate.scorecard.ScenarioScorecard(scenario_id, n_records_total, n_records_used, decision_summary, coverage_rate, bias, rmse, false_ship_rate, missed_win_rate, mean_regret, mean_regret_cpm)[source]¶
Bases:
objectPer-scenario aggregated calibration metrics.
All 7 metric fields are
float | None. Whenn_records_used == 0, all metrics areNone.- scenario_id¶
Scenario identifier (from CalibrationRecord).
- n_records_total¶
Count of ALL records with this scenario_id (pre-filter).
- n_records_used¶
Count of HONEST_ESTIMATE records (post-filter).
- decision_summary¶
Decision accuracy summary (counts only, no rates).
- coverage_rate¶
Fraction of CIs containing the true effect [0, 1].
- bias¶
Mean(estimated_lift - effect) in metric-native units.
- rmse¶
sqrt(mean((estimated_lift - effect)^2)) in metric-native units.
- false_ship_rate¶
n_false_ship / n_records_used (total-denominator).
- missed_win_rate¶
n_missed_win / n_records_used (total-denominator).
- mean_regret¶
Mean of non-None regret values; None if ALL are None.
- mean_regret_cpm¶
mean_regret * 1000 if mean_regret is not None, else None.
- Parameters:
scenario_id (
str)n_records_total (
int)n_records_used (
int)decision_summary (
DecisionSummary)coverage_rate (
float|None)bias (
float|None)rmse (
float|None)false_ship_rate (
float|None)missed_win_rate (
float|None)mean_regret (
float|None)mean_regret_cpm (
float|None)
- pytyche.calibrate.scorecard.compute_scorecard(records)[source]¶
Group CalibrationRecords by scenario_id and compute per-group metrics.
Filters records to
analysis_mode == ClaimLevel.HONEST_ESTIMATEbefore computing metrics. Bothn_records_total(pre-filter) andn_records_used(post-filter) are surfaced on each ScenarioScorecard.- Parameters:
records (
list[CalibrationRecord]) – Flat list of CalibrationRecords from one or more scenarios.- Return type:
list[ScenarioScorecard]- Returns:
List of ScenarioScorecards, one per unique scenario_id, sorted by scenario_id for consistent ordering.