Decision-theoretic inputs

Note

All four inputs below are available today. Worked examples of custom GraduationRule implementations and guidance on calibrating loss thresholds are planned additions to this page (see the list at the end).

Pytyche’s stance: the library surfaces decision-theoretic inputs; the operator (or the operator’s policy) makes the decision. Most experimentation platforms ship a verdict (ship / don’t ship) without exposing how the model arrived at it. Pytyche exposes the inputs and lets the operator wire them into whatever rule fits their business.

The inputs

The public surface carries four decision-theoretic quantities across two types:

On RecommendationSummary:

  • expected_loss_baseline: float — per-visitor regret of choosing the baseline arm when the comparison would have been better. Integral over the posterior on τ = E[Y | comparison] E[Y | baseline] of max(τ, 0). Outcome units (dollars per visitor for revenue outcomes).

  • expected_loss_comparison: float — per-visitor regret of choosing the comparison arm when the baseline would have been better. E[max(-τ, 0)] over the posterior. Same units as the baseline variant.

  • expected_value_of_one_more_round: float — the expected reduction in decision loss from being able to re-decide after one more round of data at the same per-round n. Same units as the expected losses (loss-reduction per visitor).

    The computation is the closed-form preposterior expected value of sample information (EVSI; Raiffa–Schlaifer two-action form) on a normal approximation of the lift posterior. From the scope’s per-draw mean contrast samples: μ = mean, σ = sd. One more round at the same per-round n doubles the data behind the posterior, so the preposterior standard deviation of the future posterior mean is s = σ/√2, and

    EVOR = s · (φ(z) − z · Φ(−z)),    z = |μ| / s
    

    where φ/Φ are the standard normal pdf/cdf (the unit normal loss integral). Degenerate σ = 0 gives exactly 0.0.

    When to trust it: the normal approximation is CLT-justified because the lift is a mean over visitors; it is least reliable at very small n or for segment scopes with few members. Properties to lean on: it is strictly below the chosen side’s expected loss mid-experiment (the next round reduces risk, never erases it), and it falls to ~0 once the decision is unambiguous — “more data will not change the call” is readable directly from this number. Monte-Carlo simulation over posterior-predictive next rounds was considered and rejected: it requires refits per simulated round, is noisy at practical budgets, and the closed form already has the correct limits.

On DiscoveredSegment:

  • arm_best_probabilities: dict[str, float] — per-arm posterior probability that this arm is best in this segment, under the shared best-arm rule (control wins a draw exactly when every contrast is non-positive). Keyed by ALL variant names including the control; values sum to 1.0. Computed as per-draw win frequencies of the segment-mean contrast vector — the same computation thompson_allocation floors and returns as allocation weights. Per-segment companion to the global expected-loss contrast.

The default graduation rule

Pytyche ships ExpectedLossRule as the default graduation rule. The rule fires for a (treatment, segment) pair when ALL of:

  • expected_loss_comparison < expected_loss_max (operator-set tolerance, outcome units)

  • probability_positive > p_positive_threshold (operator-set per-round threshold)

  • probability_better > p_better_threshold (operator-set per-round threshold)

AND the per-round condition has held across sustained_rounds consecutive rounds in the experiment’s history.

expected_value_of_one_more_round and arm_best_probabilities are NOT inputs to the default rule. They are surfaced as fields for operator-defined rules that want to consume them: pass a custom GraduationRule implementation to pt.sequential_experiment(graduation_rule=...) to act on the extended inputs.

Planned additions to this page

  • Worked examples of custom GraduationRule implementations that consume the extended inputs (cost-aware stopping, per-segment ship-with-caveat, drop-treatment policies)

  • Calibration of ε_loss thresholds against domain context

  • The cross-references to pytyche.diagnostics for sample-count and convergence sanity checks before trusting any of these inputs

Cross-references

  • Sequential targeting — why Thompson allocation + per-round refits is the substrate.

  • Statistical honesty — why surfacing decision-theoretic inputs is the honest alternative to shipping a verdict.

  • API reference: pytyche.contracts.RecommendationSummary, pytyche.contracts.DiscoveredSegment, pytyche.ExpectedLossRule.