pytyche.contracts

v2 typed contracts for the pytyche analysis pipeline.

This module IS the API reference. Each frozen dataclass defines a contract between pipeline stages, with docstrings documenting fields, invariants, and the containment level it operates at.

Containment chain:

Visitor → Variant → Experiment → Program

Orthogonal axis — Segment:

Visitor → Variant → Experiment → Program
   ↑
Segment ─┘  (cross-cutting rule/group lens)

Key boundaries enforced by types:

  • Observed ↔ truth: ObservedExperimentData has NO truth field. Analysis code structurally cannot peek at ground truth.

  • Analysis ↔ diagnostics: AnalysisResult carries core results. DiagnosticsBundle carries PyMC internals, returned separately.

  • Discovery ↔ internals: DiscoveredSegment exposes segment outputs. The fitted estimator object is never exposed downstream.

Module Attributes

VISITOR_SCHEMA

Required columns and their numpy dtypes for visitor DataFrames.

RESERVED_PROPENSITY_COLUMN

Reserved per-visitor column name for binary-arm propensity scores.

RESERVED_PROPENSITY_PREFIX

Prefix for multi-arm propensity columns (propensity_0, propensity_1, …).

RESERVED_CELL_COLUMN

Reserved per-visitor column name for sequential-experiment cell membership.

RuleClause

Union of all rule clause types.

Functions

is_reserved_propensity_column(name)

Return True if name is a reserved propensity column name.

Classes

AlignedVisitorArray(values, n_visitors)

Array aligned 1:1 with concatenated visitor rows.

AnalysisResult(experiment_id, metric, ...)

Summary analysis surface returned by posterior.analyze().

BetweenRule(feature, low, high)

Numeric range: low <= feature <= high.

CalibrationBundle(observed, truth)

Transparent container pairing observed data with ground truth.

CalibrationRecord(scenario_id, seed, ...)

Per-seed evaluation record produced by the calibration pipeline.

CalibrationTruth(effect, metric_id, ...[, ...])

Ground truth for a single calibration/simulation run.

CartRevenueConfig(categories[, ...])

Cart-based revenue model configuration.

ChannelLift(point_estimate, ci)

Point estimate + interval for a single hurdle channel's lift.

ClaimLevel(*values)

What the operator can claim from the analysis.

Comparison(treatment, probability_positive, ...)

Lean per-treatment global contrast vs the reference arm.

ComparisonResult(baseline, comparison, ...)

Posterior comparison between two variants.

ComparisonRule(feature, operator, threshold)

Numeric threshold: feature <op> threshold.

Decision(*values)

Recommendation decision outcome.

DecisionThresholds([...])

Decision thresholds for recommendation summaries.

Decomposition(conversion_lift, severity_lift)

Conversion/severity decomposition of a hurdle-metric lift.

DecompositionSamples(frequency_lift_samples, ...)

Posterior samples for frequency and severity lift components.

DiagnosticsBundle(inference_data)

Layer 3: PyMC internals.

DiscoveredSegment(id, rule, gate_estimate, ...)

HTE discovery output.

DiscoveryProvenance(gate_estimate, ...)

Compact snapshot of HTE discovery origin.

EqRule(feature, value)

Categorical equality: feature == value.

InRule(feature, values)

Categorical set membership: feature in values.

MetricFamily(*values)

Abstract metric family taxonomy.

ObservedExperimentData(experiment_id, ...)

Input data for a single experiment analysis run.

ProductCategory(name, base_price, price_std, ...)

A single product category in a cart-based revenue model.

RecommendationSummary(treatment, decision, ...)

Recommended decision with its decision-theoretic evidence.

RegisteredSegment(key, rule, provenance, ...)

Operator-reviewed, registry-registered segment.

SegmentRule(description, clauses)

Rule defining a group of visitors.

VariantData(name, visitors, n_visitors, ...)

Per-visitor observations for a single experiment variant.

class pytyche.contracts.MetricFamily(*values)[source]

Bases: StrEnum

Abstract metric family taxonomy.

Determines model structure and decomposition availability.

class pytyche.contracts.Decision(*values)[source]

Bases: StrEnum

Recommendation decision outcome.

SHIP: deploy the treatment. CONTINUE: keep collecting data. STOP: abandon the treatment (harmful or futile).

class pytyche.contracts.ClaimLevel(*values)[source]

Bases: StrEnum

What the operator can claim from the analysis.

Describes the evidentiary strength, not the splitting mechanism. Stable across estimator changes (e.g. BCF makes splitting optional).

EXPLORATORY: data-driven discovery, not pre-registered. HONEST_ESTIMATE: sample-split or honest-forest estimates. CONFIRMED: replicated in a hold-out experiment.

pytyche.contracts.VISITOR_SCHEMA: dict[str, str] = {'converted': 'bool', 'experiment_id': 'object', 'orders_count': 'int64', 'revenue': 'float64', 'sessions_count': 'int64', 'variant': 'object', 'visitor_id': 'object'}

Required columns and their numpy dtypes for visitor DataFrames.

Both generators and production loaders MUST produce DataFrames with at least these columns. Additional feature columns (segment assignments, device, country, etc.) are permitted and used by HTE discovery.

Invariants:
  • One row per visitor (unique visitor_id).

  • revenue >= 0.

  • Generator expectation: converted implies revenue > 0. Production data may have converted=True, revenue=0 (free trials, coupons) — analysis handles both.

pytyche.contracts.RESERVED_PROPENSITY_COLUMN: str = 'propensity'

Reserved per-visitor column name for binary-arm propensity scores.

At K=2 this column carries P(Z=1 | x) — the probability of assignment to the treatment variant given visitor covariates. At K≥3 the multi-arm equivalents are propensity_1 propensity_{K-1} (P(Z=k | x)), following the RESERVED_PROPENSITY_PREFIX pattern.

These columns are NEVER features — the fit-boundary adapter excludes them from the feature matrix X at every K. Use is_reserved_propensity_column to test any column name against the full reserved set.

pytyche.contracts.RESERVED_PROPENSITY_PREFIX: str = 'propensity_'

Prefix for multi-arm propensity columns (propensity_0, propensity_1, …).

Any column whose name is exactly RESERVED_PROPENSITY_COLUMN or matches RESERVED_PROPENSITY_PREFIX + <digits> is reserved and excluded from the feature matrix. The propensity_0 form is included as a deliberate fail-safe — it is not a standard K≥3 propensity column, but admitting it prevents accidental leakage of propensity-like columns into HTE discovery.

pytyche.contracts.RESERVED_CELL_COLUMN: str = 'cell'

Reserved per-visitor column name for sequential-experiment cell membership.

Carries the id of the cell (Control / Explore / Optimized / operator hypothesis cell) that allocated the visitor. Recorded at data-generation time — membership is not derivable from the treatment received, since e.g. an Explore-cell visitor can draw control. Never a feature: the fit-boundary adapter excludes it from X; the single-shot fit path otherwise ignores it. Consumed by pt.sequential_experiment to compute per-cell observations.

pytyche.contracts.is_reserved_propensity_column(name)[source]

Return True if name is a reserved propensity column name.

Reserved names:

  • exactly "propensity" (K=2: P(Z=1 | x))

  • "propensity_<digits>" (K≥3: P(Z=k | x); also matches propensity_0 as a deliberate fail-safe superset)

Any column matching this predicate is excluded from the feature matrix by the fit-boundary extraction adapter.

Parameters:

name (str) – Column name to test.

Return type:

bool

Returns:

True when the column is reserved; False otherwise.

class pytyche.contracts.EqRule(feature, value)[source]

Bases: object

Categorical equality: feature == value.

Example: EqRule("lifecycle_stage", "new_visitor").

Parameters:
  • feature (str)

  • value (str)

class pytyche.contracts.InRule(feature, values)[source]

Bases: object

Categorical set membership: feature in values.

Example: InRule("device", ("mobile", "tablet")).

Parameters:
  • feature (str)

  • values (tuple[str, ...])

class pytyche.contracts.ComparisonRule(feature, operator, threshold)[source]

Bases: object

Numeric threshold: feature <op> threshold.

Example: ComparisonRule("age", "gt", 35.0) means age > 35.

Parameters:
  • feature (str)

  • operator (Literal['gt', 'gte', 'lt', 'lte'])

  • threshold (float)

class pytyche.contracts.BetweenRule(feature, low, high)[source]

Bases: object

Numeric range: low <= feature <= high.

Inclusive on both ends.

Example: BetweenRule("spend", 10.0, 100.0).

Parameters:
  • feature (str)

  • low (float)

  • high (float)

pytyche.contracts.RuleClause = pytyche.contracts.EqRule | pytyche.contracts.InRule | pytyche.contracts.ComparisonRule | pytyche.contracts.BetweenRule

Union of all rule clause types. Clauses within a SegmentRule are AND-combined.

class pytyche.contracts.SegmentRule(description, clauses)[source]

Bases: object

Rule defining a group of visitors.

Shared across all segment contexts (manual, discovered, registered). Clauses are AND-combined. Canonical sort by feature name ensures deterministic equality, hashing, and serialization regardless of input order.

clauses=() is the catch-all rule matching every visitor (apply_rule’s AND-fold over zero clauses is vacuously all-True), produced by fit_policy_tree for a root-only (single-leaf) tree.

Level: cross-cutting (applied over visitor sets).

Parameters:
class pytyche.contracts.DiscoveredSegment(id, rule, gate_estimate, gate_ci, population_share, stability_score, arm_best_probabilities)[source]

Bases: object

HTE discovery output. Tight — no optional fields.

Produced by the HTE estimation pipeline (the embedded policy-tree fit over posterior CATEs). The fitted estimator is NOT exposed — downstream sees only this output.

Level: cross-cutting (segment × experiment).

id

Leaf id within the parent policy tree. Identifies the segment’s tree position so PolicyTreeResult.allocation_map[id] lookups work.

rule

The segment-defining rule.

gate_estimate

Estimated treatment effect for this segment (metric- native units).

gate_ci

80% credible/confidence interval for the gate estimate.

population_share

Fraction of the population assigned to this segment, in [0, 1].

stability_score

Bootstrap-replicability score in [0, 1] — the fraction of bootstrap tree refits in which some leaf has Jaccard overlap >= 0.5 with this segment’s member set. 0.80 is the documented default “credible enough to act on” cutoff. NaN is the documented “not computed” sentinel (e.g. fit_policy_tree(n_bootstrap=0)).

arm_best_probabilities

Per-arm posterior probability that the arm is best in this segment under the shared best-arm rule. Keyed by ALL variant names INCLUDING control (control wins a draw when every contrast is non-positive); values sum to 1.0 within 1e-6.

Parameters:
  • id (int)

  • rule (SegmentRule)

  • gate_estimate (float)

  • gate_ci (tuple[float, float])

  • population_share (float)

  • stability_score (float)

  • arm_best_probabilities (dict[str, float])

class pytyche.contracts.DiscoveryProvenance(gate_estimate, stability_score, population_share, discovered_at)[source]

Bases: object

Compact snapshot of HTE discovery origin.

Avoids bloating/duplicating DiscoveredSegment when carried into a RegisteredSegment.

Parameters:
  • gate_estimate (float)

  • stability_score (float)

  • population_share (float)

  • discovered_at (datetime)

class pytyche.contracts.RegisteredSegment(key, rule, provenance, lifecycle)[source]

Bases: object

Operator-reviewed, registry-registered segment.

Level: cross-cutting (segment in registry).

Lifecycle:

HTE discovery → DiscoveredSegment
    → operator review → RegisteredSegment(lifecycle="registered")
    → dbt SQL → Redis → RegisteredSegment(lifecycle="deployed")
    → experiment targeting
key

Snake_case registry identifier (e.g. "high_value_mobile").

rule

The segment-defining rule.

provenance

Discovery origin, or None for manually-defined segments.

lifecycle

Current lifecycle stage.

Parameters:
class pytyche.contracts.VariantData(name, visitors, n_visitors, n_conversions, total_revenue)[source]

Bases: object

Per-visitor observations for a single experiment variant.

Level: variant.

The visitors DataFrame MUST conform to VISITOR_SCHEMA (at minimum). Additional feature columns are permitted.

name

Variant name (e.g. "control", "treatment_a").

visitors

DataFrame with one row per visitor.

n_visitors

Row count (redundant with len(visitors) — fail- closed validation).

n_conversions

Count of converted == True rows.

total_revenue

Sum of revenue column.

Parameters:
  • name (str)

  • visitors (DataFrame)

  • n_visitors (int)

  • n_conversions (int)

  • total_revenue (float)

class pytyche.contracts.ObservedExperimentData(experiment_id, metric, variants)[source]

Bases: object

Input data for a single experiment analysis run.

Level: experiment.

This type has NO truth field. Ground truth is structurally excluded so that analysis code cannot peek at it. Generators produce a CalibrationBundle that pairs observed data with truth; the calibration runner unpacks the bundle and passes only the observed data to analyze().

Production path: load_experiment() returns this directly. Simulation path: generate()CalibrationBundle → runner unpacks.

experiment_id

Unique experiment identifier.

metric

Canonical metric name (e.g. "revenue_per_visitor").

variants

List of variant data, minimum 2. The first variant is conventionally the control/baseline.

Derived accessors (read-only properties, not dataclass fields): control_name — the first variant’s name (the control/reference variant); treatment_names — names of all non-control variants, in variant-list order.

Parameters:
  • experiment_id (str)

  • metric (str)

  • variants (list[VariantData])

property control_name: str

Name of the reference/control variant (variants[0].name).

property treatment_names: tuple[str, ...]

Names of the non-control variants in variant-list order.

Returns a tuple of variants[1:].name values. At K=2 this is a single-element tuple; at K≥3 it carries all treatment variant names.

class pytyche.contracts.AlignedVisitorArray(values, n_visitors)[source]

Bases: object

Array aligned 1:1 with concatenated visitor rows.

Any per-visitor array (e.g. CATE predictions) MUST be wrapped in this type to enforce explicit alignment with the concatenated visitor rows:

visitors = pd.concat([v.visitors for v in data.variants],
                     ignore_index=True)
assert len(array.values) == len(visitors)
# array.values[i] corresponds to visitors.iloc[i]

The type name IS the documentation — when an agent sees cate_per_visitor: AlignedVisitorArray, the alignment contract is self-evident.

values

The per-visitor array.

n_visitors

Expected length (redundant — fail-closed validation).

Parameters:
  • values (ndarray)

  • n_visitors (int)

class pytyche.contracts.DecompositionSamples(frequency_lift_samples, severity_lift_samples)[source]

Bases: object

Posterior samples for frequency and severity lift components.

Only meaningful for hurdle metrics (MetricFamily.HURDLE_REAL). Frequency = conversion probability lift. Severity = AOV lift given conversion.

frequency_lift_samples

Per-sample frequency component lift.

severity_lift_samples

Per-sample severity component lift.

Parameters:
  • frequency_lift_samples (ndarray)

  • severity_lift_samples (ndarray)

class pytyche.contracts.ComparisonResult(baseline, comparison, method, probability_positive, probability_better, probability_harmful, expected_loss_baseline, expected_loss_comparison, expected_loss_samples_baseline, expected_loss_samples_comparison, lift_samples, lift_unit, lift_ci, lift_ci_level=0.8, decomposition=None)[source]

Bases: object

Posterior comparison between two variants.

Level: experiment.

Uses role-based naming: baseline and comparison are roles within THIS comparison, not properties of the variants themselves. The same variant can play different roles in different comparisons.

Lift semantics: lift_samples always contains absolute lift (comparison - baseline) in metric-native units. lift_unit indicates the metric’s natural presentation unit ("pct" for binary metrics, "dollar" for revenue metrics) so display layers can derive percentage lift when rendering.

baseline

Variant name serving as the baseline in this comparison.

comparison

Variant name being compared to the baseline.

method

"compare_to_control" or "best_of_rest".

probability_positive

P(comparison > baseline).

probability_better

P(comparison > baseline + threshold).

probability_harmful

P(baseline > comparison + threshold).

expected_loss_baseline

E[max(comparison - baseline, 0)] — cost of choosing baseline when comparison is better.

expected_loss_comparison

E[max(baseline - comparison, 0)] — cost of choosing comparison when baseline is better.

expected_loss_samples_baseline

Per-sample loss array for baseline.

expected_loss_samples_comparison

Per-sample loss array for comparison.

lift_samples

Absolute lift samples (comparison - baseline) in metric-native units. Always absolute regardless of metric family.

lift_unit

Metric’s natural presentation unit ("pct" for binary, "dollar" for revenue). Display hint only — lift_samples is always absolute.

lift_ci

(low, high) credible interval for lift.

lift_ci_level

CI level (default 0.80).

decomposition

Frequency/severity decomposition (hurdle metrics only).

Parameters:
  • baseline (str)

  • comparison (str)

  • method (Literal['compare_to_control', 'best_of_rest'])

  • probability_positive (float)

  • probability_better (float)

  • probability_harmful (float)

  • expected_loss_baseline (float)

  • expected_loss_comparison (float)

  • expected_loss_samples_baseline (ndarray)

  • expected_loss_samples_comparison (ndarray)

  • lift_samples (ndarray)

  • lift_unit (str)

  • lift_ci (tuple[float, float])

  • lift_ci_level (float)

  • decomposition (DecompositionSamples | None)

class pytyche.contracts.ChannelLift(point_estimate, ci)[source]

Bases: object

Point estimate + interval for a single hurdle channel’s lift.

Level: experiment.

point_estimate

Posterior mean of the channel-specific lift.

ci

80% credible interval (low, high) on the channel-specific lift.

Parameters:
  • point_estimate (float)

  • ci (tuple[float, float])

class pytyche.contracts.Decomposition(conversion_lift, severity_lift)[source]

Bases: object

Conversion/severity decomposition of a hurdle-metric lift.

Lean summary counterpart of DecompositionSamples — point estimates and intervals only, no posterior samples. Populated on Comparison for hurdle posteriors (posterior.has_decomposition() == True).

Level: experiment.

conversion_lift

Change in conversion probability attributable to the treatment.

severity_lift

Change in basket size given conversion attributable to the treatment.

Parameters:
class pytyche.contracts.Comparison(treatment, probability_positive, lift_estimate, lift_ci, decomposition=None)[source]

Bases: object

Lean per-treatment global contrast vs the reference arm.

The v0.2 summary surface carried by AnalysisResult.comparisons — point estimates and probabilities only. The rich sample-carrying ComparisonResult stays the compare.variants output; anything needing posterior samples goes through AnalysisResult.posterior.

Level: experiment.

treatment

Treatment variant name being compared (matches a name in posterior.observed.treatment_names).

probability_positive

P(lift > 0) at the global level.

lift_estimate

Posterior mean of the CATE for this contrast.

lift_ci

80% credible interval on the lift (10th/90th percentile of rpv_cate_samples for this contrast).

decomposition

Conversion/severity decomposition (hurdle posteriors only; None otherwise).

Parameters:
  • treatment (str)

  • probability_positive (float)

  • lift_estimate (float)

  • lift_ci (tuple[float, float])

  • decomposition (Decomposition | None)

class pytyche.contracts.DecisionThresholds(expected_loss_tolerance=0.01, p_positive_threshold=0.95, p_better_threshold=0.8, futility_threshold=0.05, harm_threshold=0.9)[source]

Bases: object

Decision thresholds for recommendation summaries.

All values are probabilities in (0, 1) except expected_loss_tolerance which is a positive metric-native value.

Parameters:
  • expected_loss_tolerance (float)

  • p_positive_threshold (float)

  • p_better_threshold (float)

  • futility_threshold (float)

  • harm_threshold (float)

class pytyche.contracts.RecommendationSummary(treatment, decision, expected_loss_baseline, expected_loss_comparison, probability_positive, probability_better, probability_harmful, thresholds, *, expected_value_of_one_more_round=nan)[source]

Bases: object

Recommended decision with its decision-theoretic evidence.

The act-now risk assessment for one treatment-vs-control contrast: what committing to either side costs in expectation, how confident the posterior is, what one more round of data is worth — and the default rule’s resulting SHIP / CONTINUE / STOP call. A pure summary of the posterior (no sample arrays); recomputable from any posterior, globally or per-segment.

Level: experiment.

treatment

The treatment variant this summary is for (the contrast’s non-control side).

decision

Ship, continue, or stop.

expected_loss_baseline

Expected loss of choosing baseline.

expected_loss_comparison

Expected loss of choosing comparison.

probability_positive

P(comparison > baseline).

probability_better

P(comparison meaningfully better).

probability_harmful

P(comparison meaningfully harmful).

thresholds

Decision thresholds used (e.g. {"expected_loss_tolerance": 0.001, ...}).

expected_value_of_one_more_round

Information-theoretic value of running one more round of data at the same per-round n, in expected-loss-reduction units (loss/visitor). NaN means the producer did not compute it (the legacy compare.variants path cannot — a ComparisonResult carries no sample-size information). Formula documented in docs/concepts/decision-theoretic-inputs.md.

Parameters:
  • treatment (str)

  • decision (Decision)

  • expected_loss_baseline (float)

  • expected_loss_comparison (float)

  • probability_positive (float)

  • probability_better (float)

  • probability_harmful (float)

  • thresholds (dict[str, float])

  • expected_value_of_one_more_round (float)

class pytyche.contracts.AnalysisResult(experiment_id, metric, comparisons, segments, recommendation, cate_per_visitor, analyzed_at, posterior)[source]

Bases: object

Summary analysis surface returned by posterior.analyze().

Level: experiment.

This is the SUMMARY surface — lean point estimates and probabilities (Comparison entries, discovered segments, the global RecommendationSummary). Anything needing posterior samples goes through posterior (e.g. analysis.posterior.rpv_cate_samples); observed data is reachable as analysis.posterior.observed.

experiment_id

Experiment identifier.

metric

Metric analyzed.

comparisons

One lean Comparison per non-reference treatment.

segments

Segments discovered by the embedded policy-tree fit. Non-optional — an empty list when no segment cleared the min_segment_share threshold, never None.

recommendation

Global RecommendationSummary (the extended shape with expected_value_of_one_more_round). At K ≥ 3 it is computed for the best challenger (largest global posterior- mean contrast).

cate_per_visitor

Posterior-mean CATE per visitor, aligned with concatenated visitor rows. Shape (n,) at K = 2; (n, K 1) per-arm contrasts vs the reference at K ≥ 3.

analyzed_at

Timestamp of analysis completion.

posterior

The fitted posterior the analysis derives from (repr=False — large sample arrays).

Parameters:
property is_calibrated: bool

Whether the underlying posterior has a calibration applied.

Delegates to posterior.is_calibrated.

class pytyche.contracts.DiagnosticsBundle(inference_data)[source]

Bases: NamedTuple

Layer 3: PyMC internals. Not part of the analysis contract.

Transparent container — callers that don’t need diagnostics simply ignore the second element:

result, _ = analyze(data)

Following the ArviZ opinionated Bayes workflow, diagnostics are not optional — every analysis produces traces. analyze() always returns tuple[AnalysisResult, DiagnosticsBundle].

Parameters:

inference_data (DataTree)

inference_data: DataTree

Alias for field number 0

class pytyche.contracts.CalibrationTruth(effect, metric_id, metric_family, effect_components, cate_per_visitor, conv_cate_per_visitor=None, aov_cate_per_visitor=None, p0_per_visitor=None, p1_per_visitor=None, m0_per_visitor=None, m1_per_visitor=None, *, contrast_cate_per_visitor=None, p_per_visitor=None, m_per_visitor=None)[source]

Bases: object

Ground truth for a single calibration/simulation run.

This type exists ONLY in the simulation/calibration path. Production analysis never sees it. The type boundary enforces this:

analyze(ObservedExperimentData) -> AnalysisResult  # no truth
calibrate(AnalysisResult, CalibrationTruth) -> CalibrationRecord

K=2 dispatch: the legacy 1-D fields (cate_per_visitor, conv_cate_per_visitor, aov_cate_per_visitor, p0_per_visitor, p1_per_visitor, m0_per_visitor, m1_per_visitor) are populated and the three new list fields (contrast_cate_per_visitor, p_per_visitor, m_per_visitor) are None.

K≥3 dispatch: cate_per_visitor is None; the legacy paired fields (p0/p1/m0/m1_per_visitor) are None. contrast_cate_per_visitor (length K−1) carries the per-treatment effects (each treatment level vs. control, the heterogeneous CATEs); p_per_visitor and m_per_visitor (each length K) carry the per-visitor potential-outcome truth under each treatment level (index 0 = control).

effect

Absolute metric-native treatment effect (e.g. +$0.12 RPV).

metric_id

Canonical metric name.

metric_family

Abstract family taxonomy value.

effect_components

Decomposition by named component (e.g. {"conv_effect": 0.02, "aov_effect": 0.10}).

cate_per_visitor

Per-visitor true CATEs, aligned with concatenated visitor rows. Populated at K=2; None at K≥3.

conv_cate_per_visitor

Per-visitor conversion CATE (p1 - p0) * m0. Hurdle K=2 only; None for binary or K≥3.

aov_cate_per_visitor

Per-visitor AOV CATE p1 * (m1 - m0). Hurdle K=2 only; None for binary or K≥3.

p0_per_visitor

Per-visitor control conversion probabilities. Hurdle K=2 only; None for binary or K≥3.

p1_per_visitor

Per-visitor treatment conversion probabilities. Hurdle K=2 only; None for binary or K≥3.

m0_per_visitor

Per-visitor control severity means. Hurdle K=2 only; None for binary or K≥3.

m1_per_visitor

Per-visitor treatment severity means. Hurdle K=2 only; None for binary or K≥3.

contrast_cate_per_visitor

Per-treatment-effect per-visitor CATEs (K≥3). Length K−1 list (one entry per treatment level vs. control); each is the heterogeneous treatment effect realized on the visitor rows. None at K=2.

p_per_visitor

Per-visitor conversion potential outcomes under each treatment level (K≥3). Length K, index 0 = control. None at K=2.

m_per_visitor

Per-visitor severity potential outcomes under each treatment level (K≥3). Length K, index 0 = control. None at K=2.

Parameters:
class pytyche.contracts.CalibrationBundle(observed, truth)[source]

Bases: NamedTuple

Transparent container pairing observed data with ground truth.

Unpackable:

observed, truth = bundle

Generators produce this. The calibration runner unpacks it, passes observed to analyze() (which cannot see truth), then evaluates the result against truth separately.

Parameters:
observed: ObservedExperimentData

Alias for field number 0

truth: CalibrationTruth

Alias for field number 1

class pytyche.contracts.CalibrationRecord(scenario_id, seed, analysis_mode, effect, metric_id, metric_family, effect_components, estimator_id, estimated_lift, ci_low, ci_high, ci_level, probability_positive, probability_better, probability_harmful, expected_loss_baseline, expected_loss_comparison, decision, oracle_decision, decision_correct, regret)[source]

Bases: object

Per-seed evaluation record produced by the calibration pipeline.

Level: program.

Output of calibrate(AnalysisResult, CalibrationTruth, oracle_config). All fields are JSON-serializable (no numpy arrays, no callables).

Uses agent-proof naming: - analysis_mode is a ClaimLevel (not bare string). - metric_family is a MetricFamily (not bare string). - decision is a Decision (not bare string). - est_lift_meanestimated_lift (clearer).

scenario_id

Identifier for the simulation scenario.

seed

Random seed for this run.

analysis_mode

Evidentiary claim level of the analysis.

effect

True planted treatment effect (from truth).

metric_id

Canonical metric name.

metric_family

Abstract family taxonomy.

effect_components

True effect decomposition (from truth).

estimator_id

Model/estimator used (e.g. "hurdle_lognormal").

estimated_lift

Posterior mean of the absolute lift estimate (metric-native units, matching effect).

ci_low

Lower bound of the credible interval.

ci_high

Upper bound of the credible interval.

ci_level

CI level (e.g. 0.80).

probability_positive

P(treatment > baseline).

probability_better

P(comparison > baseline + threshold).

probability_harmful

P(baseline > comparison + threshold).

expected_loss_baseline

Expected loss of choosing baseline.

expected_loss_comparison

Expected loss of choosing treatment.

decision

Recommended decision made.

oracle_decision

The decision the oracle would have made given the true effect. Always a concrete Decision value — never None. Persisted directly from _oracle_decision() so downstream consumers (scorecard, notebooks) never need to re-infer it from decision + decision_correct.

decision_correct

Whether the decision was correct given truth (None if correctness is ambiguous, e.g. true effect near zero and decision is CONTINUE).

regret

Magnitude of decision error in metric-native units (None if not applicable).

Parameters:
  • scenario_id (str)

  • seed (int)

  • analysis_mode (ClaimLevel)

  • effect (float)

  • metric_id (str)

  • metric_family (MetricFamily)

  • effect_components (dict[str, float])

  • estimator_id (str)

  • estimated_lift (float)

  • ci_low (float)

  • ci_high (float)

  • ci_level (float)

  • probability_positive (float)

  • probability_better (float)

  • probability_harmful (float)

  • expected_loss_baseline (float)

  • expected_loss_comparison (float)

  • decision (Decision)

  • oracle_decision (Decision)

  • decision_correct (bool | None)

  • regret (float | None)

class pytyche.contracts.ProductCategory(name, base_price, price_std, base_purchase_prob)[source]

Bases: object

A single product category in a cart-based revenue model.

name

Category identifier (e.g. "budget", "mid", "premium").

base_price

Mean price for this category (in dollars). Must be > 0.

price_std

Standard deviation of within-category price variation. Actual price is drawn from Normal(base_price, price_std), clipped to base_price / 2 minimum (no near-zero prices). Use 0.0 for deterministic prices. Must be >= 0.

base_purchase_prob

Baseline Bernoulli purchase probability for this category (before visitor-level affinity and treatment adjustments). Must be in (0, 1].

Parameters:
  • name (str)

  • base_price (float)

  • price_std (float)

  • base_purchase_prob (float)

class pytyche.contracts.CartRevenueConfig(categories, base_quantity_mu=1.0, base_quantity_sigma=0.0)[source]

Bases: object

Cart-based revenue model configuration.

Revenue for a converter is computed as the sum of prices for categories where a per-visitor Bernoulli event fires. The purchase probability for category j and visitor i is:

purchase_prob_j(i) = sigmoid(
    logit(base_purchase_prob_j)
    + visitor_affinity_j(i)
    + effect_scale * treatment_delta_j(i)
)

The cart sampler distributes the severity surface scalar shift across categories proportionally to each category’s base_purchase_prob (see design doc D9).

When all Bernoulli events fail (empty cart), a minimum-purchase fallback forces the cheapest category.

categories

Ordered list of product categories. Must be non-empty.

base_quantity_mu

Mean of per-converter quantity distribution. Must be > 0.

base_quantity_sigma

Std of per-converter quantity distribution. Must be >= 0.

Parameters:
  • categories (list[ProductCategory])

  • base_quantity_mu (float)

  • base_quantity_sigma (float)