pytyche.contracts¶

v2 typed contracts for the pytyche analysis pipeline.

This module IS the API reference. Each frozen dataclass defines a contract between pipeline stages, with docstrings documenting fields, invariants, and the containment level it operates at.

Containment chain:

Visitor → Variant → Experiment → Program

Orthogonal axis — Segment:

Visitor → Variant → Experiment → Program
   ↑
Segment ─┘  (cross-cutting rule/group lens)

Key boundaries enforced by types:

Observed ↔ truth: ObservedExperimentData has NO truth field. Analysis code structurally cannot peek at ground truth.
Analysis ↔ diagnostics: AnalysisResult carries core results. DiagnosticsBundle carries PyMC internals, returned separately.
Discovery ↔ internals: DiscoveredSegment exposes segment outputs. The fitted estimator object is never exposed downstream.

Module Attributes

`VISITOR_SCHEMA`	Required columns and their numpy dtypes for visitor DataFrames.
`RESERVED_PROPENSITY_COLUMN`	Reserved per-visitor column name for binary-arm propensity scores.
`RESERVED_PROPENSITY_PREFIX`	Prefix for multi-arm propensity columns (`propensity_0`, `propensity_1`, …).
`RESERVED_CELL_COLUMN`	Reserved per-visitor column name for sequential-experiment cell membership.
`RuleClause`	Union of all rule clause types.

Functions

is_reserved_propensity_column(name)

Return True if name is a reserved propensity column name.

Classes

`AlignedVisitorArray`(values, n_visitors)	Array aligned 1:1 with concatenated visitor rows.
`AnalysisResult`(experiment_id, metric, ...)	Summary analysis surface returned by `posterior.analyze()`.
`BetweenRule`(feature, low, high)	Numeric range: `low <= feature <= high`.
`CalibrationBundle`(observed, truth)	Transparent container pairing observed data with ground truth.
`CalibrationRecord`(scenario_id, seed, ...)	Per-seed evaluation record produced by the calibration pipeline.
`CalibrationTruth`(effect, metric_id, ...[, ...])	Ground truth for a single calibration/simulation run.
`CartRevenueConfig`(categories[, ...])	Cart-based revenue model configuration.
`ChannelLift`(point_estimate, ci)	Point estimate + interval for a single hurdle channel's lift.
`ClaimLevel`(*values)	What the operator can claim from the analysis.
`Comparison`(treatment, probability_positive, ...)	Lean per-treatment global contrast vs the reference arm.
`ComparisonResult`(baseline, comparison, ...)	Posterior comparison between two variants.
`ComparisonRule`(feature, operator, threshold)	Numeric threshold: `feature <op> threshold`.
`Decision`(*values)	Recommendation decision outcome.
`DecisionThresholds`([...])	Decision thresholds for recommendation summaries.
`Decomposition`(conversion_lift, severity_lift)	Conversion/severity decomposition of a hurdle-metric lift.
`DecompositionSamples`(frequency_lift_samples, ...)	Posterior samples for frequency and severity lift components.
`DiagnosticsBundle`(inference_data)	Layer 3: PyMC internals.
`DiscoveredSegment`(id, rule, gate_estimate, ...)	HTE discovery output.
`DiscoveryProvenance`(gate_estimate, ...)	Compact snapshot of HTE discovery origin.
`EqRule`(feature, value)	Categorical equality: `feature == value`.
`InRule`(feature, values)	Categorical set membership: `feature in values`.
`MetricFamily`(*values)	Abstract metric family taxonomy.
`ObservedExperimentData`(experiment_id, ...)	Input data for a single experiment analysis run.
`ProductCategory`(name, base_price, price_std, ...)	A single product category in a cart-based revenue model.
`RecommendationSummary`(treatment, decision, ...)	Recommended decision with its decision-theoretic evidence.
`RegisteredSegment`(key, rule, provenance, ...)	Operator-reviewed, registry-registered segment.
`SegmentRule`(description, clauses)	Rule defining a group of visitors.
`VariantData`(name, visitors, n_visitors, ...)	Per-visitor observations for a single experiment variant.

class pytyche.contracts.MetricFamily(*values)[source]¶

Bases: StrEnum

Abstract metric family taxonomy.

Determines model structure and decomposition availability.

class pytyche.contracts.Decision(*values)[source]¶

Bases: StrEnum

Recommendation decision outcome.

SHIP: deploy the treatment. CONTINUE: keep collecting data. STOP: abandon the treatment (harmful or futile).

class pytyche.contracts.ClaimLevel(*values)[source]¶

Bases: StrEnum

What the operator can claim from the analysis.

Describes the evidentiary strength, not the splitting mechanism. Stable across estimator changes (e.g. BCF makes splitting optional).

EXPLORATORY: data-driven discovery, not pre-registered. HONEST_ESTIMATE: sample-split or honest-forest estimates. CONFIRMED: replicated in a hold-out experiment.

pytyche.contracts.VISITOR_SCHEMA: dict[str, str] = {'converted': 'bool', 'experiment_id': 'object', 'orders_count': 'int64', 'revenue': 'float64', 'sessions_count': 'int64', 'variant': 'object', 'visitor_id': 'object'}¶

Required columns and their numpy dtypes for visitor DataFrames.

Both generators and production loaders MUST produce DataFrames with at least these columns. Additional feature columns (segment assignments, device, country, etc.) are permitted and used by HTE discovery.

Invariants:

One row per visitor (unique visitor_id).
revenue >= 0.
Generator expectation: converted implies revenue > 0. Production data may have converted=True, revenue=0 (free trials, coupons) — analysis handles both.

pytyche.contracts.RESERVED_PROPENSITY_COLUMN: str = 'propensity'¶

Reserved per-visitor column name for binary-arm propensity scores.

At K=2 this column carries P(Z=1 | x) — the probability of assignment to the treatment variant given visitor covariates. At K≥3 the multi-arm equivalents are propensity_1 … propensity_{K-1} (P(Z=k | x)), following the RESERVED_PROPENSITY_PREFIX pattern.

These columns are NEVER features — the fit-boundary adapter excludes them from the feature matrix X at every K. Use is_reserved_propensity_column to test any column name against the full reserved set.

pytyche.contracts.RESERVED_PROPENSITY_PREFIX: str = 'propensity_'¶

Prefix for multi-arm propensity columns (propensity_0, propensity_1, …).

Any column whose name is exactly RESERVED_PROPENSITY_COLUMN or matches RESERVED_PROPENSITY_PREFIX + <digits> is reserved and excluded from the feature matrix. The propensity_0 form is included as a deliberate fail-safe — it is not a standard K≥3 propensity column, but admitting it prevents accidental leakage of propensity-like columns into HTE discovery.

pytyche.contracts.RESERVED_CELL_COLUMN: str = 'cell'¶

Reserved per-visitor column name for sequential-experiment cell membership.

Carries the id of the cell (Control / Explore / Optimized / operator hypothesis cell) that allocated the visitor. Recorded at data-generation time — membership is not derivable from the treatment received, since e.g. an Explore-cell visitor can draw control. Never a feature: the fit-boundary adapter excludes it from X; the single-shot fit path otherwise ignores it. Consumed by pt.sequential_experiment to compute per-cell observations.

pytyche.contracts.is_reserved_propensity_column(name)[source]¶

Return True if name is a reserved propensity column name.

Reserved names:

exactly "propensity" (K=2: P(Z=1 | x))
"propensity_<digits>" (K≥3: P(Z=k | x); also matches propensity_0 as a deliberate fail-safe superset)

Any column matching this predicate is excluded from the feature matrix by the fit-boundary extraction adapter.

Parameters:: name (str) – Column name to test.
Return type:: bool
Returns:: True when the column is reserved; False otherwise.

class pytyche.contracts.EqRule(feature, value)[source]¶

Bases: object

Categorical equality: feature == value.

Example: EqRule("lifecycle_stage", "new_visitor").

Parameters:

feature (str)
value (str)

class pytyche.contracts.InRule(feature, values)[source]¶

Bases: object

Categorical set membership: feature in values.

Example: InRule("device", ("mobile", "tablet")).

Parameters:

feature (str)
values (tuple[str, ...])

class pytyche.contracts.ComparisonRule(feature, operator, threshold)[source]¶

Bases: object

Numeric threshold: feature <op> threshold.

Example: ComparisonRule("age", "gt", 35.0) means age > 35.

Parameters:

feature (str)
operator (Literal['gt', 'gte', 'lt', 'lte'])
threshold (float)

class pytyche.contracts.BetweenRule(feature, low, high)[source]¶

Bases: object

Numeric range: low <= feature <= high.

Inclusive on both ends.

Example: BetweenRule("spend", 10.0, 100.0).

Parameters:

feature (str)
low (float)
high (float)

pytyche.contracts.RuleClause = pytyche.contracts.EqRule | pytyche.contracts.InRule | pytyche.contracts.ComparisonRule | pytyche.contracts.BetweenRule¶: Union of all rule clause types. Clauses within a SegmentRule are AND-combined.

class pytyche.contracts.SegmentRule(description, clauses)[source]¶

Bases: object

Rule defining a group of visitors.

Shared across all segment contexts (manual, discovered, registered). Clauses are AND-combined. Canonical sort by feature name ensures deterministic equality, hashing, and serialization regardless of input order.

clauses=() is the catch-all rule matching every visitor (apply_rule’s AND-fold over zero clauses is vacuously all-True), produced by fit_policy_tree for a root-only (single-leaf) tree.

Level: cross-cutting (applied over visitor sets).

Parameters:

description (str)
clauses (tuple[EqRule | InRule | ComparisonRule | BetweenRule, ...])

class pytyche.contracts.DiscoveredSegment(id, rule, gate_estimate, gate_ci, population_share, stability_score, arm_best_probabilities)[source]¶

Bases: object

HTE discovery output. Tight — no optional fields.

Produced by the HTE estimation pipeline (the embedded policy-tree fit over posterior CATEs). The fitted estimator is NOT exposed — downstream sees only this output.

Level: cross-cutting (segment × experiment).

id¶: Leaf id within the parent policy tree. Identifies the segment’s tree position so PolicyTreeResult.allocation_map[id] lookups work.

rule¶: The segment-defining rule.

gate_estimate¶: Estimated treatment effect for this segment (metric- native units).

gate_ci¶: 80% credible/confidence interval for the gate estimate.

population_share¶: Fraction of the population assigned to this segment, in [0, 1].

stability_score¶: Bootstrap-replicability score in [0, 1] — the fraction of bootstrap tree refits in which some leaf has Jaccard overlap >= 0.5 with this segment’s member set. 0.80 is the documented default “credible enough to act on” cutoff. NaN is the documented “not computed” sentinel (e.g. fit_policy_tree(n_bootstrap=0)).

arm_best_probabilities¶: Per-arm posterior probability that the arm is best in this segment under the shared best-arm rule. Keyed by ALL variant names INCLUDING control (control wins a draw when every contrast is non-positive); values sum to 1.0 within 1e-6.

Parameters:

id (int)
rule (SegmentRule)
gate_estimate (float)
gate_ci (tuple[float, float])
population_share (float)
stability_score (float)
arm_best_probabilities (dict[str, float])

class pytyche.contracts.DiscoveryProvenance(gate_estimate, stability_score, population_share, discovered_at)[source]¶

Bases: object

Compact snapshot of HTE discovery origin.

Avoids bloating/duplicating DiscoveredSegment when carried into a RegisteredSegment.

Parameters:

gate_estimate (float)
stability_score (float)
population_share (float)
discovered_at (datetime)

class pytyche.contracts.RegisteredSegment(key, rule, provenance, lifecycle)[source]¶

Bases: object

Operator-reviewed, registry-registered segment.

Level: cross-cutting (segment in registry).

Lifecycle:

HTE discovery → DiscoveredSegment
    → operator review → RegisteredSegment(lifecycle="registered")
    → dbt SQL → Redis → RegisteredSegment(lifecycle="deployed")
    → experiment targeting

key¶: Snake_case registry identifier (e.g. "high_value_mobile").

rule¶: The segment-defining rule.

provenance¶: Discovery origin, or None for manually-defined segments.

lifecycle¶: Current lifecycle stage.

Parameters:

key (str)
rule (SegmentRule)
provenance (DiscoveryProvenance | None)
lifecycle (Literal['registered', 'deployed'])

class pytyche.contracts.VariantData(name, visitors, n_visitors, n_conversions, total_revenue)[source]¶

Bases: object

Per-visitor observations for a single experiment variant.

Level: variant.

The visitors DataFrame MUST conform to VISITOR_SCHEMA (at minimum). Additional feature columns are permitted.

name¶: Variant name (e.g. "control", "treatment_a").

visitors¶: DataFrame with one row per visitor.

n_visitors¶: Row count (redundant with len(visitors) — fail- closed validation).

n_conversions¶: Count of converted == True rows.

total_revenue¶: Sum of revenue column.

Parameters:

name (str)
visitors (DataFrame)
n_visitors (int)
n_conversions (int)
total_revenue (float)

class pytyche.contracts.ObservedExperimentData(experiment_id, metric, variants)[source]¶

Bases: object

Input data for a single experiment analysis run.

Level: experiment.

This type has NO truth field. Ground truth is structurally excluded so that analysis code cannot peek at it. Generators produce a CalibrationBundle that pairs observed data with truth; the calibration runner unpacks the bundle and passes only the observed data to analyze().

Production path: load_experiment() returns this directly. Simulation path: generate() → CalibrationBundle → runner unpacks.

experiment_id¶: Unique experiment identifier.

metric¶: Canonical metric name (e.g. "revenue_per_visitor").

variants¶: List of variant data, minimum 2. The first variant is conventionally the control/baseline.

Derived accessors (read-only properties, not dataclass fields): control_name — the first variant’s name (the control/reference variant); treatment_names — names of all non-control variants, in variant-list order.

Parameters:

experiment_id (str)
metric (str)
variants (list[VariantData])

property control_name: str¶: Name of the reference/control variant (variants[0].name).

property treatment_names: tuple[str, ...]¶

Names of the non-control variants in variant-list order.

Returns a tuple of variants[1:].name values. At K=2 this is a single-element tuple; at K≥3 it carries all treatment variant names.

class pytyche.contracts.AlignedVisitorArray(values, n_visitors)[source]¶

Bases: object

Array aligned 1:1 with concatenated visitor rows.

Any per-visitor array (e.g. CATE predictions) MUST be wrapped in this type to enforce explicit alignment with the concatenated visitor rows:

visitors = pd.concat([v.visitors for v in data.variants],
                     ignore_index=True)
assert len(array.values) == len(visitors)
# array.values[i] corresponds to visitors.iloc[i]

The type name IS the documentation — when an agent sees cate_per_visitor: AlignedVisitorArray, the alignment contract is self-evident.

values¶: The per-visitor array.

n_visitors¶: Expected length (redundant — fail-closed validation).

Parameters:

values (ndarray)
n_visitors (int)

class pytyche.contracts.DecompositionSamples(frequency_lift_samples, severity_lift_samples)[source]¶

Bases: object

Posterior samples for frequency and severity lift components.

Only meaningful for hurdle metrics (MetricFamily.HURDLE_REAL). Frequency = conversion probability lift. Severity = AOV lift given conversion.

frequency_lift_samples¶: Per-sample frequency component lift.

severity_lift_samples¶: Per-sample severity component lift.

Parameters:

frequency_lift_samples (ndarray)
severity_lift_samples (ndarray)

class pytyche.contracts.ComparisonResult(baseline, comparison, method, probability_positive, probability_better, probability_harmful, expected_loss_baseline, expected_loss_comparison, expected_loss_samples_baseline, expected_loss_samples_comparison, lift_samples, lift_unit, lift_ci, lift_ci_level=0.8, decomposition=None)[source]¶

Bases: object

Posterior comparison between two variants.

Level: experiment.

Uses role-based naming: baseline and comparison are roles within THIS comparison, not properties of the variants themselves. The same variant can play different roles in different comparisons.

Lift semantics: lift_samples always contains absolute lift (comparison - baseline) in metric-native units. lift_unit indicates the metric’s natural presentation unit ("pct" for binary metrics, "dollar" for revenue metrics) so display layers can derive percentage lift when rendering.

baseline¶: Variant name serving as the baseline in this comparison.

comparison¶: Variant name being compared to the baseline.

method¶: "compare_to_control" or "best_of_rest".

probability_positive¶: P(comparison > baseline).

probability_better¶: P(comparison > baseline + threshold).

probability_harmful¶: P(baseline > comparison + threshold).

expected_loss_baseline¶: E[max(comparison - baseline, 0)] — cost of choosing baseline when comparison is better.

expected_loss_comparison¶: E[max(baseline - comparison, 0)] — cost of choosing comparison when baseline is better.

expected_loss_samples_baseline¶: Per-sample loss array for baseline.

expected_loss_samples_comparison¶: Per-sample loss array for comparison.

lift_samples¶: Absolute lift samples (comparison - baseline) in metric-native units. Always absolute regardless of metric family.

lift_unit¶: Metric’s natural presentation unit ("pct" for binary, "dollar" for revenue). Display hint only — lift_samples is always absolute.

lift_ci¶: (low, high) credible interval for lift.

lift_ci_level¶: CI level (default 0.80).

decomposition¶: Frequency/severity decomposition (hurdle metrics only).

Parameters:

baseline (str)
comparison (str)
method (Literal['compare_to_control', 'best_of_rest'])
probability_positive (float)
probability_better (float)
probability_harmful (float)
expected_loss_baseline (float)
expected_loss_comparison (float)
expected_loss_samples_baseline (ndarray)
expected_loss_samples_comparison (ndarray)
lift_samples (ndarray)
lift_unit (str)
lift_ci (tuple[float, float])
lift_ci_level (float)
decomposition (DecompositionSamples | None)

class pytyche.contracts.ChannelLift(point_estimate, ci)[source]¶

Bases: object

Point estimate + interval for a single hurdle channel’s lift.

Level: experiment.

point_estimate¶: Posterior mean of the channel-specific lift.

ci¶: 80% credible interval (low, high) on the channel-specific lift.

Parameters:

point_estimate (float)
ci (tuple[float, float])

class pytyche.contracts.Decomposition(conversion_lift, severity_lift)[source]¶

Bases: object

Conversion/severity decomposition of a hurdle-metric lift.

Lean summary counterpart of DecompositionSamples — point estimates and intervals only, no posterior samples. Populated on Comparison for hurdle posteriors (posterior.has_decomposition() == True).

Level: experiment.

conversion_lift¶: Change in conversion probability attributable to the treatment.

severity_lift¶: Change in basket size given conversion attributable to the treatment.

Parameters:

conversion_lift (ChannelLift)
severity_lift (ChannelLift)

class pytyche.contracts.Comparison(treatment, probability_positive, lift_estimate, lift_ci, decomposition=None)[source]¶

Bases: object

Lean per-treatment global contrast vs the reference arm.

The v0.2 summary surface carried by AnalysisResult.comparisons — point estimates and probabilities only. The rich sample-carrying ComparisonResult stays the compare.variants output; anything needing posterior samples goes through AnalysisResult.posterior.

Level: experiment.

treatment¶: Treatment variant name being compared (matches a name in posterior.observed.treatment_names).

probability_positive¶: P(lift > 0) at the global level.

lift_estimate¶: Posterior mean of the CATE for this contrast.

lift_ci¶: 80% credible interval on the lift (10th/90th percentile of rpv_cate_samples for this contrast).

decomposition¶: Conversion/severity decomposition (hurdle posteriors only; None otherwise).

Parameters:

treatment (str)
probability_positive (float)
lift_estimate (float)
lift_ci (tuple[float, float])
decomposition (Decomposition | None)

class pytyche.contracts.DecisionThresholds(expected_loss_tolerance=0.01, p_positive_threshold=0.95, p_better_threshold=0.8, futility_threshold=0.05, harm_threshold=0.9)[source]¶

Bases: object

Decision thresholds for recommendation summaries.

All values are probabilities in (0, 1) except expected_loss_tolerance which is a positive metric-native value.

Parameters:

expected_loss_tolerance (float)
p_positive_threshold (float)
p_better_threshold (float)
futility_threshold (float)
harm_threshold (float)

class pytyche.contracts.RecommendationSummary(treatment, decision, expected_loss_baseline, expected_loss_comparison, probability_positive, probability_better, probability_harmful, thresholds, *, expected_value_of_one_more_round=nan)[source]¶

Bases: object

Recommended decision with its decision-theoretic evidence.

The act-now risk assessment for one treatment-vs-control contrast: what committing to either side costs in expectation, how confident the posterior is, what one more round of data is worth — and the default rule’s resulting SHIP / CONTINUE / STOP call. A pure summary of the posterior (no sample arrays); recomputable from any posterior, globally or per-segment.

Level: experiment.

treatment¶: The treatment variant this summary is for (the contrast’s non-control side).

decision¶: Ship, continue, or stop.

expected_loss_baseline¶: Expected loss of choosing baseline.

expected_loss_comparison¶: Expected loss of choosing comparison.

probability_positive¶: P(comparison > baseline).

probability_better¶: P(comparison meaningfully better).

probability_harmful¶: P(comparison meaningfully harmful).

thresholds¶: Decision thresholds used (e.g. {"expected_loss_tolerance": 0.001, ...}).

expected_value_of_one_more_round¶: Information-theoretic value of running one more round of data at the same per-round n, in expected-loss-reduction units (loss/visitor). NaN means the producer did not compute it (the legacy compare.variants path cannot — a ComparisonResult carries no sample-size information). Formula documented in docs/concepts/decision-theoretic-inputs.md.

Parameters:

treatment (str)
decision (Decision)
expected_loss_baseline (float)
expected_loss_comparison (float)
probability_positive (float)
probability_better (float)
probability_harmful (float)
thresholds (dict[str, float])
expected_value_of_one_more_round (float)

class pytyche.contracts.AnalysisResult(experiment_id, metric, comparisons, segments, recommendation, cate_per_visitor, analyzed_at, posterior)[source]¶

Bases: object

Summary analysis surface returned by posterior.analyze().

Level: experiment.

This is the SUMMARY surface — lean point estimates and probabilities (Comparison entries, discovered segments, the global RecommendationSummary). Anything needing posterior samples goes through posterior (e.g. analysis.posterior.rpv_cate_samples); observed data is reachable as analysis.posterior.observed.

experiment_id¶: Experiment identifier.

metric¶: Metric analyzed.

comparisons¶: One lean Comparison per non-reference treatment.

segments¶: Segments discovered by the embedded policy-tree fit. Non-optional — an empty list when no segment cleared the min_segment_share threshold, never None.

recommendation¶: Global RecommendationSummary (the extended shape with expected_value_of_one_more_round). At K ≥ 3 it is computed for the best challenger (largest global posterior- mean contrast).

cate_per_visitor¶: Posterior-mean CATE per visitor, aligned with concatenated visitor rows. Shape (n,) at K = 2; (n, K − 1) per-arm contrasts vs the reference at K ≥ 3.

analyzed_at¶: Timestamp of analysis completion.

posterior¶: The fitted posterior the analysis derives from (repr=False — large sample arrays).

Parameters:

experiment_id (str)
metric (str)
comparisons (list[Comparison])
segments (list[DiscoveredSegment])
recommendation (RecommendationSummary)
cate_per_visitor (ndarray)
analyzed_at (datetime)
posterior (HurdleBCFResult | ContinuousBCFResult | BinaryBCFResult)

property is_calibrated: bool¶

Whether the underlying posterior has a calibration applied.

Delegates to posterior.is_calibrated.

class pytyche.contracts.DiagnosticsBundle(inference_data)[source]¶

Bases: NamedTuple

Layer 3: PyMC internals. Not part of the analysis contract.

Transparent container — callers that don’t need diagnostics simply ignore the second element:

result, _ = analyze(data)

Following the ArviZ opinionated Bayes workflow, diagnostics are not optional — every analysis produces traces. analyze() always returns tuple[AnalysisResult, DiagnosticsBundle].

Parameters:: inference_data (DataTree)

inference_data: DataTree¶: Alias for field number 0

class pytyche.contracts.CalibrationTruth(effect, metric_id, metric_family, effect_components, cate_per_visitor, conv_cate_per_visitor=None, aov_cate_per_visitor=None, p0_per_visitor=None, p1_per_visitor=None, m0_per_visitor=None, m1_per_visitor=None, *, contrast_cate_per_visitor=None, p_per_visitor=None, m_per_visitor=None)[source]¶

Bases: object

Ground truth for a single calibration/simulation run.

This type exists ONLY in the simulation/calibration path. Production analysis never sees it. The type boundary enforces this:

analyze(ObservedExperimentData) -> AnalysisResult  # no truth
calibrate(AnalysisResult, CalibrationTruth) -> CalibrationRecord

K=2 dispatch: the legacy 1-D fields (cate_per_visitor, conv_cate_per_visitor, aov_cate_per_visitor, p0_per_visitor, p1_per_visitor, m0_per_visitor, m1_per_visitor) are populated and the three new list fields (contrast_cate_per_visitor, p_per_visitor, m_per_visitor) are None.

K≥3 dispatch: cate_per_visitor is None; the legacy paired fields (p0/p1/m0/m1_per_visitor) are None. contrast_cate_per_visitor (length K−1) carries the per-treatment effects (each treatment level vs. control, the heterogeneous CATEs); p_per_visitor and m_per_visitor (each length K) carry the per-visitor potential-outcome truth under each treatment level (index 0 = control).

effect¶: Absolute metric-native treatment effect (e.g. +$0.12 RPV).

metric_id¶: Canonical metric name.

metric_family¶: Abstract family taxonomy value.

effect_components¶: Decomposition by named component (e.g. {"conv_effect": 0.02, "aov_effect": 0.10}).

cate_per_visitor¶: Per-visitor true CATEs, aligned with concatenated visitor rows. Populated at K=2; None at K≥3.

conv_cate_per_visitor¶: Per-visitor conversion CATE (p1 - p0) * m0. Hurdle K=2 only; None for binary or K≥3.

aov_cate_per_visitor¶: Per-visitor AOV CATE p1 * (m1 - m0). Hurdle K=2 only; None for binary or K≥3.

p0_per_visitor¶: Per-visitor control conversion probabilities. Hurdle K=2 only; None for binary or K≥3.

p1_per_visitor¶: Per-visitor treatment conversion probabilities. Hurdle K=2 only; None for binary or K≥3.

m0_per_visitor¶: Per-visitor control severity means. Hurdle K=2 only; None for binary or K≥3.

m1_per_visitor¶: Per-visitor treatment severity means. Hurdle K=2 only; None for binary or K≥3.

contrast_cate_per_visitor¶: Per-treatment-effect per-visitor CATEs (K≥3). Length K−1 list (one entry per treatment level vs. control); each is the heterogeneous treatment effect realized on the visitor rows. None at K=2.

p_per_visitor¶: Per-visitor conversion potential outcomes under each treatment level (K≥3). Length K, index 0 = control. None at K=2.

m_per_visitor¶: Per-visitor severity potential outcomes under each treatment level (K≥3). Length K, index 0 = control. None at K=2.

Parameters:

effect (float)
metric_id (str)
metric_family (MetricFamily)
effect_components (dict[str, float])
cate_per_visitor (AlignedVisitorArray | None)
conv_cate_per_visitor (AlignedVisitorArray | None)
aov_cate_per_visitor (AlignedVisitorArray | None)
p0_per_visitor (AlignedVisitorArray | None)
p1_per_visitor (AlignedVisitorArray | None)
m0_per_visitor (AlignedVisitorArray | None)
m1_per_visitor (AlignedVisitorArray | None)
contrast_cate_per_visitor (list[AlignedVisitorArray] | None)
p_per_visitor (list[AlignedVisitorArray] | None)
m_per_visitor (list[AlignedVisitorArray] | None)

class pytyche.contracts.CalibrationBundle(observed, truth)[source]¶

Bases: NamedTuple

Transparent container pairing observed data with ground truth.

Unpackable:

observed, truth = bundle

Generators produce this. The calibration runner unpacks it, passes observed to analyze() (which cannot see truth), then evaluates the result against truth separately.

Parameters:

observed (ObservedExperimentData)
truth (CalibrationTruth)

observed: ObservedExperimentData¶: Alias for field number 0

truth: CalibrationTruth¶: Alias for field number 1

class pytyche.contracts.CalibrationRecord(scenario_id, seed, analysis_mode, effect, metric_id, metric_family, effect_components, estimator_id, estimated_lift, ci_low, ci_high, ci_level, probability_positive, probability_better, probability_harmful, expected_loss_baseline, expected_loss_comparison, decision, oracle_decision, decision_correct, regret)[source]¶

Bases: object

Per-seed evaluation record produced by the calibration pipeline.

Level: program.

Output of calibrate(AnalysisResult, CalibrationTruth, oracle_config). All fields are JSON-serializable (no numpy arrays, no callables).

Uses agent-proof naming: - analysis_mode is a ClaimLevel (not bare string). - metric_family is a MetricFamily (not bare string). - decision is a Decision (not bare string). - est_lift_mean → estimated_lift (clearer).

scenario_id¶: Identifier for the simulation scenario.

seed¶: Random seed for this run.

analysis_mode¶: Evidentiary claim level of the analysis.

effect¶: True planted treatment effect (from truth).

metric_id¶: Canonical metric name.

metric_family¶: Abstract family taxonomy.

effect_components¶: True effect decomposition (from truth).

estimator_id¶: Model/estimator used (e.g. "hurdle_lognormal").

estimated_lift¶: Posterior mean of the absolute lift estimate (metric-native units, matching effect).

ci_low¶: Lower bound of the credible interval.

ci_high¶: Upper bound of the credible interval.

ci_level¶: CI level (e.g. 0.80).

probability_positive¶: P(treatment > baseline).

probability_better¶: P(comparison > baseline + threshold).

probability_harmful¶: P(baseline > comparison + threshold).

expected_loss_baseline¶: Expected loss of choosing baseline.

expected_loss_comparison¶: Expected loss of choosing treatment.

decision¶: Recommended decision made.

oracle_decision¶: The decision the oracle would have made given the true effect. Always a concrete Decision value — never None. Persisted directly from _oracle_decision() so downstream consumers (scorecard, notebooks) never need to re-infer it from decision + decision_correct.

decision_correct¶: Whether the decision was correct given truth (None if correctness is ambiguous, e.g. true effect near zero and decision is CONTINUE).

regret¶: Magnitude of decision error in metric-native units (None if not applicable).

Parameters:

scenario_id (str)
seed (int)
analysis_mode (ClaimLevel)
effect (float)
metric_id (str)
metric_family (MetricFamily)
effect_components (dict[str, float])
estimator_id (str)
estimated_lift (float)
ci_low (float)
ci_high (float)
ci_level (float)
probability_positive (float)
probability_better (float)
probability_harmful (float)
expected_loss_baseline (float)
expected_loss_comparison (float)
decision (Decision)
oracle_decision (Decision)
decision_correct (bool | None)
regret (float | None)

class pytyche.contracts.ProductCategory(name, base_price, price_std, base_purchase_prob)[source]¶

Bases: object

A single product category in a cart-based revenue model.

name¶: Category identifier (e.g. "budget", "mid", "premium").

base_price¶: Mean price for this category (in dollars). Must be > 0.

price_std¶: Standard deviation of within-category price variation. Actual price is drawn from Normal(base_price, price_std), clipped to base_price / 2 minimum (no near-zero prices). Use 0.0 for deterministic prices. Must be >= 0.

base_purchase_prob¶: Baseline Bernoulli purchase probability for this category (before visitor-level affinity and treatment adjustments). Must be in (0, 1].

Parameters:

name (str)
base_price (float)
price_std (float)
base_purchase_prob (float)

class pytyche.contracts.CartRevenueConfig(categories, base_quantity_mu=1.0, base_quantity_sigma=0.0)[source]¶

Bases: object

Cart-based revenue model configuration.

Revenue for a converter is computed as the sum of prices for categories where a per-visitor Bernoulli event fires. The purchase probability for category j and visitor i is:

purchase_prob_j(i) = sigmoid(
    logit(base_purchase_prob_j)
    + visitor_affinity_j(i)
    + effect_scale * treatment_delta_j(i)
)

The cart sampler distributes the severity surface scalar shift across categories proportionally to each category’s base_purchase_prob (see design doc D9).

When all Bernoulli events fail (empty cart), a minimum-purchase fallback forces the cheapest category.

categories¶: Ordered list of product categories. Must be non-empty.

base_quantity_mu¶: Mean of per-converter quantity distribution. Must be > 0.

base_quantity_sigma¶: Std of per-converter quantity distribution. Must be >= 0.

Parameters:

categories (list[ProductCategory])
base_quantity_mu (float)
base_quantity_sigma (float)