pytyche.contracts¶
v2 typed contracts for the pytyche analysis pipeline.
This module IS the API reference. Each frozen dataclass defines a contract between pipeline stages, with docstrings documenting fields, invariants, and the containment level it operates at.
Containment chain:
Visitor → Variant → Experiment → Program
Orthogonal axis — Segment:
Visitor → Variant → Experiment → Program
↑
Segment ─┘ (cross-cutting rule/group lens)
Key boundaries enforced by types:
Observed ↔ truth:
ObservedExperimentDatahas NO truth field. Analysis code structurally cannot peek at ground truth.Analysis ↔ diagnostics:
AnalysisResultcarries core results.DiagnosticsBundlecarries PyMC internals, returned separately.Discovery ↔ internals:
DiscoveredSegmentexposes segment outputs. The fitted estimator object is never exposed downstream.
Module Attributes
Required columns and their numpy dtypes for visitor DataFrames. |
|
Reserved per-visitor column name for binary-arm propensity scores. |
|
Prefix for multi-arm propensity columns ( |
|
Reserved per-visitor column name for sequential-experiment cell membership. |
|
Union of all rule clause types. |
Functions
Return True if name is a reserved propensity column name. |
Classes
|
Array aligned 1:1 with concatenated visitor rows. |
|
Summary analysis surface returned by |
|
Numeric range: |
|
Transparent container pairing observed data with ground truth. |
|
Per-seed evaluation record produced by the calibration pipeline. |
|
Ground truth for a single calibration/simulation run. |
|
Cart-based revenue model configuration. |
|
Point estimate + interval for a single hurdle channel's lift. |
|
What the operator can claim from the analysis. |
|
Lean per-treatment global contrast vs the reference arm. |
|
Posterior comparison between two variants. |
|
Numeric threshold: |
|
Recommendation decision outcome. |
|
Decision thresholds for recommendation summaries. |
|
Conversion/severity decomposition of a hurdle-metric lift. |
|
Posterior samples for frequency and severity lift components. |
|
Layer 3: PyMC internals. |
|
HTE discovery output. |
|
Compact snapshot of HTE discovery origin. |
|
Categorical equality: |
|
Categorical set membership: |
|
Abstract metric family taxonomy. |
|
Input data for a single experiment analysis run. |
|
A single product category in a cart-based revenue model. |
|
Recommended decision with its decision-theoretic evidence. |
|
Operator-reviewed, registry-registered segment. |
|
Rule defining a group of visitors. |
|
Per-visitor observations for a single experiment variant. |
- class pytyche.contracts.MetricFamily(*values)[source]¶
Bases:
StrEnumAbstract metric family taxonomy.
Determines model structure and decomposition availability.
- class pytyche.contracts.Decision(*values)[source]¶
Bases:
StrEnumRecommendation decision outcome.
SHIP: deploy the treatment.CONTINUE: keep collecting data.STOP: abandon the treatment (harmful or futile).
- class pytyche.contracts.ClaimLevel(*values)[source]¶
Bases:
StrEnumWhat the operator can claim from the analysis.
Describes the evidentiary strength, not the splitting mechanism. Stable across estimator changes (e.g. BCF makes splitting optional).
EXPLORATORY: data-driven discovery, not pre-registered.HONEST_ESTIMATE: sample-split or honest-forest estimates.CONFIRMED: replicated in a hold-out experiment.
- pytyche.contracts.VISITOR_SCHEMA: dict[str, str] = {'converted': 'bool', 'experiment_id': 'object', 'orders_count': 'int64', 'revenue': 'float64', 'sessions_count': 'int64', 'variant': 'object', 'visitor_id': 'object'}¶
Required columns and their numpy dtypes for visitor DataFrames.
Both generators and production loaders MUST produce DataFrames with at least these columns. Additional feature columns (segment assignments, device, country, etc.) are permitted and used by HTE discovery.
- Invariants:
One row per visitor (unique
visitor_id).revenue >= 0.Generator expectation:
convertedimpliesrevenue > 0. Production data may haveconverted=True, revenue=0(free trials, coupons) — analysis handles both.
- pytyche.contracts.RESERVED_PROPENSITY_COLUMN: str = 'propensity'¶
Reserved per-visitor column name for binary-arm propensity scores.
At K=2 this column carries
P(Z=1 | x)— the probability of assignment to the treatment variant given visitor covariates. At K≥3 the multi-arm equivalents arepropensity_1 … propensity_{K-1}(P(Z=k | x)), following theRESERVED_PROPENSITY_PREFIXpattern.These columns are NEVER features — the fit-boundary adapter excludes them from the feature matrix
Xat every K. Useis_reserved_propensity_columnto test any column name against the full reserved set.
- pytyche.contracts.RESERVED_PROPENSITY_PREFIX: str = 'propensity_'¶
Prefix for multi-arm propensity columns (
propensity_0,propensity_1, …).Any column whose name is exactly
RESERVED_PROPENSITY_COLUMNor matchesRESERVED_PROPENSITY_PREFIX + <digits>is reserved and excluded from the feature matrix. Thepropensity_0form is included as a deliberate fail-safe — it is not a standard K≥3 propensity column, but admitting it prevents accidental leakage of propensity-like columns into HTE discovery.
- pytyche.contracts.RESERVED_CELL_COLUMN: str = 'cell'¶
Reserved per-visitor column name for sequential-experiment cell membership.
Carries the id of the cell (Control / Explore / Optimized / operator hypothesis cell) that allocated the visitor. Recorded at data-generation time — membership is not derivable from the treatment received, since e.g. an Explore-cell visitor can draw control. Never a feature: the fit-boundary adapter excludes it from
X; the single-shot fit path otherwise ignores it. Consumed bypt.sequential_experimentto compute per-cell observations.
- pytyche.contracts.is_reserved_propensity_column(name)[source]¶
Return True if name is a reserved propensity column name.
Reserved names:
exactly
"propensity"(K=2:P(Z=1 | x))"propensity_<digits>"(K≥3:P(Z=k | x); also matchespropensity_0as a deliberate fail-safe superset)
Any column matching this predicate is excluded from the feature matrix by the fit-boundary extraction adapter.
- Parameters:
name (
str) – Column name to test.- Return type:
bool- Returns:
True when the column is reserved; False otherwise.
- class pytyche.contracts.EqRule(feature, value)[source]¶
Bases:
objectCategorical equality:
feature == value.Example:
EqRule("lifecycle_stage", "new_visitor").- Parameters:
feature (
str)value (
str)
- class pytyche.contracts.InRule(feature, values)[source]¶
Bases:
objectCategorical set membership:
feature in values.Example:
InRule("device", ("mobile", "tablet")).- Parameters:
feature (
str)values (
tuple[str,...])
- class pytyche.contracts.ComparisonRule(feature, operator, threshold)[source]¶
Bases:
objectNumeric threshold:
feature <op> threshold.Example:
ComparisonRule("age", "gt", 35.0)meansage > 35.- Parameters:
feature (
str)operator (
Literal['gt','gte','lt','lte'])threshold (
float)
- class pytyche.contracts.BetweenRule(feature, low, high)[source]¶
Bases:
objectNumeric range:
low <= feature <= high.Inclusive on both ends.
Example:
BetweenRule("spend", 10.0, 100.0).- Parameters:
feature (
str)low (
float)high (
float)
- pytyche.contracts.RuleClause = pytyche.contracts.EqRule | pytyche.contracts.InRule | pytyche.contracts.ComparisonRule | pytyche.contracts.BetweenRule¶
Union of all rule clause types. Clauses within a
SegmentRuleare AND-combined.
- class pytyche.contracts.SegmentRule(description, clauses)[source]¶
Bases:
objectRule defining a group of visitors.
Shared across all segment contexts (manual, discovered, registered). Clauses are AND-combined. Canonical sort by feature name ensures deterministic equality, hashing, and serialization regardless of input order.
clauses=()is the catch-all rule matching every visitor (apply_rule’s AND-fold over zero clauses is vacuously all-True), produced byfit_policy_treefor a root-only (single-leaf) tree.Level: cross-cutting (applied over visitor sets).
- Parameters:
description (
str)clauses (
tuple[EqRule|InRule|ComparisonRule|BetweenRule,...])
- class pytyche.contracts.DiscoveredSegment(id, rule, gate_estimate, gate_ci, population_share, stability_score, arm_best_probabilities)[source]¶
Bases:
objectHTE discovery output. Tight — no optional fields.
Produced by the HTE estimation pipeline (the embedded policy-tree fit over posterior CATEs). The fitted estimator is NOT exposed — downstream sees only this output.
Level: cross-cutting (segment × experiment).
- id¶
Leaf id within the parent policy tree. Identifies the segment’s tree position so
PolicyTreeResult.allocation_map[id]lookups work.
- rule¶
The segment-defining rule.
- gate_estimate¶
Estimated treatment effect for this segment (metric- native units).
- gate_ci¶
80% credible/confidence interval for the gate estimate.
Fraction of the population assigned to this segment, in [0, 1].
- stability_score¶
Bootstrap-replicability score in [0, 1] — the fraction of bootstrap tree refits in which some leaf has Jaccard overlap >= 0.5 with this segment’s member set. 0.80 is the documented default “credible enough to act on” cutoff.
NaNis the documented “not computed” sentinel (e.g.fit_policy_tree(n_bootstrap=0)).
- arm_best_probabilities¶
Per-arm posterior probability that the arm is best in this segment under the shared best-arm rule. Keyed by ALL variant names INCLUDING control (control wins a draw when every contrast is non-positive); values sum to 1.0 within 1e-6.
- Parameters:
id (
int)rule (
SegmentRule)gate_estimate (
float)gate_ci (
tuple[float,float])population_share (
float)stability_score (
float)arm_best_probabilities (
dict[str,float])
- class pytyche.contracts.DiscoveryProvenance(gate_estimate, stability_score, population_share, discovered_at)[source]¶
Bases:
objectCompact snapshot of HTE discovery origin.
Avoids bloating/duplicating
DiscoveredSegmentwhen carried into aRegisteredSegment.- Parameters:
gate_estimate (
float)stability_score (
float)population_share (
float)discovered_at (
datetime)
- class pytyche.contracts.RegisteredSegment(key, rule, provenance, lifecycle)[source]¶
Bases:
objectOperator-reviewed, registry-registered segment.
Level: cross-cutting (segment in registry).
Lifecycle:
HTE discovery → DiscoveredSegment → operator review → RegisteredSegment(lifecycle="registered") → dbt SQL → Redis → RegisteredSegment(lifecycle="deployed") → experiment targeting- key¶
Snake_case registry identifier (e.g.
"high_value_mobile").
- rule¶
The segment-defining rule.
- provenance¶
Discovery origin, or
Nonefor manually-defined segments.
- lifecycle¶
Current lifecycle stage.
- Parameters:
key (
str)rule (
SegmentRule)provenance (
DiscoveryProvenance|None)lifecycle (
Literal['registered','deployed'])
- class pytyche.contracts.VariantData(name, visitors, n_visitors, n_conversions, total_revenue)[source]¶
Bases:
objectPer-visitor observations for a single experiment variant.
Level: variant.
The
visitorsDataFrame MUST conform toVISITOR_SCHEMA(at minimum). Additional feature columns are permitted.- name¶
Variant name (e.g.
"control","treatment_a").
- visitors¶
DataFrame with one row per visitor.
- n_visitors¶
Row count (redundant with
len(visitors)— fail- closed validation).
- n_conversions¶
Count of
converted == Truerows.
- total_revenue¶
Sum of
revenuecolumn.
- Parameters:
name (
str)visitors (
DataFrame)n_visitors (
int)n_conversions (
int)total_revenue (
float)
- class pytyche.contracts.ObservedExperimentData(experiment_id, metric, variants)[source]¶
Bases:
objectInput data for a single experiment analysis run.
Level: experiment.
This type has NO truth field. Ground truth is structurally excluded so that analysis code cannot peek at it. Generators produce a
CalibrationBundlethat pairs observed data with truth; the calibration runner unpacks the bundle and passes only the observed data toanalyze().Production path:
load_experiment()returns this directly. Simulation path:generate()→CalibrationBundle→ runner unpacks.- experiment_id¶
Unique experiment identifier.
- metric¶
Canonical metric name (e.g.
"revenue_per_visitor").
- variants¶
List of variant data, minimum 2. The first variant is conventionally the control/baseline.
Derived accessors (read-only properties, not dataclass fields):
control_name— the first variant’s name (the control/reference variant);treatment_names— names of all non-control variants, in variant-list order.- Parameters:
experiment_id (
str)metric (
str)variants (
list[VariantData])
- property control_name: str¶
Name of the reference/control variant (
variants[0].name).
- property treatment_names: tuple[str, ...]¶
Names of the non-control variants in variant-list order.
Returns a tuple of
variants[1:].namevalues. At K=2 this is a single-element tuple; at K≥3 it carries all treatment variant names.
- class pytyche.contracts.AlignedVisitorArray(values, n_visitors)[source]¶
Bases:
objectArray aligned 1:1 with concatenated visitor rows.
Any per-visitor array (e.g. CATE predictions) MUST be wrapped in this type to enforce explicit alignment with the concatenated visitor rows:
visitors = pd.concat([v.visitors for v in data.variants], ignore_index=True) assert len(array.values) == len(visitors) # array.values[i] corresponds to visitors.iloc[i]
The type name IS the documentation — when an agent sees
cate_per_visitor: AlignedVisitorArray, the alignment contract is self-evident.- values¶
The per-visitor array.
- n_visitors¶
Expected length (redundant — fail-closed validation).
- Parameters:
values (
ndarray)n_visitors (
int)
- class pytyche.contracts.DecompositionSamples(frequency_lift_samples, severity_lift_samples)[source]¶
Bases:
objectPosterior samples for frequency and severity lift components.
Only meaningful for hurdle metrics (
MetricFamily.HURDLE_REAL). Frequency = conversion probability lift. Severity = AOV lift given conversion.- frequency_lift_samples¶
Per-sample frequency component lift.
- severity_lift_samples¶
Per-sample severity component lift.
- Parameters:
frequency_lift_samples (
ndarray)severity_lift_samples (
ndarray)
- class pytyche.contracts.ComparisonResult(baseline, comparison, method, probability_positive, probability_better, probability_harmful, expected_loss_baseline, expected_loss_comparison, expected_loss_samples_baseline, expected_loss_samples_comparison, lift_samples, lift_unit, lift_ci, lift_ci_level=0.8, decomposition=None)[source]¶
Bases:
objectPosterior comparison between two variants.
Level: experiment.
Uses role-based naming:
baselineandcomparisonare roles within THIS comparison, not properties of the variants themselves. The same variant can play different roles in different comparisons.Lift semantics:
lift_samplesalways contains absolute lift (comparison - baseline) in metric-native units.lift_unitindicates the metric’s natural presentation unit ("pct"for binary metrics,"dollar"for revenue metrics) so display layers can derive percentage lift when rendering.- baseline¶
Variant name serving as the baseline in this comparison.
- comparison¶
Variant name being compared to the baseline.
- method¶
"compare_to_control"or"best_of_rest".
- probability_positive¶
P(comparison > baseline).
- probability_better¶
P(comparison > baseline + threshold).
- probability_harmful¶
P(baseline > comparison + threshold).
- expected_loss_baseline¶
E[max(comparison - baseline, 0)] — cost of choosing baseline when comparison is better.
- expected_loss_comparison¶
E[max(baseline - comparison, 0)] — cost of choosing comparison when baseline is better.
- expected_loss_samples_baseline¶
Per-sample loss array for baseline.
- expected_loss_samples_comparison¶
Per-sample loss array for comparison.
- lift_samples¶
Absolute lift samples (
comparison - baseline) in metric-native units. Always absolute regardless of metric family.
- lift_unit¶
Metric’s natural presentation unit (
"pct"for binary,"dollar"for revenue). Display hint only —lift_samplesis always absolute.
- lift_ci¶
(low, high) credible interval for lift.
- lift_ci_level¶
CI level (default 0.80).
- decomposition¶
Frequency/severity decomposition (hurdle metrics only).
- Parameters:
baseline (
str)comparison (
str)method (
Literal['compare_to_control','best_of_rest'])probability_positive (
float)probability_better (
float)probability_harmful (
float)expected_loss_baseline (
float)expected_loss_comparison (
float)expected_loss_samples_baseline (
ndarray)expected_loss_samples_comparison (
ndarray)lift_samples (
ndarray)lift_unit (
str)lift_ci (
tuple[float,float])lift_ci_level (
float)decomposition (
DecompositionSamples|None)
- class pytyche.contracts.ChannelLift(point_estimate, ci)[source]¶
Bases:
objectPoint estimate + interval for a single hurdle channel’s lift.
Level: experiment.
- point_estimate¶
Posterior mean of the channel-specific lift.
- ci¶
80% credible interval (low, high) on the channel-specific lift.
- Parameters:
point_estimate (
float)ci (
tuple[float,float])
- class pytyche.contracts.Decomposition(conversion_lift, severity_lift)[source]¶
Bases:
objectConversion/severity decomposition of a hurdle-metric lift.
Lean summary counterpart of
DecompositionSamples— point estimates and intervals only, no posterior samples. Populated onComparisonfor hurdle posteriors (posterior.has_decomposition() == True).Level: experiment.
- conversion_lift¶
Change in conversion probability attributable to the treatment.
- severity_lift¶
Change in basket size given conversion attributable to the treatment.
- Parameters:
conversion_lift (
ChannelLift)severity_lift (
ChannelLift)
- class pytyche.contracts.Comparison(treatment, probability_positive, lift_estimate, lift_ci, decomposition=None)[source]¶
Bases:
objectLean per-treatment global contrast vs the reference arm.
The v0.2 summary surface carried by
AnalysisResult.comparisons— point estimates and probabilities only. The rich sample-carryingComparisonResultstays thecompare.variantsoutput; anything needing posterior samples goes throughAnalysisResult.posterior.Level: experiment.
- treatment¶
Treatment variant name being compared (matches a name in
posterior.observed.treatment_names).
- probability_positive¶
P(lift > 0) at the global level.
- lift_estimate¶
Posterior mean of the CATE for this contrast.
- lift_ci¶
80% credible interval on the lift (10th/90th percentile of
rpv_cate_samplesfor this contrast).
- decomposition¶
Conversion/severity decomposition (hurdle posteriors only;
Noneotherwise).
- Parameters:
treatment (
str)probability_positive (
float)lift_estimate (
float)lift_ci (
tuple[float,float])decomposition (
Decomposition|None)
- class pytyche.contracts.DecisionThresholds(expected_loss_tolerance=0.01, p_positive_threshold=0.95, p_better_threshold=0.8, futility_threshold=0.05, harm_threshold=0.9)[source]¶
Bases:
objectDecision thresholds for recommendation summaries.
All values are probabilities in (0, 1) except
expected_loss_tolerancewhich is a positive metric-native value.- Parameters:
expected_loss_tolerance (
float)p_positive_threshold (
float)p_better_threshold (
float)futility_threshold (
float)harm_threshold (
float)
- class pytyche.contracts.RecommendationSummary(treatment, decision, expected_loss_baseline, expected_loss_comparison, probability_positive, probability_better, probability_harmful, thresholds, *, expected_value_of_one_more_round=nan)[source]¶
Bases:
objectRecommended decision with its decision-theoretic evidence.
The act-now risk assessment for one treatment-vs-control contrast: what committing to either side costs in expectation, how confident the posterior is, what one more round of data is worth — and the default rule’s resulting SHIP / CONTINUE / STOP call. A pure summary of the posterior (no sample arrays); recomputable from any posterior, globally or per-segment.
Level: experiment.
- treatment¶
The treatment variant this summary is for (the contrast’s non-control side).
- decision¶
Ship, continue, or stop.
- expected_loss_baseline¶
Expected loss of choosing baseline.
- expected_loss_comparison¶
Expected loss of choosing comparison.
- probability_positive¶
P(comparison > baseline).
- probability_better¶
P(comparison meaningfully better).
- probability_harmful¶
P(comparison meaningfully harmful).
- thresholds¶
Decision thresholds used (e.g.
{"expected_loss_tolerance": 0.001, ...}).
- expected_value_of_one_more_round¶
Information-theoretic value of running one more round of data at the same per-round n, in expected-loss-reduction units (loss/visitor).
NaNmeans the producer did not compute it (the legacycompare.variantspath cannot — aComparisonResultcarries no sample-size information). Formula documented indocs/concepts/decision-theoretic-inputs.md.
- Parameters:
treatment (
str)decision (
Decision)expected_loss_baseline (
float)expected_loss_comparison (
float)probability_positive (
float)probability_better (
float)probability_harmful (
float)thresholds (
dict[str,float])expected_value_of_one_more_round (
float)
- class pytyche.contracts.AnalysisResult(experiment_id, metric, comparisons, segments, recommendation, cate_per_visitor, analyzed_at, posterior)[source]¶
Bases:
objectSummary analysis surface returned by
posterior.analyze().Level: experiment.
This is the SUMMARY surface — lean point estimates and probabilities (
Comparisonentries, discovered segments, the globalRecommendationSummary). Anything needing posterior samples goes throughposterior(e.g.analysis.posterior.rpv_cate_samples); observed data is reachable asanalysis.posterior.observed.- experiment_id¶
Experiment identifier.
- metric¶
Metric analyzed.
- comparisons¶
One lean
Comparisonper non-reference treatment.
- segments¶
Segments discovered by the embedded policy-tree fit. Non-optional — an empty list when no segment cleared the min_segment_share threshold, never
None.
- recommendation¶
Global
RecommendationSummary(the extended shape withexpected_value_of_one_more_round). At K ≥ 3 it is computed for the best challenger (largest global posterior- mean contrast).
- cate_per_visitor¶
Posterior-mean CATE per visitor, aligned with concatenated visitor rows. Shape
(n,)at K = 2;(n, K − 1)per-arm contrasts vs the reference at K ≥ 3.
- analyzed_at¶
Timestamp of analysis completion.
- posterior¶
The fitted posterior the analysis derives from (
repr=False— large sample arrays).
- Parameters:
experiment_id (
str)metric (
str)comparisons (
list[Comparison])segments (
list[DiscoveredSegment])recommendation (
RecommendationSummary)cate_per_visitor (
ndarray)analyzed_at (
datetime)posterior (
HurdleBCFResult|ContinuousBCFResult|BinaryBCFResult)
- property is_calibrated: bool¶
Whether the underlying posterior has a calibration applied.
Delegates to
posterior.is_calibrated.
- class pytyche.contracts.DiagnosticsBundle(inference_data)[source]¶
Bases:
NamedTupleLayer 3: PyMC internals. Not part of the analysis contract.
Transparent container — callers that don’t need diagnostics simply ignore the second element:
result, _ = analyze(data)
Following the ArviZ opinionated Bayes workflow, diagnostics are not optional — every analysis produces traces.
analyze()always returnstuple[AnalysisResult, DiagnosticsBundle].- Parameters:
inference_data (
DataTree)
- inference_data: DataTree¶
Alias for field number 0
- class pytyche.contracts.CalibrationTruth(effect, metric_id, metric_family, effect_components, cate_per_visitor, conv_cate_per_visitor=None, aov_cate_per_visitor=None, p0_per_visitor=None, p1_per_visitor=None, m0_per_visitor=None, m1_per_visitor=None, *, contrast_cate_per_visitor=None, p_per_visitor=None, m_per_visitor=None)[source]¶
Bases:
objectGround truth for a single calibration/simulation run.
This type exists ONLY in the simulation/calibration path. Production analysis never sees it. The type boundary enforces this:
analyze(ObservedExperimentData) -> AnalysisResult # no truth calibrate(AnalysisResult, CalibrationTruth) -> CalibrationRecord
K=2 dispatch: the legacy 1-D fields (
cate_per_visitor,conv_cate_per_visitor,aov_cate_per_visitor,p0_per_visitor,p1_per_visitor,m0_per_visitor,m1_per_visitor) are populated and the three new list fields (contrast_cate_per_visitor,p_per_visitor,m_per_visitor) areNone.K≥3 dispatch:
cate_per_visitorisNone; the legacy paired fields (p0/p1/m0/m1_per_visitor) areNone.contrast_cate_per_visitor(length K−1) carries the per-treatment effects (each treatment level vs. control, the heterogeneous CATEs);p_per_visitorandm_per_visitor(each length K) carry the per-visitor potential-outcome truth under each treatment level (index 0 = control).- effect¶
Absolute metric-native treatment effect (e.g. +$0.12 RPV).
- metric_id¶
Canonical metric name.
- metric_family¶
Abstract family taxonomy value.
- effect_components¶
Decomposition by named component (e.g.
{"conv_effect": 0.02, "aov_effect": 0.10}).
- cate_per_visitor¶
Per-visitor true CATEs, aligned with concatenated visitor rows. Populated at K=2;
Noneat K≥3.
- conv_cate_per_visitor¶
Per-visitor conversion CATE (p1 - p0) * m0. Hurdle K=2 only;
Nonefor binary or K≥3.
- aov_cate_per_visitor¶
Per-visitor AOV CATE p1 * (m1 - m0). Hurdle K=2 only;
Nonefor binary or K≥3.
- p0_per_visitor¶
Per-visitor control conversion probabilities. Hurdle K=2 only;
Nonefor binary or K≥3.
- p1_per_visitor¶
Per-visitor treatment conversion probabilities. Hurdle K=2 only;
Nonefor binary or K≥3.
- m0_per_visitor¶
Per-visitor control severity means. Hurdle K=2 only;
Nonefor binary or K≥3.
- m1_per_visitor¶
Per-visitor treatment severity means. Hurdle K=2 only;
Nonefor binary or K≥3.
- contrast_cate_per_visitor¶
Per-treatment-effect per-visitor CATEs (K≥3). Length K−1 list (one entry per treatment level vs. control); each is the heterogeneous treatment effect realized on the visitor rows.
Noneat K=2.
- p_per_visitor¶
Per-visitor conversion potential outcomes under each treatment level (K≥3). Length K, index 0 = control.
Noneat K=2.
- m_per_visitor¶
Per-visitor severity potential outcomes under each treatment level (K≥3). Length K, index 0 = control.
Noneat K=2.
- Parameters:
effect (
float)metric_id (
str)metric_family (
MetricFamily)effect_components (
dict[str,float])cate_per_visitor (
AlignedVisitorArray|None)conv_cate_per_visitor (
AlignedVisitorArray|None)aov_cate_per_visitor (
AlignedVisitorArray|None)p0_per_visitor (
AlignedVisitorArray|None)p1_per_visitor (
AlignedVisitorArray|None)m0_per_visitor (
AlignedVisitorArray|None)m1_per_visitor (
AlignedVisitorArray|None)contrast_cate_per_visitor (
list[AlignedVisitorArray] |None)p_per_visitor (
list[AlignedVisitorArray] |None)m_per_visitor (
list[AlignedVisitorArray] |None)
- class pytyche.contracts.CalibrationBundle(observed, truth)[source]¶
Bases:
NamedTupleTransparent container pairing observed data with ground truth.
Unpackable:
observed, truth = bundle
Generators produce this. The calibration runner unpacks it, passes
observedtoanalyze()(which cannot see truth), then evaluates the result againsttruthseparately.- Parameters:
observed (
ObservedExperimentData)truth (
CalibrationTruth)
- observed: ObservedExperimentData¶
Alias for field number 0
- truth: CalibrationTruth¶
Alias for field number 1
- class pytyche.contracts.CalibrationRecord(scenario_id, seed, analysis_mode, effect, metric_id, metric_family, effect_components, estimator_id, estimated_lift, ci_low, ci_high, ci_level, probability_positive, probability_better, probability_harmful, expected_loss_baseline, expected_loss_comparison, decision, oracle_decision, decision_correct, regret)[source]¶
Bases:
objectPer-seed evaluation record produced by the calibration pipeline.
Level: program.
Output of
calibrate(AnalysisResult, CalibrationTruth, oracle_config). All fields are JSON-serializable (no numpy arrays, no callables).Uses agent-proof naming: -
analysis_modeis aClaimLevel(not bare string). -metric_familyis aMetricFamily(not bare string). -decisionis aDecision(not bare string). -est_lift_mean→estimated_lift(clearer).- scenario_id¶
Identifier for the simulation scenario.
- seed¶
Random seed for this run.
- analysis_mode¶
Evidentiary claim level of the analysis.
- effect¶
True planted treatment effect (from truth).
- metric_id¶
Canonical metric name.
- metric_family¶
Abstract family taxonomy.
- effect_components¶
True effect decomposition (from truth).
- estimator_id¶
Model/estimator used (e.g.
"hurdle_lognormal").
- estimated_lift¶
Posterior mean of the absolute lift estimate (metric-native units, matching
effect).
- ci_low¶
Lower bound of the credible interval.
- ci_high¶
Upper bound of the credible interval.
- ci_level¶
CI level (e.g. 0.80).
- probability_positive¶
P(treatment > baseline).
- probability_better¶
P(comparison > baseline + threshold).
- probability_harmful¶
P(baseline > comparison + threshold).
- expected_loss_baseline¶
Expected loss of choosing baseline.
- expected_loss_comparison¶
Expected loss of choosing treatment.
- decision¶
Recommended decision made.
- oracle_decision¶
The decision the oracle would have made given the true effect. Always a concrete
Decisionvalue — neverNone. Persisted directly from_oracle_decision()so downstream consumers (scorecard, notebooks) never need to re-infer it fromdecision+decision_correct.
- decision_correct¶
Whether the decision was correct given truth (
Noneif correctness is ambiguous, e.g. true effect near zero and decision isCONTINUE).
- regret¶
Magnitude of decision error in metric-native units (
Noneif not applicable).
- Parameters:
scenario_id (
str)seed (
int)analysis_mode (
ClaimLevel)effect (
float)metric_id (
str)metric_family (
MetricFamily)effect_components (
dict[str,float])estimator_id (
str)estimated_lift (
float)ci_low (
float)ci_high (
float)ci_level (
float)probability_positive (
float)probability_better (
float)probability_harmful (
float)expected_loss_baseline (
float)expected_loss_comparison (
float)decision (
Decision)oracle_decision (
Decision)decision_correct (
bool|None)regret (
float|None)
- class pytyche.contracts.ProductCategory(name, base_price, price_std, base_purchase_prob)[source]¶
Bases:
objectA single product category in a cart-based revenue model.
- name¶
Category identifier (e.g.
"budget","mid","premium").
- base_price¶
Mean price for this category (in dollars). Must be > 0.
- price_std¶
Standard deviation of within-category price variation. Actual price is drawn from
Normal(base_price, price_std), clipped tobase_price / 2minimum (no near-zero prices). Use0.0for deterministic prices. Must be >= 0.
- base_purchase_prob¶
Baseline Bernoulli purchase probability for this category (before visitor-level affinity and treatment adjustments). Must be in
(0, 1].
- Parameters:
name (
str)base_price (
float)price_std (
float)base_purchase_prob (
float)
- class pytyche.contracts.CartRevenueConfig(categories, base_quantity_mu=1.0, base_quantity_sigma=0.0)[source]¶
Bases:
objectCart-based revenue model configuration.
Revenue for a converter is computed as the sum of prices for categories where a per-visitor Bernoulli event fires. The purchase probability for category
jand visitoriis:purchase_prob_j(i) = sigmoid( logit(base_purchase_prob_j) + visitor_affinity_j(i) + effect_scale * treatment_delta_j(i) )
The cart sampler distributes the severity surface scalar shift across categories proportionally to each category’s
base_purchase_prob(see design doc D9).When all Bernoulli events fail (empty cart), a minimum-purchase fallback forces the cheapest category.
- categories¶
Ordered list of product categories. Must be non-empty.
- base_quantity_mu¶
Mean of per-converter quantity distribution. Must be > 0.
- base_quantity_sigma¶
Std of per-converter quantity distribution. Must be >= 0.
- Parameters:
categories (
list[ProductCategory])base_quantity_mu (
float)base_quantity_sigma (
float)