Glossary¶
Definitions for the load-bearing pytyche concepts. Several terms in this space collide easily (treatment / arm; cell / segment / cluster; CATE / HTE). Each entry below gives a short definition, the closest neighbors a reader might confuse it with, and a pointer to the code where it lives.
Sequential experiment¶
- sequential experiment¶
The full adaptive experiment: one campaign across N rounds sharing treatments, schedule, and cumulative posterior. Constructed via
pt.sequential_experiment(...)and iterates round by round.exp = pt.sequential_experiment( generator=my_dgp, schedule=pt.GeometricSchedule(initial=10_000, growth=2.0, n_rounds=3), treatments=['control', 'low_promo', 'free_ship'], calibration=pt.Calibration.from_sweep('clustered_realistic_v1'), ) for r in exp: inspect(r)
Each round of a sequential experiment is an experiment. The temporal slot housing it is a round. The same sequential machinery powers both sim mode (generator-driven) and real-data mode (operator-driven), distinguished only by what the
generatorcallable returns.Defined as
pytyche.experiment.SequentialExperiment.- experiment¶
A single discrete experiment: observed data, analysis, cells shipped, and the recommendation for the next experiment. The shape of a traditional single-shot A/B/N test, composed of an
ObservedExperimentDataplus anAnalysisResult.In a sequential experiment, each round produces one experiment. In single-shot use,
pt.analyze(observed)returns anAnalysisResultdirectly with no sequential machinery.Defined as
pytyche.experiment.Experiment.- round¶
One iteration of a sequential experiment — the temporal slot housing one experiment. “Round 1, 2, 3” are positional indices into
SequentialExperiment.history.Defined as
pytyche.experiment.Round.- schedule¶
The protocol that decides each round’s visitor count. Three shipped implementations:
GeometricSchedule(initial, growth, n_rounds=None)— doubling batches (matches Perchet 2016, Esfandiari 2021, Che & Namkoong 2023)FixedSchedule(per_round, n_rounds)— flat batchesExplicitSchedule([n_round_1, n_round_2, ...])— user-supplied per-round visitor counts
A schedule’s
n_roundsis optional. When None, the schedule is open-ended and the operator decides when to stop.Defined as
pytyche.experiment.Schedule.- generator¶
The callable supplied at construction that provides observed data when the sequential experiment advances. One entry point through which both sim mode and real-data mode deliver data.
Generator = Callable[ [int, NextRoundPlan], tuple[ObservedExperimentData, CalibrationTruth | None], ]
Sim mode supplies a DGP that returns synthesized observations alongside truth. Real-data mode supplies a callable that fetches the round’s data from the operator’s platform, database, or other source. The library treats both identically.
Defined as
pytyche.experiment.Generatortype alias.
Treatments, arms, policies, cells¶
Four terms with sharp distinctions.
- treatment¶
A named candidate intervention delivered to a single visitor (for example
'free_ship','low_promo','control'). The thing the BCF model estimates causal effects for. Declared via thetreatmentsparameter onpt.sequential_experiment().Related terms:
- arm¶
The internal-math term for a treatment — the integer index that encodes it within BCF kernels. The joint hurdle BCF estimator carries a per-arm axis on its sample arrays (
p_samples/sev_samplesof shape(n, S_total, K); the(K − 1)contrast vector onrpv_cate_samples).Z = np.array([0, 1, 2, 0]) # control, arm 1, arm 2, control basis = _compute_basis(Z) # bcf/preprocess.py — (n, K-1) contrast coding
Pytyche’s public API always uses “treatment.” “Arm” appears in BCF kernel code and per-arm array dimensions.
Defined as integer indices in
Zarrays. Seesrc/pytyche/bcf/preprocess.py(_compute_basis) for the per-arm contrast coding.- policy¶
The routing rule a cell uses to decide which treatment to deliver per visitor. Four shipped variants:
BaselinePolicy()— always delivers the control treatmentUniformPolicy(over=[...])— uniform-random over a subset of treatments. The default Explore-cell policy.TreePolicy(tree, allocation_map)— sklearnDecisionTreeClassifierplus per-leaf Thompson allocation over treatmentsOperator-defined Policy subclasses for hypothesis injection
A cell houses one policy as its routing rule. A
TreePolicywraps a decision tree and the per-leaf Thompson allocation. The tree alone does not make a policy.Defined as
pytyche.experiment.Policyprotocol.- cell¶
An assignment cohort within a single round of a sequential experiment. Cells span the visitor population by weight. Each cell ships a policy that decides what treatment to deliver per visitor within that cell.
cells = [ Cell('control', BaselinePolicy(), weight=0.3), Cell('explore', UniformPolicy(over=treatments), weight=0.4), Cell('optimized_v1', TreePolicy(tree, allocation_map), weight=0.3), ]
The default round-1 structure has a Control cell and an Explore cell at 50/50. The recommendation engine may add Optimized cells in subsequent rounds. Multiple Optimized cells in one round is a first-class capability for organizations running head-to-head policy variants. Operational reasons to do this include different stakeholder ownership, vendor relationships, and channel-specific creative.
A cell is an assignment-time cohort spanning the population. A segment is a region of feature space that a policy tree within an Optimized cell partitions. A cluster is a DGP-generated mixture component (truth-side, sim-only) and is neither.
Defined as
pytyche.experiment.Cell.- Thompson allocation¶
A Bayesian allocation rule where each treatment receives a share of traffic proportional to its posterior probability of being best. Per segment, per arm:
allocation[arm] = P(arm best | posterior). Each segment’s allocation sums to 1.In pytyche, applied per segment with an ε-clip floor so that no active treatment can be allocated below a minimum share. Magnitude-aware: a segment with
P(best) = 0.91gets a markedly different allocation thanP(best) = 0.55, without discrete regime thresholds.Defined as
pytyche.experiment.ThompsonPolicy(the default allocation policy behindTreePolicyandUniformPolicy).- ε-clip¶
An internal safety net inside Thompson allocation. Within each segment (leaf of an Optimized cell’s policy tree), every active treatment receives at least
ε/Kof that segment’s Optimized-cell share, where K is the active treatment count. Prevents Thompson allocation from collapsing to a single treatment per segment when one treatment dominates the posterior.The operator-facing controls-retention story is the cell-level Control and Explore weights (see min_control_weight and min_explore_weight), not ε. The ε-clip becomes mostly redundant when both Control and Explore cells have non-zero weight, since Explore already samples every treatment uniformly across all segments at the cell level.
Not exposed at the L1 API. Lives as a Thompson-allocation implementation detail with a hard-coded default (ε = 0.02).
- min_control_weight¶
The guaranteed minimum share of traffic the recommendation engine will allocate to the Control cell when proposing the next round’s structure. With
min_control_weight=0.05, the Control cell never falls below 5% of the round’s traffic regardless of how confident the model becomes.The baseline-measurement controls-retention floor. Even as the experiment matures and Optimized cells absorb more traffic, the Control cell continues to receive a guaranteed share so drift in the baseline outcome surface remains detectable.
Set via the
min_control_weightparameter onpt.sequential_experiment(). The operator may override the recommended weight in their own next-round plan; the floor applies only to engine-proposed allocations.- min_explore_weight¶
The guaranteed minimum share of traffic the recommendation engine will allocate to the Explore cell when proposing the next round’s structure. With
min_explore_weight=0.05, the Explore cell never falls below 5% of the round’s traffic regardless of segment confidence.The every-treatment-observed controls-retention floor. The Explore cell samples uniformly across all active treatments, so a non-zero floor guarantees every treatment receives some traffic in every segment regardless of what the Optimized cells are doing.
Set via the
min_explore_weightparameter onpt.sequential_experiment(). The operator may override in their own next-round plan; the floor applies only to engine-proposed allocations.
Segments, clusters, HTE¶
- segment¶
A region of feature space, typically a leaf of a policy tree. “Mobile-returning visitors” or “desktop-new visitors” are segments discovered by the segmentation pipeline (or declared by the caller via the rule algebra). The unit of Thompson allocation: each segment receives a per-treatment allocation derived from the joint posterior.
A cell is the routing cohort. A segment is a feature-space region, typically inside an Optimized cell’s tree. The same segment may appear across multiple Optimized cells’ trees with different policies attached.
The collection of all discovered segments for one posterior is returned as part of a PolicyTreeResult (one
DiscoveredSegmentper leaf, ordered by leaf id).Defined as
contracts.DiscoveredSegmentfor the discovered surface andcontracts.SegmentRule(a discriminated union ofEqRule,InRule,ComparisonRule,BetweenRule) for the predicate that defines one.- cluster¶
A DGP-generated mixture component. Latent and truth-side. Clusters exist only in sim mode, populated by the generator. The
clustered_realistictemplate has 4 clusters representing customer archetypes. Used for evaluation (“did our discovered segments correlate with the DGP’s clusters?”), not for assignment.A segment is an observed-side feature-space region. A cluster is truth-side mixture-component identity. They may correlate. They are not the same.
Available in sim-mode
RoundDataascluster_ids: np.ndarray.- stability score¶
Bootstrap-replicability score for a discovered segment: the fraction of bootstrap policy trees (B = 50 by default) in which some leaf has Jaccard overlap ≥ 0.5 with the original segment’s member set. The bootstrap resamples per-visitor CATEs (not the BCF posterior itself), refits the same-depth tree on each resample, and reports the overlap fraction. Range
[0, 1]; segments withstability_score >= 0.80are considered credible enough to act on.Answers the boundary-replicability question: “would this tree boundary have appeared on a slightly different sample?” Credible interval width does NOT answer this question — a tight CI says the effect estimate is stable given the tree, not that the tree boundaries themselves would survive resampling.
Controlled via
posterior.fit_policy_tree(n_bootstrap=50, bootstrap_seed=...). Callingn_bootstrap=0suppresses computation and sets scores tofloat("nan"). Carried on bothDiscoveredSegment.stability_scoreand PolicyTreeResult’sstability_scoresdict keyed by leaf id. Threshold-checked by capability methods (has_credible_segments(threshold=0.80)).- HTE¶
Heterogeneous Treatment Effect: the phenomenon that the causal effect of a treatment varies across customer segments. The joint multivariate hurdle BCF estimates a per-visitor CATE surface (via the BART forest); the policy tree partitions that surface into segments where CATE is approximately constant.
CATE is the technical statistical term for the per-visitor effect. HTE names the broader phenomenon. segments are how the discovery is partitioned for interpretability.
Per-visitor CATEs live on
AnalysisResult.cate_per_visitor. Segment-level summaries live onAnalysisResult.segments.- CATE¶
Conditional Average Treatment Effect: the expected difference in outcome between treatment and control for a specific covariate combination. Formally
E[Y(1) - Y(0) | X = x]. Read per-visitor: how much would this specific visitor’s outcome change under treatment vs control, given their features.The quantity BCF estimates. The collection of CATEs across feature space is the HTE surface. Pytyche exposes per-visitor CATEs on
AnalysisResult.cate_per_visitorand segment-level summary CATEs onAnalysisResult.segments.
Models and inference¶
- BART forest¶
The tree ensemble inside the BCF — a sum of many weak trees that together approximate the underlying function. “BART” = Bayesian Additive Regression Trees. The BCF carries two such forests: a prognostic μ-forest for the baseline outcome surface and a treatment τ-forest for the conditional treatment effect. The MCMC samples a posterior over forests (each posterior sample is a different forest configuration), and what pytyche surfaces is the per-visitor posterior on
τ(x)marginalized over that posterior — never the individual trees.Forest sizes are set by
GPUBCFConfig.num_trees_muandGPUBCFConfig.num_trees_tau; seesrc/pytyche/bcf/config.pyfor current defaults.compute_num_trees_tauis the formula-driven helper for picking the τ-forest size at a target CI coverage.This is NOT the policy tree. They share the word “tree” but do different jobs: the BART forest estimates the CATE surface inside the BCF MCMC; the policy tree segments that CATE surface for downstream allocation and operator interpretability. Users don’t inspect BART trees directly; they inspect the policy tree.
Lives inside
pytyche.bcf.hurdle.*. Implemented on top ofbartz(the GPU BART primitive library).- policy tree¶
The single
sklearn.tree.DecisionTreeClassifierfit on the per-visitor CATEs from the BCF posterior, used to discover segments of feature space where the treatment effect is approximately constant. One tree (not an ensemble), deterministic given the posterior + hyperparameters, user-inspectable asPolicyTreeResult.tree. Its leaves are the segments downstream allocation, recommendation, and graduation decisions operate on.This is NOT the BART forest. The BART forest produces the CATEs; the policy tree partitions them. The policy tree is what shows up in cell recommendations (Thompson allocation’s
allocation_map[leaf_id]) and what operators see as the segmentation of “where the lift comes from.”Depth controlled by
max_segment_depthat the L1 surface (pt.sequential_experiment(max_segment_depth=3)) ormax_depthat the L2 method (posterior.fit_policy_tree(max_depth=3)). Minimum segment size controlled bymin_segment_share(default 0.10). The result is a PolicyTreeResult — a frozen dataclass carrying the tree, segments, allocation map, and bootstrap stability scores.- BCF¶
Bayesian Causal Forests. The class of model pytyche builds on for HTE estimation, introduced by Hahn, Murray, and Carvalho (2020). Combines a “prognostic” forest (estimating the baseline outcome surface) with a “treatment” forest (estimating the conditional treatment effect) to give unbiased per-visitor CATE estimates that don’t confound effect modification with prognostic signal. Both forests are BART forests.
For zero-inflated outcomes like e-commerce revenue per visitor, pytyche uses a joint hurdle BCF that shares tree topology between the conversion (probit) and severity (log-normal) channels.
Defined as
pytyche.bcf.fit_continuous_bcf,pytyche.bcf.fit_binary_bcf, and friends. The high-level sequential surface (pt.sequential_experiment) calls these under the covers.- joint hurdle BCF¶
The model pytyche actually fits for e-commerce revenue and similar zero-inflated outcomes. Two channels — a conversion probit channel and a severity log-normal channel — share tree topology, so the per-visitor CATE on revenue decomposes cleanly into “did the treatment change conversion” and “did it change basket size given conversion.”
For multi-arm experiments, the joint hurdle BCF estimates per-treatment effects jointly via shared prognostic structure rather than fitting K-1 independent contrasts (which leaks power on max-of-K selection).
Defined as
pytyche.bcf.fit_hurdle_bcf(called bypt.sequential_experimentandpt.analyze). The pooling kwarg selects between the canonical shared-tree fit and the independent two-stage literature baseline.- hurdle outcomes¶
Outcomes with two distinct components: a binary “did anything happen” gate and a positive-valued severity conditional on the gate firing. E-commerce revenue per visitor is the canonical example — most visitors convert at $0; the converting tail has continuously-distributed positive revenue.
Standard regression on a hurdle outcome confounds “treatment changes conversion probability” with “treatment changes basket size.” Hurdle BCF models the two channels separately, then combines them for the per-visitor revenue effect.
- pooling¶
The mode in which
fit_hurdle_bcf(and pt.fit when it dispatches to the hurdle path) couples the two channels of the joint hurdle BCF. Two values:"joint"(default) — canonical shared-tree fit. Conversion (probit) and severity (log-normal) share tree topology, so the per-visitor CATE decomposes cleanly and the model borrows strength across the two channels. This is the v0.2+ recommended path for typical e-commerce revenue data."independent"— independent two-stage fit. Runsfit_binary_bcffor conversion, thenfit_continuous_bcffor log-severity on converters, and composes the posteriors. Opt in when the two channels are driven by different feature subsets, when one channel has dominant HTE and shared topology distorts the other, or when a researcher wants per-channel HTE structure without the regularization-induced coupling.
Exposed as
fit_hurdle_bcf(..., pooling="joint")at the fit boundary and carried onHurdleBCFResult.pooling. Passed through verbatim when callingpt.fit(observed, pooling="independent"). Stored onCalibrationregime metadata — a calibration artifact fitted on joint-pooling data is not applied to an independent-pooling posterior.Defined as a
Literal["joint", "independent"]kwarg onpytyche.bcf.fit_hurdle_bcf. Private dispatch helpers_fit_joint_hurdle_bcf(inpytyche.bcf.hurdle.model) and_fit_independent_hurdle_bcf(inpytyche.bcf.hurdle.compose) implement the two paths.- joint posterior¶
The Bayesian posterior distribution over all model parameters considered jointly. In pytyche’s joint hurdle BCF, the joint posterior covers per-treatment conversion probabilities, per-treatment severity means, and the per-visitor treatment effects, all conditioned on the observed data.
“Joint” because the parameters are estimated together with their full correlation structure preserved, rather than fitted marginally and assumed independent. This is what lets Thompson allocation respect cross-treatment dependence.
Direct access on
AnalysisResult.posteriorfor follow-up analysis (custom decompositions, alternative ship rules, sensitivity checks).
Results, recommendations, graduation¶
- PolicyTreeResult¶
The frozen dataclass returned by
posterior.fit_policy_tree(...). Bundles the policy tree and all downstream-usable derived data:tree— the fittedsklearn.tree.DecisionTreeClassifierpartitioning feature space into segmentssegments— oneDiscoveredSegmentper leaf, ordered by leaf id; carriesrule,gate_estimate,gate_ci,stability_score,population_share,id, andarm_best_probabilitiesallocation_map—dict[leaf_id, dict[treatment_name, weight]]; each leaf’s weight dict sums to 1.0; produced by Thompson allocation under the shared best-arm rulestability_scores—dict[leaf_id, float]in[0, 1]; bootstrap-replicability scores computed by resampling visitor CATEs and Jaccard-overlap matching (see stability score)observed— reference to theObservedExperimentDatathe underlying posterior was fit on; shared by identity from the posterior (no re-clone; see observed data stashing)
The dataclass is frozen; assignment to any field raises
dataclasses.FrozenInstanceError.treeis asklearn.tree.DecisionTreeClassifierfor v0.2; a future change may introduce a pytyche wrapper with the same predict / decision path methods.PolicyTreeResultis NOT the BART forest. The BART forest estimates the CATE surface inside the MCMC; the policy tree inPolicyTreeResultpartitions that surface for downstream allocation and operator interpretability.Defined as
pytyche.contracts.PolicyTreeResult.- decision¶
The recommended ship-or-continue-or-stop call for a treatment versus baseline. A 3-value enum:
SHIP,CONTINUE,STOP.Defined as
contracts.Decision.- recommendation summary¶
A structured decision (
SHIP,CONTINUE, orSTOP) with supporting evidence: expected losses, probability of positive lift, probability of meaningful improvement, probability of harm. The decision applies five thresholds across three branches.SHIP gate —
expected_loss < toleranceANDp_positive > 0.95ANDp_better > 0.80STOP (harm) —
p_harmful > 0.90STOP (futility) —
p_better < 0.05CONTINUE — default when no gate fires
Produced by
recommendation_summary()incompare.variants. A graduation candidate is a (treatment, segment) pair whose recommendation summary has fired SHIP for N consecutive rounds.Defined as
contracts.RecommendationSummary.- graduation candidate¶
A (treatment, segment) pair where the recommendation has fired SHIP for ≥
sustained_roundsconsecutive rounds. The default rule fires whenexpected_loss < toleranceANDp_positive > 0.95ANDp_better > 0.80, sustained over at least 2 rounds. Segments withstability_score < 0.80are excluded from graduation-candidate consideration by default (see stability score). The capability methodshas_credible_segments(threshold=0.80)provides a quick check before running the full analysis.Pytyche surfaces graduation candidates as structured data. The operator (or an agentic caller) decides whether to promote one to broader rollout. The library does not auto-graduate.
Defined as
pytyche.experiment.GraduationCandidate.- next round plan¶
The recommended cell structure, treatments list, and prose summary for the next round of a sequential experiment. The handoff between the recommendation engine and the operator’s next-round decision.
Carries:
Recommended cells (typically Control + Explore + an Optimized cell with the recommended tree)
Active treatments
Dropped treatments, if any
Prose rationale
The operator may accept, partially override (for example, add a hypothesis cell), or fully replace before shipping.
Defined as
pytyche.experiment.NextRoundPlan.
L2 analysis surface¶
- pt.fit¶
The auto-selecting fit entry point at the top-level
pytychenamespace. Inspects the extracted outcome arrayYand treatment cardinalityK = len(observed.variants), then dispatches deterministically to one of three underlying fit functions:Yall{0, 1}→fit_binary_bcf→ returnsBinaryBCFResultYfloat dtype, < 30% zero entries →fit_continuous_bcf→ returnsContinuousBCFResultYfloat dtype, ≥ 30% zero entries with a positive non-zero tail →fit_hurdle_bcf(..., pooling="joint")→ returnsHurdleBCFResult(at any K, including K ≥ 3)
posterior = pt.fit(observed) # auto-selects fit posterior = pt.fit(observed, pooling="independent") # kwarg forwarded verbatim
The same
observedalways selects the same fit function (deterministic).**kwargsforward verbatim to the dispatched fit. Users who need explicit control callpt.fit_binary_bcf,pt.fit_continuous_bcf, orpt.fit_hurdle_bcfdirectly.Edge cases: all-zero
YraisesValueErrornaming both binary and hurdle interpretations; multi-armZwith binary or continuousYraisesNotImplementedError(multi-arm binary / continuous BCF is not yet shipped).The 30% zero-density threshold is a semi-empirical starting point for e-commerce revenue distributions — generous enough to catch typical revenue data, conservative enough to keep non-hurdle continuous data on the continuous path.
Defined at
pytyche.fit. Internal dispatch helper atpytyche.bcf.dispatch._dispatch_fit(or similar).- observed data stashing¶
The contract that every posterior result type (
HurdleBCFResult,ContinuousBCFResult,BinaryBCFResult) carries theObservedExperimentDatait was fit on. Analysis methods reach their inputs throughposterior.observed, not through a separately-passed handle, so fit-time and analysis-time data encodings cannot drift.posterior = pt.fit(observed) # downstream methods reach the data through the posterior tree = posterior.fit_policy_tree() # derived results share the same observed by identity assert tree.observed is posterior.observed
Derived results (
PolicyTreeResult, calibrated posteriors) hold the same reference by identity — the cost of stashing is paid once at fit time, not per-derivation. The observed_copy parameter controls what kind of stash is created.Follows the sklearn idiom (
.X_train_and similar) — downstream operations on a fitted result reach into the input data through the result.- observed_copy parameter¶
The kwarg on every fit entry point (
pt.fit,pt.fit_hurdle_bcf, etc.) that controls how the inputObservedExperimentDatais stashed on the resulting posterior. Three modes:"view"(default) — shallow clone of the dataclass with each visitors DataFrame rebuilt over read-only numpy views of the original columns. Zero data-buffer copy; in-place mutation through the stash raisesValueError(“assignment destination is read-only”). Buffers are still shared with the caller’s original handle, so mutation through the original is reflected in the stash."deep"—copy.deepcopy(observed)at fit time. Doubles memory for input data; provides a bit-stable stash independent of any subsequent mutation to the original handle."ref"—posterior.observed is observeddirectly. No view wrappers; no protection. For the power-user case that wants the cheapest possible path and accepts mutation risk on both sides.
Any other value raises
ValueErrornaming the three valid modes. See observed data stashing for the propagation contract through derived results.- capability methods¶
A pair of pure getters on every posterior result type that enable conditional downstream logic without triggering heavyweight computation:
has_credible_segments(threshold=0.80) -> bool—Trueiff at least one segment inposterior.analyze().segmentshasstability_score >= threshold. The default threshold matches theExpectedLossRuleSHIP-gate stability floor.has_decomposition() -> bool—TrueforHurdleBCFResult(the two-channel hurdle decomposition into conversion + severity);FalseforContinuousBCFResultandBinaryBCFResult.
Both are pure: no state mutation, no side effects, deterministic given the posterior. The canonical branch pattern:
if posterior.has_credible_segments(): tree = posterior.fit_policy_tree() # ship tree-based policy elif posterior.has_decomposition(): # hurdle posterior with no credible segments yet — inspect # channel decomposition to diagnose ...
Defined on each result type in
pytyche.bcf. The threshold default (0.80) is shared with stability score’s credibility cutoff.- pt.viz namespace¶
The
pytyche.vizsubmodule exposing five matplotlib-backed visualization primitives:pt.viz.plot_cells(cells, ax=None)— horizontal bar chart of cell weights for one roundpt.viz.plot_policy_tree(tree_policy, ax=None)— tree diagram from a PolicyTreeResultpt.viz.plot_segment_intervals(segments, ax=None)— forest plot of per-segment gate estimates + 80% credible intervalspt.viz.plot_calibration(calibrated_posterior, reference=None, ax=None)— R(p) calibration curve, optionally overlaid with a reference (uncalibrated) curvept.viz.experiment_evolution_gif(history, output_path, fps=1)— animated GIF rendering round-by-round cell structure and policy tree evolution
Each static primitive accepts an
axparameter for matplotlib subplot composition. Whenax=None, a new figure and axes are created and returned. The GIF helper renders to disk and returns the path.matplotlibis imported lazily —import pytychedoes NOT trigger the matplotlib import. The cost is paid only when apt.viz.*function is first called.matplotlib >= 3.7and eitherimageio >= 2.31orPillow >= 10.0are base dependencies inpyproject.toml.Importable as
import pytyche.viz as ptvizorfrom pytyche import viz. The five names are also importable directly:from pytyche.viz import plot_cells, ....Defined in
pytyche.viz.
Calibration, truth, sim mode¶
- calibration¶
On-path recalibration that corrects BCF posterior coverage at scale. Three construction paths:
pt.Calibration.from_sweep('clustered_realistic_v1') # shipped artifact pt.Calibration.from_sweep('/path/to/sweep.json') # user-fitted pt.Calibration.skip() # uncorrected
When calibration is specified, the library applies the correction automatically. When
skip()is used, uncalibrated posteriors are explicitly labeled in result objects and the library emits a warning on first fit.A
Calibrationinstance is a frozen dataclass carrying:correction— theLayeredCalibrationCorrection(layered R(p)scale-family correction payload)
Regime metadata:
metric(the outcome metric the sweep was fitted on, e.g."revenue_per_visitor"),n_treatments(the K of the fitted sweep),pooling(the pooling mode of the sweep —"joint"or"independent")applies_to(observed: ObservedExperimentData) -> bool—Trueiffobserved.metric,len(observed.variants), and the pooling mode all match the artifact’s regime metadata.posterior.apply_calibration(calibration)raisesValueErrornaming the mismatched dimension(s) whenapplies_toreturnsFalse. This prevents silently applying a K=2 revenue correction to a K=3 conversion-rate posterior.
from_sweep/skipconstructors and the shipped artifact registry land withsequential-experiment-api. The type’s minimal contract (frozen dataclass + regime metadata +applies_to) is owned by this L2 surface.Canonical home:
pytyche.calibrate.Calibration, re-exported aspt.Calibration. Calibration machinery lives insrc/pytyche/calibrate/.- calibration truth¶
Ground truth for a single calibration or simulation run: per-visitor CATE, hurdle decomposition (p0, p1, m0, m1), and effect components. Lives only in the sim and calibration paths; analysis code cannot peek at it because the type is segregated via
CalibrationBundle.Defined as
contracts.CalibrationTruth.- truth comparison¶
Per-round truth-vs-estimate metrics, populated only in sim mode.
Nonewhen the experiment runs in real-data mode.Six fields:
cate_rmse— root mean square error of estimated CATE against truthpolicy_accuracy— fraction of visitors for whom the recommended treatment matches the truth-optimal treatmentoracle_gap_rpv— RPV regret of the recommended policy vs the oracle policyrpv_policy,rpv_uniform,rpv_oracle— RPV under the recommended policy, uniform random allocation, and the oracle policy respectively
Defined as
pytyche.experiment.TruthComparison.- SBC (in pytyche)¶
In this codebase “SBC” is used loosely for simulation-based coverage evaluation and correction — generate data from known ground truth, measure how the posterior’s credible intervals (and decisions) actually perform, and fit a correction. It is not the classical rank-statistic Simulation-Based Calibration of Talts et al. (2018), which checks posterior-rank uniformity. Two modules carry the “SBC” label and neither implements that rank procedure:
pytyche.calibrate.sbc— oracle-decision and regret evaluation against planted truth (does the recommended decision match the oracle; what regret does it incur).scripts/fit_sbc_correction.py— fits the isotonic R(p) coverage correction (nominal → empirical coverage mapping).
Read “SBC” here as the umbrella for that simulate-then-correct workflow, not as the Talts diagnostic.
Setup and environment¶
- setup report¶
The structured output of
pt.check_setup(). Carries the pytyche version, JAX device list, CUDA availability, bartz version, calibration registry state, and a recommended install command when GPU is absent.report = pt.check_setup() if not report.cuda_available: print(report.recommended_install)
Defined as
pytyche.SetupReport.
Contract types — quick reference¶
The contract types this glossary references and where they live:
Term |
Type |
Module |
|---|---|---|
Decision (enum) |
|
|
Observed experiment data |
|
|
Variant data |
|
|
Visitor schema |
|
|
Segment rule |
|
|
Discovered segment |
|
|
Aligned visitor array |
|
|
Decomposition samples |
|
|
Comparison result |
|
|
Recommendation summary (type) |
|
|
Recommendation summary (function) |
|
|
Decision thresholds |
|
|
Analysis result |
|
|
Policy tree result |
|
|
Calibration |
|
|
Layered calibration correction |
|
|
Calibration truth |
|
|
Calibration bundle |
|
|
Calibration record |
|
|
Compare variants |
|
|
Claim level |
|
|
Metric family |
|
|