Glossary

Definitions for the load-bearing pytyche concepts. Several terms in this space collide easily (treatment / arm; cell / segment / cluster; CATE / HTE). Each entry below gives a short definition, the closest neighbors a reader might confuse it with, and a pointer to the code where it lives.

Sequential experiment

sequential experiment

The full adaptive experiment: one campaign across N rounds sharing treatments, schedule, and cumulative posterior. Constructed via pt.sequential_experiment(...) and iterates round by round.

exp = pt.sequential_experiment(
    generator=my_dgp,
    schedule=pt.GeometricSchedule(initial=10_000, growth=2.0, n_rounds=3),
    treatments=['control', 'low_promo', 'free_ship'],
    calibration=pt.Calibration.from_sweep('clustered_realistic_v1'),
)
for r in exp:
    inspect(r)

Each round of a sequential experiment is an experiment. The temporal slot housing it is a round. The same sequential machinery powers both sim mode (generator-driven) and real-data mode (operator-driven), distinguished only by what the generator callable returns.

Defined as pytyche.experiment.SequentialExperiment.

experiment

A single discrete experiment: observed data, analysis, cells shipped, and the recommendation for the next experiment. The shape of a traditional single-shot A/B/N test, composed of an ObservedExperimentData plus an AnalysisResult.

In a sequential experiment, each round produces one experiment. In single-shot use, pt.analyze(observed) returns an AnalysisResult directly with no sequential machinery.

Defined as pytyche.experiment.Experiment.

round

One iteration of a sequential experiment — the temporal slot housing one experiment. “Round 1, 2, 3” are positional indices into SequentialExperiment.history.

Defined as pytyche.experiment.Round.

schedule

The protocol that decides each round’s visitor count. Three shipped implementations:

  • GeometricSchedule(initial, growth, n_rounds=None) — doubling batches (matches Perchet 2016, Esfandiari 2021, Che & Namkoong 2023)

  • FixedSchedule(per_round, n_rounds) — flat batches

  • ExplicitSchedule([n_round_1, n_round_2, ...]) — user-supplied per-round visitor counts

A schedule’s n_rounds is optional. When None, the schedule is open-ended and the operator decides when to stop.

Defined as pytyche.experiment.Schedule.

generator

The callable supplied at construction that provides observed data when the sequential experiment advances. One entry point through which both sim mode and real-data mode deliver data.

Generator = Callable[
    [int, NextRoundPlan],
    tuple[ObservedExperimentData, CalibrationTruth | None],
]

Sim mode supplies a DGP that returns synthesized observations alongside truth. Real-data mode supplies a callable that fetches the round’s data from the operator’s platform, database, or other source. The library treats both identically.

Defined as pytyche.experiment.Generator type alias.

Treatments, arms, policies, cells

Four terms with sharp distinctions.

treatment

A named candidate intervention delivered to a single visitor (for example 'free_ship', 'low_promo', 'control'). The thing the BCF model estimates causal effects for. Declared via the treatments parameter on pt.sequential_experiment().

Related terms:

  • arm — internal-math term for the same concept. Pytyche’s public API uses “treatment” canonically; “arm” still appears in BCF kernel code and per-arm (K, S) array dimensions.

  • policy — the rule that picks which treatment to deliver per visitor. A treatment is the delivered intervention itself.

arm

The internal-math term for a treatment — the integer index that encodes it within BCF kernels. The joint hurdle BCF estimator carries a per-arm axis on its sample arrays (p_samples/sev_samples of shape (n, S_total, K); the (K 1) contrast vector on rpv_cate_samples).

Z = np.array([0, 1, 2, 0])    # control, arm 1, arm 2, control
basis = _compute_basis(Z)     # bcf/preprocess.py — (n, K-1) contrast coding

Pytyche’s public API always uses “treatment.” “Arm” appears in BCF kernel code and per-arm array dimensions.

Defined as integer indices in Z arrays. See src/pytyche/bcf/preprocess.py (_compute_basis) for the per-arm contrast coding.

policy

The routing rule a cell uses to decide which treatment to deliver per visitor. Four shipped variants:

  • BaselinePolicy() — always delivers the control treatment

  • UniformPolicy(over=[...]) — uniform-random over a subset of treatments. The default Explore-cell policy.

  • TreePolicy(tree, allocation_map) — sklearn DecisionTreeClassifier plus per-leaf Thompson allocation over treatments

  • Operator-defined Policy subclasses for hypothesis injection

A cell houses one policy as its routing rule. A TreePolicy wraps a decision tree and the per-leaf Thompson allocation. The tree alone does not make a policy.

Defined as pytyche.experiment.Policy protocol.

cell

An assignment cohort within a single round of a sequential experiment. Cells span the visitor population by weight. Each cell ships a policy that decides what treatment to deliver per visitor within that cell.

cells = [
    Cell('control', BaselinePolicy(), weight=0.3),
    Cell('explore', UniformPolicy(over=treatments), weight=0.4),
    Cell('optimized_v1', TreePolicy(tree, allocation_map), weight=0.3),
]

The default round-1 structure has a Control cell and an Explore cell at 50/50. The recommendation engine may add Optimized cells in subsequent rounds. Multiple Optimized cells in one round is a first-class capability for organizations running head-to-head policy variants. Operational reasons to do this include different stakeholder ownership, vendor relationships, and channel-specific creative.

A cell is an assignment-time cohort spanning the population. A segment is a region of feature space that a policy tree within an Optimized cell partitions. A cluster is a DGP-generated mixture component (truth-side, sim-only) and is neither.

Defined as pytyche.experiment.Cell.

Thompson allocation

A Bayesian allocation rule where each treatment receives a share of traffic proportional to its posterior probability of being best. Per segment, per arm: allocation[arm] = P(arm best | posterior). Each segment’s allocation sums to 1.

In pytyche, applied per segment with an ε-clip floor so that no active treatment can be allocated below a minimum share. Magnitude-aware: a segment with P(best) = 0.91 gets a markedly different allocation than P(best) = 0.55, without discrete regime thresholds.

Defined as pytyche.experiment.ThompsonPolicy (the default allocation policy behind TreePolicy and UniformPolicy).

ε-clip

An internal safety net inside Thompson allocation. Within each segment (leaf of an Optimized cell’s policy tree), every active treatment receives at least ε/K of that segment’s Optimized-cell share, where K is the active treatment count. Prevents Thompson allocation from collapsing to a single treatment per segment when one treatment dominates the posterior.

The operator-facing controls-retention story is the cell-level Control and Explore weights (see min_control_weight and min_explore_weight), not ε. The ε-clip becomes mostly redundant when both Control and Explore cells have non-zero weight, since Explore already samples every treatment uniformly across all segments at the cell level.

Not exposed at the L1 API. Lives as a Thompson-allocation implementation detail with a hard-coded default (ε = 0.02).

min_control_weight

The guaranteed minimum share of traffic the recommendation engine will allocate to the Control cell when proposing the next round’s structure. With min_control_weight=0.05, the Control cell never falls below 5% of the round’s traffic regardless of how confident the model becomes.

The baseline-measurement controls-retention floor. Even as the experiment matures and Optimized cells absorb more traffic, the Control cell continues to receive a guaranteed share so drift in the baseline outcome surface remains detectable.

Set via the min_control_weight parameter on pt.sequential_experiment(). The operator may override the recommended weight in their own next-round plan; the floor applies only to engine-proposed allocations.

min_explore_weight

The guaranteed minimum share of traffic the recommendation engine will allocate to the Explore cell when proposing the next round’s structure. With min_explore_weight=0.05, the Explore cell never falls below 5% of the round’s traffic regardless of segment confidence.

The every-treatment-observed controls-retention floor. The Explore cell samples uniformly across all active treatments, so a non-zero floor guarantees every treatment receives some traffic in every segment regardless of what the Optimized cells are doing.

Set via the min_explore_weight parameter on pt.sequential_experiment(). The operator may override in their own next-round plan; the floor applies only to engine-proposed allocations.

Segments, clusters, HTE

segment

A region of feature space, typically a leaf of a policy tree. “Mobile-returning visitors” or “desktop-new visitors” are segments discovered by the segmentation pipeline (or declared by the caller via the rule algebra). The unit of Thompson allocation: each segment receives a per-treatment allocation derived from the joint posterior.

A cell is the routing cohort. A segment is a feature-space region, typically inside an Optimized cell’s tree. The same segment may appear across multiple Optimized cells’ trees with different policies attached.

The collection of all discovered segments for one posterior is returned as part of a PolicyTreeResult (one DiscoveredSegment per leaf, ordered by leaf id).

Defined as contracts.DiscoveredSegment for the discovered surface and contracts.SegmentRule (a discriminated union of EqRule, InRule, ComparisonRule, BetweenRule) for the predicate that defines one.

cluster

A DGP-generated mixture component. Latent and truth-side. Clusters exist only in sim mode, populated by the generator. The clustered_realistic template has 4 clusters representing customer archetypes. Used for evaluation (“did our discovered segments correlate with the DGP’s clusters?”), not for assignment.

A segment is an observed-side feature-space region. A cluster is truth-side mixture-component identity. They may correlate. They are not the same.

Available in sim-mode RoundData as cluster_ids: np.ndarray.

stability score

Bootstrap-replicability score for a discovered segment: the fraction of bootstrap policy trees (B = 50 by default) in which some leaf has Jaccard overlap ≥ 0.5 with the original segment’s member set. The bootstrap resamples per-visitor CATEs (not the BCF posterior itself), refits the same-depth tree on each resample, and reports the overlap fraction. Range [0, 1]; segments with stability_score >= 0.80 are considered credible enough to act on.

Answers the boundary-replicability question: “would this tree boundary have appeared on a slightly different sample?” Credible interval width does NOT answer this question — a tight CI says the effect estimate is stable given the tree, not that the tree boundaries themselves would survive resampling.

Controlled via posterior.fit_policy_tree(n_bootstrap=50, bootstrap_seed=...). Calling n_bootstrap=0 suppresses computation and sets scores to float("nan"). Carried on both DiscoveredSegment.stability_score and PolicyTreeResult’s stability_scores dict keyed by leaf id. Threshold-checked by capability methods (has_credible_segments(threshold=0.80)).

HTE

Heterogeneous Treatment Effect: the phenomenon that the causal effect of a treatment varies across customer segments. The joint multivariate hurdle BCF estimates a per-visitor CATE surface (via the BART forest); the policy tree partitions that surface into segments where CATE is approximately constant.

CATE is the technical statistical term for the per-visitor effect. HTE names the broader phenomenon. segments are how the discovery is partitioned for interpretability.

Per-visitor CATEs live on AnalysisResult.cate_per_visitor. Segment-level summaries live on AnalysisResult.segments.

CATE

Conditional Average Treatment Effect: the expected difference in outcome between treatment and control for a specific covariate combination. Formally E[Y(1) - Y(0) | X = x]. Read per-visitor: how much would this specific visitor’s outcome change under treatment vs control, given their features.

The quantity BCF estimates. The collection of CATEs across feature space is the HTE surface. Pytyche exposes per-visitor CATEs on AnalysisResult.cate_per_visitor and segment-level summary CATEs on AnalysisResult.segments.

Models and inference

BART forest

The tree ensemble inside the BCF — a sum of many weak trees that together approximate the underlying function. “BART” = Bayesian Additive Regression Trees. The BCF carries two such forests: a prognostic μ-forest for the baseline outcome surface and a treatment τ-forest for the conditional treatment effect. The MCMC samples a posterior over forests (each posterior sample is a different forest configuration), and what pytyche surfaces is the per-visitor posterior on τ(x) marginalized over that posterior — never the individual trees.

Forest sizes are set by GPUBCFConfig.num_trees_mu and GPUBCFConfig.num_trees_tau; see src/pytyche/bcf/config.py for current defaults. compute_num_trees_tau is the formula-driven helper for picking the τ-forest size at a target CI coverage.

This is NOT the policy tree. They share the word “tree” but do different jobs: the BART forest estimates the CATE surface inside the BCF MCMC; the policy tree segments that CATE surface for downstream allocation and operator interpretability. Users don’t inspect BART trees directly; they inspect the policy tree.

Lives inside pytyche.bcf.hurdle.*. Implemented on top of bartz (the GPU BART primitive library).

policy tree

The single sklearn.tree.DecisionTreeClassifier fit on the per-visitor CATEs from the BCF posterior, used to discover segments of feature space where the treatment effect is approximately constant. One tree (not an ensemble), deterministic given the posterior + hyperparameters, user-inspectable as PolicyTreeResult.tree. Its leaves are the segments downstream allocation, recommendation, and graduation decisions operate on.

This is NOT the BART forest. The BART forest produces the CATEs; the policy tree partitions them. The policy tree is what shows up in cell recommendations (Thompson allocation’s allocation_map[leaf_id]) and what operators see as the segmentation of “where the lift comes from.”

Depth controlled by max_segment_depth at the L1 surface (pt.sequential_experiment(max_segment_depth=3)) or max_depth at the L2 method (posterior.fit_policy_tree(max_depth=3)). Minimum segment size controlled by min_segment_share (default 0.10). The result is a PolicyTreeResult — a frozen dataclass carrying the tree, segments, allocation map, and bootstrap stability scores.

BCF

Bayesian Causal Forests. The class of model pytyche builds on for HTE estimation, introduced by Hahn, Murray, and Carvalho (2020). Combines a “prognostic” forest (estimating the baseline outcome surface) with a “treatment” forest (estimating the conditional treatment effect) to give unbiased per-visitor CATE estimates that don’t confound effect modification with prognostic signal. Both forests are BART forests.

For zero-inflated outcomes like e-commerce revenue per visitor, pytyche uses a joint hurdle BCF that shares tree topology between the conversion (probit) and severity (log-normal) channels.

Defined as pytyche.bcf.fit_continuous_bcf, pytyche.bcf.fit_binary_bcf, and friends. The high-level sequential surface (pt.sequential_experiment) calls these under the covers.

joint hurdle BCF

The model pytyche actually fits for e-commerce revenue and similar zero-inflated outcomes. Two channels — a conversion probit channel and a severity log-normal channel — share tree topology, so the per-visitor CATE on revenue decomposes cleanly into “did the treatment change conversion” and “did it change basket size given conversion.”

For multi-arm experiments, the joint hurdle BCF estimates per-treatment effects jointly via shared prognostic structure rather than fitting K-1 independent contrasts (which leaks power on max-of-K selection).

Defined as pytyche.bcf.fit_hurdle_bcf (called by pt.sequential_experiment and pt.analyze). The pooling kwarg selects between the canonical shared-tree fit and the independent two-stage literature baseline.

hurdle outcomes

Outcomes with two distinct components: a binary “did anything happen” gate and a positive-valued severity conditional on the gate firing. E-commerce revenue per visitor is the canonical example — most visitors convert at $0; the converting tail has continuously-distributed positive revenue.

Standard regression on a hurdle outcome confounds “treatment changes conversion probability” with “treatment changes basket size.” Hurdle BCF models the two channels separately, then combines them for the per-visitor revenue effect.

pooling

The mode in which fit_hurdle_bcf (and pt.fit when it dispatches to the hurdle path) couples the two channels of the joint hurdle BCF. Two values:

  • "joint" (default) — canonical shared-tree fit. Conversion (probit) and severity (log-normal) share tree topology, so the per-visitor CATE decomposes cleanly and the model borrows strength across the two channels. This is the v0.2+ recommended path for typical e-commerce revenue data.

  • "independent" — independent two-stage fit. Runs fit_binary_bcf for conversion, then fit_continuous_bcf for log-severity on converters, and composes the posteriors. Opt in when the two channels are driven by different feature subsets, when one channel has dominant HTE and shared topology distorts the other, or when a researcher wants per-channel HTE structure without the regularization-induced coupling.

Exposed as fit_hurdle_bcf(..., pooling="joint") at the fit boundary and carried on HurdleBCFResult.pooling. Passed through verbatim when calling pt.fit(observed, pooling="independent"). Stored on Calibration regime metadata — a calibration artifact fitted on joint-pooling data is not applied to an independent-pooling posterior.

Defined as a Literal["joint", "independent"] kwarg on pytyche.bcf.fit_hurdle_bcf. Private dispatch helpers _fit_joint_hurdle_bcf (in pytyche.bcf.hurdle.model) and _fit_independent_hurdle_bcf (in pytyche.bcf.hurdle.compose) implement the two paths.

joint posterior

The Bayesian posterior distribution over all model parameters considered jointly. In pytyche’s joint hurdle BCF, the joint posterior covers per-treatment conversion probabilities, per-treatment severity means, and the per-visitor treatment effects, all conditioned on the observed data.

“Joint” because the parameters are estimated together with their full correlation structure preserved, rather than fitted marginally and assumed independent. This is what lets Thompson allocation respect cross-treatment dependence.

Direct access on AnalysisResult.posterior for follow-up analysis (custom decompositions, alternative ship rules, sensitivity checks).

Results, recommendations, graduation

PolicyTreeResult

The frozen dataclass returned by posterior.fit_policy_tree(...). Bundles the policy tree and all downstream-usable derived data:

  • tree — the fitted sklearn.tree.DecisionTreeClassifier partitioning feature space into segments

  • segments — one DiscoveredSegment per leaf, ordered by leaf id; carries rule, gate_estimate, gate_ci, stability_score, population_share, id, and arm_best_probabilities

  • allocation_mapdict[leaf_id, dict[treatment_name, weight]]; each leaf’s weight dict sums to 1.0; produced by Thompson allocation under the shared best-arm rule

  • stability_scoresdict[leaf_id, float] in [0, 1]; bootstrap-replicability scores computed by resampling visitor CATEs and Jaccard-overlap matching (see stability score)

  • observed — reference to the ObservedExperimentData the underlying posterior was fit on; shared by identity from the posterior (no re-clone; see observed data stashing)

The dataclass is frozen; assignment to any field raises dataclasses.FrozenInstanceError. tree is a sklearn.tree.DecisionTreeClassifier for v0.2; a future change may introduce a pytyche wrapper with the same predict / decision path methods.

PolicyTreeResult is NOT the BART forest. The BART forest estimates the CATE surface inside the MCMC; the policy tree in PolicyTreeResult partitions that surface for downstream allocation and operator interpretability.

Defined as pytyche.contracts.PolicyTreeResult.

decision

The recommended ship-or-continue-or-stop call for a treatment versus baseline. A 3-value enum: SHIP, CONTINUE, STOP.

Defined as contracts.Decision.

recommendation summary

A structured decision (SHIP, CONTINUE, or STOP) with supporting evidence: expected losses, probability of positive lift, probability of meaningful improvement, probability of harm. The decision applies five thresholds across three branches.

  • SHIP gateexpected_loss < tolerance AND p_positive > 0.95 AND p_better > 0.80

  • STOP (harm)p_harmful > 0.90

  • STOP (futility)p_better < 0.05

  • CONTINUE — default when no gate fires

Produced by recommendation_summary() in compare.variants. A graduation candidate is a (treatment, segment) pair whose recommendation summary has fired SHIP for N consecutive rounds.

Defined as contracts.RecommendationSummary.

graduation candidate

A (treatment, segment) pair where the recommendation has fired SHIP for ≥ sustained_rounds consecutive rounds. The default rule fires when expected_loss < tolerance AND p_positive > 0.95 AND p_better > 0.80, sustained over at least 2 rounds. Segments with stability_score < 0.80 are excluded from graduation-candidate consideration by default (see stability score). The capability methods has_credible_segments(threshold=0.80) provides a quick check before running the full analysis.

Pytyche surfaces graduation candidates as structured data. The operator (or an agentic caller) decides whether to promote one to broader rollout. The library does not auto-graduate.

Defined as pytyche.experiment.GraduationCandidate.

next round plan

The recommended cell structure, treatments list, and prose summary for the next round of a sequential experiment. The handoff between the recommendation engine and the operator’s next-round decision.

Carries:

  • Recommended cells (typically Control + Explore + an Optimized cell with the recommended tree)

  • Active treatments

  • Dropped treatments, if any

  • graduation candidates

  • Prose rationale

The operator may accept, partially override (for example, add a hypothesis cell), or fully replace before shipping.

Defined as pytyche.experiment.NextRoundPlan.

L2 analysis surface

pt.fit

The auto-selecting fit entry point at the top-level pytyche namespace. Inspects the extracted outcome array Y and treatment cardinality K = len(observed.variants), then dispatches deterministically to one of three underlying fit functions:

  • Y all {0, 1}fit_binary_bcf → returns BinaryBCFResult

  • Y float dtype, < 30% zero entries → fit_continuous_bcf → returns ContinuousBCFResult

  • Y float dtype, ≥ 30% zero entries with a positive non-zero tail → fit_hurdle_bcf(..., pooling="joint") → returns HurdleBCFResult (at any K, including K ≥ 3)

posterior = pt.fit(observed)               # auto-selects fit
posterior = pt.fit(observed,
    pooling="independent")                 # kwarg forwarded verbatim

The same observed always selects the same fit function (deterministic). **kwargs forward verbatim to the dispatched fit. Users who need explicit control call pt.fit_binary_bcf, pt.fit_continuous_bcf, or pt.fit_hurdle_bcf directly.

Edge cases: all-zero Y raises ValueError naming both binary and hurdle interpretations; multi-arm Z with binary or continuous Y raises NotImplementedError (multi-arm binary / continuous BCF is not yet shipped).

The 30% zero-density threshold is a semi-empirical starting point for e-commerce revenue distributions — generous enough to catch typical revenue data, conservative enough to keep non-hurdle continuous data on the continuous path.

Defined at pytyche.fit. Internal dispatch helper at pytyche.bcf.dispatch._dispatch_fit (or similar).

observed data stashing

The contract that every posterior result type (HurdleBCFResult, ContinuousBCFResult, BinaryBCFResult) carries the ObservedExperimentData it was fit on. Analysis methods reach their inputs through posterior.observed, not through a separately-passed handle, so fit-time and analysis-time data encodings cannot drift.

posterior = pt.fit(observed)
# downstream methods reach the data through the posterior
tree = posterior.fit_policy_tree()
# derived results share the same observed by identity
assert tree.observed is posterior.observed

Derived results (PolicyTreeResult, calibrated posteriors) hold the same reference by identity — the cost of stashing is paid once at fit time, not per-derivation. The observed_copy parameter controls what kind of stash is created.

Follows the sklearn idiom (.X_train_ and similar) — downstream operations on a fitted result reach into the input data through the result.

observed_copy parameter

The kwarg on every fit entry point (pt.fit, pt.fit_hurdle_bcf, etc.) that controls how the input ObservedExperimentData is stashed on the resulting posterior. Three modes:

  • "view" (default) — shallow clone of the dataclass with each visitors DataFrame rebuilt over read-only numpy views of the original columns. Zero data-buffer copy; in-place mutation through the stash raises ValueError (“assignment destination is read-only”). Buffers are still shared with the caller’s original handle, so mutation through the original is reflected in the stash.

  • "deep"copy.deepcopy(observed) at fit time. Doubles memory for input data; provides a bit-stable stash independent of any subsequent mutation to the original handle.

  • "ref"posterior.observed is observed directly. No view wrappers; no protection. For the power-user case that wants the cheapest possible path and accepts mutation risk on both sides.

Any other value raises ValueError naming the three valid modes. See observed data stashing for the propagation contract through derived results.

capability methods

A pair of pure getters on every posterior result type that enable conditional downstream logic without triggering heavyweight computation:

  • has_credible_segments(threshold=0.80) -> boolTrue iff at least one segment in posterior.analyze().segments has stability_score >= threshold. The default threshold matches the ExpectedLossRule SHIP-gate stability floor.

  • has_decomposition() -> boolTrue for HurdleBCFResult (the two-channel hurdle decomposition into conversion + severity); False for ContinuousBCFResult and BinaryBCFResult.

Both are pure: no state mutation, no side effects, deterministic given the posterior. The canonical branch pattern:

if posterior.has_credible_segments():
    tree = posterior.fit_policy_tree()
    # ship tree-based policy
elif posterior.has_decomposition():
    # hurdle posterior with no credible segments yet — inspect
    # channel decomposition to diagnose
    ...

Defined on each result type in pytyche.bcf. The threshold default (0.80) is shared with stability score’s credibility cutoff.

pt.viz namespace

The pytyche.viz submodule exposing five matplotlib-backed visualization primitives:

  • pt.viz.plot_cells(cells, ax=None) — horizontal bar chart of cell weights for one round

  • pt.viz.plot_policy_tree(tree_policy, ax=None) — tree diagram from a PolicyTreeResult

  • pt.viz.plot_segment_intervals(segments, ax=None) — forest plot of per-segment gate estimates + 80% credible intervals

  • pt.viz.plot_calibration(calibrated_posterior, reference=None, ax=None) — R(p) calibration curve, optionally overlaid with a reference (uncalibrated) curve

  • pt.viz.experiment_evolution_gif(history, output_path, fps=1) — animated GIF rendering round-by-round cell structure and policy tree evolution

Each static primitive accepts an ax parameter for matplotlib subplot composition. When ax=None, a new figure and axes are created and returned. The GIF helper renders to disk and returns the path.

matplotlib is imported lazily — import pytyche does NOT trigger the matplotlib import. The cost is paid only when a pt.viz.* function is first called. matplotlib >= 3.7 and either imageio >= 2.31 or Pillow >= 10.0 are base dependencies in pyproject.toml.

Importable as import pytyche.viz as ptviz or from pytyche import viz. The five names are also importable directly: from pytyche.viz import plot_cells, ....

Defined in pytyche.viz.

Calibration, truth, sim mode

calibration

On-path recalibration that corrects BCF posterior coverage at scale. Three construction paths:

pt.Calibration.from_sweep('clustered_realistic_v1')  # shipped artifact
pt.Calibration.from_sweep('/path/to/sweep.json')      # user-fitted
pt.Calibration.skip()                                  # uncorrected

When calibration is specified, the library applies the correction automatically. When skip() is used, uncalibrated posteriors are explicitly labeled in result objects and the library emits a warning on first fit.

A Calibration instance is a frozen dataclass carrying:

  • correction — the LayeredCalibrationCorrection (layered R(p)

    • scale-family correction payload)

  • Regime metadata: metric (the outcome metric the sweep was fitted on, e.g. "revenue_per_visitor"), n_treatments (the K of the fitted sweep), pooling (the pooling mode of the sweep — "joint" or "independent")

  • applies_to(observed: ObservedExperimentData) -> boolTrue iff observed.metric, len(observed.variants), and the pooling mode all match the artifact’s regime metadata. posterior.apply_calibration(calibration) raises ValueError naming the mismatched dimension(s) when applies_to returns False. This prevents silently applying a K=2 revenue correction to a K=3 conversion-rate posterior.

from_sweep / skip constructors and the shipped artifact registry land with sequential-experiment-api. The type’s minimal contract (frozen dataclass + regime metadata + applies_to) is owned by this L2 surface.

Canonical home: pytyche.calibrate.Calibration, re-exported as pt.Calibration. Calibration machinery lives in src/pytyche/calibrate/.

calibration truth

Ground truth for a single calibration or simulation run: per-visitor CATE, hurdle decomposition (p0, p1, m0, m1), and effect components. Lives only in the sim and calibration paths; analysis code cannot peek at it because the type is segregated via CalibrationBundle.

Defined as contracts.CalibrationTruth.

truth comparison

Per-round truth-vs-estimate metrics, populated only in sim mode. None when the experiment runs in real-data mode.

Six fields:

  • cate_rmse — root mean square error of estimated CATE against truth

  • policy_accuracy — fraction of visitors for whom the recommended treatment matches the truth-optimal treatment

  • oracle_gap_rpv — RPV regret of the recommended policy vs the oracle policy

  • rpv_policy, rpv_uniform, rpv_oracle — RPV under the recommended policy, uniform random allocation, and the oracle policy respectively

Defined as pytyche.experiment.TruthComparison.

SBC (in pytyche)

In this codebase “SBC” is used loosely for simulation-based coverage evaluation and correction — generate data from known ground truth, measure how the posterior’s credible intervals (and decisions) actually perform, and fit a correction. It is not the classical rank-statistic Simulation-Based Calibration of Talts et al. (2018), which checks posterior-rank uniformity. Two modules carry the “SBC” label and neither implements that rank procedure:

  • pytyche.calibrate.sbc — oracle-decision and regret evaluation against planted truth (does the recommended decision match the oracle; what regret does it incur).

  • scripts/fit_sbc_correction.py — fits the isotonic R(p) coverage correction (nominal → empirical coverage mapping).

Read “SBC” here as the umbrella for that simulate-then-correct workflow, not as the Talts diagnostic.

Setup and environment

setup report

The structured output of pt.check_setup(). Carries the pytyche version, JAX device list, CUDA availability, bartz version, calibration registry state, and a recommended install command when GPU is absent.

report = pt.check_setup()
if not report.cuda_available:
    print(report.recommended_install)

Defined as pytyche.SetupReport.

Contract types — quick reference

The contract types this glossary references and where they live:

Term

Type

Module

Decision (enum)

Decision

contracts

Observed experiment data

ObservedExperimentData

contracts

Variant data

VariantData

contracts

Visitor schema

VISITOR_SCHEMA

contracts

Segment rule

SegmentRule (union: EqRule, InRule, ComparisonRule, BetweenRule)

contracts

Discovered segment

DiscoveredSegment

contracts

Aligned visitor array

AlignedVisitorArray

contracts

Decomposition samples

DecompositionSamples

contracts

Comparison result

ComparisonResult

contracts

Recommendation summary (type)

RecommendationSummary

contracts

Recommendation summary (function)

recommendation_summary()

compare.variants

Decision thresholds

DecisionThresholds

compare.variants

Analysis result

AnalysisResult

contracts

Policy tree result

PolicyTreeResult

contracts

Calibration

Calibration

calibrate

Layered calibration correction

LayeredCalibrationCorrection

calibrate.layered

Calibration truth

CalibrationTruth

contracts

Calibration bundle

CalibrationBundle

contracts

Calibration record

CalibrationRecord

contracts

Compare variants

compare_variants()

compare.variants

Claim level

ClaimLevel (enum)

contracts

Metric family

MetricFamily (enum)

contracts