---
title: Glossary
review-state: drafting
last-human-review: "2026-06-11"
depends-on:
  - src/pytyche/contracts.py
  - src/pytyche/compare/variants.py
  - src/pytyche/experiment
  - src/pytyche/bcf
  - src/pytyche/calibrate
owner: tradcliffe
quadrant: concept
---

# Glossary

Definitions for the load-bearing pytyche concepts. Several terms in
this space collide easily (treatment / arm; cell / segment / cluster;
CATE / HTE). Each entry below gives a short definition, the closest
neighbors a reader might confuse it with, and a pointer to the code
where it lives.

## Sequential experiment

:::{glossary}
sequential experiment
   The full adaptive experiment: one campaign across N rounds sharing
   treatments, schedule, and cumulative posterior. Constructed via
   `pt.sequential_experiment(...)` and iterates round by round.

   ```python
   exp = pt.sequential_experiment(
       generator=my_dgp,
       schedule=pt.GeometricSchedule(initial=10_000, growth=2.0, n_rounds=3),
       treatments=['control', 'low_promo', 'free_ship'],
       calibration=pt.Calibration.from_sweep('clustered_realistic_v1'),
   )
   for r in exp:
       inspect(r)
   ```

   Each round of a sequential experiment is an {term}`experiment`.
   The temporal slot housing it is a {term}`round`. The same
   sequential machinery powers both sim mode (generator-driven) and
   real-data mode (operator-driven), distinguished only by what the
   `generator` callable returns.

   Defined as `pytyche.experiment.SequentialExperiment`.

experiment
   A single discrete experiment: observed data, analysis, cells
   shipped, and the recommendation for the next experiment. The
   shape of a traditional single-shot A/B/N test, composed of an
   `ObservedExperimentData` plus an `AnalysisResult`.

   In a {term}`sequential experiment`, each round produces one
   experiment. In single-shot use, `pt.analyze(observed)` returns an
   `AnalysisResult` directly with no sequential machinery.

   Defined as `pytyche.experiment.Experiment`.

round
   One iteration of a sequential experiment — the temporal slot
   housing one {term}`experiment`. "Round 1, 2, 3" are positional
   indices into `SequentialExperiment.history`.

   Defined as `pytyche.experiment.Round`.

schedule
   The protocol that decides each round's visitor count. Three
   shipped implementations:

   * `GeometricSchedule(initial, growth, n_rounds=None)` — doubling
     batches (matches Perchet 2016, Esfandiari 2021, Che & Namkoong
     2023)
   * `FixedSchedule(per_round, n_rounds)` — flat batches
   * `ExplicitSchedule([n_round_1, n_round_2, ...])` — user-supplied
     per-round visitor counts

   A schedule's `n_rounds` is optional. When None, the schedule is
   open-ended and the operator decides when to stop.

   Defined as `pytyche.experiment.Schedule`.

generator
   The callable supplied at construction that provides observed data
   when the sequential experiment advances. One entry point through
   which both sim mode and real-data mode deliver data.

   ```python
   Generator = Callable[
       [int, NextRoundPlan],
       tuple[ObservedExperimentData, CalibrationTruth | None],
   ]
   ```

   Sim mode supplies a DGP that returns synthesized observations
   alongside truth. Real-data mode supplies a callable that fetches
   the round's data from the operator's platform, database, or other
   source. The library treats both identically.

   Defined as `pytyche.experiment.Generator` type alias.
:::

## Treatments, arms, policies, cells

Four terms with sharp distinctions.

:::{glossary}
treatment
   A named candidate intervention delivered to a single visitor (for
   example `'free_ship'`, `'low_promo'`, `'control'`). The thing the
   BCF model estimates causal effects for. Declared via the
   `treatments` parameter on `pt.sequential_experiment()`.

   Related terms:

   * {term}`arm` — internal-math term for the same concept. Pytyche's
     public API uses "treatment" canonically; "arm" still appears in
     BCF kernel code and per-arm `(K, S)` array dimensions.
   * {term}`policy` — the rule that picks which treatment to deliver
     per visitor. A treatment is the delivered intervention itself.

arm
   The internal-math term for a {term}`treatment` — the integer
   index that encodes it within BCF kernels. The joint hurdle BCF
   estimator carries a per-arm axis on its sample arrays
   (`p_samples`/`sev_samples` of shape `(n, S_total, K)`; the
   `(K − 1)` contrast vector on `rpv_cate_samples`).

   ```python
   Z = np.array([0, 1, 2, 0])    # control, arm 1, arm 2, control
   basis = _compute_basis(Z)     # bcf/preprocess.py — (n, K-1) contrast coding
   ```

   Pytyche's public API always uses "treatment." "Arm" appears in
   BCF kernel code and per-arm array dimensions.

   Defined as integer indices in `Z` arrays. See
   `src/pytyche/bcf/preprocess.py` (`_compute_basis`) for the
   per-arm contrast coding.

policy
   The routing rule a cell uses to decide which treatment to deliver
   per visitor. Four shipped variants:

   * `BaselinePolicy()` — always delivers the control treatment
   * `UniformPolicy(over=[...])` — uniform-random over a subset of
     treatments. The default Explore-cell policy.
   * `TreePolicy(tree, allocation_map)` — sklearn
     `DecisionTreeClassifier` plus per-leaf {term}`Thompson allocation`
     over treatments
   * Operator-defined Policy subclasses for hypothesis injection

   A {term}`cell` houses one policy as its routing rule. A
   `TreePolicy` wraps a decision tree and the per-leaf Thompson
   allocation. The tree alone does not make a policy.

   Defined as `pytyche.experiment.Policy` protocol.

cell
   An assignment cohort within a single round of a sequential
   experiment. Cells span the visitor population by weight. Each
   cell ships a {term}`policy` that decides what treatment to
   deliver per visitor within that cell.

   ```python
   cells = [
       Cell('control', BaselinePolicy(), weight=0.3),
       Cell('explore', UniformPolicy(over=treatments), weight=0.4),
       Cell('optimized_v1', TreePolicy(tree, allocation_map), weight=0.3),
   ]
   ```

   The default round-1 structure has a Control cell and an Explore
   cell at 50/50. The recommendation engine may add Optimized cells
   in subsequent rounds. Multiple Optimized cells in one round is a
   first-class capability for organizations running head-to-head
   policy variants. Operational reasons to do this include different
   stakeholder ownership, vendor relationships, and channel-specific
   creative.

   A cell is an assignment-time cohort spanning the population.
   A {term}`segment` is a region of feature space that a policy tree
   within an Optimized cell partitions. A {term}`cluster` is a
   DGP-generated mixture component (truth-side, sim-only) and is
   neither.

   Defined as `pytyche.experiment.Cell`.

Thompson allocation
   A Bayesian allocation rule where each treatment receives a share
   of traffic proportional to its posterior probability of being
   best. Per segment, per arm: `allocation[arm] = P(arm best |
   posterior)`. Each segment's allocation sums to 1.

   In pytyche, applied per segment with an {term}`ε-clip` floor so
   that no active treatment can be allocated below a minimum share.
   Magnitude-aware: a segment with `P(best) = 0.91` gets a markedly
   different allocation than `P(best) = 0.55`, without discrete
   regime thresholds.

   Defined as `pytyche.experiment.ThompsonPolicy` (the default
   allocation policy behind `TreePolicy` and `UniformPolicy`).

ε-clip
   An internal safety net inside {term}`Thompson allocation`. Within
   each segment (leaf of an Optimized {term}`cell`'s policy tree),
   every active treatment receives at least `ε/K` of that segment's
   Optimized-cell share, where K is the active treatment count.
   Prevents Thompson allocation from collapsing to a single
   treatment per segment when one treatment dominates the
   posterior.

   The operator-facing controls-retention story is the cell-level
   Control and Explore weights (see {term}`min_control_weight` and
   {term}`min_explore_weight`), not ε. The ε-clip becomes mostly
   redundant when both Control and Explore cells have non-zero
   weight, since Explore already samples every treatment uniformly
   across all segments at the cell level.

   Not exposed at the L1 API. Lives as a Thompson-allocation
   implementation detail with a hard-coded default (ε = 0.02).

min_control_weight
   The guaranteed minimum share of traffic the recommendation
   engine will allocate to the Control {term}`cell` when proposing
   the next round's structure. With `min_control_weight=0.05`, the
   Control cell never falls below 5% of the round's traffic
   regardless of how confident the model becomes.

   The baseline-measurement controls-retention floor. Even as the
   experiment matures and Optimized cells absorb more traffic, the
   Control cell continues to receive a guaranteed share so drift
   in the baseline outcome surface remains detectable.

   Set via the `min_control_weight` parameter on
   `pt.sequential_experiment()`. The operator may override the
   recommended weight in their own next-round plan; the floor
   applies only to engine-proposed allocations.

min_explore_weight
   The guaranteed minimum share of traffic the recommendation
   engine will allocate to the Explore {term}`cell` when proposing
   the next round's structure. With `min_explore_weight=0.05`, the
   Explore cell never falls below 5% of the round's traffic
   regardless of segment confidence.

   The every-treatment-observed controls-retention floor. The
   Explore cell samples uniformly across all active treatments, so
   a non-zero floor guarantees every treatment receives some
   traffic in every segment regardless of what the Optimized cells
   are doing.

   Set via the `min_explore_weight` parameter on
   `pt.sequential_experiment()`. The operator may override in their
   own next-round plan; the floor applies only to engine-proposed
   allocations.
:::

## Segments, clusters, HTE

:::{glossary}
segment
   A region of feature space, typically a leaf of a policy tree.
   "Mobile-returning visitors" or "desktop-new visitors" are
   segments discovered by the segmentation pipeline (or declared
   by the caller via the rule algebra). The unit of {term}`Thompson allocation`: each segment
   receives a per-treatment allocation derived from the
   {term}`joint posterior`.

   A {term}`cell` is the routing cohort. A segment is a
   feature-space region, typically inside an Optimized cell's tree.
   The same segment may appear across multiple Optimized cells'
   trees with different policies attached.

   The collection of all discovered segments for one posterior is
   returned as part of a {term}`PolicyTreeResult` (one
   `DiscoveredSegment` per leaf, ordered by leaf id).

   Defined as `contracts.DiscoveredSegment` for the discovered
   surface and `contracts.SegmentRule` (a discriminated union of
   `EqRule`, `InRule`, `ComparisonRule`, `BetweenRule`) for the
   predicate that defines one.

cluster
   A DGP-generated mixture component. Latent and truth-side.
   Clusters exist only in sim mode, populated by the generator. The
   `clustered_realistic` template has 4 clusters representing
   customer archetypes. Used for evaluation ("did our discovered
   segments correlate with the DGP's clusters?"), not for
   assignment.

   A {term}`segment` is an observed-side feature-space region. A
   cluster is truth-side mixture-component identity. They may
   correlate. They are not the same.

   Available in sim-mode `RoundData` as `cluster_ids: np.ndarray`.

stability score
   Bootstrap-replicability score for a discovered segment: the
   fraction of bootstrap policy trees (B = 50 by default) in which
   some leaf has Jaccard overlap ≥ 0.5 with the original segment's
   member set. The bootstrap resamples per-visitor CATEs (not the
   BCF posterior itself), refits the same-depth tree on each
   resample, and reports the overlap fraction. Range `[0, 1]`;
   segments with `stability_score >= 0.80` are considered credible
   enough to act on.

   Answers the boundary-replicability question: "would this tree
   boundary have appeared on a slightly different sample?" Credible
   interval width does NOT answer this question — a tight CI says
   the *effect estimate* is stable given the tree, not that the
   *tree boundaries* themselves would survive resampling.

   Controlled via `posterior.fit_policy_tree(n_bootstrap=50,
   bootstrap_seed=...)`. Calling `n_bootstrap=0` suppresses
   computation and sets scores to `float("nan")`. Carried on
   both `DiscoveredSegment.stability_score` and
   {term}`PolicyTreeResult`'s `stability_scores` dict keyed by
   leaf id. Threshold-checked by {term}`capability methods`
   (`has_credible_segments(threshold=0.80)`).

HTE
   Heterogeneous Treatment Effect: the phenomenon that the causal
   effect of a treatment varies across customer segments. The
   joint multivariate hurdle BCF estimates a per-visitor
   {term}`CATE` surface (via the {term}`BART forest`); the
   {term}`policy tree` partitions that surface into segments where
   CATE is approximately constant.

   {term}`CATE` is the technical statistical term for the per-visitor
   effect. HTE names the broader phenomenon. {term}`segment`s are
   how the discovery is partitioned for interpretability.

   Per-visitor CATEs live on `AnalysisResult.cate_per_visitor`.
   Segment-level summaries live on `AnalysisResult.segments`.

CATE
   Conditional Average Treatment Effect: the expected difference
   in outcome between treatment and control for a specific
   covariate combination. Formally `E[Y(1) - Y(0) | X = x]`. Read
   per-visitor: how much would this specific visitor's outcome
   change under treatment vs control, given their features.

   The quantity {term}`BCF` estimates. The collection of CATEs
   across feature space is the {term}`HTE` surface. Pytyche
   exposes per-visitor CATEs on `AnalysisResult.cate_per_visitor`
   and segment-level summary CATEs on `AnalysisResult.segments`.
:::

## Models and inference

:::{glossary}
BART forest
   The tree ensemble inside the BCF — a sum of many weak trees that
   together approximate the underlying function. "BART" = Bayesian
   Additive Regression Trees. The BCF carries two such forests: a
   prognostic μ-forest for the baseline outcome surface and a
   treatment τ-forest for the conditional treatment effect. The
   MCMC samples a *posterior over forests* (each posterior sample
   is a different forest configuration), and what pytyche surfaces
   is the per-visitor posterior on `τ(x)` marginalized over that
   posterior — never the individual trees.

   Forest sizes are set by `GPUBCFConfig.num_trees_mu` and
   `GPUBCFConfig.num_trees_tau`; see `src/pytyche/bcf/config.py` for
   current defaults. `compute_num_trees_tau` is the formula-driven
   helper for picking the τ-forest size at a target CI coverage.

   This is NOT the {term}`policy tree`. They share the word "tree"
   but do different jobs: the BART forest *estimates* the CATE
   surface inside the BCF MCMC; the policy tree *segments* that
   CATE surface for downstream allocation and operator
   interpretability. Users don't inspect BART trees directly; they
   inspect the policy tree.

   Lives inside `pytyche.bcf.hurdle.*`. Implemented on top of
   `bartz` (the GPU BART primitive library).

policy tree
   The single `sklearn.tree.DecisionTreeClassifier` fit on the
   per-visitor CATEs from the BCF posterior, used to discover
   {term}`segment`s of feature space where the treatment effect is
   approximately constant. One tree (not an ensemble),
   deterministic given the posterior + hyperparameters,
   user-inspectable as `PolicyTreeResult.tree`. Its leaves are the
   segments downstream allocation, recommendation, and graduation
   decisions operate on.

   This is NOT the {term}`BART forest`. The BART forest produces
   the CATEs; the policy tree partitions them. The policy tree is
   what shows up in cell recommendations
   ({term}`Thompson allocation`'s `allocation_map[leaf_id]`) and
   what operators see as the segmentation of "where the lift comes
   from."

   Depth controlled by `max_segment_depth` at the L1 surface
   (`pt.sequential_experiment(max_segment_depth=3)`) or `max_depth`
   at the L2 method (`posterior.fit_policy_tree(max_depth=3)`).
   Minimum segment size controlled by `min_segment_share` (default
   0.10). The result is a {term}`PolicyTreeResult` — a frozen
   dataclass carrying the tree, segments, allocation map, and
   bootstrap stability scores.

BCF
   Bayesian Causal Forests. The class of model pytyche builds on
   for HTE estimation, introduced by Hahn, Murray, and Carvalho
   (2020). Combines a "prognostic" forest (estimating the baseline
   outcome surface) with a "treatment" forest (estimating the
   conditional treatment effect) to give unbiased per-visitor CATE
   estimates that don't confound effect modification with
   prognostic signal. Both forests are {term}`BART forest`s.

   For zero-inflated outcomes like e-commerce revenue per visitor,
   pytyche uses a {term}`joint hurdle BCF` that shares tree
   topology between the conversion (probit) and severity
   (log-normal) channels.

   Defined as `pytyche.bcf.fit_continuous_bcf`,
   `pytyche.bcf.fit_binary_bcf`, and friends. The high-level
   sequential surface (`pt.sequential_experiment`) calls these
   under the covers.

joint hurdle BCF
   The model pytyche actually fits for e-commerce revenue and
   similar zero-inflated outcomes. Two channels — a conversion
   probit channel and a severity log-normal channel — share tree
   topology, so the per-visitor CATE on revenue decomposes cleanly
   into "did the treatment change conversion" and "did it change
   basket size given conversion."

   For multi-arm experiments, the joint hurdle BCF estimates
   per-treatment effects jointly via shared prognostic structure
   rather than fitting K-1 independent contrasts (which leaks
   power on max-of-K selection).

   Defined as `pytyche.bcf.fit_hurdle_bcf` (called by
   `pt.sequential_experiment` and `pt.analyze`). The {term}`pooling`
   kwarg selects between the canonical shared-tree fit and the
   independent two-stage literature baseline.

hurdle outcomes
   Outcomes with two distinct components: a binary "did anything
   happen" gate and a positive-valued severity conditional on the
   gate firing. E-commerce revenue per visitor is the canonical
   example — most visitors convert at $0; the converting tail has
   continuously-distributed positive revenue.

   Standard regression on a hurdle outcome confounds "treatment
   changes conversion probability" with "treatment changes basket
   size." Hurdle BCF models the two channels separately, then
   combines them for the per-visitor revenue effect.

pooling
   The mode in which `fit_hurdle_bcf` (and {term}`pt.fit` when it
   dispatches to the hurdle path) couples the two channels of the
   {term}`joint hurdle BCF`. Two values:

   * `"joint"` (default) — canonical shared-tree fit. Conversion
     (probit) and severity (log-normal) share tree topology, so the
     per-visitor CATE decomposes cleanly and the model borrows
     strength across the two channels. This is the v0.2+
     recommended path for typical e-commerce revenue data.
   * `"independent"` — independent two-stage fit. Runs
     `fit_binary_bcf` for conversion, then `fit_continuous_bcf` for
     log-severity on converters, and composes the posteriors. Opt
     in when the two channels are driven by different feature
     subsets, when one channel has dominant HTE and shared topology
     distorts the other, or when a researcher wants per-channel HTE
     structure without the regularization-induced coupling.

   Exposed as `fit_hurdle_bcf(..., pooling="joint")` at the fit
   boundary and carried on `HurdleBCFResult.pooling`. Passed
   through verbatim when calling `pt.fit(observed,
   pooling="independent")`. Stored on `Calibration` regime
   metadata — a calibration artifact fitted on joint-pooling data
   is not applied to an independent-pooling posterior.

   Defined as a `Literal["joint", "independent"]` kwarg on
   `pytyche.bcf.fit_hurdle_bcf`. Private dispatch helpers
   `_fit_joint_hurdle_bcf` (in `pytyche.bcf.hurdle.model`) and
   `_fit_independent_hurdle_bcf` (in `pytyche.bcf.hurdle.compose`)
   implement the two paths.

joint posterior
   The Bayesian posterior distribution over all model parameters
   considered jointly. In pytyche's joint hurdle BCF, the joint
   posterior covers per-treatment conversion probabilities,
   per-treatment severity means, and the per-visitor treatment
   effects, all conditioned on the observed data.

   "Joint" because the parameters are estimated together with
   their full correlation structure preserved, rather than fitted
   marginally and assumed independent. This is what lets
   Thompson allocation respect cross-treatment dependence.

   Direct access on `AnalysisResult.posterior` for follow-up
   analysis (custom decompositions, alternative ship rules,
   sensitivity checks).
:::

## Results, recommendations, graduation

:::{glossary}
PolicyTreeResult
   The frozen dataclass returned by `posterior.fit_policy_tree(...)`.
   Bundles the policy tree and all downstream-usable derived data:

   * `tree` — the fitted `sklearn.tree.DecisionTreeClassifier`
     partitioning feature space into {term}`segment`s
   * `segments` — one `DiscoveredSegment` per leaf, ordered by leaf
     id; carries `rule`, `gate_estimate`, `gate_ci`,
     `stability_score`, `population_share`, `id`, and
     `arm_best_probabilities`
   * `allocation_map` — `dict[leaf_id, dict[treatment_name, weight]]`;
     each leaf's weight dict sums to 1.0; produced by
     {term}`Thompson allocation` under the shared best-arm rule
   * `stability_scores` — `dict[leaf_id, float]` in `[0, 1]`;
     bootstrap-replicability scores computed by resampling visitor
     CATEs and Jaccard-overlap matching (see {term}`stability score`)
   * `observed` — reference to the `ObservedExperimentData` the
     underlying posterior was fit on; shared by identity from the
     posterior (no re-clone; see {term}`observed data stashing`)

   The dataclass is frozen; assignment to any field raises
   `dataclasses.FrozenInstanceError`. `tree` is a
   `sklearn.tree.DecisionTreeClassifier` for v0.2; a future change
   may introduce a pytyche wrapper with the same predict / decision
   path methods.

   `PolicyTreeResult` is NOT the {term}`BART forest`. The BART
   forest estimates the CATE surface inside the MCMC; the policy
   tree in `PolicyTreeResult` partitions that surface for
   downstream allocation and operator interpretability.

   Defined as `pytyche.contracts.PolicyTreeResult`.

decision
   The recommended ship-or-continue-or-stop call
   for a treatment versus baseline. A 3-value enum: `SHIP`,
   `CONTINUE`, `STOP`.

   Defined as `contracts.Decision`.

recommendation summary
   A structured decision (`SHIP`, `CONTINUE`, or `STOP`) with
   supporting evidence: expected losses, probability of positive
   lift, probability of meaningful improvement, probability of
   harm. The decision applies five thresholds across three
   branches.

   * **SHIP gate** — `expected_loss < tolerance` AND
     `p_positive > 0.95` AND `p_better > 0.80`
   * **STOP (harm)** — `p_harmful > 0.90`
   * **STOP (futility)** — `p_better < 0.05`
   * **CONTINUE** — default when no gate fires

   Produced by `recommendation_summary()` in `compare.variants`.
   A {term}`graduation candidate` is a (treatment, segment) pair
   whose recommendation summary has fired SHIP for N consecutive
   rounds.

   Defined as `contracts.RecommendationSummary`.

graduation candidate
   A (treatment, segment) pair where the recommendation has
   fired SHIP for ≥ `sustained_rounds` consecutive rounds.
   The default rule fires when `expected_loss < tolerance` AND
   `p_positive > 0.95` AND `p_better > 0.80`, sustained over at
   least 2 rounds. Segments with `stability_score < 0.80` are
   excluded from graduation-candidate consideration by default
   (see {term}`stability score`). The {term}`capability methods`
   `has_credible_segments(threshold=0.80)` provides a quick check
   before running the full analysis.

   Pytyche surfaces graduation candidates as structured data. The
   operator (or an agentic caller) decides whether to promote one
   to broader rollout. The library does not auto-graduate.

   Defined as `pytyche.experiment.GraduationCandidate`.

next round plan
   The recommended cell structure, treatments list, and prose
   summary for the next round of a sequential experiment. The
   handoff between the recommendation engine and the operator's
   next-round decision.

   Carries:

   * Recommended {term}`cell`s (typically Control + Explore + an
     Optimized cell with the recommended tree)
   * Active treatments
   * Dropped treatments, if any
   * {term}`graduation candidate`s
   * Prose rationale

   The operator may accept, partially override (for example, add a
   hypothesis cell), or fully replace before shipping.

   Defined as `pytyche.experiment.NextRoundPlan`.
:::

## L2 analysis surface

:::{glossary}
pt.fit
   The auto-selecting fit entry point at the top-level `pytyche`
   namespace. Inspects the extracted outcome array `Y` and treatment
   cardinality `K = len(observed.variants)`, then dispatches
   deterministically to one of three underlying fit functions:

   * `Y` all `{0, 1}` → `fit_binary_bcf` → returns `BinaryBCFResult`
   * `Y` float dtype, < 30% zero entries → `fit_continuous_bcf` →
     returns `ContinuousBCFResult`
   * `Y` float dtype, ≥ 30% zero entries with a positive non-zero
     tail → `fit_hurdle_bcf(..., pooling="joint")` → returns
     `HurdleBCFResult` (at any K, including K ≥ 3)

   ```python
   posterior = pt.fit(observed)               # auto-selects fit
   posterior = pt.fit(observed,
       pooling="independent")                 # kwarg forwarded verbatim
   ```

   The same `observed` always selects the same fit function
   (deterministic). `**kwargs` forward verbatim to the dispatched
   fit. Users who need explicit control call `pt.fit_binary_bcf`,
   `pt.fit_continuous_bcf`, or `pt.fit_hurdle_bcf` directly.

   Edge cases: all-zero `Y` raises `ValueError` naming both binary
   and hurdle interpretations; multi-arm `Z` with binary or
   continuous `Y` raises `NotImplementedError` (multi-arm binary /
   continuous BCF is not yet shipped).

   The 30% zero-density threshold is a semi-empirical starting
   point for e-commerce revenue distributions — generous enough to
   catch typical revenue data, conservative enough to keep
   non-hurdle continuous data on the continuous path.

   Defined at `pytyche.fit`. Internal dispatch helper at
   `pytyche.bcf.dispatch._dispatch_fit` (or similar).

observed data stashing
   The contract that every posterior result type
   (`HurdleBCFResult`, `ContinuousBCFResult`, `BinaryBCFResult`)
   carries the `ObservedExperimentData` it was fit on. Analysis
   methods reach their inputs through `posterior.observed`, not
   through a separately-passed handle, so fit-time and
   analysis-time data encodings cannot drift.

   ```python
   posterior = pt.fit(observed)
   # downstream methods reach the data through the posterior
   tree = posterior.fit_policy_tree()
   # derived results share the same observed by identity
   assert tree.observed is posterior.observed
   ```

   Derived results (`PolicyTreeResult`, calibrated posteriors) hold
   the same reference by identity — the cost of stashing is paid
   once at fit time, not per-derivation. The {term}`observed_copy parameter`
   controls what kind of stash is created.

   Follows the sklearn idiom (`.X_train_` and similar) — downstream
   operations on a fitted result reach into the input data through
   the result.

observed_copy parameter
   The kwarg on every fit entry point (`pt.fit`, `pt.fit_hurdle_bcf`,
   etc.) that controls how the input `ObservedExperimentData` is
   stashed on the resulting posterior. Three modes:

   * `"view"` (default) — shallow clone of the dataclass with each
     visitors DataFrame rebuilt over read-only numpy views of the
     original columns. Zero data-buffer copy; in-place mutation
     through the stash raises `ValueError` ("assignment destination
     is read-only"). Buffers are still shared with the caller's
     original handle, so mutation through the original is
     reflected in the stash.
   * `"deep"` — `copy.deepcopy(observed)` at fit time. Doubles
     memory for input data; provides a bit-stable stash
     independent of any subsequent mutation to the original handle.
   * `"ref"` — `posterior.observed is observed` directly. No view
     wrappers; no protection. For the power-user case that wants
     the cheapest possible path and accepts mutation risk on both
     sides.

   Any other value raises `ValueError` naming the three valid
   modes. See {term}`observed data stashing` for the propagation
   contract through derived results.

capability methods
   A pair of pure getters on every posterior result type that
   enable conditional downstream logic without triggering
   heavyweight computation:

   * `has_credible_segments(threshold=0.80) -> bool` — `True` iff
     at least one segment in `posterior.analyze().segments` has
     `stability_score >= threshold`. The default threshold matches
     the `ExpectedLossRule` SHIP-gate stability floor.
   * `has_decomposition() -> bool` — `True` for `HurdleBCFResult`
     (the two-channel hurdle decomposition into conversion +
     severity); `False` for `ContinuousBCFResult` and
     `BinaryBCFResult`.

   Both are pure: no state mutation, no side effects, deterministic
   given the posterior. The canonical branch pattern:

   ```python
   if posterior.has_credible_segments():
       tree = posterior.fit_policy_tree()
       # ship tree-based policy
   elif posterior.has_decomposition():
       # hurdle posterior with no credible segments yet — inspect
       # channel decomposition to diagnose
       ...
   ```

   Defined on each result type in `pytyche.bcf`. The threshold
   default (0.80) is shared with {term}`stability score`'s
   credibility cutoff.

pt.viz namespace
   The `pytyche.viz` submodule exposing five matplotlib-backed
   visualization primitives:

   * `pt.viz.plot_cells(cells, ax=None)` — horizontal bar chart of
     cell weights for one round
   * `pt.viz.plot_policy_tree(tree_policy, ax=None)` — tree diagram
     from a {term}`PolicyTreeResult`
   * `pt.viz.plot_segment_intervals(segments, ax=None)` — forest
     plot of per-segment gate estimates + 80% credible intervals
   * `pt.viz.plot_calibration(calibrated_posterior, reference=None,
     ax=None)` — R(p) calibration curve, optionally overlaid with
     a reference (uncalibrated) curve
   * `pt.viz.experiment_evolution_gif(history, output_path, fps=1)`
     — animated GIF rendering round-by-round cell structure and
     policy tree evolution

   Each static primitive accepts an `ax` parameter for matplotlib
   subplot composition. When `ax=None`, a new figure and axes are
   created and returned. The GIF helper renders to disk and returns
   the path.

   `matplotlib` is imported lazily — `import pytyche` does NOT
   trigger the matplotlib import. The cost is paid only when a
   `pt.viz.*` function is first called. `matplotlib >= 3.7` and
   either `imageio >= 2.31` or `Pillow >= 10.0` are base
   dependencies in `pyproject.toml`.

   Importable as `import pytyche.viz as ptviz` or
   `from pytyche import viz`. The five names are also importable
   directly: `from pytyche.viz import plot_cells, ...`.

   Defined in `pytyche.viz`.
:::

## Calibration, truth, sim mode

:::{glossary}
calibration
   On-path recalibration that corrects BCF posterior coverage at
   scale. Three construction paths:

   ```python
   pt.Calibration.from_sweep('clustered_realistic_v1')  # shipped artifact
   pt.Calibration.from_sweep('/path/to/sweep.json')      # user-fitted
   pt.Calibration.skip()                                  # uncorrected
   ```

   When calibration is specified, the library applies the correction
   automatically. When `skip()` is used, uncalibrated posteriors are
   explicitly labeled in result objects and the library emits a
   warning on first fit.

   A `Calibration` instance is a frozen dataclass carrying:

   * `correction` — the `LayeredCalibrationCorrection` (layered R(p)
     + scale-family correction payload)
   * Regime metadata: `metric` (the outcome metric the sweep was
     fitted on, e.g. `"revenue_per_visitor"`), `n_treatments` (the
     K of the fitted sweep), `pooling` (the {term}`pooling` mode
     of the sweep — `"joint"` or `"independent"`)
   * `applies_to(observed: ObservedExperimentData) -> bool` — `True`
     iff `observed.metric`, `len(observed.variants)`, and the
     {term}`pooling` mode all match the artifact's regime metadata.
     `posterior.apply_calibration(calibration)` raises `ValueError`
     naming the mismatched dimension(s) when `applies_to` returns
     `False`. This prevents silently applying a K=2 revenue
     correction to a K=3 conversion-rate posterior.

   `from_sweep` / `skip` constructors and the shipped artifact
   registry land with `sequential-experiment-api`. The type's
   minimal contract (frozen dataclass + regime metadata +
   `applies_to`) is owned by this L2 surface.

   Canonical home: `pytyche.calibrate.Calibration`, re-exported as
   `pt.Calibration`. Calibration machinery lives in
   `src/pytyche/calibrate/`.

calibration truth
   Ground truth for a single calibration or simulation run:
   per-visitor CATE, hurdle decomposition (p0, p1, m0, m1), and
   effect components. Lives only in the sim and calibration paths;
   analysis code cannot peek at it because the type is segregated
   via `CalibrationBundle`.

   Defined as `contracts.CalibrationTruth`.

truth comparison
   Per-round truth-vs-estimate metrics, populated only in sim mode.
   `None` when the experiment runs in real-data mode.

   Six fields:

   * `cate_rmse` — root mean square error of estimated CATE against
     truth
   * `policy_accuracy` — fraction of visitors for whom the
     recommended treatment matches the truth-optimal treatment
   * `oracle_gap_rpv` — RPV regret of the recommended policy vs the
     oracle policy
   * `rpv_policy`, `rpv_uniform`, `rpv_oracle` — RPV under the
     recommended policy, uniform random allocation, and the oracle
     policy respectively

   Defined as `pytyche.experiment.TruthComparison`.

SBC (in pytyche)
   In this codebase "SBC" is used loosely for **simulation-based
   coverage evaluation and correction** — generate data from known
   ground truth, measure how the posterior's credible intervals (and
   decisions) actually perform, and fit a correction. It is **not** the
   classical rank-statistic Simulation-Based Calibration of Talts et
   al. (2018), which checks posterior-rank uniformity. Two modules carry
   the "SBC" label and neither implements that rank procedure:

   * `pytyche.calibrate.sbc` — oracle-decision and regret evaluation
     against planted truth (does the recommended decision match the
     oracle; what regret does it incur).
   * `scripts/fit_sbc_correction.py` — fits the isotonic R(p) coverage
     correction (nominal → empirical coverage mapping).

   Read "SBC" here as the umbrella for that simulate-then-correct
   workflow, not as the Talts diagnostic.
:::

## Setup and environment

:::{glossary}
setup report
   The structured output of `pt.check_setup()`. Carries the pytyche
   version, JAX device list, CUDA availability, bartz version,
   calibration registry state, and a recommended install command
   when GPU is absent.

   ```python
   report = pt.check_setup()
   if not report.cuda_available:
       print(report.recommended_install)
   ```

   Defined as `pytyche.SetupReport`.
:::

## Contract types — quick reference

The contract types this glossary references and where they live:

| Term | Type | Module |
| --- | --- | --- |
| Decision (enum) | `Decision` | `contracts` |
| Observed experiment data | `ObservedExperimentData` | `contracts` |
| Variant data | `VariantData` | `contracts` |
| Visitor schema | `VISITOR_SCHEMA` | `contracts` |
| Segment rule | `SegmentRule` (union: `EqRule`, `InRule`, `ComparisonRule`, `BetweenRule`) | `contracts` |
| Discovered segment | `DiscoveredSegment` | `contracts` |
| Aligned visitor array | `AlignedVisitorArray` | `contracts` |
| Decomposition samples | `DecompositionSamples` | `contracts` |
| Comparison result | `ComparisonResult` | `contracts` |
| Recommendation summary (type) | `RecommendationSummary` | `contracts` |
| Recommendation summary (function) | `recommendation_summary()` | `compare.variants` |
| Decision thresholds | `DecisionThresholds` | `compare.variants` |
| Analysis result | `AnalysisResult` | `contracts` |
| Policy tree result | `PolicyTreeResult` | `contracts` |
| Calibration | `Calibration` | `calibrate` |
| Layered calibration correction | `LayeredCalibrationCorrection` | `calibrate.layered` |
| Calibration truth | `CalibrationTruth` | `contracts` |
| Calibration bundle | `CalibrationBundle` | `contracts` |
| Calibration record | `CalibrationRecord` | `contracts` |
| Compare variants | `compare_variants()` | `compare.variants` |
| Claim level | `ClaimLevel` (enum) | `contracts` |
| Metric family | `MetricFamily` (enum) | `contracts` |