Result objects¶

Pytyche analyses return three result types, each with a distinct role:

HurdleBCFResult — the raw joint posterior from the BCF fit; the root primitive everything else is derived from.
AnalysisResult — the derived analysis output: comparisons, recommendation summary, discovered segments, per-visitor CATEs. What most analysis code consumes directly.
Experiment — the per-round snapshot from a sequential experiment. Composes the posterior and the analysis with round-specific context (cells shipped, recommendation, truth comparison).

Each entry point returns a different result type:

pt.fit_hurdle_bcf(...) returns the raw posterior only. You derive the analysis yourself, or pass the posterior to the analysis primitives the rest of this doc introduces.
posterior.analyze(...) returns an AnalysisResult derived from the posterior. The posterior already carries the observed data it was fit on (see Observed data stashing below), so analyze takes only optional knobs. The canonical pattern is two lines:
```
posterior = pt.fit_hurdle_bcf(observed)
analysis = posterior.analyze()
```
A function-form wrapper pt.analyze(posterior, ...) exists as a thin delegate for callers who prefer functional style. Both forms produce identical AnalysisResult output; the method is the documented canonical form because behavior dispatches naturally through the result type (ContinuousBCFResult.analyze, BinaryBCFResult.analyze, HurdleBCFResult.analyze each do the right thing for their posterior shape).
pt.sequential_experiment(...) iterates per-round Experiment snapshots. Each snapshot exposes the posterior, the derived analysis, the cells shipped that round, and the recommendation for the next round.

The rest of this doc walks each type, then shows the three entry points side by side.

`HurdleBCFResult`: the raw posterior¶

The joint posterior over all model parameters. Root primitive — every other result field in the library is derived from it.

from pytyche import fit_hurdle_bcf

posterior = fit_hurdle_bcf(observed)

# Sample arrays are visitor-major with chains concatenated into the
# sample axis: S_total = (num_mcmc / thin_factor) * num_chains.

# Per-arm conversion + severity samples (retained only when
# retain_channel_samples=True)
posterior.p_samples          # (n, S_total, K) at K >= 3
                             # legacy p0_samples / p1_samples at K = 2
posterior.sev_samples        # (n, S_total, K) at K >= 3

# Per-visitor revenue-per-visitor contrasts (always populated)
posterior.rpv_cate_samples   # (n, S_total, K - 1) at K >= 3 — the
                             # jointly-sampled contrast vector
                             # (n, S_total) at K = 2

# Precision / variance trace — scalar severity precision, shape-invariant in K
posterior.tau0_samples       # (S_total,) at every K
posterior.sigma2_samples     # property: 1 / tau0_samples (scalar) at every K

# Pooling mode that produced this result
posterior.pooling            # "joint" or "independent"

The posterior carries the full uncertainty structure from the joint multi-arm hurdle BCF fit. Custom derivations build on it:

# A different policy tree depth → different segmentation,
# same posterior
shallow_tree = posterior.fit_policy_tree(max_depth=2)
deeper_tree = posterior.fit_policy_tree(max_depth=4)

# A different Thompson ε floor → different allocation,
# same posterior
tight_allocation = posterior.thompson_allocation(segments=..., epsilon=0.01)
loose_allocation = posterior.thompson_allocation(segments=..., epsilon=0.05)

# Bring your own decision rule — the GraduationRule protocol
# (shipping with the sequential surface) lets you plug in custom
# graduation logic; see docs/how-to/build-your-own-decision-rule.md

This is the “posterior is the root, helpers are derived” story made concrete. The analysis primitives (posterior.fit_policy_tree, posterior.thompson_allocation, posterior.apply_calibration, posterior.recommendation_summary) live as methods on the posterior; function-form wrappers (pt.fit_policy_tree(posterior, ...) etc.) are thin delegates for callers who prefer functional style. The sequential experiment recommendation engine composes the methods into the default workflow; this section’s examples show the same methods called directly.

The raw posterior carries the full Bayesian workflow primitives (traces, per-draw arrays). PyMC’s InferenceData integration lives separately in DiagnosticsBundle (per contracts.py:553); the HurdleBCFResult is the BCF-specific frozen-dataclass form of the same information.

Observed data stashing¶

The posterior carries the observed data it was fit on as posterior.observed, mirroring the sklearn convention of attaching training inputs to a fitted estimator. Downstream analysis methods (posterior.analyze, posterior.fit_policy_tree, etc.) reach into posterior.observed for whichever input arrays they need. The caller passes observed once at fit time and never again.

The default semantics are shallow with read-only views:

posterior = pt.fit_hurdle_bcf(observed)

# Each variant's visitors DataFrame is rebuilt over read-only column
# arrays — the backing numpy buffers are shared but write-protected.
col = posterior.observed.variants[0].visitors["revenue"].to_numpy(copy=False)
col.flags.writeable
# False

# Mutation attempt through the stashed clone raises:
posterior.observed.variants[0].visitors.iloc[0, 0] = 999
# ValueError: assignment destination is read-only

The observed wrapper on the posterior is a new (shallow dataclass clone), but each DataFrame column is rebuilt over the original numpy buffer with the write flag off. This is cheap — no data copy — and catches mutation attempts on the posterior side loudly. It does NOT prevent mutation through the original observed handle; if you mutate observed.variants[0].visitors after fitting, the corresponding cells in posterior.observed.variants[0].visitors reflect that (shared buffer).

For a defensive snapshot independent of the original handle, pass observed_copy="deep":

posterior = pt.fit_hurdle_bcf(observed, observed_copy="deep")
observed.variants[0].visitors.iloc[0, 0] = 999
# posterior.observed.variants[0].visitors.iloc[0, 0] unchanged —
# independent buffer

Three modes are available:

observed_copy="view" (default): shallow dataclass clone with read-only array views; cheap; protects against pytyche-side mutation
observed_copy="deep": full deep copy; doubles memory for the input data; protects against caller-side mutation too
observed_copy="ref": holds posterior.observed is observed directly; no view wrappers, no protection; rarely worth using

JAX arrays pass through every mode unchanged — they are immutable by construction.

`AnalysisResult`: derived analysis¶

The derived analysis output. Combines comparisons, the recommendation summary, discovered segments, and per-visitor CATEs into a single presentation-ready result. Returned by posterior.analyze() — the method derives the analysis fields from the posterior and the observed data it was fit on.

posterior = pt.fit_hurdle_bcf(observed)

analysis = posterior.analyze()

analysis.comparisons         # list[Comparison] — per-pair effects
analysis.recommendation          # RecommendationSummary — SHIP/STOP/CONTINUE
                             # 5-threshold compound rule
analysis.segments            # list[DiscoveredSegment] — HTE segments
                             # with gate_estimate, gate_ci, stability_score
analysis.cate_per_visitor    # np.ndarray — (n,) at K=2;
                             #              (n, K-1) at K>=3
analysis.is_calibrated       # bool — property delegating to
                             # posterior.is_calibrated

Each DiscoveredSegment in analysis.segments carries:

id: int — leaf id within the parent policy tree
rule: SegmentRule — predicate describing the segment membership
gate_estimate: float — posterior mean of the segment-level GATE
gate_ci: tuple[float, float] — 80% credible interval on the GATE
population_share: float — fraction of visitors in this segment
stability_score: float — bootstrap-replicability score in [0, 1]
arm_best_probabilities: dict[str, float] — probability each treatment (or control) is best in this segment under the shared best-arm rule; keyed by ALL variant names including control, and sums to 1.0. Control wins a draw when no treatment contrast is positive.

AnalysisResult is the canonical analysis output. Most operator code consumes it directly:

# Decide what to ship
for seg in analysis.segments:
    if seg.stability_score >= 0.80:
        print(f"  {seg.rule}: lift {seg.gate_estimate:+.3f}")

# Apply the canonical 5-threshold decision rule
match analysis.recommendation.decision:
    case Decision.SHIP:
        print("Ready to roll out broadly")
    case Decision.CONTINUE:
        print("Keep running")
    case Decision.STOP:
        print("Pull the treatment")

RecommendationSummary carries the full decision-theoretic snapshot for a treatment contrast. Its field set:

decision: Decision — SHIP / CONTINUE / STOP from the 5-threshold compound rule
expected_loss_baseline: float — expected loss of choosing baseline
expected_loss_comparison: float — expected loss of choosing the treatment
probability_positive: float — P(comparison > baseline)
probability_better: float — P(comparison meaningfully better)
probability_harmful: float — P(comparison meaningfully harmful)
thresholds: dict[str, float] — decision thresholds applied
expected_value_of_one_more_round: float — information-theoretic value of running one additional round at the same per-round n, in expected-loss-reduction units. Near-zero means the experiment has effectively converged and additional data is unlikely to change the decision.

AnalysisResult does not carry the raw posterior. The split matches pytyche.contracts’s existing separation between the analysis output and the raw inference data (AnalysisResult / DiagnosticsBundle). Callers who need the posterior get it as a sibling return value alongside AnalysisResult.

`Experiment`: the per-round snapshot¶

Per-round snapshot of a sequential experiment. Composes the posterior and the analysis with round-specific context (cells shipped, per-cell observations, recommendation for next round, sim-only truth comparison).

exp = pt.sequential_experiment(
    generator=generator,
    schedule=pt.GeometricSchedule(initial=10_000, growth=2.0, n_rounds=3),
    treatments=['control', 'low_promo', 'free_ship'],
)

r = next(exp)  # round 1

# raw posterior — escape hatch
r.posterior                  # HurdleBCFResult

# derived analysis
r.analysis                   # AnalysisResult
r.analysis.segments
r.analysis.recommendation

# round context
r.round_idx                  # int
r.observed                   # ObservedExperimentData
r.cells_shipped              # list[Cell]
r.cell_observations          # list[CellObservation] — per-cell scoreboard
r.next_recommendation        # NextRoundPlan — what to ship next round
r.truth_comparison           # TruthComparison | None (sim-only)

r.posterior and r.analysis are sibling fields. Both are populated; the posterior is the root and the analysis is its derivation, but the Experiment exposes both at the same level so reaching the posterior is one field access deep rather than nested.

The three entry points side by side¶

   entry point                        returns
   ────────────────────────────────── ──────────────────────────────────
   pt.fit_hurdle_bcf(...)       HurdleBCFResult only
                                      (the raw posterior; you derive
                                      the rest)

   posterior.analyze(...)             AnalysisResult — single round of
   (after fit_hurdle_bcf)       analysis without sequential
                                      machinery. pt.analyze(posterior)
                                      is the function-form wrapper.

   pt.sequential_experiment(...)      iterates → Experiment per round
                                      (each round's Experiment
                                      composes the analysis and the
                                      posterior with round context)

Pick by how much machinery you want:

Just the posterior, you’ll derive the rest yourself: use pt.fit_hurdle_bcf.
One round’s analysis without sequential context (e.g., analyzing observed data from an experiment that already ran): fit, then call posterior.analyze(). The AnalysisResult is the same shape the sequential machinery uses internally.
The sequential adaptive-experiment loop, with recommendation per round and graduation signals: use pt.sequential_experiment. Each round’s Experiment exposes the posterior as an escape hatch when you want to dig past the standard analysis surfaces.

Calibration and the result¶

Calibration is a frozen dataclass at pytyche.calibrate.Calibration (re-exported as pt.Calibration). It wraps the layered R(p)/scale-family correction with regime metadata — the metric name, treatment count K, and pooling mode it was fitted on. The applies_to(observed) method returns True only when the observed data’s regime (metric and K) matches the artifact’s; posterior.apply_calibration(calibration) raises ValueError naming the mismatched dimension(s) when the check fails.

apply_calibration is K = 2 only in v0.2. Calling it on a K ≥ 3 posterior raises NotImplementedError — per-contrast R(p) recalibration requires per-contrast SBC machinery that ships with the sequential-surface calibration work.

When a Calibration artifact is passed to fit_hurdle_bcf (or consumed by posterior.apply_calibration), the recalibration is applied to the posterior before the AnalysisResult is constructed. The analysis.is_calibrated property labels whether the result is calibrated.

Calibration.from_sweep(...) and Calibration.skip() registry constructors arrive with the sequential-surface calibration work — any mention of them in examples should be read as forward-looking.

Tree data model (pending)¶

Two trees, one library. Pytyche has two distinct tree concepts that share the word. The BART forest lives inside the BCF — many weak trees whose posterior pytyche samples via MCMC, producing the per-visitor CATE estimates. The policy tree lives on top of those CATEs — a single sklearn DecisionTreeClassifier fit on the CATE surface to discover the segments downstream allocation and graduation operate on. Users never inspect BART trees directly; what PolicyTreeResult.tree exposes is the policy tree. See the glossary entries for both terms.

The policy returned in cell recommendations carries the policy tree + per-leaf Thompson allocation. The concrete tree type and serialization format are being worked out in the analysis-surface design pass:

direct sklearn DecisionTreeClassifier (current sketch in type stubs) — simple but couples the public API to sklearn versions
pytyche wrapper preserving sklearn ergonomics — version isolation, cleaner spec
framework-agnostic rule-list export — needed for handoff to serving platforms

This page updates when the tree contract lands.

Open questions¶

The following result-surface questions remain open:

Tree data model: see Tree data model (pending) above.

Result objects¶

HurdleBCFResult: the raw posterior¶