--- title: Result objects review-state: drafting last-human-review: "2026-06-11" depends-on: - src/pytyche/contracts.py - src/pytyche/bcf/config.py owner: tradcliffe quadrant: concept --- # Result objects Pytyche analyses return three result types, each with a distinct role: - `HurdleBCFResult` — the raw joint posterior from the BCF fit; the root primitive everything else is derived from. - `AnalysisResult` — the derived analysis output: comparisons, recommendation summary, discovered segments, per-visitor CATEs. What most analysis code consumes directly. - `Experiment` — the per-round snapshot from a {term}`sequential experiment`. Composes the posterior and the analysis with round-specific context (cells shipped, recommendation, truth comparison). Each entry point returns a different result type: - `pt.fit_hurdle_bcf(...)` returns the raw posterior only. You derive the analysis yourself, or pass the posterior to the analysis primitives the rest of this doc introduces. - `posterior.analyze(...)` returns an `AnalysisResult` derived from the posterior. The posterior already carries the observed data it was fit on (see [Observed data stashing](#observed-data-stashing) below), so `analyze` takes only optional knobs. The canonical pattern is two lines: ```python posterior = pt.fit_hurdle_bcf(observed) analysis = posterior.analyze() ``` A function-form wrapper `pt.analyze(posterior, ...)` exists as a thin delegate for callers who prefer functional style. Both forms produce identical `AnalysisResult` output; the method is the documented canonical form because behavior dispatches naturally through the result type (`ContinuousBCFResult.analyze`, `BinaryBCFResult.analyze`, `HurdleBCFResult.analyze` each do the right thing for their posterior shape). - `pt.sequential_experiment(...)` iterates per-round `Experiment` snapshots. Each snapshot exposes the posterior, the derived analysis, the cells shipped that round, and the recommendation for the next round. The rest of this doc walks each type, then shows the three entry points side by side. ## `HurdleBCFResult`: the raw posterior The joint posterior over all model parameters. Root primitive — every other result field in the library is derived from it. ```python from pytyche import fit_hurdle_bcf posterior = fit_hurdle_bcf(observed) # Sample arrays are visitor-major with chains concatenated into the # sample axis: S_total = (num_mcmc / thin_factor) * num_chains. # Per-arm conversion + severity samples (retained only when # retain_channel_samples=True) posterior.p_samples # (n, S_total, K) at K >= 3 # legacy p0_samples / p1_samples at K = 2 posterior.sev_samples # (n, S_total, K) at K >= 3 # Per-visitor revenue-per-visitor contrasts (always populated) posterior.rpv_cate_samples # (n, S_total, K - 1) at K >= 3 — the # jointly-sampled contrast vector # (n, S_total) at K = 2 # Precision / variance trace — scalar severity precision, shape-invariant in K posterior.tau0_samples # (S_total,) at every K posterior.sigma2_samples # property: 1 / tau0_samples (scalar) at every K # Pooling mode that produced this result posterior.pooling # "joint" or "independent" ``` The posterior carries the full uncertainty structure from the joint multi-arm hurdle BCF fit. Custom derivations build on it: ```python # A different policy tree depth → different segmentation, # same posterior shallow_tree = posterior.fit_policy_tree(max_depth=2) deeper_tree = posterior.fit_policy_tree(max_depth=4) # A different Thompson ε floor → different allocation, # same posterior tight_allocation = posterior.thompson_allocation(segments=..., epsilon=0.01) loose_allocation = posterior.thompson_allocation(segments=..., epsilon=0.05) # Bring your own decision rule — the GraduationRule protocol # (shipping with the sequential surface) lets you plug in custom # graduation logic; see docs/how-to/build-your-own-decision-rule.md ``` This is the "posterior is the root, helpers are derived" story made concrete. The analysis primitives (`posterior.fit_policy_tree`, `posterior.thompson_allocation`, `posterior.apply_calibration`, `posterior.recommendation_summary`) live as methods on the posterior; function-form wrappers (`pt.fit_policy_tree(posterior, ...)` etc.) are thin delegates for callers who prefer functional style. The {term}`sequential experiment` recommendation engine composes the methods into the default workflow; this section's examples show the same methods called directly. The raw posterior carries the full Bayesian workflow primitives (traces, per-draw arrays). PyMC's `InferenceData` integration lives separately in `DiagnosticsBundle` (per `contracts.py:553`); the `HurdleBCFResult` is the BCF-specific frozen-dataclass form of the same information. ### Observed data stashing The posterior carries the observed data it was fit on as `posterior.observed`, mirroring the sklearn convention of attaching training inputs to a fitted estimator. Downstream analysis methods (`posterior.analyze`, `posterior.fit_policy_tree`, etc.) reach into `posterior.observed` for whichever input arrays they need. The caller passes observed once at fit time and never again. The default semantics are **shallow with read-only views**: ```python posterior = pt.fit_hurdle_bcf(observed) # Each variant's visitors DataFrame is rebuilt over read-only column # arrays — the backing numpy buffers are shared but write-protected. col = posterior.observed.variants[0].visitors["revenue"].to_numpy(copy=False) col.flags.writeable # False # Mutation attempt through the stashed clone raises: posterior.observed.variants[0].visitors.iloc[0, 0] = 999 # ValueError: assignment destination is read-only ``` The `observed` wrapper on the posterior is a new (shallow dataclass clone), but each DataFrame column is rebuilt over the original numpy buffer with the write flag off. This is cheap — no data copy — and catches mutation attempts on the posterior side loudly. It does NOT prevent mutation through the original `observed` handle; if you mutate `observed.variants[0].visitors` after fitting, the corresponding cells in `posterior.observed.variants[0].visitors` reflect that (shared buffer). For a defensive snapshot independent of the original handle, pass `observed_copy="deep"`: ```python posterior = pt.fit_hurdle_bcf(observed, observed_copy="deep") observed.variants[0].visitors.iloc[0, 0] = 999 # posterior.observed.variants[0].visitors.iloc[0, 0] unchanged — # independent buffer ``` Three modes are available: - `observed_copy="view"` (default): shallow dataclass clone with read-only array views; cheap; protects against pytyche-side mutation - `observed_copy="deep"`: full deep copy; doubles memory for the input data; protects against caller-side mutation too - `observed_copy="ref"`: holds `posterior.observed is observed` directly; no view wrappers, no protection; rarely worth using JAX arrays pass through every mode unchanged — they are immutable by construction. ## `AnalysisResult`: derived analysis The derived analysis output. Combines comparisons, the recommendation summary, discovered segments, and per-visitor CATEs into a single presentation-ready result. Returned by `posterior.analyze()` — the method derives the analysis fields from the posterior and the observed data it was fit on. ```python posterior = pt.fit_hurdle_bcf(observed) analysis = posterior.analyze() analysis.comparisons # list[Comparison] — per-pair effects analysis.recommendation # RecommendationSummary — SHIP/STOP/CONTINUE # 5-threshold compound rule analysis.segments # list[DiscoveredSegment] — HTE segments # with gate_estimate, gate_ci, stability_score analysis.cate_per_visitor # np.ndarray — (n,) at K=2; # (n, K-1) at K>=3 analysis.is_calibrated # bool — property delegating to # posterior.is_calibrated ``` Each `DiscoveredSegment` in `analysis.segments` carries: - `id: int` — leaf id within the parent policy tree - `rule: SegmentRule` — predicate describing the segment membership - `gate_estimate: float` — posterior mean of the segment-level GATE - `gate_ci: tuple[float, float]` — 80% credible interval on the GATE - `population_share: float` — fraction of visitors in this segment - `stability_score: float` — bootstrap-replicability score in `[0, 1]` - `arm_best_probabilities: dict[str, float]` — probability each treatment (or control) is best in this segment under the shared best-arm rule; keyed by ALL variant names including control, and sums to `1.0`. Control wins a draw when no treatment contrast is positive. `AnalysisResult` is the canonical analysis output. Most operator code consumes it directly: ```python # Decide what to ship for seg in analysis.segments: if seg.stability_score >= 0.80: print(f" {seg.rule}: lift {seg.gate_estimate:+.3f}") # Apply the canonical 5-threshold decision rule match analysis.recommendation.decision: case Decision.SHIP: print("Ready to roll out broadly") case Decision.CONTINUE: print("Keep running") case Decision.STOP: print("Pull the treatment") ``` `RecommendationSummary` carries the full decision-theoretic snapshot for a treatment contrast. Its field set: - `decision: Decision` — SHIP / CONTINUE / STOP from the 5-threshold compound rule - `expected_loss_baseline: float` — expected loss of choosing baseline - `expected_loss_comparison: float` — expected loss of choosing the treatment - `probability_positive: float` — `P(comparison > baseline)` - `probability_better: float` — `P(comparison meaningfully better)` - `probability_harmful: float` — `P(comparison meaningfully harmful)` - `thresholds: dict[str, float]` — decision thresholds applied - `expected_value_of_one_more_round: float` — information-theoretic value of running one additional round at the same per-round n, in expected-loss-reduction units. Near-zero means the experiment has effectively converged and additional data is unlikely to change the decision. `AnalysisResult` does not carry the raw posterior. The split matches `pytyche.contracts`'s existing separation between the analysis output and the raw inference data (`AnalysisResult` / `DiagnosticsBundle`). Callers who need the posterior get it as a sibling return value alongside `AnalysisResult`. ## `Experiment`: the per-round snapshot Per-round snapshot of a {term}`sequential experiment`. Composes the posterior and the analysis with round-specific context (cells shipped, per-cell observations, recommendation for next round, sim-only truth comparison). ```python exp = pt.sequential_experiment( generator=generator, schedule=pt.GeometricSchedule(initial=10_000, growth=2.0, n_rounds=3), treatments=['control', 'low_promo', 'free_ship'], ) r = next(exp) # round 1 # raw posterior — escape hatch r.posterior # HurdleBCFResult # derived analysis r.analysis # AnalysisResult r.analysis.segments r.analysis.recommendation # round context r.round_idx # int r.observed # ObservedExperimentData r.cells_shipped # list[Cell] r.cell_observations # list[CellObservation] — per-cell scoreboard r.next_recommendation # NextRoundPlan — what to ship next round r.truth_comparison # TruthComparison | None (sim-only) ``` `r.posterior` and `r.analysis` are sibling fields. Both are populated; the posterior is the root and the analysis is its derivation, but the `Experiment` exposes both at the same level so reaching the posterior is one field access deep rather than nested. ## The three entry points side by side ``` entry point returns ────────────────────────────────── ────────────────────────────────── pt.fit_hurdle_bcf(...) HurdleBCFResult only (the raw posterior; you derive the rest) posterior.analyze(...) AnalysisResult — single round of (after fit_hurdle_bcf) analysis without sequential machinery. pt.analyze(posterior) is the function-form wrapper. pt.sequential_experiment(...) iterates → Experiment per round (each round's Experiment composes the analysis and the posterior with round context) ``` Pick by how much machinery you want: - **Just the posterior, you'll derive the rest yourself**: use `pt.fit_hurdle_bcf`. - **One round's analysis without sequential context** (e.g., analyzing observed data from an experiment that already ran): fit, then call `posterior.analyze()`. The `AnalysisResult` is the same shape the sequential machinery uses internally. - **The sequential adaptive-experiment loop**, with recommendation per round and graduation signals: use `pt.sequential_experiment`. Each round's `Experiment` exposes the posterior as an escape hatch when you want to dig past the standard analysis surfaces. ## Calibration and the result `Calibration` is a frozen dataclass at `pytyche.calibrate.Calibration` (re-exported as `pt.Calibration`). It wraps the layered R(p)/scale-family correction with regime metadata — the metric name, treatment count K, and pooling mode it was fitted on. The `applies_to(observed)` method returns `True` only when the observed data's regime (metric and K) matches the artifact's; `posterior.apply_calibration(calibration)` raises `ValueError` naming the mismatched dimension(s) when the check fails. `apply_calibration` is **K = 2 only in v0.2**. Calling it on a K ≥ 3 posterior raises `NotImplementedError` — per-contrast R(p) recalibration requires per-contrast SBC machinery that ships with the sequential-surface calibration work. When a `Calibration` artifact is passed to `fit_hurdle_bcf` (or consumed by `posterior.apply_calibration`), the recalibration is applied to the posterior **before** the `AnalysisResult` is constructed. The `analysis.is_calibrated` property labels whether the result is calibrated. `Calibration.from_sweep(...)` and `Calibration.skip()` registry constructors arrive with the sequential-surface calibration work — any mention of them in examples should be read as forward-looking. ## Tree data model (pending) > **Two trees, one library.** Pytyche has two distinct tree concepts > that share the word. The {term}`BART forest` lives *inside* the > BCF — many weak trees whose posterior pytyche samples via MCMC, > producing the per-visitor CATE estimates. The {term}`policy tree` > lives *on top* of those CATEs — a single sklearn > `DecisionTreeClassifier` fit on the CATE surface to discover the > segments downstream allocation and graduation operate on. Users > never inspect BART trees directly; what `PolicyTreeResult.tree` > exposes is the policy tree. See the glossary entries for both > terms. The {term}`policy` returned in cell recommendations carries the policy tree + per-leaf {term}`Thompson allocation`. The concrete tree type and serialization format are being worked out in the analysis-surface design pass: - direct sklearn `DecisionTreeClassifier` (current sketch in type stubs) — simple but couples the public API to sklearn versions - pytyche wrapper preserving sklearn ergonomics — version isolation, cleaner spec - framework-agnostic rule-list export — needed for handoff to serving platforms This page updates when the tree contract lands. ## Open questions The following result-surface questions remain open: - **Tree data model**: see [Tree data model (pending)](#tree-data-model-pending) above. ## Related concepts - {doc}`overview` — what pytyche does and who it's for - {doc}`sequential-targeting` — methodology behind the sequential loop - {doc}`statistical-honesty` — the two-gardens framing the calibrated result protects against - {doc}`bcf-calibration-at-scale` — what calibration actually corrects - {doc}`glossary` — definitions of every type and term referenced here