---
title: Result objects
review-state: drafting
last-human-review: "2026-06-11"
depends-on:
  - src/pytyche/contracts.py
  - src/pytyche/bcf/config.py
owner: tradcliffe
quadrant: concept
---

# Result objects

Pytyche analyses return three result types, each with a distinct role:

- `HurdleBCFResult` — the raw joint posterior from the BCF fit; the
  root primitive everything else is derived from.
- `AnalysisResult` — the derived analysis output: comparisons,
  recommendation summary, discovered segments, per-visitor CATEs. What
  most analysis code consumes directly.
- `Experiment` — the per-round snapshot from a {term}`sequential
  experiment`. Composes the posterior and the analysis with
  round-specific context (cells shipped, recommendation, truth
  comparison).

Each entry point returns a different result type:

- `pt.fit_hurdle_bcf(...)` returns the raw posterior only. You
  derive the analysis yourself, or pass the posterior to the analysis
  primitives the rest of this doc introduces.
- `posterior.analyze(...)` returns an `AnalysisResult` derived from
  the posterior. The posterior already carries the observed data it
  was fit on (see [Observed data stashing](#observed-data-stashing)
  below), so `analyze` takes only optional knobs. The canonical
  pattern is two lines:

  ```python
  posterior = pt.fit_hurdle_bcf(observed)
  analysis = posterior.analyze()
  ```

  A function-form wrapper `pt.analyze(posterior, ...)` exists as a
  thin delegate for callers who prefer functional style. Both forms
  produce identical `AnalysisResult` output; the method is the
  documented canonical form because behavior dispatches naturally
  through the result type (`ContinuousBCFResult.analyze`,
  `BinaryBCFResult.analyze`, `HurdleBCFResult.analyze` each do
  the right thing for their posterior shape).
- `pt.sequential_experiment(...)` iterates per-round `Experiment`
  snapshots. Each snapshot exposes the posterior, the derived
  analysis, the cells shipped that round, and the recommendation for
  the next round.

The rest of this doc walks each type, then shows the three entry
points side by side.

## `HurdleBCFResult`: the raw posterior

The joint posterior over all model parameters. Root primitive — every
other result field in the library is derived from it.

```python
from pytyche import fit_hurdle_bcf

posterior = fit_hurdle_bcf(observed)

# Sample arrays are visitor-major with chains concatenated into the
# sample axis: S_total = (num_mcmc / thin_factor) * num_chains.

# Per-arm conversion + severity samples (retained only when
# retain_channel_samples=True)
posterior.p_samples          # (n, S_total, K) at K >= 3
                             # legacy p0_samples / p1_samples at K = 2
posterior.sev_samples        # (n, S_total, K) at K >= 3

# Per-visitor revenue-per-visitor contrasts (always populated)
posterior.rpv_cate_samples   # (n, S_total, K - 1) at K >= 3 — the
                             # jointly-sampled contrast vector
                             # (n, S_total) at K = 2

# Precision / variance trace — scalar severity precision, shape-invariant in K
posterior.tau0_samples       # (S_total,) at every K
posterior.sigma2_samples     # property: 1 / tau0_samples (scalar) at every K

# Pooling mode that produced this result
posterior.pooling            # "joint" or "independent"
```

The posterior carries the full uncertainty structure from the joint
multi-arm hurdle BCF fit. Custom derivations build on it:

```python
# A different policy tree depth → different segmentation,
# same posterior
shallow_tree = posterior.fit_policy_tree(max_depth=2)
deeper_tree = posterior.fit_policy_tree(max_depth=4)

# A different Thompson ε floor → different allocation,
# same posterior
tight_allocation = posterior.thompson_allocation(segments=..., epsilon=0.01)
loose_allocation = posterior.thompson_allocation(segments=..., epsilon=0.05)

# Bring your own decision rule — the GraduationRule protocol
# (shipping with the sequential surface) lets you plug in custom
# graduation logic; see docs/how-to/build-your-own-decision-rule.md
```

This is the "posterior is the root, helpers are derived" story made
concrete. The analysis primitives (`posterior.fit_policy_tree`,
`posterior.thompson_allocation`, `posterior.apply_calibration`,
`posterior.recommendation_summary`) live as methods on the
posterior; function-form wrappers (`pt.fit_policy_tree(posterior, ...)`
etc.) are thin delegates for callers who prefer functional style. The
{term}`sequential experiment` recommendation engine composes the
methods into the default workflow; this section's examples show the
same methods called directly.

The raw posterior carries the full Bayesian workflow primitives
(traces, per-draw arrays). PyMC's
`InferenceData` integration
lives separately in `DiagnosticsBundle` (per `contracts.py:553`); the
`HurdleBCFResult` is the BCF-specific frozen-dataclass form of the
same information.

### Observed data stashing

The posterior carries the observed data it was fit on as
`posterior.observed`, mirroring the sklearn convention of attaching
training inputs to a fitted estimator. Downstream analysis methods
(`posterior.analyze`, `posterior.fit_policy_tree`, etc.) reach into
`posterior.observed` for whichever input arrays they need. The caller
passes observed once at fit time and never again.

The default semantics are **shallow with read-only views**:

```python
posterior = pt.fit_hurdle_bcf(observed)

# Each variant's visitors DataFrame is rebuilt over read-only column
# arrays — the backing numpy buffers are shared but write-protected.
col = posterior.observed.variants[0].visitors["revenue"].to_numpy(copy=False)
col.flags.writeable
# False

# Mutation attempt through the stashed clone raises:
posterior.observed.variants[0].visitors.iloc[0, 0] = 999
# ValueError: assignment destination is read-only
```

The `observed` wrapper on the posterior is a new (shallow dataclass
clone), but each DataFrame column is rebuilt over the original numpy
buffer with the write flag off. This is cheap — no data copy — and
catches mutation attempts on the posterior side loudly. It does NOT
prevent mutation through the original `observed` handle; if you mutate
`observed.variants[0].visitors` after fitting, the corresponding cells
in `posterior.observed.variants[0].visitors` reflect that (shared
buffer).

For a defensive snapshot independent of the original handle, pass
`observed_copy="deep"`:

```python
posterior = pt.fit_hurdle_bcf(observed, observed_copy="deep")
observed.variants[0].visitors.iloc[0, 0] = 999
# posterior.observed.variants[0].visitors.iloc[0, 0] unchanged —
# independent buffer
```

Three modes are available:

- `observed_copy="view"` (default): shallow dataclass clone with
  read-only array views; cheap; protects against pytyche-side mutation
- `observed_copy="deep"`: full deep copy; doubles memory for the input
  data; protects against caller-side mutation too
- `observed_copy="ref"`: holds `posterior.observed is observed`
  directly; no view wrappers, no protection; rarely worth using

JAX arrays pass through every mode unchanged — they are immutable by
construction.

## `AnalysisResult`: derived analysis

The derived analysis output. Combines comparisons, the recommendation summary,
discovered segments, and per-visitor CATEs into a single
presentation-ready result. Returned by `posterior.analyze()` — the
method derives the analysis fields from the posterior and the
observed data it was fit on.

```python
posterior = pt.fit_hurdle_bcf(observed)

analysis = posterior.analyze()

analysis.comparisons         # list[Comparison] — per-pair effects
analysis.recommendation          # RecommendationSummary — SHIP/STOP/CONTINUE
                             # 5-threshold compound rule
analysis.segments            # list[DiscoveredSegment] — HTE segments
                             # with gate_estimate, gate_ci, stability_score
analysis.cate_per_visitor    # np.ndarray — (n,) at K=2;
                             #              (n, K-1) at K>=3
analysis.is_calibrated       # bool — property delegating to
                             # posterior.is_calibrated
```

Each `DiscoveredSegment` in `analysis.segments` carries:

- `id: int` — leaf id within the parent policy tree
- `rule: SegmentRule` — predicate describing the segment membership
- `gate_estimate: float` — posterior mean of the segment-level GATE
- `gate_ci: tuple[float, float]` — 80% credible interval on the GATE
- `population_share: float` — fraction of visitors in this segment
- `stability_score: float` — bootstrap-replicability score in `[0, 1]`
- `arm_best_probabilities: dict[str, float]` — probability each treatment
  (or control) is best in this segment under the shared best-arm rule;
  keyed by ALL variant names including control, and sums to `1.0`.
  Control wins a draw when no treatment contrast is positive.

`AnalysisResult` is the canonical analysis output. Most operator code
consumes it directly:

```python
# Decide what to ship
for seg in analysis.segments:
    if seg.stability_score >= 0.80:
        print(f"  {seg.rule}: lift {seg.gate_estimate:+.3f}")

# Apply the canonical 5-threshold decision rule
match analysis.recommendation.decision:
    case Decision.SHIP:
        print("Ready to roll out broadly")
    case Decision.CONTINUE:
        print("Keep running")
    case Decision.STOP:
        print("Pull the treatment")
```

`RecommendationSummary` carries the full decision-theoretic snapshot
for a treatment contrast. Its field set:

- `decision: Decision` — SHIP / CONTINUE / STOP from the 5-threshold compound rule
- `expected_loss_baseline: float` — expected loss of choosing baseline
- `expected_loss_comparison: float` — expected loss of choosing the treatment
- `probability_positive: float` — `P(comparison > baseline)`
- `probability_better: float` — `P(comparison meaningfully better)`
- `probability_harmful: float` — `P(comparison meaningfully harmful)`
- `thresholds: dict[str, float]` — decision thresholds applied
- `expected_value_of_one_more_round: float` — information-theoretic value
  of running one additional round at the same per-round n, in
  expected-loss-reduction units. Near-zero means the experiment has
  effectively converged and additional data is unlikely to change the
  decision.

`AnalysisResult` does not carry the raw posterior. The split matches
`pytyche.contracts`'s existing separation between the analysis output
and the raw inference data (`AnalysisResult` / `DiagnosticsBundle`).
Callers who need the posterior get it as a sibling return value
alongside `AnalysisResult`.

## `Experiment`: the per-round snapshot

Per-round snapshot of a {term}`sequential experiment`. Composes the
posterior and the analysis with round-specific context (cells shipped,
per-cell observations, recommendation for next round, sim-only truth
comparison).

```python
exp = pt.sequential_experiment(
    generator=generator,
    schedule=pt.GeometricSchedule(initial=10_000, growth=2.0, n_rounds=3),
    treatments=['control', 'low_promo', 'free_ship'],
)

r = next(exp)  # round 1

# raw posterior — escape hatch
r.posterior                  # HurdleBCFResult

# derived analysis
r.analysis                   # AnalysisResult
r.analysis.segments
r.analysis.recommendation

# round context
r.round_idx                  # int
r.observed                   # ObservedExperimentData
r.cells_shipped              # list[Cell]
r.cell_observations          # list[CellObservation] — per-cell scoreboard
r.next_recommendation        # NextRoundPlan — what to ship next round
r.truth_comparison           # TruthComparison | None (sim-only)
```

`r.posterior` and `r.analysis` are sibling fields. Both are populated;
the posterior is the root and the analysis is its derivation, but the
`Experiment` exposes both at the same level so reaching the posterior
is one field access deep rather than nested.

## The three entry points side by side

```
   entry point                        returns
   ────────────────────────────────── ──────────────────────────────────
   pt.fit_hurdle_bcf(...)       HurdleBCFResult only
                                      (the raw posterior; you derive
                                      the rest)

   posterior.analyze(...)             AnalysisResult — single round of
   (after fit_hurdle_bcf)       analysis without sequential
                                      machinery. pt.analyze(posterior)
                                      is the function-form wrapper.

   pt.sequential_experiment(...)      iterates → Experiment per round
                                      (each round's Experiment
                                      composes the analysis and the
                                      posterior with round context)
```

Pick by how much machinery you want:

- **Just the posterior, you'll derive the rest yourself**: use
  `pt.fit_hurdle_bcf`.
- **One round's analysis without sequential context** (e.g., analyzing
  observed data from an experiment that already ran): fit, then call
  `posterior.analyze()`. The `AnalysisResult` is the same shape the
  sequential machinery uses internally.
- **The sequential adaptive-experiment loop**, with recommendation
  per round and graduation signals: use `pt.sequential_experiment`.
  Each round's `Experiment` exposes the posterior as an escape hatch
  when you want to dig past the standard analysis surfaces.

## Calibration and the result

`Calibration` is a frozen dataclass at `pytyche.calibrate.Calibration`
(re-exported as `pt.Calibration`). It wraps the layered R(p)/scale-family
correction with regime metadata — the metric name, treatment count K, and
pooling mode it was fitted on. The `applies_to(observed)` method returns
`True` only when the observed data's regime (metric and K) matches the
artifact's; `posterior.apply_calibration(calibration)` raises `ValueError`
naming the mismatched dimension(s) when the check fails.

`apply_calibration` is **K = 2 only in v0.2**. Calling it on a K ≥ 3
posterior raises `NotImplementedError` — per-contrast R(p) recalibration
requires per-contrast SBC machinery that ships with the sequential-surface
calibration work.

When a `Calibration` artifact is passed to `fit_hurdle_bcf` (or consumed
by `posterior.apply_calibration`), the recalibration is applied to the
posterior **before** the `AnalysisResult` is constructed. The
`analysis.is_calibrated` property labels whether the result is calibrated.

`Calibration.from_sweep(...)` and `Calibration.skip()` registry
constructors arrive with the sequential-surface calibration work — any
mention of them in examples should be read as forward-looking.

## Tree data model (pending)

> **Two trees, one library.** Pytyche has two distinct tree concepts
> that share the word. The {term}`BART forest` lives *inside* the
> BCF — many weak trees whose posterior pytyche samples via MCMC,
> producing the per-visitor CATE estimates. The {term}`policy tree`
> lives *on top* of those CATEs — a single sklearn
> `DecisionTreeClassifier` fit on the CATE surface to discover the
> segments downstream allocation and graduation operate on. Users
> never inspect BART trees directly; what `PolicyTreeResult.tree`
> exposes is the policy tree. See the glossary entries for both
> terms.

The {term}`policy` returned in cell recommendations carries the
policy tree + per-leaf {term}`Thompson allocation`. The concrete tree
type and serialization format are being worked out in the
analysis-surface design pass:

- direct sklearn `DecisionTreeClassifier` (current sketch in type
  stubs) — simple but couples the public API to sklearn versions
- pytyche wrapper preserving sklearn ergonomics — version isolation,
  cleaner spec
- framework-agnostic rule-list export — needed for handoff to serving
  platforms

This page updates when the tree contract lands.

## Open questions

The following result-surface questions remain open:

- **Tree data model**: see [Tree data model (pending)](#tree-data-model-pending)
  above.

## Related concepts

- {doc}`overview` — what pytyche does and who it's for
- {doc}`sequential-targeting` — methodology behind the sequential loop
- {doc}`statistical-honesty` — the two-gardens framing the calibrated
  result protects against
- {doc}`bcf-calibration-at-scale` — what calibration actually corrects
- {doc}`glossary` — definitions of every type and term referenced here