Result objects¶
Pytyche analyses return three result types, each with a distinct role:
HurdleBCFResult— the raw joint posterior from the BCF fit; the root primitive everything else is derived from.AnalysisResult— the derived analysis output: comparisons, recommendation summary, discovered segments, per-visitor CATEs. What most analysis code consumes directly.Experiment— the per-round snapshot from a sequential experiment. Composes the posterior and the analysis with round-specific context (cells shipped, recommendation, truth comparison).
Each entry point returns a different result type:
pt.fit_hurdle_bcf(...)returns the raw posterior only. You derive the analysis yourself, or pass the posterior to the analysis primitives the rest of this doc introduces.posterior.analyze(...)returns anAnalysisResultderived from the posterior. The posterior already carries the observed data it was fit on (see Observed data stashing below), soanalyzetakes only optional knobs. The canonical pattern is two lines:posterior = pt.fit_hurdle_bcf(observed) analysis = posterior.analyze()
A function-form wrapper
pt.analyze(posterior, ...)exists as a thin delegate for callers who prefer functional style. Both forms produce identicalAnalysisResultoutput; the method is the documented canonical form because behavior dispatches naturally through the result type (ContinuousBCFResult.analyze,BinaryBCFResult.analyze,HurdleBCFResult.analyzeeach do the right thing for their posterior shape).pt.sequential_experiment(...)iterates per-roundExperimentsnapshots. Each snapshot exposes the posterior, the derived analysis, the cells shipped that round, and the recommendation for the next round.
The rest of this doc walks each type, then shows the three entry points side by side.
HurdleBCFResult: the raw posterior¶
The joint posterior over all model parameters. Root primitive — every other result field in the library is derived from it.
from pytyche import fit_hurdle_bcf
posterior = fit_hurdle_bcf(observed)
# Sample arrays are visitor-major with chains concatenated into the
# sample axis: S_total = (num_mcmc / thin_factor) * num_chains.
# Per-arm conversion + severity samples (retained only when
# retain_channel_samples=True)
posterior.p_samples # (n, S_total, K) at K >= 3
# legacy p0_samples / p1_samples at K = 2
posterior.sev_samples # (n, S_total, K) at K >= 3
# Per-visitor revenue-per-visitor contrasts (always populated)
posterior.rpv_cate_samples # (n, S_total, K - 1) at K >= 3 — the
# jointly-sampled contrast vector
# (n, S_total) at K = 2
# Precision / variance trace — scalar severity precision, shape-invariant in K
posterior.tau0_samples # (S_total,) at every K
posterior.sigma2_samples # property: 1 / tau0_samples (scalar) at every K
# Pooling mode that produced this result
posterior.pooling # "joint" or "independent"
The posterior carries the full uncertainty structure from the joint multi-arm hurdle BCF fit. Custom derivations build on it:
# A different policy tree depth → different segmentation,
# same posterior
shallow_tree = posterior.fit_policy_tree(max_depth=2)
deeper_tree = posterior.fit_policy_tree(max_depth=4)
# A different Thompson ε floor → different allocation,
# same posterior
tight_allocation = posterior.thompson_allocation(segments=..., epsilon=0.01)
loose_allocation = posterior.thompson_allocation(segments=..., epsilon=0.05)
# Bring your own decision rule — the GraduationRule protocol
# (shipping with the sequential surface) lets you plug in custom
# graduation logic; see docs/how-to/build-your-own-decision-rule.md
This is the “posterior is the root, helpers are derived” story made
concrete. The analysis primitives (posterior.fit_policy_tree,
posterior.thompson_allocation, posterior.apply_calibration,
posterior.recommendation_summary) live as methods on the
posterior; function-form wrappers (pt.fit_policy_tree(posterior, ...)
etc.) are thin delegates for callers who prefer functional style. The
sequential experiment recommendation engine composes the
methods into the default workflow; this section’s examples show the
same methods called directly.
The raw posterior carries the full Bayesian workflow primitives
(traces, per-draw arrays). PyMC’s
InferenceData integration
lives separately in DiagnosticsBundle (per contracts.py:553); the
HurdleBCFResult is the BCF-specific frozen-dataclass form of the
same information.
Observed data stashing¶
The posterior carries the observed data it was fit on as
posterior.observed, mirroring the sklearn convention of attaching
training inputs to a fitted estimator. Downstream analysis methods
(posterior.analyze, posterior.fit_policy_tree, etc.) reach into
posterior.observed for whichever input arrays they need. The caller
passes observed once at fit time and never again.
The default semantics are shallow with read-only views:
posterior = pt.fit_hurdle_bcf(observed)
# Each variant's visitors DataFrame is rebuilt over read-only column
# arrays — the backing numpy buffers are shared but write-protected.
col = posterior.observed.variants[0].visitors["revenue"].to_numpy(copy=False)
col.flags.writeable
# False
# Mutation attempt through the stashed clone raises:
posterior.observed.variants[0].visitors.iloc[0, 0] = 999
# ValueError: assignment destination is read-only
The observed wrapper on the posterior is a new (shallow dataclass
clone), but each DataFrame column is rebuilt over the original numpy
buffer with the write flag off. This is cheap — no data copy — and
catches mutation attempts on the posterior side loudly. It does NOT
prevent mutation through the original observed handle; if you mutate
observed.variants[0].visitors after fitting, the corresponding cells
in posterior.observed.variants[0].visitors reflect that (shared
buffer).
For a defensive snapshot independent of the original handle, pass
observed_copy="deep":
posterior = pt.fit_hurdle_bcf(observed, observed_copy="deep")
observed.variants[0].visitors.iloc[0, 0] = 999
# posterior.observed.variants[0].visitors.iloc[0, 0] unchanged —
# independent buffer
Three modes are available:
observed_copy="view"(default): shallow dataclass clone with read-only array views; cheap; protects against pytyche-side mutationobserved_copy="deep": full deep copy; doubles memory for the input data; protects against caller-side mutation tooobserved_copy="ref": holdsposterior.observed is observeddirectly; no view wrappers, no protection; rarely worth using
JAX arrays pass through every mode unchanged — they are immutable by construction.
AnalysisResult: derived analysis¶
The derived analysis output. Combines comparisons, the recommendation summary,
discovered segments, and per-visitor CATEs into a single
presentation-ready result. Returned by posterior.analyze() — the
method derives the analysis fields from the posterior and the
observed data it was fit on.
posterior = pt.fit_hurdle_bcf(observed)
analysis = posterior.analyze()
analysis.comparisons # list[Comparison] — per-pair effects
analysis.recommendation # RecommendationSummary — SHIP/STOP/CONTINUE
# 5-threshold compound rule
analysis.segments # list[DiscoveredSegment] — HTE segments
# with gate_estimate, gate_ci, stability_score
analysis.cate_per_visitor # np.ndarray — (n,) at K=2;
# (n, K-1) at K>=3
analysis.is_calibrated # bool — property delegating to
# posterior.is_calibrated
Each DiscoveredSegment in analysis.segments carries:
id: int— leaf id within the parent policy treerule: SegmentRule— predicate describing the segment membershipgate_estimate: float— posterior mean of the segment-level GATEgate_ci: tuple[float, float]— 80% credible interval on the GATEpopulation_share: float— fraction of visitors in this segmentstability_score: float— bootstrap-replicability score in[0, 1]arm_best_probabilities: dict[str, float]— probability each treatment (or control) is best in this segment under the shared best-arm rule; keyed by ALL variant names including control, and sums to1.0. Control wins a draw when no treatment contrast is positive.
AnalysisResult is the canonical analysis output. Most operator code
consumes it directly:
# Decide what to ship
for seg in analysis.segments:
if seg.stability_score >= 0.80:
print(f" {seg.rule}: lift {seg.gate_estimate:+.3f}")
# Apply the canonical 5-threshold decision rule
match analysis.recommendation.decision:
case Decision.SHIP:
print("Ready to roll out broadly")
case Decision.CONTINUE:
print("Keep running")
case Decision.STOP:
print("Pull the treatment")
RecommendationSummary carries the full decision-theoretic snapshot
for a treatment contrast. Its field set:
decision: Decision— SHIP / CONTINUE / STOP from the 5-threshold compound ruleexpected_loss_baseline: float— expected loss of choosing baselineexpected_loss_comparison: float— expected loss of choosing the treatmentprobability_positive: float—P(comparison > baseline)probability_better: float—P(comparison meaningfully better)probability_harmful: float—P(comparison meaningfully harmful)thresholds: dict[str, float]— decision thresholds appliedexpected_value_of_one_more_round: float— information-theoretic value of running one additional round at the same per-round n, in expected-loss-reduction units. Near-zero means the experiment has effectively converged and additional data is unlikely to change the decision.
AnalysisResult does not carry the raw posterior. The split matches
pytyche.contracts’s existing separation between the analysis output
and the raw inference data (AnalysisResult / DiagnosticsBundle).
Callers who need the posterior get it as a sibling return value
alongside AnalysisResult.
Experiment: the per-round snapshot¶
Per-round snapshot of a sequential experiment. Composes the posterior and the analysis with round-specific context (cells shipped, per-cell observations, recommendation for next round, sim-only truth comparison).
exp = pt.sequential_experiment(
generator=generator,
schedule=pt.GeometricSchedule(initial=10_000, growth=2.0, n_rounds=3),
treatments=['control', 'low_promo', 'free_ship'],
)
r = next(exp) # round 1
# raw posterior — escape hatch
r.posterior # HurdleBCFResult
# derived analysis
r.analysis # AnalysisResult
r.analysis.segments
r.analysis.recommendation
# round context
r.round_idx # int
r.observed # ObservedExperimentData
r.cells_shipped # list[Cell]
r.cell_observations # list[CellObservation] — per-cell scoreboard
r.next_recommendation # NextRoundPlan — what to ship next round
r.truth_comparison # TruthComparison | None (sim-only)
r.posterior and r.analysis are sibling fields. Both are populated;
the posterior is the root and the analysis is its derivation, but the
Experiment exposes both at the same level so reaching the posterior
is one field access deep rather than nested.
The three entry points side by side¶
entry point returns
────────────────────────────────── ──────────────────────────────────
pt.fit_hurdle_bcf(...) HurdleBCFResult only
(the raw posterior; you derive
the rest)
posterior.analyze(...) AnalysisResult — single round of
(after fit_hurdle_bcf) analysis without sequential
machinery. pt.analyze(posterior)
is the function-form wrapper.
pt.sequential_experiment(...) iterates → Experiment per round
(each round's Experiment
composes the analysis and the
posterior with round context)
Pick by how much machinery you want:
Just the posterior, you’ll derive the rest yourself: use
pt.fit_hurdle_bcf.One round’s analysis without sequential context (e.g., analyzing observed data from an experiment that already ran): fit, then call
posterior.analyze(). TheAnalysisResultis the same shape the sequential machinery uses internally.The sequential adaptive-experiment loop, with recommendation per round and graduation signals: use
pt.sequential_experiment. Each round’sExperimentexposes the posterior as an escape hatch when you want to dig past the standard analysis surfaces.
Calibration and the result¶
Calibration is a frozen dataclass at pytyche.calibrate.Calibration
(re-exported as pt.Calibration). It wraps the layered R(p)/scale-family
correction with regime metadata — the metric name, treatment count K, and
pooling mode it was fitted on. The applies_to(observed) method returns
True only when the observed data’s regime (metric and K) matches the
artifact’s; posterior.apply_calibration(calibration) raises ValueError
naming the mismatched dimension(s) when the check fails.
apply_calibration is K = 2 only in v0.2. Calling it on a K ≥ 3
posterior raises NotImplementedError — per-contrast R(p) recalibration
requires per-contrast SBC machinery that ships with the sequential-surface
calibration work.
When a Calibration artifact is passed to fit_hurdle_bcf (or consumed
by posterior.apply_calibration), the recalibration is applied to the
posterior before the AnalysisResult is constructed. The
analysis.is_calibrated property labels whether the result is calibrated.
Calibration.from_sweep(...) and Calibration.skip() registry
constructors arrive with the sequential-surface calibration work — any
mention of them in examples should be read as forward-looking.
Tree data model (pending)¶
Two trees, one library. Pytyche has two distinct tree concepts that share the word. The BART forest lives inside the BCF — many weak trees whose posterior pytyche samples via MCMC, producing the per-visitor CATE estimates. The policy tree lives on top of those CATEs — a single sklearn
DecisionTreeClassifierfit on the CATE surface to discover the segments downstream allocation and graduation operate on. Users never inspect BART trees directly; whatPolicyTreeResult.treeexposes is the policy tree. See the glossary entries for both terms.
The policy returned in cell recommendations carries the policy tree + per-leaf Thompson allocation. The concrete tree type and serialization format are being worked out in the analysis-surface design pass:
direct sklearn
DecisionTreeClassifier(current sketch in type stubs) — simple but couples the public API to sklearn versionspytyche wrapper preserving sklearn ergonomics — version isolation, cleaner spec
framework-agnostic rule-list export — needed for handoff to serving platforms
This page updates when the tree contract lands.
Open questions¶
The following result-surface questions remain open:
Tree data model: see Tree data model (pending) above.