pytyche.bcf.config

Configuration dataclass, result types, and small utilities for the GPU BCF.

This module holds the user-facing configuration object (GPUBCFConfig), the three result dataclasses returned by the fit_* entry points, the formula-driven compute_num_trees_tau helper, and the leaf-index dtype selector used to size heap-layout tree arrays. Pure types and utilities — no JIT-compiled code, no GPU device handles, no module-level state. Importing this module is cheap and triggers no GPU work.

Import graph

gpu_bcf_config depends on JAX (for the leaf-index dtype helper), numpy (for result-array typing), and scipy.stats (for the inverse-normal quantile in compute_num_trees_tau). It does not import from any sibling gpu_bcf_* module. The orchestrator and downstream modules import FROM here, never the other way around.

Contents

_leaf_index_dtype — smallest unsigned int dtype for heap node indices at a given tree depth. GPUBCFConfig — frozen dataclass of MCMC and prior hyperparameters for the GPU BCF. compute_num_trees_tau — formula for the minimum tau-forest tree count at a target CI coverage. ContinuousBCFResult — result container for fit_continuous_bcf. BinaryBCFResult — result container for fit_binary_bcf. HurdleBCFResult — result container for fit_hurdle_bcf.

Functions

compute_num_trees_tau(n[, d_tau, sigma_tau, ...])

Formula-driven tau tree count for target CI coverage.

Classes

BinaryBCFResult(mu_samples, tau_samples, ...)

Result from binary (probit) BCF.

ContinuousBCFResult(mu_samples, tau_samples, ...)

Result from continuous BCF.

GPUBCFConfig([num_burnin, num_mcmc, ...])

Sampling configuration for GPU BCF via bartz.

HurdleBCFResult(rpv_cate_samples, p0_mean, ...)

Result from joint shared-tree hurdle BCF.

class pytyche.bcf.config.GPUBCFConfig(num_burnin=200, num_mcmc=200, num_trees_mu=200, num_trees_tau=50, max_depth=6, alpha_mu=0.95, beta_mu=2.0, alpha_tau=0.75, beta_tau=3.0, num_cuts=100, random_seed=42, num_chains=1, diagnostic_interval=50, thin_factor=1, num_gfr_sweeps=5, min_samples_leaf=5, gfr_backend='gpu', trace_path=None, var_tau_sev=0.5, kappa_sev=1.0, tau0_a_prior=1.0, tau0_b_prior=1.0, freeze_gamma=False, retain_channel_samples=True, focal_severity=False, per_leaf_gamma=False, retain_topology_history=False)[source]

Bases: object

Sampling configuration for GPU BCF via bartz.

num_burnin

Number of MCMC burn-in iterations (discarded).

num_mcmc

Number of MCMC samples to retain for posterior inference.

num_trees_mu

Number of trees in the prognostic (mu) forest.

num_trees_tau

Number of trees in the treatment effect (tau) forest.

max_depth

Maximum tree depth (controls p_nonterminal array length).

alpha_mu / beta_mu

Tree prior hyperparameters for mu forest.

alpha_tau / beta_tau

Tree prior hyperparameters for tau forest (tighter = more regularized).

num_cuts

Number of quantile-based split cutpoints per covariate.

random_seed

Seed for JAX PRNG.

num_chains

Number of parallel MCMC chains (vmapped). 1 = single-chain (legacy).

diagnostic_interval

Iterations per chunk for between-chunk diagnostics. Must divide both num_burnin and num_mcmc evenly.

thin_factor

Keep every thin_factor-th sample during MCMC (1 = keep all).

retain_topology_history

If True, retain per-iter per-tree topology hashes and move metadata on HurdleBCFResult.topology_history for mobility diagnostics. Default False (no retention, byte-identical to pre-feature behaviour).

Parameters:
  • num_burnin (int)

  • num_mcmc (int)

  • num_trees_mu (int)

  • num_trees_tau (int)

  • max_depth (int)

  • alpha_mu (float)

  • beta_mu (float)

  • alpha_tau (float)

  • beta_tau (float)

  • num_cuts (int)

  • random_seed (int)

  • num_chains (int)

  • diagnostic_interval (int)

  • thin_factor (int)

  • num_gfr_sweeps (int)

  • min_samples_leaf (int)

  • gfr_backend (str)

  • trace_path (str | None)

  • var_tau_sev (float)

  • kappa_sev (float)

  • tau0_a_prior (float)

  • tau0_b_prior (float)

  • freeze_gamma (bool)

  • retain_channel_samples (bool)

  • focal_severity (bool)

  • per_leaf_gamma (bool)

  • retain_topology_history (bool)

pytyche.bcf.config.compute_num_trees_tau(n, d_tau=3.0, sigma_tau=0.5, coverage=0.9, floor=50, ceiling=400)[source]

Formula-driven tau tree count for target CI coverage.

T_min = ceil((d_tau * sigma_tau * sqrt(n) / (2 * z))^(2/3))

The tau forest’s piecewise-constant approximation has O(1) bias that dominates the posterior at large n (O(1/sqrt(n)) concentration). This formula computes the minimum T to keep bias below the CI half-width at the target coverage level.

Parameters:
  • n (int)

  • d_tau (float)

  • sigma_tau (float)

  • coverage (float)

  • floor (int)

  • ceiling (int)

Return type:

int

class pytyche.bcf.config.ContinuousBCFResult(mu_samples, tau_samples, sigma2_samples, y_bar, y_std, wall_clock_seconds, *, observed=None, is_calibrated=False, calibration=None)[source]

Bases: object

Result from continuous BCF.

mu_samples

(n, num_mcmc) prognostic predictions (standardized).

tau_samples

(n, num_mcmc) treatment effects (standardized).

sigma2_samples

(num_mcmc,) error variance (standardized).

y_bar

Mean of the outcome used for standardization.

y_std

Standard deviation of the outcome used for standardization.

wall_clock_seconds

Wall-clock time for the fit in seconds.

observed

The ObservedExperimentData the fit consumed, attached to the result so the analysis methods can reach the visitor rows and variant metadata. None when constructed by private raw-array helpers; populated by the public fit wrappers.

is_calibrated

True only after apply_calibration has been called on this result. Defaults to False.

calibration

The Calibration artifact attached by apply_calibration; None on fresh fits. The v0.2 artifact scope is interval corrections only — it is consumed where interval summaries are built, never to transform sample arrays.

Parameters:
  • mu_samples (ndarray)

  • tau_samples (ndarray)

  • sigma2_samples (ndarray)

  • y_bar (float)

  • y_std (float)

  • wall_clock_seconds (float)

  • observed (ObservedExperimentData | None)

  • is_calibrated (bool)

  • calibration (Calibration | None)

thompson_allocation(segments, epsilon=0.02)[source]

Per-segment traffic split: each arm’s weight is the posterior probability that it is the segment’s best arm.

Thompson sampling at segment granularity: per segment, each posterior draw votes for its best arm (the largest member-mean contrast, or control when none is positive); an arm’s weight is its win frequency over draws.

Parameters:
  • segments (Sequence[DiscoveredSegment]) – Segments to allocate over (only id and rule are consumed); membership is resolved against self.observed.

  • epsilon (float) – Safety-net exploration floor — arms below epsilon / K are raised to the floor and the rest rescaled, so no arm’s traffic is starved to zero; inert when every arm is already above it. NOT the dial for how much traffic stays on control — that is min_control_weight / min_explore_weight on pt.sequential_experiment; rarely worth overriding.

Return type:

dict[int, dict[str, float]]

Returns:

{segment.id: {variant_name: weight}} — inner dicts in variant order (control first), each summing to 1.

Raises:

ValueError – When self.observed is None.

fit_policy_tree(*, max_depth=3, min_segment_share=0.1, n_bootstrap=50, bootstrap_seed=0)[source]

Discover interpretable segments from the posterior’s per-visitor treatment effects, by fitting a shallow decision tree.

Each visitor is labeled with the arm the posterior expects to be best for them (largest posterior-mean lift, or control when no lift is positive); a multiclass decision tree is fit on the visitors’ features, and each leaf becomes a DiscoveredSegment carrying an exact membership rule, gate estimate/CI, per-arm best probabilities, Thompson allocation, and bootstrap-replicability stability.

Parameters:
  • max_depth (int) – Maximum tree depth.

  • min_segment_share (float) – Minimum fraction of visitors per leaf (sklearn min_weight_fraction_leaf).

  • n_bootstrap (int) – Bootstrap tree refits behind stability_score; 0 skips stability (NaN sentinel plus UserWarning).

  • bootstrap_seed (int) – Seed for the bootstrap resampling RNG.

Return type:

PolicyTreeResult

Returns:

PolicyTreeResult with one segment per leaf, ordered by sklearn leaf id; result.observed is self.observed by identity.

Raises:

ValueError – When self.observed is None.

apply_calibration(calibration)[source]

Return a new posterior with calibration attached.

Attach, don’t transform: the artifact is stashed on the returned copy (is_calibrated=True); every sample array is shared with this posterior by identity. The correction currently applies to intervals only — probabilities and expected losses stay raw; corrected CIs appear where interval summaries are built. K = 2 experiments only (per-contrast recalibration for K >= 3 is not yet implemented).

Parameters:

calibration (Calibration) – SBC-fitted Calibration whose regime (metric, n_treatments) must match self.observed.

Return type:

ContinuousBCFResult

Returns:

New ContinuousBCFResult carrying the artifact; the original is untouched.

Raises:
  • ValueError – When self.observed is None, or on a regime mismatch (message names the mismatched dimensions).

  • NotImplementedError – At K >= 3.

recommendation_summary(treatment, segment=None, *, thresholds=None, min_practical_effect=0.02)[source]

Act-now SHIP / CONTINUE / STOP recommendation for one treatment.

The treatment’s metric-native contrast draws are scoped (segment=None is the global all-visitors snapshot; a segment restricts to its rule’s members), reduced to per-draw mean lift, and summarized under the legacy compare.variants decision rule. v0.2 raw scope: probabilities and expected losses come from the raw draws even on a calibrated posterior — interval corrections land where intervals are built.

Parameters:
  • treatment (str) – Treatment variant name (vs control).

  • segment (DiscoveredSegment | None) – None for the global snapshot; a DiscoveredSegment restricts the computation to its members.

  • thresholds (DecisionThresholds | None) – Decision thresholds; DecisionThresholds() defaults when None.

  • min_practical_effect (float) – Minimum meaningful lift for probability_better / probability_harmful.

Return type:

RecommendationSummary

Returns:

RecommendationSummary with the decision, its evidence, and expected_value_of_one_more_round always populated (closed-form preposterior EVSI; formula in docs/concepts/decision-theoretic-inputs.md).

Raises:

ValueError – When self.observed is None, when treatment is not a treatment name, or when the segment’s rule matches zero visitors.

analyze(*, max_depth=3, min_segment_share=0.1, n_bootstrap=50, bootstrap_seed=0)[source]

The canonical one-call analysis summary for this posterior.

Composes per-treatment Comparison summaries, the embedded policy-tree segmentation (keyword arguments forward to it), the global RecommendationSummary for the best challenger, and the posterior-mean per-visitor CATEs. Anything needing posterior samples goes through analysis.posterior.

Parameters:
  • max_depth (int) – Embedded policy tree depth.

  • min_segment_share (float) – Minimum per-leaf population share.

  • n_bootstrap (int) – Stability bootstrap count (0 skips stability with a UserWarning).

  • bootstrap_seed (int) – Stability bootstrap seed.

Return type:

AnalysisResult

Returns:

AnalysisResult; analysis.is_calibrated reads through to this posterior’s flag.

Raises:

ValueError – When self.observed is None.

evaluate_against_truth(tree, truth)[source]

Sim-mode evaluation of tree’s policy against ground truth.

Parameters:
  • tree (PolicyTreeResult) – The fitted policy whose assignments are evaluated.

  • truth (CalibrationTruth | None) – Ground truth from the simulation path; None in real-data mode (raises — nothing to evaluate against).

Return type:

TruthComparison

Returns:

TruthComparison (cate_rmse, policy_accuracy, and the realized-RPV trio with the oracle gap).

Raises:
  • RuntimeError – When truth is None (real-data mode).

  • ValueError – When self.observed is None or the truth lacks the K-appropriate contrast / potential-outcome fields.

has_credible_segments(threshold=0.8)[source]

Whether some discovered segment clears threshold stability.

Runs fit_policy_tree at its defaults (deterministic given the default bootstrap_seed) and checks for a segment with stability_score >= threshold. The 0.80 default matches the default graduation rule’s SHIP-gate stability threshold.

Parameters:

threshold (float) – Minimum bootstrap-replicability stability score.

Return type:

bool

Returns:

True iff at least one discovered segment clears it.

has_decomposition()[source]

Whether this posterior carries the conversion/severity split.

Return type:

bool

Returns:

False — only hurdle posteriors carry the conversion/severity decomposition.

class pytyche.bcf.config.BinaryBCFResult(mu_samples, tau_samples, wall_clock_seconds, *, observed=None, is_calibrated=False, calibration=None)[source]

Bases: object

Result from binary (probit) BCF.

mu_samples

(n, num_mcmc) prognostic predictions (probit scale).

tau_samples

(n, num_mcmc) treatment effects (probit scale).

wall_clock_seconds

Wall-clock time for the fit in seconds.

observed

The ObservedExperimentData the fit consumed, attached to the result so the analysis methods can reach the visitor rows and variant metadata. None when constructed by private raw-array helpers; populated by the public fit wrappers.

is_calibrated

True only after apply_calibration has been called on this result. Defaults to False.

calibration

The Calibration artifact attached by apply_calibration; None on fresh fits. The v0.2 artifact scope is interval corrections only — it is consumed where interval summaries are built, never to transform sample arrays.

Parameters:
  • mu_samples (ndarray)

  • tau_samples (ndarray)

  • wall_clock_seconds (float)

  • observed (ObservedExperimentData | None)

  • is_calibrated (bool)

  • calibration (Calibration | None)

thompson_allocation(segments, epsilon=0.02)[source]

Per-segment traffic split: each arm’s weight is the posterior probability that it is the segment’s best arm.

Thompson sampling at segment granularity: per segment, each posterior draw votes for its best arm (the largest member-mean contrast, or control when none is positive); an arm’s weight is its win frequency over draws.

Parameters:
  • segments (Sequence[DiscoveredSegment]) – Segments to allocate over (only id and rule are consumed); membership is resolved against self.observed.

  • epsilon (float) – Safety-net exploration floor — arms below epsilon / K are raised to the floor and the rest rescaled, so no arm’s traffic is starved to zero; inert when every arm is already above it. NOT the dial for how much traffic stays on control — that is min_control_weight / min_explore_weight on pt.sequential_experiment; rarely worth overriding.

Return type:

dict[int, dict[str, float]]

Returns:

{segment.id: {variant_name: weight}} — inner dicts in variant order (control first), each summing to 1.

Raises:

ValueError – When self.observed is None.

fit_policy_tree(*, max_depth=3, min_segment_share=0.1, n_bootstrap=50, bootstrap_seed=0)[source]

Discover interpretable segments from the posterior’s per-visitor treatment effects, by fitting a shallow decision tree.

Each visitor is labeled with the arm the posterior expects to be best for them (largest posterior-mean lift, or control when no lift is positive); a multiclass decision tree is fit on the visitors’ features, and each leaf becomes a DiscoveredSegment carrying an exact membership rule, gate estimate/CI, per-arm best probabilities, Thompson allocation, and bootstrap-replicability stability.

Parameters:
  • max_depth (int) – Maximum tree depth.

  • min_segment_share (float) – Minimum fraction of visitors per leaf (sklearn min_weight_fraction_leaf).

  • n_bootstrap (int) – Bootstrap tree refits behind stability_score; 0 skips stability (NaN sentinel plus UserWarning).

  • bootstrap_seed (int) – Seed for the bootstrap resampling RNG.

Return type:

PolicyTreeResult

Returns:

PolicyTreeResult with one segment per leaf, ordered by sklearn leaf id; result.observed is self.observed by identity.

Raises:

ValueError – When self.observed is None.

apply_calibration(calibration)[source]

Return a new posterior with calibration attached.

Attach, don’t transform: the artifact is stashed on the returned copy (is_calibrated=True); every sample array is shared with this posterior by identity. The correction currently applies to intervals only — probabilities and expected losses stay raw; corrected CIs appear where interval summaries are built. K = 2 experiments only (per-contrast recalibration for K >= 3 is not yet implemented).

Parameters:

calibration (Calibration) – SBC-fitted Calibration whose regime (metric, n_treatments) must match self.observed.

Return type:

BinaryBCFResult

Returns:

New BinaryBCFResult carrying the artifact; the original is untouched.

Raises:
  • ValueError – When self.observed is None, or on a regime mismatch (message names the mismatched dimensions).

  • NotImplementedError – At K >= 3.

recommendation_summary(treatment, segment=None, *, thresholds=None, min_practical_effect=0.02)[source]

Act-now SHIP / CONTINUE / STOP recommendation for one treatment.

The treatment’s metric-native contrast draws are scoped (segment=None is the global all-visitors snapshot; a segment restricts to its rule’s members), reduced to per-draw mean lift, and summarized under the legacy compare.variants decision rule. v0.2 raw scope: probabilities and expected losses come from the raw draws even on a calibrated posterior — interval corrections land where intervals are built.

Parameters:
  • treatment (str) – Treatment variant name (vs control).

  • segment (DiscoveredSegment | None) – None for the global snapshot; a DiscoveredSegment restricts the computation to its members.

  • thresholds (DecisionThresholds | None) – Decision thresholds; DecisionThresholds() defaults when None.

  • min_practical_effect (float) – Minimum meaningful lift for probability_better / probability_harmful.

Return type:

RecommendationSummary

Returns:

RecommendationSummary with the decision, its evidence, and expected_value_of_one_more_round always populated (closed-form preposterior EVSI; formula in docs/concepts/decision-theoretic-inputs.md).

Raises:

ValueError – When self.observed is None, when treatment is not a treatment name, or when the segment’s rule matches zero visitors.

analyze(*, max_depth=3, min_segment_share=0.1, n_bootstrap=50, bootstrap_seed=0)[source]

The canonical one-call analysis summary for this posterior.

Composes per-treatment Comparison summaries, the embedded policy-tree segmentation (keyword arguments forward to it), the global RecommendationSummary for the best challenger, and the posterior-mean per-visitor CATEs. Anything needing posterior samples goes through analysis.posterior.

Parameters:
  • max_depth (int) – Embedded policy tree depth.

  • min_segment_share (float) – Minimum per-leaf population share.

  • n_bootstrap (int) – Stability bootstrap count (0 skips stability with a UserWarning).

  • bootstrap_seed (int) – Stability bootstrap seed.

Return type:

AnalysisResult

Returns:

AnalysisResult; analysis.is_calibrated reads through to this posterior’s flag.

Raises:

ValueError – When self.observed is None.

evaluate_against_truth(tree, truth)[source]

Sim-mode evaluation of tree’s policy against ground truth.

Parameters:
  • tree (PolicyTreeResult) – The fitted policy whose assignments are evaluated.

  • truth (CalibrationTruth | None) – Ground truth from the simulation path; None in real-data mode (raises — nothing to evaluate against).

Return type:

TruthComparison

Returns:

TruthComparison (cate_rmse, policy_accuracy, and the realized-RPV trio with the oracle gap).

Raises:
  • RuntimeError – When truth is None (real-data mode).

  • ValueError – When self.observed is None or the truth lacks the K-appropriate contrast / potential-outcome fields.

has_credible_segments(threshold=0.8)[source]

Whether some discovered segment clears threshold stability.

Runs fit_policy_tree at its defaults (deterministic given the default bootstrap_seed) and checks for a segment with stability_score >= threshold. The 0.80 default matches the default graduation rule’s SHIP-gate stability threshold.

Parameters:

threshold (float) – Minimum bootstrap-replicability stability score.

Return type:

bool

Returns:

True iff at least one discovered segment clears it.

has_decomposition()[source]

Whether this posterior carries the conversion/severity split.

Return type:

bool

Returns:

False — only hurdle posteriors carry the conversion/severity decomposition.

class pytyche.bcf.config.HurdleBCFResult(rpv_cate_samples, p0_mean, p1_mean, sev0_mean, sev1_mean, tau0_samples, tau_hat_quantiles, wall_clock_seconds, num_chains=1, num_gfr_sweeps=0, diagnostics=None, phase_timing=None, p0_samples=None, p1_samples=None, sev0_samples=None, sev1_samples=None, p_samples=None, sev_samples=None, topology_history=None, *, observed=None, is_calibrated=False, calibration=None, pooling)[source]

Bases: object

Result from joint shared-tree hurdle BCF.

Each tree simultaneously estimates conversion (probit) and severity (log-revenue) parameters via shared tree structure. This couples the two channels so splits are jointly informative.

RPV CATEs are composed on-GPU (float32) and transferred to CPU for policy tree fitting. Channel-level per-draw arrays (p0, p1, sev0, sev1) are retained by default (retain_channel_samples=True) — the conversion/severity decomposition is the headline output of the hurdle approach and needs the per-draw channel arrays for its credible intervals. Set retain_channel_samples=False to skip the GPU→CPU transfer when memory matters more than the decomposition (e.g. large-n sweep contexts that only consume the composed RPV contrasts).

When num_chains > 1, samples are concatenated across chains: S_total = (num_mcmc / thin_factor) * num_chains.

Arm-count dispatch (K = int(Z.max()) + 1). At K = 2 (binary arm) the legacy paired fields are populated — p0_mean/p1_mean/sev0_mean/ sev1_mean (n,) and, when retain_channel_samples=True, p0_samples/p1_samples/sev0_samples/sev1_samples (n, S_total)rpv_cate_samples is (n, S_total), and the per-arm fields p_samples/sev_samples are None. At K >= 3 (multi-arm) the per-arm fields are populated instead — p_samples/sev_samples (n, S_total, K) (when retained) and rpv_cate_samples (n, S_total, K - 1) (the jointly sampled contrast posterior) — and the legacy paired fields are None. The two field families are never populated together. tau0_samples (S_total,) and the sigma2_samples = 1 / tau0_samples property are scalar at every K (each visitor sees one outcome, so the severity residual is scalar per visitor — there is no per-arm severity precision).

The topology_history field is populated only when the producing fit set GPUBCFConfig.retain_topology_history=True. When the flag is off (default), the field is None and the fit’s wall-clock + PRNG state is bitwise-identical to HEAD pre-this-change.

rpv_cate_samples

(n, S_total) float32 — composed on GPU, transferred to CPU.

p0_mean

(n,) float32 — E[Φ(μ_b + b₀·τ_b)]; None at K>=3.

p1_mean

(n,) float32 — E[Φ(μ_b + b₁·τ_b)]; None at K>=3.

sev0_mean

(n,) float32 — E[exp(μ_c + b₀·τ_c + σ²/2)]; None at K>=3.

sev1_mean

(n,) float32 — E[exp(μ_c + b₁·τ_c + σ²/2)]; None at K>=3.

tau0_samples

(S_total,) float32 — global precision.

tau_hat_quantiles

(S_total, 5) [q05,q25,q50,q75,q95] or None.

wall_clock_seconds

Wall-clock time for the fit in seconds.

num_chains

Number of parallel MCMC chains used.

num_gfr_sweeps

Number of GFR warm-start sweeps performed.

diagnostics

Dict of diagnostic values (rhat_tau0, per_chain_ess, etc.), or None.

phase_timing

Dict of per-phase wall-clock breakdown, or None.

p0_samples

jax.Array (n, S_total) — P(convert|control) per draw; None if not retained.

p1_samples

jax.Array (n, S_total) — P(convert|treated) per draw; None if not retained.

sev0_samples

jax.Array (n, S_total) — E[sev|control,convert] per draw; None if not retained.

sev1_samples

jax.Array (n, S_total) — E[sev|treated,convert] per draw; None if not retained.

p_samples

jax.Array (n, S_total, K) — per-arm P(convert) per draw; None at K=2.

sev_samples

jax.Array (n, S_total, K) — per-arm E[sev|convert] per draw; None at K=2.

topology_history

Topology retention trace; populated only when the producing fit set GPUBCFConfig.retain_topology_history=True. None otherwise.

observed

The ObservedExperimentData the fit consumed, attached to the result so the analysis methods can reach the visitor rows and variant metadata. None when constructed by private raw-array helpers; populated by the public fit wrappers.

is_calibrated

True only after apply_calibration has been called on this result. Defaults to False.

calibration

The Calibration artifact attached by apply_calibration; None on fresh fits. The v0.2 artifact scope is interval corrections only — it is consumed where interval summaries are built, never to transform sample arrays.

pooling

Provenance of the fit: "joint" = shared-tree canonical fit; "independent" = two-stage baseline (binary + continuous fitted separately). Required — caller must always populate.

Parameters:
  • rpv_cate_samples (ndarray)

  • p0_mean (ndarray | None)

  • p1_mean (ndarray | None)

  • sev0_mean (ndarray | None)

  • sev1_mean (ndarray | None)

  • tau0_samples (ndarray)

  • tau_hat_quantiles (ndarray | None)

  • wall_clock_seconds (float)

  • num_chains (int)

  • num_gfr_sweeps (int)

  • diagnostics (dict | None)

  • phase_timing (dict | None)

  • p0_samples (Any | None)

  • p1_samples (Any | None)

  • sev0_samples (Any | None)

  • sev1_samples (Any | None)

  • p_samples (Any | None)

  • sev_samples (Any | None)

  • topology_history (TopologyHistory | None)

  • observed (ObservedExperimentData | None)

  • is_calibrated (bool)

  • calibration (Calibration | None)

  • pooling (Literal['joint', 'independent'])

property sigma2_samples: ndarray

Return 1 / tau0_samples as a sigma² view.

Backward-compat shim for downstream code that consumes the variance parameterisation rather than the precision one.

thompson_allocation(segments, epsilon=0.02)[source]

Per-segment traffic split: each arm’s weight is the posterior probability that it is the segment’s best arm.

Thompson sampling at segment granularity: per segment, each posterior draw votes for its best arm (the largest member-mean contrast, or control when none is positive); an arm’s weight is its win frequency over draws.

Parameters:
  • segments (Sequence[DiscoveredSegment]) – Segments to allocate over (only id and rule are consumed); membership is resolved against self.observed.

  • epsilon (float) – Safety-net exploration floor — arms below epsilon / K are raised to the floor and the rest rescaled, so no arm’s traffic is starved to zero; inert when every arm is already above it. NOT the dial for how much traffic stays on control — that is min_control_weight / min_explore_weight on pt.sequential_experiment; rarely worth overriding.

Return type:

dict[int, dict[str, float]]

Returns:

{segment.id: {variant_name: weight}} — inner dicts in variant order (control first), each summing to 1.

Raises:

ValueError – When self.observed is None.

fit_policy_tree(*, max_depth=3, min_segment_share=0.1, n_bootstrap=50, bootstrap_seed=0)[source]

Discover interpretable segments from the posterior’s per-visitor treatment effects, by fitting a shallow decision tree.

Each visitor is labeled with the arm the posterior expects to be best for them (largest posterior-mean lift, or control when no lift is positive); a multiclass decision tree is fit on the visitors’ features, and each leaf becomes a DiscoveredSegment carrying an exact membership rule, gate estimate/CI, per-arm best probabilities, Thompson allocation, and bootstrap-replicability stability.

Parameters:
  • max_depth (int) – Maximum tree depth.

  • min_segment_share (float) – Minimum fraction of visitors per leaf (sklearn min_weight_fraction_leaf).

  • n_bootstrap (int) – Bootstrap tree refits behind stability_score; 0 skips stability (NaN sentinel plus UserWarning).

  • bootstrap_seed (int) – Seed for the bootstrap resampling RNG.

Return type:

PolicyTreeResult

Returns:

PolicyTreeResult with one segment per leaf, ordered by sklearn leaf id; result.observed is self.observed by identity.

Raises:

ValueError – When self.observed is None.

apply_calibration(calibration)[source]

Return a new posterior with calibration attached.

Attach, don’t transform: the artifact is stashed on the returned copy (is_calibrated=True); every sample array is shared with this posterior by identity. The correction currently applies to intervals only — probabilities and expected losses stay raw; corrected CIs appear where interval summaries are built. K = 2 experiments only (per-contrast recalibration for K >= 3 is not yet implemented).

Parameters:

calibration (Calibration) – SBC-fitted Calibration whose regime (metric, n_treatments) must match self.observed.

Return type:

HurdleBCFResult

Returns:

New HurdleBCFResult carrying the artifact; the original is untouched.

Raises:
  • ValueError – When self.observed is None, or on a regime mismatch (message names the mismatched dimensions).

  • NotImplementedError – At K >= 3.

recommendation_summary(treatment, segment=None, *, thresholds=None, min_practical_effect=0.02)[source]

Act-now SHIP / CONTINUE / STOP recommendation for one treatment.

The treatment’s metric-native contrast draws are scoped (segment=None is the global all-visitors snapshot; a segment restricts to its rule’s members), reduced to per-draw mean lift, and summarized under the legacy compare.variants decision rule. v0.2 raw scope: probabilities and expected losses come from the raw draws even on a calibrated posterior — interval corrections land where intervals are built.

Parameters:
  • treatment (str) – Treatment variant name (vs control).

  • segment (DiscoveredSegment | None) – None for the global snapshot; a DiscoveredSegment restricts the computation to its members.

  • thresholds (DecisionThresholds | None) – Decision thresholds; DecisionThresholds() defaults when None.

  • min_practical_effect (float) – Minimum meaningful lift for probability_better / probability_harmful.

Return type:

RecommendationSummary

Returns:

RecommendationSummary with the decision, its evidence, and expected_value_of_one_more_round always populated (closed-form preposterior EVSI; formula in docs/concepts/decision-theoretic-inputs.md).

Raises:

ValueError – When self.observed is None, when treatment is not a treatment name, or when the segment’s rule matches zero visitors.

analyze(*, max_depth=3, min_segment_share=0.1, n_bootstrap=50, bootstrap_seed=0)[source]

The canonical one-call analysis summary for this posterior.

Composes per-treatment Comparison summaries, the embedded policy-tree segmentation (keyword arguments forward to it), the global RecommendationSummary for the best challenger, and the posterior-mean per-visitor CATEs. Anything needing posterior samples goes through analysis.posterior.

Parameters:
  • max_depth (int) – Embedded policy tree depth.

  • min_segment_share (float) – Minimum per-leaf population share.

  • n_bootstrap (int) – Stability bootstrap count (0 skips stability with a UserWarning).

  • bootstrap_seed (int) – Stability bootstrap seed.

Return type:

AnalysisResult

Returns:

AnalysisResult; analysis.is_calibrated reads through to this posterior’s flag.

Raises:

ValueError – When self.observed is None.

evaluate_against_truth(tree, truth)[source]

Sim-mode evaluation of tree’s policy against ground truth.

Parameters:
  • tree (PolicyTreeResult) – The fitted policy whose assignments are evaluated.

  • truth (CalibrationTruth | None) – Ground truth from the simulation path; None in real-data mode (raises — nothing to evaluate against).

Return type:

TruthComparison

Returns:

TruthComparison (cate_rmse, policy_accuracy, and the realized-RPV trio with the oracle gap).

Raises:
  • RuntimeError – When truth is None (real-data mode).

  • ValueError – When self.observed is None or the truth lacks the K-appropriate contrast / potential-outcome fields.

has_credible_segments(threshold=0.8)[source]

Whether some discovered segment clears threshold stability.

Runs fit_policy_tree at its defaults (deterministic given the default bootstrap_seed) and checks for a segment with stability_score >= threshold. The 0.80 default matches the default graduation rule’s SHIP-gate stability threshold.

Parameters:

threshold (float) – Minimum bootstrap-replicability stability score.

Return type:

bool

Returns:

True iff at least one discovered segment clears it.

has_decomposition()[source]

Whether this posterior carries the conversion/severity split.

Return type:

bool

Returns:

True — the hurdle posterior decomposes into the conversion and severity channels.