pytyche.bcf.diagnostics

BCF posterior calibration diagnostics.

Structured diagnostic workflow for BCF model development, following Betancourt’s principled Bayesian workflow adapted for BART/BCF:

  1. Computational faithfulness — σ² convergence, ESS, stability.

  2. Retrodictive checks — posterior predictive vs observed.

  3. Calibration — coverage, P(τ>0) calibration (requires ground truth).

  4. Model critique — channel attribution, quintile calibration.

Public API

  • BCFDiagnosticData — structured container for posterior samples and metadata.

  • extract_diagnostics_joint(model, ...) — populate from HurdleBCFModel.

  • extract_diagnostics_proto(result, ...) — populate from HurdleProtoResult.

  • extract_diagnostics_gpu(result, ...) — populate from HurdleBCFResult.

  • compute_*(...) — pure diagnostic computation functions.

  • render_diagnostic_report(diag, run_dir) — terminal report to disk.

Functions

compute_calibration_curve(p_positive, ...[, ...])

Binned P(τ>0) vs actual fraction positive.

compute_chain_diagnostics(diag)

Per-chain convergence diagnostics for multi-chain runs.

compute_channel_attribution(diag)

Which channel drives RPV CATE error?

compute_channel_calibration(diag)

Separate coverage for binary, continuous, and composed channels.

compute_coverage(samples, truth[, levels, ...])

Actual vs nominal coverage at given credible interval levels.

compute_decile_calibration(diag)

Finer-grained decile version of quintile calibration.

compute_miscalibration_area(samples, truth)

Integrated |actual - nominal| coverage across levels.

compute_ppc_binary(diag[, n_bins])

Posterior predicted vs observed conversion by decile of predicted prob.

compute_ppc_continuous(diag)

Posterior predicted vs observed log-revenue for converters.

compute_quintile_calibration(diag[, ...])

Per-quintile calibration: posterior mean vs truth, sign accuracy, coverage.

compute_scorecard(diag, *[, coverage, ...])

Compute the summary scorecard with traffic-light grading.

compute_segment_diagnostics(...[, depth, ...])

Segment-level coverage and P(τ>0) calibration.

compute_selection_bias(diag)

Does treatment change converter composition?

compute_sigma2_trace(diag)

Convergence summary for the global variance trace.

compute_slim_diagnostics(diag)

Fast diagnostic summary: pre-sort once, compute key metrics.

extract_diagnostics_gpu(result, Z, Y_obs, *)

Extract BCFDiagnosticData from a HurdleBCFResult.

extract_diagnostics_joint(model, Z, Y_obs, ...)

Extract BCFDiagnosticData from a fitted HurdleBCFModel.

extract_diagnostics_proto(result, X, Z, ...)

Extract BCFDiagnosticData from a HurdleProtoResult.

presort_samples(samples)

Sort posterior samples along axis=1 for O(1) quantile lookup.

quantiles_from_sorted(sorted_samples, alphas)

Compute all quantiles in one fancy-index op from pre-sorted (n, S).

Classes

BCFDiagnosticData(tau_binary_samples, ...[, ...])

Structured container for all posterior samples from a BCF fit.

pytyche.bcf.diagnostics.presort_samples(samples)[source]

Sort posterior samples along axis=1 for O(1) quantile lookup.

Pre-sorting once allows all subsequent quantile operations to use simple index lookup instead of re-sorting the full (n, S) array.

Parameters:

samples (ndarray)

Return type:

ndarray

pytyche.bcf.diagnostics.quantiles_from_sorted(sorted_samples, alphas)[source]

Compute all quantiles in one fancy-index op from pre-sorted (n, S).

Uses floor indexing (equivalent to numpy method=’lower’). For boolean coverage checks the interpolation method is immaterial.

Parameters:
  • sorted_samples (ndarray)

  • alphas (Sequence[float])

Return type:

dict[float, ndarray]

class pytyche.bcf.diagnostics.BCFDiagnosticData(tau_binary_samples, tau_cont_samples, rpv_cate_samples, mu_binary_samples, mu_cont_samples, p0_samples, p1_samples, sigma2_samples, Z, Y_obs, converter_mask, n_burn, estimator, num_chains=1, sev0_samples=None, sev1_samples=None, propensity=None, p0_mean=None, p1_mean=None, sev0_mean=None, sev1_mean=None, gpu_diagnostics=None, true_rpv_cate=None, true_p0=None, true_p1=None, true_m0=None, true_m1=None, cluster_id=None, topology_history=None)[source]

Bases: object

Structured container for all posterior samples from a BCF fit.

Normalizes both estimator APIs (joint, proto, gpu) into a common format for downstream diagnostic functions.

Per-visitor posterior samples have shape (n, S) where n is the number of visitors and S is the number of retained MCMC samples.

Channel convention:
  • Binary channel: probit scale (mu_b, tau_b → Phi(mu_b + b*tau_b))

  • Continuous channel: log-revenue scale (mu_c, tau_c)

  • Composed: RPV CATE in $/visitor

Parameters:
  • tau_binary_samples (ndarray)

  • tau_cont_samples (ndarray)

  • rpv_cate_samples (ndarray)

  • mu_binary_samples (ndarray)

  • mu_cont_samples (ndarray)

  • p0_samples (ndarray | None)

  • p1_samples (ndarray | None)

  • sigma2_samples (ndarray)

  • Z (ndarray)

  • Y_obs (ndarray)

  • converter_mask (ndarray)

  • n_burn (int)

  • estimator (str)

  • num_chains (int)

  • sev0_samples (ndarray | None)

  • sev1_samples (ndarray | None)

  • propensity (ndarray | None)

  • p0_mean (ndarray | None)

  • p1_mean (ndarray | None)

  • sev0_mean (ndarray | None)

  • sev1_mean (ndarray | None)

  • gpu_diagnostics (dict | None)

  • true_rpv_cate (ndarray | None)

  • true_p0 (ndarray | None)

  • true_p1 (ndarray | None)

  • true_m0 (ndarray | None)

  • true_m1 (ndarray | None)

  • cluster_id (ndarray | None)

  • topology_history (TopologyHistory | None)

pytyche.bcf.diagnostics.extract_diagnostics_joint(model, Z, Y_obs, num_gfr, num_burnin, *, true_rpv_cate=None, true_p0=None, true_p1=None, true_m0=None, true_m1=None, cluster_id=None)[source]

Extract BCFDiagnosticData from a fitted HurdleBCFModel.

Parameters:
  • model (Any) – Fitted joint hurdle model with posterior arrays.

  • Z (ndarray) – Treatment assignment.

  • Y_obs (ndarray) – Observed revenue.

  • num_gfr (int) – Number of GFR warmstart iterations to discard.

  • num_burnin (int) – Number of additional burn-in iterations to discard.

  • true_rpv_cate (ndarray | None)

  • true_p0 (ndarray | None)

  • true_p1 (ndarray | None)

  • true_m0 (ndarray | None)

  • true_m1 (ndarray | None)

  • cluster_id (ndarray | None)

Return type:

BCFDiagnosticData

pytyche.bcf.diagnostics.extract_diagnostics_proto(result, X, Z, Y_obs, config, *, true_rpv_cate=None, true_p0=None, true_p1=None, true_m0=None, true_m1=None, cluster_id=None)[source]

Extract BCFDiagnosticData from a HurdleProtoResult.

The proto stores f_conv_samples (composite probit prediction) and f_sev_samples (composite log-severity prediction). We decompose these back into mu/tau components using the stored tau_alpha/tau_beta arrays (which are stored on the result but need to be accessed from the rpv_cate_samples computation path).

Since the proto result doesn’t expose tau_alpha_mcmc / tau_beta_mcmc directly, we derive mu/tau from the composite predictions (f_conv = mu_alpha + tau_basis * tau_alpha); we need to run the extraction inline, or accept the composite.

For the proto, we work with the composite predictions and derive channel-level quantities via finite-difference (tau_binary is approximated from f_conv evaluated at Z=1 minus f_conv evaluated at Z=0).

Parameters:
  • result (Any)

  • X (ndarray)

  • Z (ndarray)

  • Y_obs (ndarray)

  • config (Any)

  • true_rpv_cate (ndarray | None)

  • true_p0 (ndarray | None)

  • true_p1 (ndarray | None)

  • true_m0 (ndarray | None)

  • true_m1 (ndarray | None)

  • cluster_id (ndarray | None)

Return type:

BCFDiagnosticData

pytyche.bcf.diagnostics.extract_diagnostics_gpu(result, Z, Y_obs, *, true_rpv_cate=None, true_p0=None, true_p1=None, true_m0=None, true_m1=None, cluster_id=None)[source]

Extract BCFDiagnosticData from a HurdleBCFResult.

The result stores pre-composed RPV CATE samples and channel-level posterior means (composed on GPU, float32). No CPU-side recomposition of full (n, S) channel matrices is needed.

Parameters:
  • result (Any) – Fitted GPU joint hurdle model.

  • Z (ndarray) – Treatment assignment.

  • Y_obs (ndarray) – Observed revenue.

  • true_rpv_cate (ndarray | None)

  • true_p0 (ndarray | None)

  • true_p1 (ndarray | None)

  • true_m0 (ndarray | None)

  • true_m1 (ndarray | None)

  • cluster_id (ndarray | None)

Return type:

BCFDiagnosticData

pytyche.bcf.diagnostics.compute_sigma2_trace(diag)[source]

Convergence summary for the global variance trace.

Returns mean, final value, split-half ratio (first-half mean / second-half mean), and trend (linear slope normalized by mean).

Parameters:

diag (BCFDiagnosticData)

Return type:

dict[str, float]

pytyche.bcf.diagnostics.compute_ppc_binary(diag, n_bins=10)[source]

Posterior predicted vs observed conversion by decile of predicted prob.

Bins visitors by posterior mean P(convert), computes observed conversion rate in each bin, and compares to the predicted rate.

Returns None when sample-level p0/p1 are unavailable (GPU summary mode).

Parameters:
Return type:

dict[str, Any] | None

pytyche.bcf.diagnostics.compute_ppc_continuous(diag)[source]

Posterior predicted vs observed log-revenue for converters.

Returns None when channel-level samples are unavailable (GPU summary mode).

Parameters:

diag (BCFDiagnosticData)

Return type:

dict[str, Any] | None

pytyche.bcf.diagnostics.compute_coverage(samples, truth, levels=(0.5, 0.8, 0.9, 0.95), *, sorted_samples=None)[source]

Actual vs nominal coverage at given credible interval levels.

Parameters:
  • samples (ndarray) – Posterior samples per visitor.

  • truth (ndarray) – True values.

  • levels (tuple[float, ...]) – Nominal coverage levels.

  • sorted_samples (ndarray | None) – Pre-sorted samples for fast quantile lookup. When provided, samples is ignored for quantile computation.

Return type:

dict[str, float]

pytyche.bcf.diagnostics.compute_calibration_curve(p_positive, truly_positive, n_bins=10)[source]

Binned P(τ>0) vs actual fraction positive.

Parameters:
  • p_positive (ndarray) – Posterior P(τ>0) per visitor.

  • truly_positive (ndarray) – Boolean: is true τ > 0?

  • n_bins (int) – Number of bins.

Return type:

dict[str, list[float]]

pytyche.bcf.diagnostics.compute_segment_diagnostics(rpv_cate_samples, true_rpv_cate, X, feature_names, *, depth=3, levels=(0.05, 0.1, 0.2, 0.5, 0.8, 0.9, 0.95, 0.975, 0.99, 0.995, 0.999, 0.9995), n_ptau_bins=10)[source]

Segment-level coverage and P(τ>0) calibration.

Fits a policy tree on the BCF posterior means, then for each leaf segment computes coverage and P(τ>0) accuracy at the segment-mean level. This produces the calibration data needed for segment-mean SBC corrections.

Parameters:
  • rpv_cate_samples (ndarray) – Per-visitor posterior CATE samples (S = MCMC draws).

  • true_rpv_cate (ndarray) – True CATE per visitor.

  • X (ndarray) – Feature matrix (same as used for tree fitting).

  • feature_names (list[str]) – Column names for X.

  • depth (int) – Policy tree max depth.

  • levels (tuple[float, ...]) – Coverage levels to evaluate.

  • n_ptau_bins (int) – Number of bins for P(τ>0) calibration curve.

Returns:

Keys include:

  • seg_coverage_{level} — size-weighted segment-mean coverage at each level.

  • seg_n_segments — number of segments (leaves).

  • seg_ptau_bin{k}_predicted, seg_ptau_bin{k}_actual, seg_ptau_bin{k}_n — per-bin segment-mean P(τ>0) calibration data.

Return type:

dict[str, float]

pytyche.bcf.diagnostics.compute_miscalibration_area(samples, truth, n_levels=20, *, sorted_samples=None)[source]

Integrated |actual - nominal| coverage across levels.

Returns a scalar in [0, 1]: 0 = perfectly calibrated, 1 = maximally off.

Parameters:
  • samples (ndarray)

  • truth (ndarray)

  • n_levels (int)

  • sorted_samples (ndarray | None)

Return type:

float

pytyche.bcf.diagnostics.compute_channel_calibration(diag)[source]

Separate coverage for binary, continuous, and composed channels.

Only available when full sample-level p0/p1 are present (joint estimator or GPU with trace mode). Returns None in GPU summary mode.

Parameters:

diag (BCFDiagnosticData)

Return type:

dict[str, dict[str, float]] | None

pytyche.bcf.diagnostics.compute_channel_attribution(diag)[source]

Which channel drives RPV CATE error?

Decomposes posterior mean RPV CATE error into:
  • conversion channel: (p1-p0)*m0 contribution

  • AOV channel: p1*(m1-m0) contribution

Only meaningful with ground truth and joint/gpu estimator. Uses pre-computed means when available (GPU summary mode), otherwise computes from full posterior samples.

Parameters:

diag (BCFDiagnosticData)

Return type:

dict[str, float] | None

pytyche.bcf.diagnostics.compute_quintile_calibration(diag, n_quantiles=5, *, sorted_samples=None, est_mean=None)[source]

Per-quintile calibration: posterior mean vs truth, sign accuracy, coverage.

This is the primary diagnostic for our success criterion: accurate population-level quintile rank ordering and sign correctness.

Parameters:
  • sorted_samples (ndarray | None) – Pre-sorted RPV CATE samples. Row-subsetting preserves sort order so no re-sort is needed per quintile.

  • est_mean (ndarray | None) – Pre-computed posterior mean to avoid recomputing O(nS).

  • diag (BCFDiagnosticData)

  • n_quantiles (int)

Return type:

list[dict[str, Any]] | None

pytyche.bcf.diagnostics.compute_decile_calibration(diag)[source]

Finer-grained decile version of quintile calibration.

Parameters:

diag (BCFDiagnosticData)

Return type:

list[dict[str, Any]] | None

pytyche.bcf.diagnostics.compute_selection_bias(diag)[source]

Does treatment change converter composition?

Compares E[log(Y)|convert, Z=1] vs E[log(Y)|convert, Z=0].

Parameters:

diag (BCFDiagnosticData)

Return type:

dict[str, float] | None

pytyche.bcf.diagnostics.compute_chain_diagnostics(diag)[source]

Per-chain convergence diagnostics for multi-chain runs.

Returns per-chain ESS, R-hat, autocorrelation at lag-1, and per-chain σ² means for visual comparison.

Parameters:

diag (BCFDiagnosticData)

Return type:

dict[str, Any]

pytyche.bcf.diagnostics.compute_scorecard(diag, *, coverage=None, miscalibration_area=None, chain_diagnostics=None, est_mean=None, sorted_rpv=None)[source]

Compute the summary scorecard with traffic-light grading.

Returns a dict of metric_name → {value, grade, threshold_info}. Grade is “green”, “yellow”, or “red”.

Accepts pre-computed values to avoid redundant computation when called from render_report().

Parameters:
  • diag (BCFDiagnosticData)

  • coverage (dict[str, float] | None)

  • miscalibration_area (float | None)

  • chain_diagnostics (dict[str, Any] | None)

  • est_mean (ndarray | None)

  • sorted_rpv (ndarray | None)

Return type:

dict[str, dict[str, Any]]

pytyche.bcf.diagnostics.compute_slim_diagnostics(diag)[source]

Fast diagnostic summary: pre-sort once, compute key metrics.

Returns a flat dict suitable for writing to summary.json in sweep mode. Includes coverage at 7 levels, quintile breakdown, PPC scalars, and channel calibration when available.

Parameters:

diag (BCFDiagnosticData)

Return type:

dict[str, Any]

Modules

topology

Per-tree topology hashing and aggregate mobility metrics for hurdle BCF.