pytyche.compare.variants

v2 comparison and recommendation — pure numpy, no v1 dependency.

Provides compare_variants() and recommendation_summary() for 2-arm compare-to-control experiments. All functions are pure (no I/O, no PyMC imports).

Functions

compare_variants(samples_baseline, ...[, ...])

Compare two variants from posterior samples.

recommendation_summary(comparison[, thresholds])

Produce a recommendation summary from a comparison result.

pytyche.compare.variants.compare_variants(samples_baseline, samples_comparison, baseline_name, comparison_name, lift_unit, ci_level=0.8, min_practical_effect=0.02, decomposition=None)[source]

Compare two variants from posterior samples.

Parameters:
  • samples_baseline (ndarray) – 1-D posterior samples for the baseline variant.

  • samples_comparison (ndarray) – 1-D posterior samples for the comparison variant.

  • baseline_name (str) – Variant name serving as baseline.

  • comparison_name (str) – Variant name being compared.

  • lift_unit (str) – "pct" for relative lift, "dollar" for absolute.

  • ci_level (float) – Credible interval level (default 0.80).

  • min_practical_effect (float) – Minimum meaningful effect size for probability_better / probability_harmful (default 0.02).

  • decomposition (DecompositionSamples | None) – Optional frequency/severity decomposition.

Return type:

ComparisonResult

Returns:

Frozen ComparisonResult with all comparison metrics.

Raises:

ValueError – If inputs are invalid.

pytyche.compare.variants.recommendation_summary(comparison, thresholds=None)[source]

Produce a recommendation summary from a comparison result.

Decision logic (priority order): 1. SHIP: loss_comparison < tolerance AND p_positive > threshold AND p_better > threshold 2. STOP (harm): p_harmful > harm_threshold 3. STOP (futility): p_better < futility_threshold 4. CONTINUE: default

Parameters:
Return type:

RecommendationSummary

Returns:

RecommendationSummary with decision and supporting evidence.