pytyche.generators.core¶
V2 generator core — single potential-outcomes pipeline.
Public API¶
generate_v2_core(config: V2GeneratorConfig) -> CalibrationBundle
- Pipeline stages (filled in by subsequent tasks):
sample_features(config, rng) -> X: pd.DataFrame
compute_potential_outcomes(X, params) -> truth arrays
assign_and_sample(X, truth, assignment, rng) -> observed outcomes
build_bundle(observed, truth) -> CalibrationBundle
Config design¶
Typed frozen dataclasses with __post_init__ fail-closed validation. Feature sampler is a discriminated union: CopulaConfig | MixtureConfig. Both strategies produce the same output shape (X: pd.DataFrame). Surfaces are Callable[[pd.DataFrame], np.ndarray].
Module Attributes
Discriminated union of feature sampler strategy configs. |
Functions
|
Return a surface whose output is the element-wise sum of two surfaces. |
|
Assign visitors to treatment levels and sample observed outcomes. |
|
Assemble a CalibrationBundle from variant list and truth. |
|
Construct a CalibrationTruth from computed potential outcomes. |
|
Compute per-visitor potential outcomes and derive truth arrays. |
|
Generate a CalibrationBundle via the v2 potential-outcomes pipeline. |
|
Return a surface whose output is the element-wise product of two surfaces. |
|
Sample a feature matrix from the given sampler config. |
Return a surface applying the sigmoid transform to an inner surface. |
|
|
Return a step-function surface based on a threshold applied to an inner surface. |
Classes
|
Fixed (forced) treatment-assignment policy. |
|
Gaussian copula feature sampler configuration. |
|
Specification for a single feature column. |
|
Metric-specific potential outcome surfaces. |
|
Mixture-of-populations feature sampler configuration. |
|
HTE surface definition as a callable. |
|
Internal intermediate result from compute_potential_outcomes. |
|
Top-level configuration for the v2 core generator pipeline. |
- class pytyche.generators.core.FeatureSpec(name, kind)[source]¶
Bases:
objectSpecification for a single feature column.
- name¶
Column name. Must be non-empty.
- kind¶
Feature type —
"continuous","categorical", or"binary".
- Parameters:
name (
str)kind (
Literal['continuous','categorical','binary'])
- class pytyche.generators.core.CopulaConfig(features, correlation)[source]¶
Bases:
objectGaussian copula feature sampler configuration.
Generates correlated mixed covariates via multivariate normal with configured correlation matrix and marginal transforms.
- features¶
Ordered feature specifications. Length determines the dimension of the copula.
- correlation¶
Square correlation matrix of shape
(len(features), len(features)). Must be positive semi-definite.
- Parameters:
features (
tuple[FeatureSpec,...])correlation (
ndarray)
- class pytyche.generators.core.MixtureConfig(features, weights, cluster_params)[source]¶
Bases:
objectMixture-of-populations feature sampler configuration.
Generates covariates by sampling latent cluster membership and then drawing from cluster-conditional feature distributions.
- features¶
Feature specifications shared across all clusters.
- weights¶
Per-cluster mixture weights. All must be strictly positive. Length must equal
len(cluster_params).
- cluster_params¶
Per-cluster parameter dicts keyed by feature name (or by arbitrary convention per stage implementation).
- Parameters:
features (
tuple[FeatureSpec,...])weights (
tuple[float,...])cluster_params (
tuple[dict[str,Any],...])
- pytyche.generators.core.FeatureSamplerConfig = pytyche.generators.core.CopulaConfig | pytyche.generators.core.MixtureConfig¶
Discriminated union of feature sampler strategy configs.
- class pytyche.generators.core.SurfaceConfig(fn)[source]¶
Bases:
objectHTE surface definition as a callable.
The callable receives the feature DataFrame
X(one row per visitor) and returns a 1-D numpy array of per-visitor values (e.g. conversion probabilities, severity means).- fn¶
(X: pd.DataFrame) -> np.ndarray— must return a 1-D array of lengthlen(X).
- Parameters:
fn (
Callable[[DataFrame],ndarray])
- class pytyche.generators.core.AssignmentConfig(treatment_allocation=0.5, *, treatment_probabilities=None)[source]¶
Bases:
objectFixed (forced) treatment-assignment policy.
Configures the FIXED/forced assignment used for simple randomized designs and SBC — every visitor’s treatment is drawn from a feature-independent distribution (binary 50/50 or a uniform/fixed per-treatment vector). Policy-routed assignment — where a sequential experiment’s cell routes each visitor to a treatment based on their features — is NOT encoded here; it is supplied externally at observation time. This config is just one assignment policy (the forced/randomized one).
Binary (paired) form: set
treatment_allocation(fraction in (0, 1)); leavetreatment_probabilitiesasNone.Multi-treatment form: set
treatment_probabilitiesto a length-K tuple of positive fractions summing to 1.0 (within 1e-9) — the fixed per-treatment assignment probabilities (uniform = balanced forced exploration).treatment_allocationis unused in this form but must still be in (0, 1).- treatment_allocation¶
Fraction of visitors assigned to treatment in the binary form. Must be in the open interval (0, 1). Default 0.5.
- treatment_probabilities¶
Fixed per-treatment assignment probabilities for K≥2 (the forced/randomized policy).
Noneselects the binary paired form. Length must be ≥ 2, every entry strictly in (0, 1), and entries must sum to 1.0 (±1e-9).
- Parameters:
treatment_allocation (
float)treatment_probabilities (
tuple[float,...] |None)
- class pytyche.generators.core.MetricMode(metric_id, p0_surface=None, p1_surface=None, m0_surface=None, m1_surface=None, sigma0_surface=None, sigma1_surface=None, *, p_surfaces=None, m_surfaces=None, sigma_surfaces=None)[source]¶
Bases:
objectMetric-specific potential outcome surfaces.
K=2 (paired) form: populate the six paired surface fields (
p0_surface,p1_surface, and for hurdle metricsm0/m1/sigma0/sigma1); leavep_surfaces,m_surfaces, andsigma_surfacesasNone.Multi-treatment form: leave all six paired surface fields as
Noneand populatep_surfaces(length K) and, for hurdle metrics,m_surfacesandsigma_surfaces(also length K). K must be ≥ 3 — the K = 2 case uses the paired form (the truth-computation dispatch routes list-form surfaces through the multi-treatment path only at K ≥ 3).Every surface is a function of the individual user’s attributes
X. ASurfaceConfigcarries the potential-outcome surface for ONE treatment level:p_surfaces[k](X)is the conversion potential outcome a user with featuresXWOULD show under treatment levelk(level 0 = control, the prognostic baseline; levels 1…K−1 = treated). The surfaces are not properties of an assignment cohort — assignment (randomization, and in a sequential experiment a cell’s policy) merely selects which of a user’s K potential outcomes is observed.For binary metrics (
"conversion_rate"): only conversion probability surfaces are needed; severity/sigma surfaces must beNone.For hurdle metrics (
"revenue_per_visitor"): conversion, severity, and sigma surfaces are all required.- metric_id¶
Canonical metric name. Must be a known metric.
- p0_surface¶
Control (level-0) conversion potential-outcome surface (paired form).
- p1_surface¶
Treated (level-1) conversion potential-outcome surface (paired form).
- m0_surface¶
Control severity (AOV) surface. Hurdle paired form only.
- m1_surface¶
Treated severity (AOV) surface. Hurdle paired form only.
- sigma0_surface¶
Control LogNormal dispersion surface. Hurdle paired only.
- sigma1_surface¶
Treated LogNormal dispersion surface. Hurdle paired only.
- p_surfaces¶
Conversion potential-outcome surfaces, one per treatment level (length K, index 0 = control). Each is a function of user attributes
X.
- m_surfaces¶
Severity potential-outcome surfaces, one per treatment level (length K). Hurdle multi-treatment form.
- sigma_surfaces¶
LogNormal dispersion surfaces, one per treatment level (length K). Hurdle multi-treatment form.
- Parameters:
metric_id (
str)p0_surface (
SurfaceConfig|None)p1_surface (
SurfaceConfig|None)m0_surface (
SurfaceConfig|None)m1_surface (
SurfaceConfig|None)sigma0_surface (
SurfaceConfig|None)sigma1_surface (
SurfaceConfig|None)p_surfaces (
list[SurfaceConfig] |None)m_surfaces (
list[SurfaceConfig] |None)sigma_surfaces (
list[SurfaceConfig] |None)
- class pytyche.generators.core.V2GeneratorConfig(n_visitors, feature_sampler, metric_mode, assignment, seed, experiment_id='sim-v2', revenue_model=None)[source]¶
Bases:
objectTop-level configuration for the v2 core generator pipeline.
- n_visitors¶
Total visitor count across all arms. Must be positive.
- feature_sampler¶
Feature sampling strategy config (
CopulaConfigorMixtureConfig).
- metric_mode¶
Metric and potential-outcome surface definitions.
- assignment¶
Treatment assignment parameters.
- seed¶
Random seed for full pipeline reproducibility.
- experiment_id¶
Identifier for the generated experiment. Default
"sim-v2".
- revenue_model¶
Revenue sampling strategy for hurdle metrics.
None(default) uses LogNormal sampling (existing behavior). ACartRevenueConfigactivates cart-based revenue sampling — per-category Bernoulli draws whose prices sum to converter revenue. Ignored for binary (conversion_rate) metrics.
- Parameters:
n_visitors (
int)feature_sampler (
CopulaConfig|MixtureConfig)metric_mode (
MetricMode)assignment (
AssignmentConfig)seed (
int)experiment_id (
str)revenue_model (
CartRevenueConfig|None)
- pytyche.generators.core.sample_features(config, rng, n)[source]¶
Sample a feature matrix from the given sampler config.
- Parameters:
config (
CopulaConfig|MixtureConfig) – Either aCopulaConfig(Gaussian copula strategy) or aMixtureConfig(mixture-of-populations strategy).rng (
Generator) – Explicit RNG instance — caller controls seeding for reproducibility.n (
int) – Number of rows to sample.
- Returns:
Shape
(n, len(config.features)). Column names and dtypes follow the feature specs inconfig.features. ForMixtureConfigan additionalcluster_idcolumn is included (used internally for per-cluster surface dispatch; stripped before output bygenerate_v2_core).- Return type:
DataFrame- Raises:
TypeError – Raised for unsupported config types.
ValueError – Raised for unknown feature kinds within a config.
- class pytyche.generators.core.TruthResult(cate_per_visitor, effect, effect_components, conv_per_visitor=None, aov_per_visitor=None, p0_values=None, p1_values=None, m0_values=None, m1_values=None, m0_effective=None, m1_effective=None, sigma0_values=None, sigma1_values=None, p_values=None, m_values=None, sigma_values=None)[source]¶
Bases:
objectInternal intermediate result from compute_potential_outcomes.
Holds per-visitor CATE arrays and population summary statistics before assembly into a CalibrationBundle. This is not part of the public contract — build_bundle (task 7.2) constructs CalibrationTruth from this.
K-dispatch:
K=2 (paired form): populates the legacy paired
*_valuesfields (p0_values,p1_values, and for hurdlem0/m1/sigma0/sigma1_values) pluscate_per_visitor; leavesp_values,m_values,sigma_valuesasNone.K≥3 (multi-treatment form): populates
p_values(length K), and for hurdlem_values/sigma_values(also length K); setscate_per_visitor=Noneand leaves all six legacy paired*_valuesfields asNone.- cate_per_visitor¶
Per-visitor CATE, aligned 1:1 with input rows. Populated at K=2;
Noneat K≥3.
- effect¶
Population mean CATE (mean of cate_per_visitor.values at K=2; mean of the best-treatment contrast at K≥3 — provisional scalar).
- effect_components¶
Named decomposition of the population effect. Binary K=2:
{"conv_effect": float}Hurdle K=2:{"conv_effect": float, "aov_effect": float}K≥3:{"treatment_1_effect": float, ..., "treatment_{K-1}_effect": float}
- conv_per_visitor¶
Per-visitor conversion component. Only populated for hurdle K=2 metrics;
Nonefor binary or K≥3.
- aov_per_visitor¶
Per-visitor AOV component. Only populated for hurdle K=2 metrics;
Nonefor binary or K≥3.
- p0_values¶
Per-visitor control conversion probabilities. Populated by both binary and hurdle K=2 computations for use in outcome sampling.
- p1_values¶
Per-visitor treatment conversion probabilities. K=2 only.
- m0_values¶
Per-visitor control severity means (raw surface output). Hurdle K=2 only;
Nonefor binary or K≥3.
- m1_values¶
Per-visitor treatment severity means (raw surface output). Hurdle K=2 only;
Nonefor binary or K≥3.
- m0_effective¶
Per-visitor control expected revenue conditional on conversion. For lognormal, equals
m0_values. For cart, this is the analytical expected cart revenue.Nonefor binary or K≥3.
- m1_effective¶
Per-visitor treatment expected revenue conditional on conversion. For lognormal, equals
m1_values. For cart, this is the analytical expected cart revenue.Nonefor binary or K≥3.
- sigma0_values¶
Per-visitor control LogNormal dispersion. Hurdle K=2 only;
Nonefor binary or K≥3.
- sigma1_values¶
Per-visitor treatment LogNormal dispersion. Hurdle K=2 only;
Nonefor binary or K≥3.
- p_values¶
Per-visitor conversion potential outcomes, one array per treatment level (length K, index 0 = control). K≥3 only;
Noneat K=2.
- m_values¶
Per-visitor severity potential outcomes (length K). K≥3 hurdle only;
Nonefor K=2 or conversion_rate.
- sigma_values¶
Per-visitor LogNormal dispersion (length K). K≥3 hurdle only;
Nonefor K=2 or conversion_rate.
- Parameters:
cate_per_visitor (
AlignedVisitorArray|None)effect (
float)effect_components (
dict[str,float])conv_per_visitor (
AlignedVisitorArray|None)aov_per_visitor (
AlignedVisitorArray|None)p0_values (
ndarray|None)p1_values (
ndarray|None)m0_values (
ndarray|None)m1_values (
ndarray|None)m0_effective (
ndarray|None)m1_effective (
ndarray|None)sigma0_values (
ndarray|None)sigma1_values (
ndarray|None)p_values (
list[ndarray] |None)m_values (
list[ndarray] |None)sigma_values (
list[ndarray] |None)
- pytyche.generators.core.compute_potential_outcomes(features, metric_mode, revenue_model=None)[source]¶
Compute per-visitor potential outcomes and derive truth arrays.
Dispatches based on
metric_mode.n_treatments:n_treatments == 2(paired form): routes to_compute_binary_truthor_compute_hurdle_truthexactly as before.n_treatments >= 3(multi-treatment form): routes to_compute_truth_multi.
- Parameters:
features (
DataFrame) – Feature matrix, one row per visitor. Produced bysample_features.metric_mode (
MetricMode) – Metric and surface definitions. Determines which computation path to use (binary, hurdle, or multi-treatment).revenue_model (
CartRevenueConfig|None) – Cart revenue configuration. When provided, the hurdle decomposition uses analytical expected cart revenue instead of the raw severity surface values. Ignored for binary metrics. Not supported for K≥3.
- Returns:
Per-visitor CATE, population effect, and decomposition components.
- Return type:
- Raises:
ValueError – If
metric_mode.metric_idis not a supported metric.NotImplementedError – If
revenue_modelis non-None andn_treatments >= 3.
- pytyche.generators.core.build_truth(truth_result, metric_mode)[source]¶
Construct a CalibrationTruth from computed potential outcomes.
Bridges the internal
TruthResult(pipeline intermediate) to the publicCalibrationTruthcontract.K-dispatch:
K=2 (paired form): populates the legacy 1-D fields (
cate_per_visitor,conv/aov_cate_per_visitor,p0/p1/m0/m1_per_visitor); leaves the three new list fields (contrast_cate_per_visitor,p_per_visitor,m_per_visitor) asNone.K≥3 (multi-treatment form): populates
contrast_cate_per_visitor(length K−1),p_per_visitor(length K), andm_per_visitor(length K, orNoneforconversion_rate); leaves all legacy 1-D fields asNone.- Parameters:
truth_result (
TruthResult) – Output ofcompute_potential_outcomes— holds per-visitor CATE, population effect, and decomposition components.metric_mode (
MetricMode) – The metric configuration used to compute truth — providesmetric_idfor family derivation.
- Returns:
Typed, frozen ground truth.
- Return type:
- pytyche.generators.core.add_surfaces(s1, s2)[source]¶
Return a surface whose output is the element-wise sum of two surfaces.
- Parameters:
s1 (
SurfaceConfig) – Input surfaces. Both receive the same feature DataFrameX.s2 (
SurfaceConfig) – Input surfaces. Both receive the same feature DataFrameX.
- Returns:
Surface computing
s1(X) + s2(X).- Return type:
- pytyche.generators.core.multiply_surfaces(s1, s2)[source]¶
Return a surface whose output is the element-wise product of two surfaces.
- Parameters:
s1 (
SurfaceConfig) – Input surfaces. Both receive the same feature DataFrameX.s2 (
SurfaceConfig) – Input surfaces. Both receive the same feature DataFrameX.
- Returns:
Surface computing
s1(X) * s2(X).- Return type:
- pytyche.generators.core.sigmoid_surface(s)[source]¶
Return a surface applying the sigmoid transform to an inner surface.
The sigmoid maps any real value to the open interval (0, 1):
sigmoid(z) = 1 / (1 + exp(-z))
Useful for composing linear or nonlinear terms into a valid probability surface without manual clipping.
- Parameters:
s (
SurfaceConfig) – Inner surface whose output is passed through sigmoid.- Returns:
Surface computing
sigmoid(s(X)).- Return type:
- pytyche.generators.core.threshold_surface(s, cutoff, above, below)[source]¶
Return a step-function surface based on a threshold applied to an inner surface.
Produces a discontinuous surface suitable for stress-testing HTE estimators that assume smooth effect functions:
result[i] = above if s(X)[i] > cutoff below otherwise
- Parameters:
s (
SurfaceConfig) – Inner surface to threshold.cutoff (
float) – Decision boundary. Strict>comparison.above (
float) – Output value wheres(X) > cutoff.below (
float) – Output value wheres(X) <= cutoff.
- Returns:
Step-function surface.
- Return type:
- pytyche.generators.core.assign_and_observe(features, truth_result, assignment, metric_id, rng, revenue_model=None, treatment_assignment=None)[source]¶
Assign visitors to treatment levels and sample observed outcomes.
Dispatches to one of two paths based on the truth structure:
Paired/binary path (
treatment_assignment is NoneANDtruth_result.p_values is None): the existing K=2 binary/hurdle sampling unchanged — RNG draws are byte-identical to the original implementation. Returns a 2-element list[control, treatment].Multi-treatment path (
truth_result.p_values is not None, i.e. K≥3 truth): generalised to K treatment levels. Assignment is either:Internal randomisation (
treatment_assignment is None): usesassignment.treatment_probabilitiesto draw per-visitor treatment indices from a categorical distribution.External hook (
treatment_assignmentsupplied): the caller provides a per-visitor array of treatment indices in[0, K−1](e.g. a sequential-experiment cell’s policy routing).
When
revenue_modelis aCartRevenueConfig, cart-based revenue sampling is used (K=2 only). Supplying arevenue_modelfor K≥3 raisesNotImplementedError.- Parameters:
features (
DataFrame) – Feature matrix produced bysample_features, one row per visitor.truth_result (
TruthResult) – Output ofcompute_potential_outcomes. K=2: carriesp0_valuesandp1_values(and hurdle arrays). K≥3: carriesp_values(andm_values/sigma_valuesfor hurdle metrics).assignment (
AssignmentConfig) – Treatment assignment parameters. Paired path usestreatment_allocation; multi-treatment internal path usestreatment_probabilities.metric_id (
str) – Canonical metric identifier — determines revenue sampling path.rng (
Generator) – Caller-controlled RNG for reproducible assignment and sampling.revenue_model (
CartRevenueConfig|None) – Revenue sampling strategy for hurdle metrics.None(default) uses LogNormal sampling. ACartRevenueConfigactivates cart-based sampling (K=2 only). Ignored for binary metrics.treatment_assignment (
ndarray|None) – External per-visitor treatment-index array (ints in[0, K−1]). When supplied, bypasses internal randomisation and drives group membership exactly — suitable for the sequential-experiment loop where a cell’s policy routes visitors by features. Requires multi-treatment truth (truth_result.p_valuesnon-None); supplying this with K=2 paired truth raisesValueError.
- Returns:
Length K. Index 0 = control (
name="control"); indices 1…K−1 arename="treatment_k"for K≥3, orname="treatment"for K=2. All VariantData contain observed columns and feature columns only (no truth fields).- Return type:
list[VariantData]- Raises:
ValueError – If
truth_resultis missing required potential outcome arrays, iftreatment_assignmentis supplied with K=2 paired truth, or iftreatment_assignmenthas invalid length or out-of-range indices.NotImplementedError – If
revenue_modelis non-None and K≥3 truth is present.
- pytyche.generators.core.build_bundle(variants, truth_result, metric_mode, experiment_id)[source]¶
Assemble a CalibrationBundle from variant list and truth.
Accepts a list of K VariantData (K=2 for paired designs, K≥3 for multi-treatment). Stamps the canonical
experiment_idonto each variant, validates the observed data schema, and performs an alignment check appropriate for the truth form:K=2 (
truth.cate_per_visitornon-None): standard CATE alignment.K≥3 (
truth.cate_per_visitor is None): alignstruth.p_per_visitor[0]against the total observed visitor count.
Fail-closed: raises on any validation or alignment violation.
- Parameters:
variants (
list[VariantData]) – List of VariantData, one per treatment level. Index 0 = control, indices 1…K−1 = treated levels.truth_result (
TruthResult) – Internal truth intermediate fromcompute_potential_outcomes.metric_mode (
MetricMode) – Metric configuration — providesmetric_idfor family derivation.experiment_id (
str) – Canonical experiment identifier stamped onto observed data.
- Returns:
(observed, truth)— observed is validated and truth-free.- Return type:
- Raises:
SchemaViolation – If observed data violates the visitor schema contract.
AlignmentViolation – If the truth array length doesn’t match total observed visitors.
ValueError – If alignment check fails at the n_visitors level.
- pytyche.generators.core.generate_v2_core(config)[source]¶
Generate a CalibrationBundle via the v2 potential-outcomes pipeline.
- Pipeline stages:
sample_features — draw correlated mixed covariates X.
compute_potential_outcomes — derive per-visitor truth arrays.
assign_and_observe — assign treatment, sample observed outcomes.
build_bundle — assemble CalibrationBundle(observed, truth).
- Parameters:
config (
V2GeneratorConfig) – Fully specified v2 generator configuration.- Returns:
(observed, truth)— observed is truth-free, truth contains per-visitor CATE aligned with concatenated visitor rows.- Return type: