pytyche.validation

Runtime validators for v2 contracts.

These validators make the contract types executable — they enforce invariants at construction time rather than discovering violations downstream.

Used by both generators and loaders. Fail-closed: no silent acceptance of malformed data.

Functions:

  • validate_observed_data(data) — checks visitor schema, dtypes, invariants.

  • validate_alignment(array, data) — confirms array length matches visitors.

  • validate_rule(rule, data) — confirms rule features exist and are type-compatible.

Functions

validate_alignment(array, data)

Confirm that a per-visitor array is aligned with concatenated visitors.

validate_observed_data(data, *[, strict])

Validate that all variant DataFrames conform to VISITOR_SCHEMA.

validate_rule(rule, data)

Confirm that a segment rule's features exist in visitor data and have compatible types.

Exceptions

AlignmentViolation

Raised when a per-visitor array is misaligned with visitor rows.

RuleViolation

Raised when a segment rule references invalid features or types.

SchemaViolation

Raised when observed data violates the visitor schema contract.

exception pytyche.validation.SchemaViolation[source]

Bases: Exception

Raised when observed data violates the visitor schema contract.

exception pytyche.validation.AlignmentViolation[source]

Bases: Exception

Raised when a per-visitor array is misaligned with visitor rows.

exception pytyche.validation.RuleViolation[source]

Bases: Exception

Raised when a segment rule references invalid features or types.

pytyche.validation.validate_observed_data(data, *, strict=True)[source]

Validate that all variant DataFrames conform to VISITOR_SCHEMA.

Per-variant checks:

  1. All required columns are present.

  2. Column dtypes are compatible with the schema.

  3. revenue >= 0 for all rows.

  4. No duplicate visitor_id within a variant.

  5. n_visitors, n_conversions, total_revenue match the DataFrame contents.

  6. Every row’s variant column matches VariantData.name.

  7. Every row’s experiment_id matches data.experiment_id.

Cross-variant checks:

  1. No visitor_id appears in more than one variant (a visitor can only be assigned to one arm).

Strict-mode checks (strict=True, the default):

  1. All variants have the same set of extra feature columns (beyond VISITOR_SCHEMA).

  2. Extra feature columns have consistent dtypes across variants.

Set strict=False when feature-column asymmetry across variants is intentional. Example: a treatment arm collects an extra survey response column that doesn’t exist for control:

# Treatment adds a post-checkout "why did you buy?" column.
# Control visitors never see the survey, so the column is absent.
validate_observed_data(data, strict=False)
Parameters:
  • data (ObservedExperimentData) – The observed experiment data to validate.

  • strict (bool) – If True (default), require feature-column consistency across variants. If False, skip cross-variant column checks.

Raises:

SchemaViolation – On any violation, with a message identifying the variant and the specific problem.

Return type:

None

pytyche.validation.validate_alignment(array, data)[source]

Confirm that a per-visitor array is aligned with concatenated visitors.

The expected length is the sum of n_visitors across all variants, which equals the row count of:

pd.concat([v.visitors for v in data.variants], ignore_index=True)
Parameters:
Raises:

AlignmentViolation – If the array length doesn’t match.

Return type:

None

pytyche.validation.validate_rule(rule, data)[source]

Confirm that a segment rule’s features exist in visitor data and have compatible types.

Checks:

  1. Each clause’s feature column exists in at least one variant’s DataFrame.

  2. Numeric rules (ComparisonRule, BetweenRule) reference numeric columns.

  3. Categorical rules (EqRule, InRule) reference non-numeric columns.

Does NOT enforce allowed categorical values — no domain registry in Phase 1 scope.

Parameters:
Raises:

RuleViolation – If a feature is missing or type-incompatible.

Return type:

None