pytyche.experiments.manifest

Versioned experiment manifest builder, validator, and atomic writer.

The canonical JSON Schema document lives at docs/specs/experiment-manifest-schema.json. Validation is hand-rolled rather than via jsonschema so error messages can name the offending field directly, and so this module imports cleanly without heavy ML deps.

Module Attributes

MANIFEST_SCHEMA_VERSION

Current schema version.

Functions

build_manifest(*, experiment_id, params, ...)

Build a manifest dict with required top-level fields populated.

validate_manifest(manifest)

Validate a manifest; raise ValueError on any violation.

write_manifest(manifest, path)

Atomically write manifest as JSON to path.

pytyche.experiments.manifest.MANIFEST_SCHEMA_VERSION: int = 1

Current schema version. Bump for breaking changes; keep in sync with docs/specs/experiment-manifest-schema.json ($id v1 → version 1).

pytyche.experiments.manifest.build_manifest(*, experiment_id, params, pytyche_extensions, data_provenance)[source]

Build a manifest dict with required top-level fields populated.

git (sha / dirty / branch), env (python / platform), and timestamp_utc are resolved at call time from the ambient environment (git via subprocess against Path.cwd(); env via sys and platform; timestamp_utc via datetime.now(timezone.utc)).

pytyche_extensions is a {capability: content} dict; each entry is nested under the reserved top-level pytyche key, e.g. pytyche_extensions={"calibration": {...}} lands at manifest["pytyche"]["calibration"]. An empty extensions dict still produces an empty manifest["pytyche"] object so consumers can rely on the key existing.

The returned dict is intentionally not validated here — call validate_manifest() separately if you want to assert correctness before writing.

Parameters:
  • experiment_id (str) – Unique identifier, conventionally {iso8601}_{short_sha}.

  • params (dict[str, Any]) – Free-form per-experiment hyperparameters.

  • pytyche_extensions (dict[str, Any]) – Mapping from capability name (e.g. "calibration") to its nested content object. Goes under the reserved pytyche top-level key.

  • data_provenance (dict[str, Any]) – Discriminated-union dict; either {"kind": "synthetic", "seed": int} or {"kind": "external", "hashes": {name: sha256_hex}}.

Returns:

The constructed manifest. Caller owns persistence.

Return type:

dict[str, Any]

pytyche.experiments.manifest.validate_manifest(manifest)[source]

Validate a manifest; raise ValueError on any violation.

Accepts object rather than dict because callers feed this from json.load() of untrusted files — the runtime value may be a list, None, a string, etc. The isinstance narrowing below is the boundary check; downstream code only runs once we know we have a dict.

Checks performed (each failure raises with an error message that names the offending field(s) — no silent catch-all messages):

  1. Top-level value is a dict.

  2. Required top-level fields are present (names listed in the error).

  3. No foreign top-level keys (anything not required and not pytyche).

  4. data_provenance is a dict with a valid kind discriminator, and for kind == "external" the hashes map is non-empty.

Passes silently when valid.

Parameters:

manifest (object)

Return type:

None

pytyche.experiments.manifest.write_manifest(manifest, path)[source]

Atomically write manifest as JSON to path.

Uses a temporary file in the same directory followed by os.replace so partial writes are never visible at path (POSIX-atomic rename). The temporary file is cleaned up on failure.

Parameters:
  • manifest (dict[str, Any]) – The manifest dict. Caller is responsible for validating it first if desired — this function only persists.

  • path (str | Path) – Destination path. The parent directory must already exist.

Return type:

None