Experiment Tracking¶

How pytyche’s calibration SBC sweeps + their downstream artifacts are versioned, logged, and reproduced: what each piece is for, how the pieces fit, and the canonical invocations.

Note

This page is for developers running calibration sweeps from a source checkout of the repository — the pipeline files it describes (dvc.yaml, params.yaml, scripts/) are not part of the installed package.

Pieces¶

Artifact	Where	Owns
`dvc.yaml`	repo root	The pipeline DAG: `train_sweep → {fit_corrections, test_sweep} → evaluate_layered`.
`params.yaml`	repo root	Sweep IDs + per-stage CLI arguments. Edit before each pilot run.
`manifest.json`	each sweep dir	Per-sweep metadata: experiment_id, git sha + dirty flag, env (python/platform), params, data_provenance, and the `pytyche.calibration` extension content. Schema at `docs/specs/experiment-manifest-schema.json`.
`dvclive/`	each sweep dir	Per-fit metrics (`coverage_<level>`, `rmse`, `bias`, `wall_seconds`) and params (`config_seed`, `n_visitors`, `generator_family`). Inspectable as TSV via `cat sweep_dir/dvclive/metrics.tsv`.
`summary.json`	per-config dir	Existing per-fit summary (preserved unchanged from pre-tracking sweeps; consumed by `scripts/fit_sbc_correction.py`).
`aggregate.csv`	each sweep dir	Existing per-sweep aggregate (preserved unchanged).

Canonical invocations¶

# Edit params.yaml first: sweep IDs (train_sweep.id, test_sweep.id), master_seed, n_configs.
dvc repro train_sweep         # ~5 GPU-hr at the default n_configs=50; ~10 GPU-hr at n_configs=100.
dvc repro test_sweep          # ~1 GPU-hr at n_configs=10.
dvc repro fit_corrections     # quick (~minutes); fits R(p) + scale-family from the train sweep.
dvc repro evaluate_layered    # quick; applies corrections to the test sweep and scores.

The sweep IDs in params.yaml are the canonical cross-sweep identifiers. They’re used both as DVC’s stable output paths (runs/<id>/) and as the manifest’s experiment_id cross-reference target (test sweeps link to their train sweep via pytyche.calibration.links.trained_correction_from).

Manifest schema (required top-level fields)¶

Field	Type	Notes
`manifest_schema_version`	integer	Currently `1`. Monotonic; bumped on breaking schema changes.
`experiment_id`	string	`{iso8601_utc}_{short_sha}`, e.g. `"2026-05-27T12-34-56Z_abc1234"`.
`timestamp_utc`	string	ISO 8601 UTC of experiment start.
`git`	object	`{sha, dirty, branch}`.
`env`	object	`{python, platform}` + optional library versions.
`params`	object	Free-form per-experiment hyperparameters.
`data_provenance`	object	Discriminated union: `{kind: "synthetic", seed: int}` or `{kind: "external", hashes: {name: sha256}}`.
`pytyche`	object	Reserved namespace for per-capability extension content. Currently has `calibration`.

`pytyche.calibration` extension content¶

Set automatically by scripts/calibration_sweep_v2.py:

{
  "pytyche": {
    "calibration": {
      "master_seed": 20260527,
      "n_configs": 50,
      "scales": [250000],
      "generator_family": "v2",
      "sweep_kind": "train",         // or "test", "exploratory"
      "save_samples": false,         // true for test sweeps (--save-samples)
      "links": {                     // only present on test sweeps
        "trained_correction_from": "2026-05-27T12-34-56Z_abc1234"
      }
    }
  }
}

Why DVC + dvclive (not MLflow / wandb)¶

File-based discipline. Inspect with cat/grep/jq/pandas — no SaaS, no UI as source of truth.
Git-versioned config + content-addressed artifacts. Sweep IDs in params.yaml are the cross-stage identifiers; DVC caches large outputs and gives reproducibility on top.
Reserved per-capability namespace. Future capabilities (e.g., a future Thompson-allocation tracking layer) extend the manifest under pytyche.<capability> rather than reinventing a manifest layer per-area.

MLflow, wandb, and Aim were considered and rejected: each makes a service or a UI the source of truth, where this workflow wants plain files that survive grep and version control.

Experiment Tracking¶

Pieces¶

Canonical invocations¶

Manifest schema (required top-level fields)¶

pytyche.calibration extension content¶

Why DVC + dvclive (not MLflow / wandb)¶

`pytyche.calibration` extension content¶