Experiment Tracking¶
How pytyche’s calibration SBC sweeps + their downstream artifacts are versioned, logged, and reproduced: what each piece is for, how the pieces fit, and the canonical invocations.
Note
This page is for developers running calibration sweeps from a source
checkout of the repository — the pipeline files it describes
(dvc.yaml, params.yaml, scripts/) are not part of the installed
package.
Pieces¶
Artifact |
Where |
Owns |
|---|---|---|
|
repo root |
The pipeline DAG: |
|
repo root |
Sweep IDs + per-stage CLI arguments. Edit before each pilot run. |
|
each sweep dir |
Per-sweep metadata: experiment_id, git sha + dirty flag, env (python/platform), params, data_provenance, and the |
|
each sweep dir |
Per-fit metrics ( |
|
per-config dir |
Existing per-fit summary (preserved unchanged from pre-tracking sweeps; consumed by |
|
each sweep dir |
Existing per-sweep aggregate (preserved unchanged). |
Canonical invocations¶
# Edit params.yaml first: sweep IDs (train_sweep.id, test_sweep.id), master_seed, n_configs.
dvc repro train_sweep # ~5 GPU-hr at the default n_configs=50; ~10 GPU-hr at n_configs=100.
dvc repro test_sweep # ~1 GPU-hr at n_configs=10.
dvc repro fit_corrections # quick (~minutes); fits R(p) + scale-family from the train sweep.
dvc repro evaluate_layered # quick; applies corrections to the test sweep and scores.
The sweep IDs in params.yaml are the canonical cross-sweep identifiers. They’re used both as DVC’s stable output paths (runs/<id>/) and as the manifest’s experiment_id cross-reference target (test sweeps link to their train sweep via pytyche.calibration.links.trained_correction_from).
Manifest schema (required top-level fields)¶
Field |
Type |
Notes |
|---|---|---|
|
integer |
Currently |
|
string |
|
|
string |
ISO 8601 UTC of experiment start. |
|
object |
|
|
object |
|
|
object |
Free-form per-experiment hyperparameters. |
|
object |
Discriminated union: |
|
object |
Reserved namespace for per-capability extension content. Currently has |
pytyche.calibration extension content¶
Set automatically by scripts/calibration_sweep_v2.py:
{
"pytyche": {
"calibration": {
"master_seed": 20260527,
"n_configs": 50,
"scales": [250000],
"generator_family": "v2",
"sweep_kind": "train", // or "test", "exploratory"
"save_samples": false, // true for test sweeps (--save-samples)
"links": { // only present on test sweeps
"trained_correction_from": "2026-05-27T12-34-56Z_abc1234"
}
}
}
}
Why DVC + dvclive (not MLflow / wandb)¶
File-based discipline. Inspect with
cat/grep/jq/pandas— no SaaS, no UI as source of truth.Git-versioned config + content-addressed artifacts. Sweep IDs in
params.yamlare the cross-stage identifiers; DVC caches large outputs and gives reproducibility on top.Reserved per-capability namespace. Future capabilities (e.g., a future Thompson-allocation tracking layer) extend the manifest under
pytyche.<capability>rather than reinventing a manifest layer per-area.
MLflow, wandb, and Aim were considered and rejected: each makes a
service or a UI the source of truth, where this workflow wants plain
files that survive grep and version control.