---
title: "Experiment structure: cells and allocation"
review-state: drafting
last-human-review: "2026-06-12"
depends-on:
  - src/pytyche/experiment/cells.py
  - src/pytyche/experiment/recommendation.py
  - src/pytyche/analysis/_thompson.py
  - src/pytyche/viz/_cells.py
owner: tradcliffe
quadrant: concept
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Experiment structure: cells and allocation

An adaptive experiment has three jobs that pull in different
directions: **measure** honestly (you need a live control reference,
forever), **learn** what works for whom (every arm needs data in every
kind of visitor, or you can't tell), and **earn** (once you know what
works, traffic spent elsewhere is money left on the table). Pytyche's
answer is structural: each round's traffic is split into **cells**,
and each cell does exactly one of those jobs.

This page explains the canonical cell structure and the allocation
rule inside it — with pictures. The full design rationale lives in
[sequential targeting](sequential-targeting.md); the hands-on
walkthrough is the
[first adaptive experiment](../tutorials/first-adaptive-experiment.md)
tutorial.

## The canonical three cells

Every round the recommendation engine proposes the same three-cell
shape (weights are the operator's dials,
`min_control_weight` / `min_explore_weight` on
`pt.sequential_experiment(...)`; defaults shown):

```{code-cell} ipython3
from types import SimpleNamespace

import pytyche as pt

# plot_cells reads `id` and `weight` — in real use you pass the
# engine's own cells (`plan.cells`), as the adaptive tutorial does.
layout = [
    SimpleNamespace(id="control", weight=0.05),
    SimpleNamespace(id="explore", weight=0.05),
    SimpleNamespace(id="optimized", weight=0.90),
]
pt.viz.plot_cells(layout);
```

- **Control** — everyone gets the baseline. This is the permanent
  measurement reference: however confident the model gets, lift is
  always "compared to a live control," never "compared to what we
  remember." It is also never droppable.
- **Explore** — uniform random over all active arms, ignoring the
  model entirely. This is the floor that keeps the experiment
  *identified*: every arm keeps receiving data in every segment, so
  the model can always re-estimate any arm anywhere — including
  noticing that an arm it wrote off has started working (population
  drift, product changes).
- **Optimized** — the remaining ~90%, routed by the current policy:
  a shallow decision tree assigns each visitor to a discovered
  segment, and the segment's allocation decides which arm they get.

The first two cells are deliberately dumb. All of the model's
intelligence is concentrated in the third — which is what the rest of
this page is about.

## Inside the Optimized cell: Thompson allocation

Within a segment, each arm's traffic share is **the posterior
probability that it is the segment's best arm**. That single sentence
is the whole allocation rule (Thompson sampling, at segment
granularity). Mechanically: each posterior draw casts a vote for the
arm it shows winning in that segment — control wins a draw exactly
when no treatment shows a positive lift — and an arm's share is its
win frequency over draws.

The consequence is the behavior you actually want from an adaptive
experiment, with no thresholds to tune. Early on, when the posterior
is still wide, contending arms split the segment's traffic — so the
next round gathers evidence exactly where the decision is open. As
evidence accumulates and the posterior sharpens, traffic concentrates
on the winner:

```{code-cell} ipython3
import matplotlib.pyplot as plt
import numpy as np

rng = np.random.default_rng(0)
arms = ["control", "blue_button", "free_shipping"]


def thompson_weights(mean, sd, n_draws=4000):
    """Win frequency per arm over posterior draws of the two lifts."""
    draws = rng.normal(mean, sd, size=(n_draws, 2))  # lifts vs control
    best = np.where(draws.max(axis=1) <= 0, 0, draws.argmax(axis=1) + 1)
    return np.bincount(best, minlength=len(arms)) / n_draws


early = thompson_weights(mean=[0.04, 0.05], sd=0.06)  # round 1: wide
late = thompson_weights(mean=[0.01, 0.06], sd=0.015)  # round 4: sharp

fig, axes = plt.subplots(1, 2, figsize=(9, 3), sharey=True)
for ax, w, title in [
    (axes[0], early, "early round — posterior still wide"),
    (axes[1], late, "later round — posterior has sharpened"),
]:
    ax.bar(arms, w)
    ax.set_title(title)
axes[0].set_ylabel("segment traffic share");
```

Two properties worth noticing:

- **Exploration costs almost nothing.** Thompson only splits traffic
  between arms that are near-tied — and when arms are near-tied, the
  value difference between them is, by definition, small. Where one
  arm clearly dominates, it gets nearly everything.
- **Control is a first-class arm.** A segment where every treatment
  looks harmful allocates its traffic back to control automatically —
  "do nothing here" is a discoverable answer, not a failure mode.

Each segment gets its own mixture, so one round can simultaneously be
exploiting a resolved segment and still actively comparing arms in a
contested one. The `pt.viz.plot_policy_tree` rendering shows this
per-leaf: each leaf is labeled with its leading arm and its current
allocation.

## Watching a whole experiment

Put the pieces together over rounds and the structure becomes visible:
the cell layout stays fixed while, inside the Optimized cell, the
policy tree's segments and their allocations evolve as the posterior
learns. `pt.viz.experiment_evolution_gif(exp.history, path)` renders
this directly from an experiment's history — this is the canonical
adaptive tutorial's run:

![Round-by-round evolution of cell allocations and the policy tree, animated](../_static/first-adaptive-experiment-evolution.gif)

Read it round by round: allocations start spread within each segment,
then concentrate as the posterior sharpens; segments themselves can
split or merge as the refit tree finds sharper structure; and the
Control and Explore floors persist untouched throughout — the
measurement and identification guarantees don't decay as the
experiment gets confident.

## Why this shape

The structure separates two duties that are tempting to conflate:

- **The floors are flat on purpose.** Control and Explore provide
  guarantees — a live reference, identification everywhere, drift
  detectability — and guarantees shouldn't depend on what the model
  currently believes. They are sized by operator policy, not by
  posterior state.
- **The adaptivity is concentrated and self-pricing.** All
  model-driven traffic shifting happens inside the Optimized cell,
  through a rule whose exploration spend is automatically proportional
  to how unresolved each decision is — and whose cost is lowest
  exactly where its spread is largest.

Assignment probabilities are recorded exactly for every visitor in
every cell, so the accumulated data stays analyzable across rounds no
matter how the allocation evolved.

These guarantees are bought with flat spend — the floors cost the
same whether or not the posterior still needs them. The structures
below price that trade differently.

## Other experiment structures

:::{admonition} Forward-looking design intent — not shipped behavior
:class: warning

This section describes design direction for upcoming releases;
today's engine ships the three-cell structure above.
:::

The three-cell Thompson structure is one point on a spectrum of
equally valid designs, and it is the *hybrid* point: structural
floors around an adaptive core. The two pure endpoints are:

- **Continuous batched Thompson — one cell.** Since Thompson already
  spreads traffic where the posterior is uncertain and already treats
  control as a first-class arm, the floors can be folded into the
  allocation itself: a single cell whose policy *is* the per-segment
  Thompson mixture, with the round-1 allocation informed by prior
  beliefs (a flat prior gives a uniform first round — the Explore
  cell's job, done by the same rule that handles every later round)
  and minimum shares expressed as floor parameters rather than as
  separate cells. Maximally adaptive; nothing is spent on flat floors
  the posterior has already resolved past.
- **Deterministic policies as cells.** Each segment gets 100% of its
  optimized traffic from its current best arm, and alternatives are
  tested via **challenger cells** carrying alternate segment-to-arm
  mappings. A set of deterministic cells with weights induces exactly
  a per-segment traffic mixture, so nothing statistical is forfeited —
  what changes is legibility: each cell is a complete, nameable policy
  you could ship, and the cell-level scoreboard compares deployable
  policies head-to-head.

All three structures rest on the same statistical machinery — recorded
assignment probabilities, accumulated-data refits, per-segment
posterior contrasts — and differ in where they put exploration and
how auditable the moving parts are. The intended direction is for the
library to support these structures as first-class options — including
sizing challenger cells by the *value* of resolving the remaining
uncertainty (the `expected_value_of_one_more_round` machinery from
the [decision-theoretic inputs](decision-theoretic-inputs.md)) rather
than by uncertainty alone. The building blocks are already public:
`Cell`, `TreePolicy`, and arbitrary allocation maps.

## Related pages

- [Sequential targeting](sequential-targeting.md) — the full design
  rationale for the loop this structure serves.
- [Decision-theoretic inputs](decision-theoretic-inputs.md) — the
  quantities that decide when a treatment graduates out of the loop.
- [First adaptive experiment](../tutorials/first-adaptive-experiment.md)
  — the end-to-end tutorial this page's GIF comes from.
- [Working with the posterior](../tutorials/working-with-the-posterior.md)
  — `thompson_allocation` and `fit_policy_tree` in isolation.