Methods
Pilot design
The Phase 7 mechanism pilot crossed four domains (poetry generation, poetry interpretation, alternative uses, scientific creativity) with four base arms (haiku_bare, haiku_cascade, haiku_bare_2K_scorer for H6 fairness, haiku_generic_revise_2pass for H7 fairness) and a five-policy multiplex over the cascade arm (always_draft, always_revise, event_gated, learned_gate, oracle). Per-domain n is constrained by the managed-API cost envelope and the judge envelope; the table below summarises pooled per-domain pairs.
| Domain | n_pairs (H5) |
|---|---|
| H1 | 5 |
| H2 | 10 |
| H3 | 6 |
| H4 | 4 |
Substrate
Both the cascade and the judge ran via parallel API calls against the managed Anthropic-API substrate during the Phase 7 pilot (global.anthropic.claude-haiku-4-5-20251001-v1:0 for cascade, global.anthropic.claude-sonnet-4-5-20250929-v1:0 for judge). The substrate deviation from the OAuth Claude CLI was justified by quota — the managed-API substrate can parallelise across domains where the OAuth substrate cannot. Day-to-day operation, including showcase regeneration, runs against the OAuth substrate. ADR-006 records the deviation and ADR-007 records the SDK code-path removal that accompanies it. The plugin page is the operator-facing reference.
Statistical protocol
Per-domain primary contrasts use paired permutation tests (50 000 permutations) and Wilcoxon signed-rank as a non-parametric backup. Effect sizes are Hedges' g; intervals are BCa 95% from 10 000 bootstraps. H5 pools H1–H4 via inverse-variance fixed-effects (ADR-005); the pre-registered alternative random-effects DerSimonian–Laird pool is reported alongside as a sensitivity check. Multi-hypothesis correction is Holm. H8a is a paired permutation test on score(revision) − score(draft) over all cascade items; H8b reports binary classifier metrics for each commit gate at the same threshold; H8c pairs each policy against the bare control with Holm correction across the six pairwise contrasts. H9 reports Spearman ρ and sign-agreement between the proxy composite delta and the Sonnet judge delta.
Reproducibility
Every primary number on this site is regenerable from a single command:
pce smoke pce cascade --prompt "$(cat prompts/anushtubh.txt)" --model haiku --k 4 --seed 4242 python -m benchmarks.figures --version v0.4 python -m benchmarks.autoreport --version v0.4 --strict
See the reproducibility page for the full repro recipe, including the cost ledger, integrity probes, and the §0.5 unmerged-state critique.