Reproducibility
Every primary number on this site is regenerable from the open repository. Three install paths (Cursor, Claude Code, standalone CLI) all share the same cascade module (src/pce/cascade.py) and the same statistical pipeline (benchmarks/figures.py + benchmarks/autoreport.py).
Recipe — replicate every number on this site
# 1. clone + install git clone https://github.com/sharathsphd/pratyabhijna cd pratyabhijna uv pip install -e . pce smoke # 2. regenerate the figure pack and autoreport python -m benchmarks.figures --version v0.4 python -m benchmarks.autoreport --version v0.4 --strict # 3. (optional) regenerate the showcase from cached Phase 7 results python scripts/generate_v0_4_showcase.py python tests/test_v0_4_showcase.py # verifies all 9 demos exist with traces # 4. rebuild the paper cd paper && tectonic -X compile main.tex # 5. rebuild the Astro site cd docs/site && pnpm install && python ../../scripts/prepare_site_data.py && pnpm build
Audit artefacts
- benchmarks/results_v0.4/stats.json — pre-registered hypothesis statistics.
- benchmarks/results_v0.4/judge.jsonl — per-item LLM-judge verdicts.
- benchmarks/results_v0.4/judge_agreement.json — Spearman ρ and sign-agreement (H9.v4).
- audit/v0.4/cost_ledger_merged.json — managed-API token-cost ledger across the four domains.
- audit/v0.4/integrity_probes_merged.jsonl — substrate integrity probes across the pilot run.
- audit/v0.4/lit_verification.jsonl & lit_new_entries.jsonl — bibliography verification + new-entry log.
- audit/v0.4/phase8_gate_report.json — Phase 8 Ralph gate stack pass/fail summary.
The §0.5 unmerged-state critique
Phase 7 of the v0.4 mechanism study completed via parallel API calls against the managed Anthropic-API substrate on April 30 2026 with the result tree pushed to origin/v0.4-mechanism-study at commit 94ba97e. From that date until Phase 8 landed, the public main branch and the GitHub Pages site told the v0.3 story: results_v0.3/stats.json is what the public surface fetched, the README's headline block was the v0.3 negative-result summary, and the repository's hypothesis table read H1.v3–H8.v3.
There are reasonable defences for the branch-only stance. The v0.4 paper, HTML, and release notes were not yet written; a premature merge would have surfaced raw stats without their academic interpretation. The Pages workflow had no v0.4 schema branch and would have either silently rendered v0.3 or broken. The COMPLETION_PROMISES_v0.4.md Phase 8 contract explicitly required the paper, HTML, README, and release notes to be ready before merge, not the other way around.
The costs are visible too. External readers landing on the GitHub Pages site for the first time on May 1 2026 saw results that were by then stale at the headline level (the cascade-vs-bare null without the H8a / H8b mechanism findings); collaborators did not see ADR-005 fixed-effects or ADR-006 typed Haiku errors on the trunk; the branch was at risk of drifting if anything else landed on main; the cost ledgers + integrity probes from a reproducible managed-API pilot run were not visible at main yet, so anyone trying to replicate had to dig through branches.
The Phase 8 mitigation is not a defensive squash but an explicit acknowledgement: the merge happens at the end of Phase 8 in lockstep with the paper, the Astro site (which from day one reads results_v0.4/stats.json), the release notes, the v0.4.0 tag, and the GitHub release. The PR body for the Phase 8 mega-merge cites this section by name. The paper §10.8 opens with the same observation. We do not pretend the gap was costless; we record what it cost and why we accepted the cost.
A v0.5 process change is recorded in the v0.5 PRD: future phases that produce numbers should land a "preliminary results" PR within 48 hours of pilot completion, even if the full paper rewrite is still pending — Pages can run a "draft" badge in those windows.