Iterative CLI loop. Calls claude -p up to 10x. Pure stubborn determination. Tokens? Who counts tokens?
| STAT | RATING |
|---|---|
| SPEED | ⭐⭐⭐⭐⭐ |
| CONSISTENCY | ⭐⭐⭐ |
| TOKEN COST | 💀💀💀💀 |
| AVG TOKENS | 2,146 |
| AVG ITERS | 1.2 |
Lyapunov-guided orchestrator. Phase-space trajectories. Spawns explorer/convergence subagents. Dynamical systems theory applied to code.
| STAT | RATING |
|---|---|
| SPEED | ⭐⭐⭐ |
| CONSISTENCY | ⭐⭐⭐⭐⭐ |
| TOKEN COST | 💀 |
| AVG TOKENS | 601 |
| AVG ITERS | 3.3 |
How the battle was judged
6 tasks total: 5 coding + 1 analysis. Difficulty tiers: 🟢 Standard (A,C) · 🔴 Hard (H)
4 repetitions per task per condition. Latin-square ordering eliminates carry-over effects.
Claude Sonnet 4.6 · Temperature=0 · Blinded: UUID-named outputs, condition hidden
0-10 rubric. Coding tasks: +2 bonus for passing pytest suite (cap 10).
Each output saved as random UUID.txt. Judge never sees condition label.
Repeated-measures ANOVA · Bonferroni correction · Bootstrap 95% CIs (n=10,000)
HOW TO USE THIS IN YOUR OWN BATTLES
git clone https://github.com/SharathSPhD/strange-wiggum
cd strange-wiggum
uv venv .venv --python 3.13
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv pip install pingouin scipy pandas numpy
npm install -g @anthropic-ai/claude-code
claude login # authenticate with Anthropic
claude plugin install attractor-flow
# Verify: claude plugin list
python -m benchmark.harness --tasks H02 --conditions ralph --reps 1
# Results appear in benchmark/results/scores.csv
python -m benchmark.stats
python -m benchmark.report
# Open benchmark/results/leaderboard.md
| TASK | DIFFICULTY | RALPH μ | AF μ | WINNER |
|---|---|---|---|---|
| A01 | 🟢 Analysis | 9.00 | 9.00 | 🤝 TIE |
| C02 | 🟢 Coding | 9.33 | 9.33 | 🤝 TIE |
| C08 | 🟢 Coding | 9.67 | 9.67 | 🤝 TIE |
| H01 | 🔴 Hard | 10.00 | 10.00 | 🤝 TIE |
| H02 | 🔴 Hard | 8.75 | 10.00 | ⚡ 🌀 ATTRACTORFLOW |
| H03 | 🔴 Hard | 10.00 | 9.67 | 🧒 RALPH |
H02 (Pratt Parser) was the decisive battle. Ralph's Haiku model failed the unary-minus-power precedence test (-2**2 should = -4, not 4) in rep 2, scoring 5/10. AttractorFlow stayed CONVERGING and nailed all 21 tests.
⚔️ NO KNOCKOUT
NEITHER AGENT DOMINATED
NEITHER FIGHTER WAS KO'D.
ATTRACTORFLOW EDGES QUALITY BY +0.21
WHILE SPENDING 72% FEWER TOKENS.
THE REAL WIN: AF σ=0.60 VS RALPH σ=1.22
— ATTRACTOR IS MORE CONSISTENT.
| RANK | CONDITION | SCORE | GRADE |
|---|---|---|---|
| 🥇 #1 | ATTRACTORFLOW | 9.63 | S |
| 🥈 #2 | RALPH WIGGUM | 9.42 | A |
═══════════════════════
BUILT WITH:
═══════════════════════
═══════════════════════