Agents generate code with no record of what was tried, no gate it had to pass, no proof it works. 12-Factor AgentOps is the operating model that closes that gap. Before a change counts as done, something that didn't write it has to check it. No verdict, not done.
The proven layer first. Before a change counts as done, an independent judge has to reach a verdict, and every verdict lands in a hash-chained ledger you can re-verify yourself. These are the real numbers from that ledger.
190
independent verdicts
185
confirmed
4
refuted
3
caught, then fixed
// a real catch from the ledger · bead age-rhlx
REFUTED on 02230bb · an independent judge rejected the change
CONFIRMED on 950a925 · the fix passed a fresh verdict, merge cleared
190 verdicts across 169 beads, 2026-06-13 to 2026-07-01 · hash chain verified · ledger tip 1994cc271212… boshu2/agentops docs/evidence/membrane-receipts.md, auto-generated from docs/provenance/ledger.jsonl and chain-verified via ao provenance verify · synced 2026-07-01
// what it looks like in-session
> /validate --mixed the agent reported this PR done
// evidence sealed → fresh-context judges, Claude Code + Codex
claude · REFUTE /login has no rate limit (claimed covered, isn't)
codex · REFUTE token-bucket refill lacks jitter under burst
plan: token bucket, 5/min per IP, Redis-backed, jittered
// plan recorded to .agents/ in your repo
// the bet
THE WAGER — Vendors will ship managed memory, review councils, and overnight learning loops natively; they will lock them to their runtime. Your corpus stays in .agents/ in your repo, runs on whichever harness you already pay for, and is portable across whichever frontier model wins next quarter.
Good architecture principles outlive every tool that implements them.