Skip to content
~/agentops
// 12-FACTOR AGENTOPS

Don't run production
on vibes.

Agents generate code with no record of what was tried, no gate it had to pass, no proof it works. 12-Factor AgentOps is the operating model that closes that gap. Before a change counts as done, something that didn't write it has to check it. No verdict, not done.

Run it in your agent →GitHub ↗Read the 12 factors →
LAYER 1
VALIDATION MEMBRANE
Fresh-context judges prove or reject the work before you trust it.
Catch "done" that isn't.
LAYER 2
EVIDENCE TRAIL
Every verdict lands hash-chained in .agents/ in your repo.
The proof is yours, greppable and portable.
LAYER 3
CONTEXT COMPILER
Compile the right slice into the next run.
No re-explaining your codebase each session.
LAYER 4
KNOWLEDGE RATCHET
Learnings promote into gates and constraints.
Compounding is measured, not claimed.
├ cold start↺ EVERY VERDICT RECORDED — THE LEDGER IS YOURSproven ┤

// the 12 factors

all 12 →

The operating rules the site is named for: twelve, in four phases.

The 12 factors as a four-phase loop: Prepare, Bound, Select, Govern — Govern feeds back into Prepare

// see it work

The proven layer first. Before a change counts as done, an independent judge has to reach a verdict, and every verdict lands in a hash-chained ledger you can re-verify yourself. These are the real numbers from that ledger.

190
independent verdicts
185
confirmed
4
refuted
3
caught, then fixed
// a real catch from the ledger · bead age-rhlx
REFUTED on 02230bb · an independent judge rejected the change
CONFIRMED on 950a925 · the fix passed a fresh verdict, merge cleared

190 verdicts across 169 beads, 2026-06-13 to 2026-07-01 · hash chain verified · ledger tip 1994cc271212
boshu2/agentops docs/evidence/membrane-receipts.md, auto-generated from docs/provenance/ledger.jsonl and chain-verified via ao provenance verify · synced 2026-07-01

// what it looks like in-session
> /validate --mixed the agent reported this PR done
// evidence sealed → fresh-context judges, Claude Code + Codex
claude · REFUTE /login has no rate limit (claimed covered, isn't)
codex · REFUTE token-bucket refill lacks jitter under burst
verdict: HOLD. not done.
> /research add rate limiting to /login
// loading context from .agents/ …
3 prior auth decisions cited · 2 planning rules · 1 learning
plan: token bucket, 5/min per IP, Redis-backed, jittered
// plan recorded to .agents/ in your repo

// the bet

THE WAGER — Vendors will ship managed memory, review councils, and overnight learning loops natively; they will lock them to their runtime. Your corpus stays in .agents/ in your repo, runs on whichever harness you already pay for, and is portable across whichever frontier model wins next quarter.

Good architecture principles outlive every tool that implements them.
see the proof →

// the three gaps it closes

The failure modes that make agent work unreliable, each closed by a named surface.

G1JUDGMENTPressure-test plans before code.closed by /pre-mortem · /validate · /council
G2DURABLE LEARNINGSolved problems stay solved: the measured, still-unproven bet.closed by /post-mortem · ao search · ao lookup
G3LOOP CLOSUREShipped work compounds into the next session: measured, still unproven.closed by /post-mortem · the promotion ratchet · /rpi
// pages
/factorsthe twelve, in four phases/cdlcthe context development lifecycle/comparisonshow it stacks up against alternatives/journeydoctrine evolution as a devlog arc/skillsthe catalog, enforced in your runtime/installrun the loop in your harness
// your agent said done; something else proved it. the proof is yours.
the corpus stays yours