// 12-FACTOR AGENTOPS
Don't run production
on vibes.
Agents generate code with no record of what was tried, no gate it had to pass, no proof it works. 12-Factor AgentOps is the operating model that closes that gap. Before a change counts as done, something that didn't write it has to check it. No verdict, not done.
LAYER 1
VALIDATION MEMBRANE
Fresh-context judges prove or reject the work before you trust it.
Catch "done" that isn't.
LAYER 2
EVIDENCE TRAIL
Every verdict lands hash-chained in .agents/ in your repo.
The proof is yours, greppable and portable.
LAYER 3
CONTEXT COMPILER
Compile the right slice into the next run.
No re-explaining your codebase each session.
LAYER 4
KNOWLEDGE RATCHET
Learnings promote into gates and constraints.
Compounding is measured, not claimed.
├ cold start↺ EVERY VERDICT RECORDED — THE LEDGER IS YOURSproven ┤
// the 12 factors
all 12 →The operating rules the site is named for: twelve, in four phases.
IPREPARE
Context
Put only task-relevant context in the window.
II
Track
Keep work state, decisions, and learnings in git.
III
Scope
Give each agent one bounded job and a fresh window.
IVBOUND
Privilege
Act inside a least-privilege envelope untrusted input cannot widen.
V
Research
Inspect the integration surface before writing code.
VI
Isolate
Separate concurrent workers by workspace, context, and state.
VIISELECT
Validate
The worker reports evidence; an independent checker writes the verdict.
VIII
Lock
Turn validated work into the new baseline.
IX
Extract
Record what the session learned, not only what it changed.
XGOVERN
Compound
Wire learnings — positive and negative — back into future work.
XI
Supervise
Escalate up a clear tree; a stuck job goes to a fresh agent.
XII
Measure
Track fitness toward goals, not agent activity.
// the bet
THE WAGER — Vendors will ship managed memory, review councils, and overnight learning loops natively; they will lock them to their runtime. Your corpus stays in .agents/ in your repo, runs on whichever harness you already pay for, and is portable across whichever frontier model wins next quarter.
Good architecture principles outlive every tool that implements them.
// the three gaps it closes
The failure modes that make agent work unreliable, each closed by a named surface.
G1JUDGMENTPressure-test plans before code.closed by /pre-mortem · /validate · /council
G2DURABLE LEARNINGSolved problems stay solved: the measured, still-unproven bet.closed by /post-mortem · ao search · ao lookup
G3LOOP CLOSUREShipped work compounds into the next session: measured, still unproven.closed by /post-mortem · the promotion ratchet · /rpi
// pages
// your agent said done; something else proved it. the proof is yours.
the corpus stays yours