Judgment
Pressure-test plans before code.
Plan looks coherent. Code passes tests. Both miss the edge case. No one challenged either.
Closed by: /pre-mortem · /vibe · /council
> agentops · operational discipline for coding agents
Ship reliable code with unreliable agents that forget.
The moat is the context you’ve earned. AgentOps is how it compounds.
After Twelve-Factor App · 12-Factor Agents · HumanLayer
> doctrine · four layers · one flywheel · three gaps
The moat is the context you’ve earned. Every decision. Every scar. Every landmine. Each session writes a learning; each learning sharpens the next.
Each layer stands alone. Together they compose — run /rpi instead of eight commands by hand.
Knowledge in git, not chat history.
Learnings, findings, and handoffs land in plain-text .agents/ files. Diff them. Review them. Commit them. Decay the stale ones.
/retro/forge/injectao lookupGates that block, not advise.
Multi-model council reviews plans before build and code before commit. Independent Claude + Codex judges. No agent grades its own work.
/pre-mortem/vibe/council/post-mortemSkills, hooks, and the ao CLI.
Reusable building blocks. Call one. Compose several. Same shape at every scale.
/research/plan/implementaohooksEnd-to-end paths, hands-free or manual.
Named compositions of primitives. Run /rpi for the full lifecycle. Run /evolve to chase a goal autonomously. Same audit trail either way.
/rpi/crank/evolve/dreamCapture. Score. Promote. Inject. Not memory — compounding.
Every session emits learnings.
Five axes: specificity, actionability, novelty, context, confidence.
Learnings become patterns. Patterns become rules.
Next session starts loaded, not cold.
Stage 4 feeds the next session's stage 1. Escape velocity: retrieval × usage > decay.
Twelve factors. Three proof obligations. Each factor closes one or more.
Pressure-test plans before code.
Plan looks coherent. Code passes tests. Both miss the edge case. No one challenged either.
Closed by: /pre-mortem · /vibe · /council
Solved problems stay solved.
Auth bug fixed Monday. Same auth bug returns Wednesday. The lesson lived in a chat transcript that got compacted.
Closed by: /retro · /forge · ao lookup
Shipped work informs the next session.
Code diff lands. No lesson extracted. No constraint hardened. Next session re-learns from scratch.
Closed by: /post-mortem · finding compiler · /evolve
We’d rather lose the argument before you install than after.
Claim
Compound growth is happening, not just possible.
Strongest critique
Every agent framework claims a learning loop. Most ship a folder of stale notes. A markdown directory looks like a step backwards.
Our answer
Escape velocity is measurable: retrieval × usage > decay. CI gates the inequality every run. When learnings stop being cited, the gate goes red and names the broken stage. Diff-able plain text is the only audit substrate that scales.
Evidence: agentops/GOALS.md directives 4–5 + scripts/check-flywheel-lifecycle.sh
Manage the context window like production input.
If it's not in git, it didn't happen.
Scoped task. Fresh context. Never reuse a saturated window.
Understand before you generate.
No agent grades its own work. Ever.
Validated work ratchets. It cannot regress.
Every session produces two outputs: the work, and the lessons.
Learnings flow into future sessions automatically.
Track fitness, not activity.
Own workspace. Own context. Zero shared mutable state.
Escalation flows up, never sideways.
Failures are data. Index them like successes.
convergence · three primitives · multiple implementations
Any vendor can target the schema. We publish it.
Every harness gets absorbed into the model — memory, learning loops, validation gates included. The corpus is what stays yours. AgentOps is the bridge: build the moat before the harness commoditizes. Vendor-agnostic on purpose — use Claude and Codex side by side.
Primitive
Industry framing
AgentOps surface
Status
Primitive
Learning loop
Industry framing
Extract memory. Consolidate off-session. Inject next session.
AgentOps surface
/retro → /forge → /harvest → ao inject. Private overnight runs via /dream. Tiered promotion: learning → pattern → rule.
Status
Shipped.
Primitive
Skill packaging
Industry framing
Watch recurring patterns. Package them as reusable skills.
AgentOps surface
69 skills. /heal-skill audits. /converter exports cross-runtime. ao flywheel close-loop drafts new skills from repeats.
Status
Drafting works. Promotion polish in flight.
Primitive
Adversarial verification
Industry framing
Independent agents audit other agents. Verdicts go to a human.
AgentOps surface
/council, /pre-mortem, /vibe, /post-mortem. Multi-model consensus with prediction tracking. Behavioral validation fires inside /validation.
Status
Shipped.
Harness commoditizes. Corpus doesn’t. Skills, hooks, and a CLI smooth today’s sharp edges; the corpus is the moat that stays. Audit the contract at /spec/jobspec/v0.
Claim
Same skills across Claude Code, Codex, Cursor, OpenCode.
Strongest critique
'Cross-platform' usually means tested on the primary and shimmed for everything else. Users get burned on three of four.
Our answer
Three test tiers — structural, live inventory, live execution — gate Claude and Codex in CI. The conformance schema is public at /spec/jobspec/v0, so any vendor harness gets audited against the same contract we use. Externally falsifiable beats trust-us.
Evidence: agentops/GOALS.md directive 1 + tests/skills/test-runtime-*-smoke.sh + /spec/jobspec/v0/openapi.yaml
spec · doctrine made executable
JobSpec OpenAPI v0 describes the current AgentOps daemon. Any vendor can target it. Audit it. Implement it. Push back on it.