Skip to content

> agentops · operational discipline for coding agents

Operational discipline for coding agents.

Ship reliable code with unreliable agents that forget.

The moat is the context you’ve earned. AgentOps is how it compounds.

After Twelve-Factor App · 12-Factor Agents · HumanLayer

The AgentOps knowledge flywheel

> doctrine · four layers · one flywheel · three gaps

What AgentOps is.

The moat is the context you’ve earned. Every decision. Every scar. Every landmine. Each session writes a learning; each learning sharpens the next.

01 / Four layers

Four-layer model

Each layer stands alone. Together they compose — run /rpi instead of eight commands by hand.

01

Bookkeeping

Knowledge in git, not chat history.

Learnings, findings, and handoffs land in plain-text .agents/ files. Diff them. Review them. Commit them. Decay the stale ones.

/retro/forge/injectao lookup
02

Validation

Gates that block, not advise.

Multi-model council reviews plans before build and code before commit. Independent Claude + Codex judges. No agent grades its own work.

/pre-mortem/vibe/council/post-mortem
03

Primitives

Skills, hooks, and the ao CLI.

Reusable building blocks. Call one. Compose several. Same shape at every scale.

/research/plan/implementaohooks
04

Flows

End-to-end paths, hands-free or manual.

Named compositions of primitives. Run /rpi for the full lifecycle. Run /evolve to chase a goal autonomously. Same audit trail either way.

/rpi/crank/evolve/dream

02 / Knowledge flywheel

Capture. Score. Promote. Inject. Not memory — compounding.

  1. 1

    Capture

    Every session emits learnings.

  2. 2

    Score

    Five axes: specificity, actionability, novelty, context, confidence.

  3. 3

    Promote

    Learnings become patterns. Patterns become rules.

  4. 4

    Inject

    Next session starts loaded, not cold.

Stage 4 feeds the next session's stage 1. Escape velocity: retrieval × usage > decay.

03 / Three gaps the doctrine closes

Twelve factors. Three proof obligations. Each factor closes one or more.

Three gaps converging

Judgment

Pressure-test plans before code.

Plan looks coherent. Code passes tests. Both miss the edge case. No one challenged either.

Closed by: /pre-mortem · /vibe · /council

Durable Learning

Solved problems stay solved.

Auth bug fixed Monday. Same auth bug returns Wednesday. The lesson lived in a chat transcript that got compacted.

Closed by: /retro · /forge · ao lookup

Loop Closure

Shipped work informs the next session.

Code diff lands. No lesson extracted. No constraint hardened. Next session re-learns from scratch.

Closed by: /post-mortem · finding compiler · /evolve

04 / Steelman the moat

We’d rather lose the argument before you install than after.

Claim

Compound growth is happening, not just possible.

Strongest critique

Every agent framework claims a learning loop. Most ship a folder of stale notes. A markdown directory looks like a step backwards.

Our answer

Escape velocity is measurable: retrieval × usage > decay. CI gates the inequality every run. When learnings stop being cited, the gate goes red and names the broken stage. Diff-able plain text is the only audit substrate that scales.

Evidence: agentops/GOALS.md directives 4–5 + scripts/check-flywheel-lifecycle.sh

Foundation (I - III)
Flow (IV - VI)
Knowledge (VII - IX)
Scale (X - XII)

convergence · three primitives · multiple implementations

Knowledge is the moat. AgentOps isn’t.

Any vendor can target the schema. We publish it.

Every harness gets absorbed into the model — memory, learning loops, validation gates included. The corpus is what stays yours. AgentOps is the bridge: build the moat before the harness commoditizes. Vendor-agnostic on purpose — use Claude and Codex side by side.

  • Primitive

    Learning loop

    Industry framing

    Extract memory. Consolidate off-session. Inject next session.

    AgentOps surface

    /retro → /forge → /harvest → ao inject. Private overnight runs via /dream. Tiered promotion: learning → pattern → rule.

    Status

    Shipped.

  • Primitive

    Skill packaging

    Industry framing

    Watch recurring patterns. Package them as reusable skills.

    AgentOps surface

    69 skills. /heal-skill audits. /converter exports cross-runtime. ao flywheel close-loop drafts new skills from repeats.

    Status

    Drafting works. Promotion polish in flight.

  • Primitive

    Adversarial verification

    Industry framing

    Independent agents audit other agents. Verdicts go to a human.

    AgentOps surface

    /council, /pre-mortem, /vibe, /post-mortem. Multi-model consensus with prediction tracking. Behavioral validation fires inside /validation.

    Status

    Shipped.

Harness commoditizes. Corpus doesn’t. Skills, hooks, and a CLI smooth today’s sharp edges; the corpus is the moat that stays. Audit the contract at /spec/jobspec/v0.

Steelman: cross-runtime

Claim

Same skills across Claude Code, Codex, Cursor, OpenCode.

Strongest critique

'Cross-platform' usually means tested on the primary and shimmed for everything else. Users get burned on three of four.

Our answer

Three test tiers — structural, live inventory, live execution — gate Claude and Codex in CI. The conformance schema is public at /spec/jobspec/v0, so any vendor harness gets audited against the same contract we use. Externally falsifiable beats trust-us.

Evidence: agentops/GOALS.md directive 1 + tests/skills/test-runtime-*-smoke.sh + /spec/jobspec/v0/openapi.yaml

spec · doctrine made executable

Conformance, in YAML.

JobSpec OpenAPI v0 describes the current AgentOps daemon. Any vendor can target it. Audit it. Implement it. Push back on it.

$ ao quick-start

Ship the next session smarter than this one.

Five minutes to install. One command to validate. Zero telemetry.