Skip to content

> agentops · context compiler for coding agents

Context compiler for coding agents.

Compile context. Gate output. Compound knowledge.

Every session reads from the corpus on the way in and writes back on the way out.

Vendor memory follows the chat. The corpus follows the team.

After Twelve-Factor App · 12-Factor Agents · HumanLayer

The AgentOps knowledge flywheel

> doctrine · three layers · one flywheel · three gaps

What AgentOps is.

Vendor memory follows the chat. The corpus follows the team. Every coding session reads from the corpus on the way in and writes back on the way out — typed, versioned, validated, decay-ranked. Your agent’s context is now an engineering artifact, not chat history.

01 / Three layers

Each layer solves a different problem. All three compound through the Context Development Life Cycle.

01

Context Compiler

Right context, right window, right time.

Assembles phase-scoped context packets, scores knowledge by utility and freshness, trims to the token budget, and delivers at session start automatically. Your agent starts loaded, not cold.

ao injectao context assembleao compile71 skills12 hooks
02

Validation Gates

Gates that block, not advise.

Multi-model consensus validates plans before build and code before commit. Independent Claude + Codex judges debate and return one auditable verdict. No agent grades its own work.

/pre-mortem/vibe/council61 eval suitesbaseline A/B
03

Knowledge Flywheel

The system gets smarter every session.

Every session extracts learnings. Learnings get scored and promoted to permanent patterns. Patterns become planning rules. The flywheel runs overnight unattended. Session 15 starts with everything session 1 learned.

/forgeao flywheel/evolve/dream1,400+ learnings

02 / Knowledge flywheel

Capture. Score. Promote. Inject. Not memory — compounding.

  1. 1

    Capture

    Every session emits learnings.

  2. 2

    Score

    Five axes: specificity, actionability, novelty, context, confidence.

  3. 3

    Promote

    Learnings become patterns. Patterns become rules.

  4. 4

    Inject

    Next session starts loaded, not cold.

Stage 4 feeds the next session's stage 1. Escape velocity: retrieval × usage > decay.

03 / Three gaps the doctrine closes

Twelve factors. Three proof obligations. Each factor closes one or more.

Three gaps converging

Judgment

Pressure-test plans before code.

Plan looks coherent. Code passes tests. Both miss the edge case. No one challenged either.

Closed by: /pre-mortem · /vibe · /council

Durable Learning

Solved problems stay solved.

Auth bug fixed Monday. Same auth bug returns Wednesday. The lesson lived in a chat transcript that got compacted.

Closed by: /retro · /forge · ao lookup

Loop Closure

Shipped work informs the next session.

Code diff lands. No lesson extracted. No constraint hardened. Next session re-learns from scratch.

Closed by: /post-mortem · finding compiler · /evolve

04 / Steelman the moat

We’d rather lose the argument before you install than after.

Claim

Compound growth is happening, not just possible.

Strongest critique

Every agent framework claims a learning loop. Most ship a folder of stale notes. A markdown directory looks like a step backwards.

Our answer

Escape velocity is measurable: retrieval × usage > decay. CI gates the inequality every run. When learnings stop being cited, the gate goes red and names the broken stage. Diff-able plain text is the only audit substrate that scales.

Evidence: agentops/GOALS.md directives 4–5 + scripts/check-flywheel-lifecycle.sh

Foundation (I - III)
Flow (IV - VI)
Knowledge (VII - IX)
Scale (X - XII)

convergence · three primitives · multiple implementations

Knowledge is the moat. AgentOps isn’t.

Any vendor can target the schema. We publish it.

Every harness gets absorbed into the model — memory, learning loops, validation gates included. The corpus is what stays yours. AgentOps is the bridge: build the moat before the harness commoditizes. Vendor-agnostic on purpose — use Claude and Codex side by side.

  • Primitive

    Learning loop

    Industry framing

    Extract memory. Consolidate off-session. Inject next session.

    AgentOps surface

    /retro → /forge → /harvest → ao inject. Private overnight runs via /dream. Tiered promotion: learning → pattern → rule.

    Status

    Shipped.

  • Primitive

    Skill packaging

    Industry framing

    Watch recurring patterns. Package them as reusable skills.

    AgentOps surface

    69 skills. /heal-skill audits. /converter exports cross-runtime. ao flywheel close-loop drafts new skills from repeats.

    Status

    Drafting works. Promotion polish in flight.

  • Primitive

    Adversarial verification

    Industry framing

    Independent agents audit other agents. Verdicts go to a human.

    AgentOps surface

    /council, /pre-mortem, /vibe, /post-mortem. Multi-model consensus with prediction tracking. Behavioral validation fires inside /validation.

    Status

    Shipped.

Harness commoditizes. Corpus doesn’t. Skills, hooks, and a CLI smooth today’s sharp edges; the corpus is the moat that stays. Audit the contract at /spec/jobspec/v0.

Steelman: cross-runtime

Claim

Same skills across Claude Code, Codex, Cursor, OpenCode.

Strongest critique

'Cross-platform' usually means tested on the primary and shimmed for everything else. Users get burned on three of four.

Our answer

Three test tiers — structural, live inventory, live execution — gate Claude and Codex in CI. The conformance schema is public at /spec/jobspec/v0, so any vendor harness gets audited against the same contract we use. Externally falsifiable beats trust-us.

Evidence: agentops/GOALS.md directive 1 + tests/skills/test-runtime-*-smoke.sh + /spec/jobspec/v0/openapi.yaml

spec · doctrine made executable

Conformance, in YAML.

JobSpec OpenAPI v0 describes the current AgentOps daemon. Any vendor can target it. Audit it. Implement it. Push back on it.

$ ao quick-start

Ship the next session smarter than this one.

Five minutes to install. One command to validate. Zero telemetry.