Judgment
Pressure-test plans before code.
Plan looks coherent. Code passes tests. Both miss the edge case. No one challenged either.
Closed by: /pre-mortem · /vibe · /council
> agentops · context compiler for coding agents
Compile context. Gate output. Compound knowledge.
Every session reads from the corpus on the way in and writes back on the way out.
Vendor memory follows the chat. The corpus follows the team.
After Twelve-Factor App · 12-Factor Agents · HumanLayer
> doctrine · three layers · one flywheel · three gaps
Vendor memory follows the chat. The corpus follows the team. Every coding session reads from the corpus on the way in and writes back on the way out — typed, versioned, validated, decay-ranked. Your agent’s context is now an engineering artifact, not chat history.
Each layer solves a different problem. All three compound through the Context Development Life Cycle.
Right context, right window, right time.
Assembles phase-scoped context packets, scores knowledge by utility and freshness, trims to the token budget, and delivers at session start automatically. Your agent starts loaded, not cold.
ao injectao context assembleao compile71 skills12 hooksGates that block, not advise.
Multi-model consensus validates plans before build and code before commit. Independent Claude + Codex judges debate and return one auditable verdict. No agent grades its own work.
/pre-mortem/vibe/council61 eval suitesbaseline A/BThe system gets smarter every session.
Every session extracts learnings. Learnings get scored and promoted to permanent patterns. Patterns become planning rules. The flywheel runs overnight unattended. Session 15 starts with everything session 1 learned.
/forgeao flywheel/evolve/dream1,400+ learningsCapture. Score. Promote. Inject. Not memory — compounding.
Every session emits learnings.
Five axes: specificity, actionability, novelty, context, confidence.
Learnings become patterns. Patterns become rules.
Next session starts loaded, not cold.
Stage 4 feeds the next session's stage 1. Escape velocity: retrieval × usage > decay.
Twelve factors. Three proof obligations. Each factor closes one or more.
Pressure-test plans before code.
Plan looks coherent. Code passes tests. Both miss the edge case. No one challenged either.
Closed by: /pre-mortem · /vibe · /council
Solved problems stay solved.
Auth bug fixed Monday. Same auth bug returns Wednesday. The lesson lived in a chat transcript that got compacted.
Closed by: /retro · /forge · ao lookup
Shipped work informs the next session.
Code diff lands. No lesson extracted. No constraint hardened. Next session re-learns from scratch.
Closed by: /post-mortem · finding compiler · /evolve
We’d rather lose the argument before you install than after.
Claim
Compound growth is happening, not just possible.
Strongest critique
Every agent framework claims a learning loop. Most ship a folder of stale notes. A markdown directory looks like a step backwards.
Our answer
Escape velocity is measurable: retrieval × usage > decay. CI gates the inequality every run. When learnings stop being cited, the gate goes red and names the broken stage. Diff-able plain text is the only audit substrate that scales.
Evidence: agentops/GOALS.md directives 4–5 + scripts/check-flywheel-lifecycle.sh
Manage the context window like production input.
If it's not in git, it didn't happen.
Scoped task. Fresh context. Never reuse a saturated window.
Understand before you generate.
No agent grades its own work. Ever.
Validated work ratchets. It cannot regress.
Every session produces two outputs: the work, and the lessons.
Learnings flow into future sessions automatically.
Track fitness, not activity.
Own workspace. Own context. Zero shared mutable state.
Escalation flows up, never sideways.
Failures are data. Index them like successes.
convergence · three primitives · multiple implementations
Any vendor can target the schema. We publish it.
Every harness gets absorbed into the model — memory, learning loops, validation gates included. The corpus is what stays yours. AgentOps is the bridge: build the moat before the harness commoditizes. Vendor-agnostic on purpose — use Claude and Codex side by side.
Primitive
Industry framing
AgentOps surface
Status
Primitive
Learning loop
Industry framing
Extract memory. Consolidate off-session. Inject next session.
AgentOps surface
/retro → /forge → /harvest → ao inject. Private overnight runs via /dream. Tiered promotion: learning → pattern → rule.
Status
Shipped.
Primitive
Skill packaging
Industry framing
Watch recurring patterns. Package them as reusable skills.
AgentOps surface
69 skills. /heal-skill audits. /converter exports cross-runtime. ao flywheel close-loop drafts new skills from repeats.
Status
Drafting works. Promotion polish in flight.
Primitive
Adversarial verification
Industry framing
Independent agents audit other agents. Verdicts go to a human.
AgentOps surface
/council, /pre-mortem, /vibe, /post-mortem. Multi-model consensus with prediction tracking. Behavioral validation fires inside /validation.
Status
Shipped.
Harness commoditizes. Corpus doesn’t. Skills, hooks, and a CLI smooth today’s sharp edges; the corpus is the moat that stays. Audit the contract at /spec/jobspec/v0.
Claim
Same skills across Claude Code, Codex, Cursor, OpenCode.
Strongest critique
'Cross-platform' usually means tested on the primary and shimmed for everything else. Users get burned on three of four.
Our answer
Three test tiers — structural, live inventory, live execution — gate Claude and Codex in CI. The conformance schema is public at /spec/jobspec/v0, so any vendor harness gets audited against the same contract we use. Externally falsifiable beats trust-us.
Evidence: agentops/GOALS.md directive 1 + tests/skills/test-runtime-*-smoke.sh + /spec/jobspec/v0/openapi.yaml
spec · doctrine made executable
JobSpec OpenAPI v0 describes the current AgentOps daemon. Any vendor can target it. Audit it. Implement it. Push back on it.