The record is yours, portable and tamper-evident.
AgentOps is single-runtime by default, to keep first value cheap. The proof of sovereignty is not a benchmark; earlier benchmarks are saturated, so a score delta argues nothing. The proof is where the record lives: every verdict lands in a hash-chained ledger in your own repo, portable across Claude Code, Codex, Cursor, and OpenCode. When the next frontier model wins, the record comes with you, and you can re-verify it byte for byte. Mixed-model review is opt-in and table stakes; the moat is that the record is portable and tamper-evident, which the receipts ledger below proves directly.
"Portability is a slogan. Every tool says your data is yours." The difference is what backs the word here: the record is a hash-chained ledger in your repo, machine-derived, and it regenerates from the chain or fails to build. It is not an export button you have to trust; it is a record you can carry to the next vendor and re-verify line by line. The first exhibit below is that ledger.
Membrane receipts: verdicts are public, machine-derived, and hash-chain verified
The validation membrane keeps a provenance ledger of every verdict it renders, and the receipts are built straight from it: 213 ledger records, 190 verdict events, 185 CONFIRMED, 4 REFUTED, 1 ESCALATE, across 169 distinct beads reviewed between 2026-06-13 and 2026-07-01. The generator refuses to render if the hash chain is broken or tampered, so the numbers cannot drift from the ledger without the page itself failing to build.
The generator refuses to render if ao provenance verify reports a broken or tampered hash chain.
Every number derives from the ledger; regenerating the page recomputes them from the hash chain, so the receipts cannot quietly drift from what the membrane actually did.
agentops/docs/evidence/membrane-receipts.md + agentops/docs/provenance/ledger.jsonlCodex reframed "RPI is waterfall" → "plan.md is an artifact-boundary bug"
The maintainer asked the council to make RPI lean by re-architecting the loop. The Claude judges debated waterfall-vs-agile tradeoffs. Codex reframed the diagnosis: RPI execution is already wave-based; the bloat is that plan.md is treated as a shared god-artifact re-read roughly eight times per lifecycle. That reframe became the load-bearing decision: execution-packet.json became the contract, plan.md became commentary, and a mis-scoped re-architecture epic was avoided.
Discovery + planning manufacture the plan.md as a shared god artifact, then every later phase rehydrates pieces of it. Bloat is an artifact-boundary bug, not a methodology failure.
execution-packet.json adopted as the cross-phase contract; plan.md demoted to commentary; the re-architecture epic was reshaped into a packet-boundary cleanup instead of a methodology overhaul.
F6: session-misjoin in flywheel_citation_feedback.go (two Codex judges, independent)
The Adapt-phase feedback loop in the citation tracker uses a timestamp-generated fallback session id and exact-match session filtering, then applies utility penalties without filtering citations by session. Corrections can be misattributed across sessions or over-applied. A separate Codex judge found a complementary bug: skill_loaded citations resolve to skills/<name>/SKILL.md, but the feedback resolver only checks .agents/learnings|findings|patterns paths, so skill-correction penalties silently resolve to nothing. The Adapt phase the leverage hierarchy calls highest-leverage was partly miswired.
This is a real defect, not a framing issue: the "Adapt" phase the leverage hierarchy calls highest-leverage is partly mis-wired.
Filed as a critical bug against the citation feedback loop; the load-bearing claim that the flywheel adapts on correction signal was demoted to "partially adaptive" until the misjoin is fixed.
cli/cmd/ao/flywheel_citation_feedback.goThe mixed-model claim stays falsifiable: a queued test re-runs the same councils single-vendor and mixed on identical evidence, scored by an independent reviewer, and if a single vendor catches the same load-bearing findings, mixed-model loses its billing.
The record is the moat: machine-derived, hash-chained, and living in your own repo, so it stays yours when you change vendors. Mixed-model review earns its place by moving verdicts, not by sitting in the install matrix, and the exhibits above are the standing evidence. New runs land here as they accrue; this page is the running ledger, not a launch graphic.