sovereignty proof · artifact-as-evidence

The cross-vendor claim, falsifiable.

AgentOps 3.0 defaults to single-runtime to make install-to-first-value cheap; mixed-model is opt-in via /council --mixed. The capability is preserved because the corpus lives in .agents/ in your repo and is portable across Claude Code, Codex, Cursor, and OpenCode — so when the next frontier model wins, the corpus comes with you. The proof is not a benchmark; the v1 workbench is saturated. The proof is in-repo /council --mixed runs where the second-vendor judge moved the verdict, with file:line citations any reader can audit.

Steelman

"Cross-vendor coordination is marketing." Most multi-runtime tools test on the primary and shim everything else. If the second judge never disagrees in a load-bearing way, the second judge is decoration. We answer the steelman with three real artifacts where the second vendor moved the verdict.

The standing exhibit

#1 · 2026-05-15rpi-leanness reframe
Codex reframed "RPI is waterfall" → "plan.md is an artifact-boundary bug"
Surfaced by
codex (gpt-5.5), independent voice in mixed council; echoed by the skeptic seat
Missed by
All three Claude judges (agile-coach, ddd-architect, token-economist) framed the problem as a methodology issue, not a payload-boundary issue
The maintainer asked the council to make RPI lean by re-architecting the loop. The Claude judges debated waterfall-vs-agile tradeoffs. Codex re-framed the diagnosis: RPI execution is already wave-based; the bloat is that plan.md is treated as a shared god-artifact re-read ~8× per lifecycle (20–44 KB each time). That reframe became the load-bearing decision — execution-packet.json became the contract, plan.md became commentary — saving a mis-scoped re-architecture epic.
Discovery + planning manufacture the plan.md as a shared god artifact, then every later phase rehydrates pieces of it. Bloat is an artifact-boundary bug, not a methodology failure.
Load-bearing outcome
execution-packet.json adopted as the cross-phase contract; plan.md demoted to commentary; the re-architecture epic was reshaped into a packet-boundary cleanup instead of a methodology overhaul.
verdict: .agents/council/2026-05-15-rpi-leanness-council.md:23-30
#2 · 2026-05-16F6 — citation feedback misjoin
F6: session-misjoin in flywheel_citation_feedback.go (two Codex judges, independent)
Surfaced by
codex-1 and codex-3, independently — two Codex judges flagged the same defect from different angles in the same council run
Missed by
All three Claude judges (claude-1, claude-2, claude-3) reviewed the same evidence packet and did not surface this finding
The Adapt-phase feedback loop in the citation-tracker uses a timestamp-generated fallback session id and exact-match session filtering, then applies utility penalties without filtering citations by session — so corrections can be misattributed across sessions or over-applied. A separate codex judge found a complementary bug: skill_loaded citations resolve to skills/<name>/SKILL.md, but the feedback resolver only checks .agents/learnings|findings|patterns paths, so skill-correction penalties silently resolve to nothing. The Adapt phase the leverage hierarchy calls highest-leverage was partly mis-wired.
This is a real defect, not a framing issue: the "Adapt" phase the leverage hierarchy calls highest-leverage is partly mis-wired.
Load-bearing outcome
Filed as a critical bug against the citation feedback loop; the load-bearing claim that the flywheel adapts on correction signal was demoted to "partially adaptive" until the misjoin is fixed.
verdict: .agents/council/2026-05-16-research-cdlc-claims.md:91-100code: cli/cmd/ao/flywheel_citation_feedback.go
#3 · 2026-05-16F7 — hook asymmetry
F7: citation-tracker.sh exists but is not wired in PostToolUse(Read) (three Codex judges)
Surfaced by
codex-1, codex-2, and codex-3 — all three Codex judges flagged the manifest/observability gap
Missed by
All three Claude judges took the CDLC prose at face value and did not audit the hooks.json manifest against the claimed observability surface
The CDLC narrative presents one unified observability lifecycle, but the manifest tells a different story. citation-tracker.sh exists as a script but is not wired into hooks/hooks.json PostToolUse(Read), so passive read-citation tracking is documentation-only. Hook coverage is also runtime-asymmetric: the Codex manifest omits SessionEnd / context-guard / context-monitor that the Claude path has. The doctrinal claim of vendor-symmetric observability is broken at the manifest layer.
citation-tracker.sh exists but is not wired into hooks/hooks.json PostToolUse(Read), so passive read-citation tracking is claimed but inactive. Hook coverage is also runtime-asymmetric.
Load-bearing outcome
Filed as a P1 manifest-doctrine divergence; cross-vendor hook parity became a tracked SLO rather than an assumed property; the CDLC observability section was rewritten to describe actual wired hooks, not aspirational ones.
verdict: .agents/council/2026-05-16-research-cdlc-claims.md:102-107code: hooks/hooks.json + hooks/citation-tracker.sh

Ablation gate (queued)

A 1-day ablation experiment is queued before the PRODUCT.md "Strategic Bet" doctrine language gets re-asserted: re-run /council --deep against these three artifacts in three configurations each (Claude-only 3 judges, Codex-only 3 judges, mixed 3+3) with the same evidence packets, blind-scored by an independent reviewer. If single-vendor catches ≥80% of the load-bearing findings, mixed-model is demoted from the headline. If ≤50%, the synthesis hybrid stands. Refusing to run the falsification gate is itself a finding — see council Finding F4: "doctrinal claims require paired falsification gates."

The bet is honest: cross-vendor coordination earns its place by changing verdicts, not by sitting in the install matrix. These three artifacts are the standing exhibit. New mixed-vendor runs land here as they accrue; this page is the running ledger, not a launch graphic.