Rule
The worker reports what it did and the evidence for it. An independent checker — never the worker — writes the verdict that counts.
Split the two roles. The worker makes a claim ("this is done, here's the proof"). Something else — a test suite, a reviewer, another model — turns that claim into a verdict. Only the checker's verdict is binding. A worker grading its own work is not validation; it's a confidence statement wearing validation's clothes.
What AgentOps Enforces
- Keep claim and verdict separate: the worker supplies evidence, an independent surface decides.
- Use tests, linters, builds, type checks, reviewers, or a separate model as that surface.
- Tie acceptance to observable evidence, not to the worker's confidence.
- Prefer command output and review findings over "looks good."
- Escalate when no independent surface is available instead of letting self-review stand in for proof.
Failure Signal
- The closeout says "looks good" but names no command.
- The same context that wrote the patch only checks the happy path it expected.
- A high-risk change merges without a second surface.
- The validation step cannot fail.
Done Looks Like
The worker's claim and the checker's verdict are two different things written by two different parties, and the binding verdict — owned by the checker, not the worker — is recorded where the next agent can inspect it.