VII. Validate Externally | 12-Factor AgentOps

Rule

The worker reports what it did and the evidence for it. An independent checker — never the worker — writes the verdict that counts.

Split the two roles. The worker makes a claim ("this is done, here's the proof"). Something else — a test suite, a reviewer, another model — turns that claim into a verdict. Only the checker's verdict is binding. A worker grading its own work is not validation; it's a confidence statement wearing validation's clothes.

What AgentOps Enforces

Keep claim and verdict separate: the worker supplies evidence, an independent surface decides.
Use tests, linters, builds, type checks, reviewers, or a separate model as that surface.
Tie acceptance to observable evidence, not to the worker's confidence.
Prefer command output and review findings over "looks good."
Escalate when no independent surface is available instead of letting self-review stand in for proof.

Failure Signal

The closeout says "looks good" but names no command.
The same context that wrote the patch only checks the happy path it expected.
A high-risk change merges without a second surface.
The validation step cannot fail.

Done Looks Like

The worker's claim and the checker's verdict are two different things written by two different parties, and the binding verdict — owned by the checker, not the worker — is recorded where the next agent can inspect it.