Factor IV: Continuous Validation
Formal checkpoints before agents apply changes; validate at every step
| Aspect | Details |
|---|---|
| Primary Pillar | DevOps + SRE |
| Supporting Pillar | Learning Science |
| Enforces Laws | Law 3 (Document Context), Law 4 (Validate Before Execute) |
| Derived From | Validation gates + Pre-flight checks + Shift-left verification |
Summary
Before any agent applies changes—to data, systems, or decisions—validation gates must confirm correctness. These gates include automated checks, review steps, and human approvals. Prevention is cheaper than recovery.
The Problem
Agents that execute without validation:
- Break production systems silently
- Create cascading failures
- Waste time fixing self-created problems
- Erode trust in automation
- Have no safety net
Familiar pattern:
Agent generates action → Executes directly → Applies changes
↓
System breaks → Hours of recovery
Traditional approach: "Let's fix it if it breaks"
12-Factor AgentOps approach: Validate before every execution, every time
Why This Factor Exists
Grounding in the Five Pillars
Primary: DevOps + SRE
The validation gates pattern comes directly from DevOps shift-left testing and SRE pre-flight checks. DevOps teaches us that "the earlier you catch a problem, the cheaper it is to fix"—finding errors before execution costs seconds, finding them in production costs hours. SRE practices require pre-flight checklists before any change, just as pilots complete checklists even after 10,000 hours of flight time. The cost ratio is brutal: 1 (validation check) : 60 (post-execution fix) : 3600 (production failure).
Validation gates prevent catastrophic failures by enforcing quality at boundaries. Just as automated pipelines have gates between stages, agent workflows need gates between decision and execution. Prevention is always cheaper than recovery.
Supporting: Learning Science
Learning Science provides the progressive complexity model: students need checkpoints to consolidate understanding before advancing. For AI agents, validation gates serve as these checkpoints—research validates before planning, planning validates before implementation. This prevents compounding errors where a flawed foundation causes cascading failures in subsequent phases.
What This Factor Enforces
Law 3: Document Context for Future Agents
Validation gates enforce documentation by requiring structured entries. Pre-action hooks reject actions without "Context:" and "Learning:" sections, ensuring every change is documented for future agents. This automated enforcement turns Law 3 from a guideline into a requirement—validation fails if context is missing.
Concrete example: Pre-action hook checks for required sections, blocking execution until documentation is complete. Result: 100% compliance with Law 3 (from 70% voluntary compliance).
Law 4: Validate Before Execute
This factor IS Law 4 in action. Validation gates operationalize the principle that agents must never execute changes without validation. Multiple layers enforce this:
- Pre-action checks: Block actions with invalid parameters
- Business rule validation: Verify logic before execution
- Human gates: Require approval before high-risk changes
- Runtime checks: Verify preconditions before executing changes
The Principle
Validation as a Gate, Not an Option
Wrong approach:
1. Generate solution
2. Apply solution
3. Hope it works
Right approach:
1. Generate solution
2. ✅ Validate inputs (automated)
3. ✅ Validate logic (automated rules)
4. ✅ Review decision (human or AI)
5. ✅ Check side effects (dependency analysis)
6. Apply solution (only if all gates pass)
Types of Validation Gates
Level 1: Input Validation (100% automated)
- Parameter format validation
- Required field checks
- Data type verification
- Constraint enforcement
Level 2: Logic Validation (100% automated)
- Business rule checks
- Policy compliance
- Threshold verification
- Constraint validation
Level 3: Semantic Validation (automated + human)
- Decision review (does this make sense?)
- Impact analysis
- Risk assessment
- Quality checks
Level 4: Human Gates (selective)
- High-value decision approval
- Sensitive action authorization
- Policy exception review
- Final confirmation for critical changes
Why This Works
1. Shift-Left Verification
Traditional DevOps wisdom:
"The earlier you catch a problem, the cheaper it is to fix"
For AI agents:
- Catching input errors before action: 5 seconds
- Catching logic errors before completion: 5 minutes
- Catching production errors after deployment: 5 hours
Cost ratio: 1:60:3600
2. Pre-Flight Checks from Aviation
Aviation principle:
"Every pilot completes a pre-flight checklist, even after 10,000 hours"
For AI agents:
- Experienced agents still make mistakes
- Environmental conditions change
- New edge cases emerge constantly
- Checklists (validation) prevent disasters
3. Fail-Fast from Software Engineering
Fail-fast principle:
"Detect and report errors immediately, don't let them propagate"
For AI agents:
Bad: Generate → Execute → Complete → System breaks
Good: Generate → Validate (FAIL) → Fix → Validate (PASS) → Execute
Result: Failures caught before execution, not after
4. Trust Through Verification
Problem: "How do I trust AI-generated decisions?"
Solution: Validation gates provide objective proof
- Checks pass → Action is safe
- Checks fail → Action needs work
- Trust based on verification, not hope
Implementation
Pre-Action Validation
Validation before every action:
def validate_action(action):
# Gate 1: Input validation
if not validate_inputs(action.inputs):
return ValidationError("Invalid inputs")
# Gate 2: Business rules
if not check_business_rules(action):
return ValidationError("Business rule violation")
# Gate 3: Risk assessment
if action.risk_level > THRESHOLD:
if not get_human_approval(action):
return ValidationError("High-risk action requires approval")
return ValidationSuccess()
Domain-Specific Validation Examples
Customer Service Agent:
def validate_refund(refund_action):
# Check refund amount within policy
if refund_action.amount > customer.max_refund_limit:
return Error("Refund exceeds customer limit")
# Check for duplicate refunds
if has_recent_refund(customer, days=30):
return Warning("Recent refund detected - verify")
# Check authorization level
if refund_action.amount > agent.authorization_limit:
return RequireEscalation("Amount exceeds agent limit")
return Success()
Research Agent:
def validate_publication(research_action):
# Check source credibility
if not verify_sources(research_action.sources):
return Error("Unverified sources detected")
# Check for bias indicators
if detect_bias(research_action.content):
return Warning("Potential bias - review required")
# Check stakeholder approval
if research_action.is_external:
return RequireApproval("External publication requires approval")
return Success()
Sales Agent:
def validate_discount(discount_action):
# Check discount within policy
if discount_action.percentage > MAX_DISCOUNT:
return Error("Discount exceeds maximum allowed")
# Check customer eligibility
if not customer_eligible(discount_action.customer):
return Error("Customer not eligible for this discount")
# Check margin impact
if calculate_margin(discount_action) < MIN_MARGIN:
return RequireApproval("Low margin - manager approval needed")
return Success()
Agent-Level Validation Pattern
In agent workflow:
class AgentWorkflow:
def execute(self, task):
# 1. Generate solution
solution = self.generate_solution(task)
# 2. Validation gates
if not self.validate_inputs(solution):
raise ValidationError("Inputs invalid")
if not self.check_business_rules(solution):
raise ValidationError("Business rule violation")
if not self.verify_constraints(solution):
raise ValidationError("Constraint violation")
# 3. Human gate (if needed)
if task.requires_approval:
if not self.request_approval(solution):
raise ApprovalDenied("Human rejected action")
# 4. All gates passed - safe to apply
self.apply_solution(solution)
return solution
Validation Hierarchy
Fast → Slow → Expensive
1. Input validation (1 second, 100% automated)
2. Business rule check (10 seconds, 100% automated)
3. Policy compliance (1 minute, 100% automated)
4. Risk assessment (2 minutes, 100% automated)
5. Impact analysis (5 minutes, 100% automated)
6. Human review (variable, selective)
Principle: Fail fast at cheap gates, escalate to expensive gates only when needed
Validation
✅ You're doing this right if:
- Every action passes automated validation
- Pre-action checks prevent bad executions
- Validation catches issues before they propagate
- Humans review only high-risk changes
- Validation failures are rare (good solutions)
❌ You're doing this wrong if:
- Executing without running checks
- Disabling validation to "move faster"
- Skipping validation for urgent requests
- Manual validation only (no automation)
- Validation happens after execution
Real-World Evidence
Production Validation (200+ Sessions)
Before validation gates:
Average failed actions: 15% (3 out of 20)
Time to fix: 30 minutes per failure
Total cost: 450 minutes wasted
Success rate: 85%
After validation gates:
Average failed actions: 0.5% (1 out of 200)
Time to fix: 5 minutes per failure
Total cost: 5 minutes wasted
Success rate: 99.5%
Improvement: 90x reduction in failed actions, 6x reduction in time wasted
Specific Example: Customer Refund Processing
Before validation:
Process 50 refund requests
Manual review catches 8 invalid amounts
Manual review catches 3 policy violations
Time: 2 hours review + 1 hour fixes = 3 hours
After validation:
Process 50 refund requests
Pre-action check catches 8 invalid amounts (5 seconds)
Policy check catches 3 violations (10 seconds)
All refunds valid on first attempt
Time: 15 seconds
Improvement: 720x faster validation
Anti-Patterns
❌ The "Trust Me" Trap
Wrong: "I checked it manually, it's fine" Right: Automated validation every time, no exceptions
❌ The "Move Fast, Break Things" Trap
Wrong: Skip validation to execute faster Right: Validation makes you faster (less fixing broken things)
❌ The "Checks Are Slow" Trap
Wrong: Disable checks because they take time Right: Optimize checks, but never skip them
❌ The "Production Testing" Trap
Wrong: "We'll catch it in production" Right: Catch it before execution (validation gates)
Relationship to Other Factors
- Factor I: Automated Tracking: Hooks enforce validation before actions
- Factor II: Context Loading: Validation in isolated agent context prevents pollution
- Factor III: Focused Agents: Smaller agents → simpler validation
- Factor V: Measure Everything: Monitors validation gate success rates
- Factor XI: Fail-Safe Checks: Validation enforces governance
Validation Gate Design Patterns
Pattern 1: Fast Fail Gates
Principle: Run fast, cheap validations first
def validate_solution(solution):
# 1 second - input validation
if not check_inputs(solution):
return False # Fail immediately
# 10 seconds - business rules
if not check_business_rules(solution):
return False # Fail immediately
# 1 minute - policy compliance
if not check_policy(solution):
return False # Fail immediately
# Only run expensive validations if cheap ones pass
return True
Pattern 2: Parallel Validation
Principle: Run independent validations concurrently
import concurrent.futures
def validate_solution(solution):
with concurrent.futures.ThreadPoolExecutor() as executor:
# Run validations in parallel
futures = {
executor.submit(check_inputs, solution): "inputs",
executor.submit(check_policy, solution): "policy",
executor.submit(check_risk, solution): "risk"
}
# Collect results
for future in concurrent.futures.as_completed(futures):
check_name = futures[future]
if not future.result():
return False, f"{check_name} failed"
return True, "All validations passed"
Pattern 3: Progressive Validation
Principle: Validate incrementally as solution develops
def develop_solution(task):
# Research phase
research = research_agent.execute(task)
validate_research(research) # Gate 1
# Plan phase
plan = plan_agent.execute(research)
validate_plan(plan) # Gate 2
# Implementation phase
action = implement_agent.execute(plan)
validate_action(action) # Gate 3
# All phases validated before execution
execute(action)
Pattern 4: Idempotent Validation
Principle: Validation should be repeatable and deterministic
def validate_solution(solution):
# Same solution + same validation = same result
# No side effects, no state changes
# ✅ Good: Pure function
def check_inputs(solution):
return validate(solution).is_valid()
# ❌ Bad: Side effects
def check_inputs_bad(solution):
with open("validation.log", "a") as f: # Side effect!
f.write("Validating...")
return validate(solution).is_valid()
Next Steps
- Define validation rules for your agent's domain
- Implement pre-action checks in your workflows
- Measure validation effectiveness (catch rate, false positives)
- Optimize slow validations (parallel execution, caching)
- Document validation requirements for each workflow
Further Reading
- DevOps + SRE Pillar: ../docs/principles/four-pillars.md#pillar-1-devops--sre
- Law 4 (Validate Before Execute): ../docs/principles/nine-laws.md
Remember: Validation gates aren't overhead—they're insurance. The cost of prevention is always lower than the cost of recovery. Always validate before execute.