Skip to content
all factors
IVfoundation

Continuous Validation

Validate at every step. Catch errors in seconds.

10 min read

Factor IV: Continuous Validation

Formal checkpoints before agents apply changes; validate at every step

AspectDetails
Primary PillarDevOps + SRE
Supporting PillarLearning Science
Enforces LawsLaw 3 (Document Context), Law 4 (Validate Before Execute)
Derived FromValidation gates + Pre-flight checks + Shift-left verification

Summary

Before any agent applies changes—to data, systems, or decisions—validation gates must confirm correctness. These gates include automated checks, review steps, and human approvals. Prevention is cheaper than recovery.

The Problem

Agents that execute without validation:

  • Break production systems silently
  • Create cascading failures
  • Waste time fixing self-created problems
  • Erode trust in automation
  • Have no safety net

Familiar pattern:

Agent generates action → Executes directly → Applies changes
                                            ↓
                            System breaks → Hours of recovery

Traditional approach: "Let's fix it if it breaks"

12-Factor AgentOps approach: Validate before every execution, every time


Why This Factor Exists

Grounding in the Five Pillars

Primary: DevOps + SRE

The validation gates pattern comes directly from DevOps shift-left testing and SRE pre-flight checks. DevOps teaches us that "the earlier you catch a problem, the cheaper it is to fix"—finding errors before execution costs seconds, finding them in production costs hours. SRE practices require pre-flight checklists before any change, just as pilots complete checklists even after 10,000 hours of flight time. The cost ratio is brutal: 1 (validation check) : 60 (post-execution fix) : 3600 (production failure).

Validation gates prevent catastrophic failures by enforcing quality at boundaries. Just as automated pipelines have gates between stages, agent workflows need gates between decision and execution. Prevention is always cheaper than recovery.

Supporting: Learning Science

Learning Science provides the progressive complexity model: students need checkpoints to consolidate understanding before advancing. For AI agents, validation gates serve as these checkpoints—research validates before planning, planning validates before implementation. This prevents compounding errors where a flawed foundation causes cascading failures in subsequent phases.


What This Factor Enforces

Law 3: Document Context for Future Agents

Validation gates enforce documentation by requiring structured entries. Pre-action hooks reject actions without "Context:" and "Learning:" sections, ensuring every change is documented for future agents. This automated enforcement turns Law 3 from a guideline into a requirement—validation fails if context is missing.

Concrete example: Pre-action hook checks for required sections, blocking execution until documentation is complete. Result: 100% compliance with Law 3 (from 70% voluntary compliance).

Law 4: Validate Before Execute

This factor IS Law 4 in action. Validation gates operationalize the principle that agents must never execute changes without validation. Multiple layers enforce this:

  • Pre-action checks: Block actions with invalid parameters
  • Business rule validation: Verify logic before execution
  • Human gates: Require approval before high-risk changes
  • Runtime checks: Verify preconditions before executing changes

The Principle

Validation as a Gate, Not an Option

Wrong approach:

1. Generate solution
2. Apply solution
3. Hope it works

Right approach:

1. Generate solution
2. ✅ Validate inputs (automated)
3. ✅ Validate logic (automated rules)
4. ✅ Review decision (human or AI)
5. ✅ Check side effects (dependency analysis)
6. Apply solution (only if all gates pass)

Types of Validation Gates

Level 1: Input Validation (100% automated)

  • Parameter format validation
  • Required field checks
  • Data type verification
  • Constraint enforcement

Level 2: Logic Validation (100% automated)

  • Business rule checks
  • Policy compliance
  • Threshold verification
  • Constraint validation

Level 3: Semantic Validation (automated + human)

  • Decision review (does this make sense?)
  • Impact analysis
  • Risk assessment
  • Quality checks

Level 4: Human Gates (selective)

  • High-value decision approval
  • Sensitive action authorization
  • Policy exception review
  • Final confirmation for critical changes

Why This Works

1. Shift-Left Verification

Traditional DevOps wisdom:

"The earlier you catch a problem, the cheaper it is to fix"

For AI agents:

  • Catching input errors before action: 5 seconds
  • Catching logic errors before completion: 5 minutes
  • Catching production errors after deployment: 5 hours

Cost ratio: 1:60:3600

2. Pre-Flight Checks from Aviation

Aviation principle:

"Every pilot completes a pre-flight checklist, even after 10,000 hours"

For AI agents:

  • Experienced agents still make mistakes
  • Environmental conditions change
  • New edge cases emerge constantly
  • Checklists (validation) prevent disasters

3. Fail-Fast from Software Engineering

Fail-fast principle:

"Detect and report errors immediately, don't let them propagate"

For AI agents:

Bad: Generate → Execute → Complete → System breaks
Good: Generate → Validate (FAIL) → Fix → Validate (PASS) → Execute

Result: Failures caught before execution, not after

4. Trust Through Verification

Problem: "How do I trust AI-generated decisions?"

Solution: Validation gates provide objective proof

  • Checks pass → Action is safe
  • Checks fail → Action needs work
  • Trust based on verification, not hope

Implementation

Pre-Action Validation

Validation before every action:

def validate_action(action):
    # Gate 1: Input validation
    if not validate_inputs(action.inputs):
        return ValidationError("Invalid inputs")

    # Gate 2: Business rules
    if not check_business_rules(action):
        return ValidationError("Business rule violation")

    # Gate 3: Risk assessment
    if action.risk_level > THRESHOLD:
        if not get_human_approval(action):
            return ValidationError("High-risk action requires approval")

    return ValidationSuccess()

Domain-Specific Validation Examples

Customer Service Agent:

def validate_refund(refund_action):
    # Check refund amount within policy
    if refund_action.amount > customer.max_refund_limit:
        return Error("Refund exceeds customer limit")

    # Check for duplicate refunds
    if has_recent_refund(customer, days=30):
        return Warning("Recent refund detected - verify")

    # Check authorization level
    if refund_action.amount > agent.authorization_limit:
        return RequireEscalation("Amount exceeds agent limit")

    return Success()

Research Agent:

def validate_publication(research_action):
    # Check source credibility
    if not verify_sources(research_action.sources):
        return Error("Unverified sources detected")

    # Check for bias indicators
    if detect_bias(research_action.content):
        return Warning("Potential bias - review required")

    # Check stakeholder approval
    if research_action.is_external:
        return RequireApproval("External publication requires approval")

    return Success()

Sales Agent:

def validate_discount(discount_action):
    # Check discount within policy
    if discount_action.percentage > MAX_DISCOUNT:
        return Error("Discount exceeds maximum allowed")

    # Check customer eligibility
    if not customer_eligible(discount_action.customer):
        return Error("Customer not eligible for this discount")

    # Check margin impact
    if calculate_margin(discount_action) < MIN_MARGIN:
        return RequireApproval("Low margin - manager approval needed")

    return Success()

Agent-Level Validation Pattern

In agent workflow:

class AgentWorkflow:
    def execute(self, task):
        # 1. Generate solution
        solution = self.generate_solution(task)

        # 2. Validation gates
        if not self.validate_inputs(solution):
            raise ValidationError("Inputs invalid")

        if not self.check_business_rules(solution):
            raise ValidationError("Business rule violation")

        if not self.verify_constraints(solution):
            raise ValidationError("Constraint violation")

        # 3. Human gate (if needed)
        if task.requires_approval:
            if not self.request_approval(solution):
                raise ApprovalDenied("Human rejected action")

        # 4. All gates passed - safe to apply
        self.apply_solution(solution)

        return solution

Validation Hierarchy

Fast → Slow → Expensive

1. Input validation       (1 second,  100% automated)
2. Business rule check    (10 seconds, 100% automated)
3. Policy compliance      (1 minute,  100% automated)
4. Risk assessment        (2 minutes, 100% automated)
5. Impact analysis        (5 minutes, 100% automated)
6. Human review          (variable,  selective)

Principle: Fail fast at cheap gates, escalate to expensive gates only when needed


Validation

✅ You're doing this right if:

  • Every action passes automated validation
  • Pre-action checks prevent bad executions
  • Validation catches issues before they propagate
  • Humans review only high-risk changes
  • Validation failures are rare (good solutions)

❌ You're doing this wrong if:

  • Executing without running checks
  • Disabling validation to "move faster"
  • Skipping validation for urgent requests
  • Manual validation only (no automation)
  • Validation happens after execution

Real-World Evidence

Production Validation (200+ Sessions)

Before validation gates:

Average failed actions: 15% (3 out of 20)
Time to fix: 30 minutes per failure
Total cost: 450 minutes wasted
Success rate: 85%

After validation gates:

Average failed actions: 0.5% (1 out of 200)
Time to fix: 5 minutes per failure
Total cost: 5 minutes wasted
Success rate: 99.5%

Improvement: 90x reduction in failed actions, 6x reduction in time wasted

Specific Example: Customer Refund Processing

Before validation:

Process 50 refund requests
Manual review catches 8 invalid amounts
Manual review catches 3 policy violations
Time: 2 hours review + 1 hour fixes = 3 hours

After validation:

Process 50 refund requests
Pre-action check catches 8 invalid amounts (5 seconds)
Policy check catches 3 violations (10 seconds)
All refunds valid on first attempt
Time: 15 seconds

Improvement: 720x faster validation


Anti-Patterns

❌ The "Trust Me" Trap

Wrong: "I checked it manually, it's fine" Right: Automated validation every time, no exceptions

❌ The "Move Fast, Break Things" Trap

Wrong: Skip validation to execute faster Right: Validation makes you faster (less fixing broken things)

❌ The "Checks Are Slow" Trap

Wrong: Disable checks because they take time Right: Optimize checks, but never skip them

❌ The "Production Testing" Trap

Wrong: "We'll catch it in production" Right: Catch it before execution (validation gates)


Relationship to Other Factors

  • Factor I: Automated Tracking: Hooks enforce validation before actions
  • Factor II: Context Loading: Validation in isolated agent context prevents pollution
  • Factor III: Focused Agents: Smaller agents → simpler validation
  • Factor V: Measure Everything: Monitors validation gate success rates
  • Factor XI: Fail-Safe Checks: Validation enforces governance

Validation Gate Design Patterns

Pattern 1: Fast Fail Gates

Principle: Run fast, cheap validations first

def validate_solution(solution):
    # 1 second - input validation
    if not check_inputs(solution):
        return False  # Fail immediately

    # 10 seconds - business rules
    if not check_business_rules(solution):
        return False  # Fail immediately

    # 1 minute - policy compliance
    if not check_policy(solution):
        return False  # Fail immediately

    # Only run expensive validations if cheap ones pass
    return True

Pattern 2: Parallel Validation

Principle: Run independent validations concurrently

import concurrent.futures

def validate_solution(solution):
    with concurrent.futures.ThreadPoolExecutor() as executor:
        # Run validations in parallel
        futures = {
            executor.submit(check_inputs, solution): "inputs",
            executor.submit(check_policy, solution): "policy",
            executor.submit(check_risk, solution): "risk"
        }

        # Collect results
        for future in concurrent.futures.as_completed(futures):
            check_name = futures[future]
            if not future.result():
                return False, f"{check_name} failed"

    return True, "All validations passed"

Pattern 3: Progressive Validation

Principle: Validate incrementally as solution develops

def develop_solution(task):
    # Research phase
    research = research_agent.execute(task)
    validate_research(research)  # Gate 1

    # Plan phase
    plan = plan_agent.execute(research)
    validate_plan(plan)  # Gate 2

    # Implementation phase
    action = implement_agent.execute(plan)
    validate_action(action)  # Gate 3

    # All phases validated before execution
    execute(action)

Pattern 4: Idempotent Validation

Principle: Validation should be repeatable and deterministic

def validate_solution(solution):
    # Same solution + same validation = same result
    # No side effects, no state changes

    # ✅ Good: Pure function
    def check_inputs(solution):
        return validate(solution).is_valid()

    # ❌ Bad: Side effects
    def check_inputs_bad(solution):
        with open("validation.log", "a") as f:  # Side effect!
            f.write("Validating...")
        return validate(solution).is_valid()

Next Steps

  1. Define validation rules for your agent's domain
  2. Implement pre-action checks in your workflows
  3. Measure validation effectiveness (catch rate, false positives)
  4. Optimize slow validations (parallel execution, caching)
  5. Document validation requirements for each workflow

Further Reading


Remember: Validation gates aren't overhead—they're insurance. The cost of prevention is always lower than the cost of recovery. Always validate before execute.