all factors
VIIIoperations

Human Validation

Humans in the loop for critical decisions.

5 min read

VIII. Human Validation

VIII
Humans in the loop

AI agents do the heavy lifting. Humans provide strategic oversight.


The Problem

Without Human Gates

  • $10M feature built that wasn't needed
  • Agents make expensive mistakes autonomously
  • Implement the wrong solution correctly
  • No opportunity for course correction
  • Trust erodes through unexpected changes

With Human Gates

  • Human gates catch errors when they're cheap to fix
  • Review plans before implementation, not after
  • 100% reduction in broken deployments
  • Partnership between AI and humans
  • 10 minutes of review saves hours of rework

The Solution

Full Autonomy

Week 1-6: Agent researches + plans + implements Week 7: "We changed our mind"

Result: 6 weeks wasted, $10M equivalent

No checkpoints. No course correction. Expensive failures.

Strategic Gates

Week 1: Research -> [GATE] Review findings Week 2: Plan -> [GATE] Approve plan Week 3: Implement -> [GATE] Verify

Result: Catch errors at each boundary

Strategic oversight. Early detection. Safe execution.


The Four Gates

Strategic checkpoints between major phases:

Gate 1: Research

Did we research the right things?

Gate 2: Plan

Is this the right solution?

Gate 3: Implementation

Does implementation match plan?

Gate 4: Deployment

Safe to deploy to production?


Why Gates Work

::: info Progressive Complexity Workflow: Research -> [CHECKPOINT] Review Plan -> [CHECKPOINT] Approve Implement -> [CHECKPOINT] Verify

Catch errors at boundaries when they're cheap Not after compounding into expensive failures

Cost escalation:

  • Gate 1: 10 minutes to redirect
  • Gate 2: 1 hour to revise plan
  • Gate 3: 4 hours to fix implementation
  • No gate: Days to recover from production disaster :::

Real Impact Data

::: code-group

25 production deployments
Broken: 4 (16%)
Average fix time: 2 hours
Total time wasted: 8 hours
25 production deployments
Rejected at gate: 3 (12%)
Broken: 0 (0%)
Gate review time: 5 minutes each
Total time invested: 125 minutes

Result: 100% reduction in broken deployments
Net savings: 6 hours
WITHOUT GATE:
Week 1-6: Agent builds entire feature
Week 7: "We don't need this"
Result: 6 weeks wasted

WITH GATE:
Week 1: Research -> [GATE]
Product owner: "Too complex, simplify"
Result: 30 minutes saved 6 weeks of work

:::


Gate Design Principles

1. Strategic, Not Constant

WrongRight
Human approves every line of codeHuman approves high-level plan
Constant interruptionsStrategic checkpoints

2. Context-Rich Approval

Poor Request:
"Approve this change? (yes/no)"

Rich Request:
# Plan Approval

## Summary
Migrate authentication to JWT with refresh tokens

## Research Findings
- Current sessions expire too quickly
- Users experience auth interruptions

## Proposed Approach
Implement JWT refresh token rotation

## Alternatives Considered
- Alternative A: Longer session timeouts (rejected: security risk)
- Alternative B: OAuth delegation (rejected: too complex)

## Impact
- Files changed: 5
- Estimated time: 6 hours
- Risk: Medium (production auth system)

## Rollback Plan
Feature flag for instant rollback

Approve? [ ] Yes [ ] No [ ] Revise

Implementation Patterns

Synchronous Gate

class HumanGate:
    def request_approval(self, context, timeout=3600):
        # Present context to human
        print(f"===== APPROVAL REQUIRED =====")
        print(f"Phase: {context.phase}")
        print(f"Proposal: {context.proposal}")
        print(f"Impact: {context.impact}")

        # Wait for approval (with timeout)
        response = wait_for_input(timeout)

        if response == "approved":
            return True
        elif response == "rejected":
            return False
        else:
            # Timeout - default to safe (block)
            return False

Async Gate (Non-Blocking)

class AsyncGate:
    def request_approval(self, context):
        # Create approval request
        approval_id = create_request({
            'phase': context.phase,
            'proposal': context.proposal,
            'created_at': datetime.now()
        })

        # Notify human (email, Slack, etc.)
        notify_human(approval_id)

        # Agent saves state and exits
        save_checkpoint(context.phase, approval_id)

        print(f"Approval request {approval_id} created")
        print(f"Resume with: agent resume --approval-id {approval_id}")

Validation

You're doing this right if:

  • Humans review plans before implementation
  • Gates are strategic (between phases), not constant
  • Context is rich (why, alternatives, impact)
  • Async gates don't block agent indefinitely
  • Approval rate 80-90% (agent proposals generally good)

You're doing this wrong if:

  • No human gates (fully autonomous for critical work)
  • Too many gates (human bottleneck)
  • Poor context (human can't make informed decision)
  • Synchronous gates block for hours/days
  • Approval rate ‹50% (agent proposals are bad)

Gate Decision Matrix

When to require human approval:

FactorLowMediumHigh
ImpactFew filesMultiple filesSystem-wide
RiskReversibleProductionCritical
CostMinutesHoursDays
NoveltyRoutineUncommonFirst time
Gate?OptionalRecommendedRequired

Examples:

Low Impact

"Fix typo in README"

Gate: Optional (auto-proceed)

Medium Impact

"Refactor auth module"

Gate: Recommended

High Impact

"Migrate database schema"

Gate: Required


FactorRelationship
II. Context LoadingGates enable context delegation
III. Focused AgentsEach gate reviews one phase's output
IV. Continuous ValidationHuman gates supplement automated validation
VI. Resume WorkGates natural boundaries for session bundles
VII. Smart RoutingRoute high-risk tasks to human-gated workflows