Skip to content
all factors
VIIoperations

Smart Routing

Right task, right agent, every time.

21 min read

Factor VII: Smart Routing

Route work to best-fit workflows/agents with measured accuracy

AspectDetails
Primary PillarLearning Science
Supporting PillarDevOps + SRE
Enforces LawsLaw 1 (Extract Learnings), Law 2 (Improve System)
Derived FromPattern recognition + Load balancing + Service mesh routing

Summary

Not all tasks are the same. Intelligent routing analyzes incoming requests and directs them to the optimal workflow, agent, or skill based on task characteristics, historical patterns, and resource constraints. Measured routing accuracy enables continuous improvement.

The Problem

Without intelligent routing:

  • Users guess which workflow to use
  • Wrong workflow selection wastes time
  • Simple tasks routed to complex workflows (overkill)
  • Complex tasks routed to simple workflows (failure)
  • No learning from routing mistakes

Familiar pattern:

User: "Create a Kubernetes app"
System: "Use /create-app or /complex-workflow or /research-first?"
User: *Guesses wrong*
Result: 30 minutes wasted on wrong approach

Traditional approach: Manual workflow selection, tribal knowledge

12-Factor AgentOps approach: Intelligent router learns patterns, suggests best workflow


Why This Factor Exists

Grounding in the Five Pillars

Primary: Learning Science

Intelligent routing applies pattern recognition from Learning Science: experts recognize patterns instantly while novices analyze step-by-step. A routing system learns which task characteristics predict which workflow's success, building expertise over time. The learning curve shows this: Week 1 (75% accuracy, cold start), Month 3 (91% accuracy, pattern recognition established).

Human learning research shows that categorization skills develop through repeated exposure and feedback. For routing, each task is a training example: task characteristics + chosen workflow + outcome. Success reinforces the pattern, failure updates the model. After 110 tasks, the router achieves 90.9% accuracy—expert-level pattern recognition.

Supporting: DevOps + SRE

DevOps provides the service mesh routing pattern: route requests based on content, not just destination. Istio and Kubernetes route traffic intelligently based on headers, load, health. For agents, routing analyzes task content (keywords, complexity, risk) to select optimal workflow. Cost optimization follows: simple tasks to quick workflows (30 seconds, $0.01), complex tasks to research workflows (3 hours, $2.00)—60x faster, 200x cheaper when routed correctly.


What This Factor Enforces

Law 1: Extract Learnings

Routing enforces learning extraction through feedback loops. Every routing decision generates data: task → workflow → outcome. Analysis reveals patterns: "Create Kubernetes app" succeeds 93% with applications-create-app, only 40% with generic workflow. This pattern extraction feeds the routing model, improving future decisions.

Concrete example: Month 1 data shows simple tasks routed to complex workflows waste time. Learning extracted: "Simple tasks (no research needed) should route to quick-edit workflow." Router updated with this pattern. Month 2 shows 97% accuracy for simple tasks. The learning compounded into improved routing.

Law 2: Improve System

Routing continuously improves the system by optimizing workflow selection. Poor routes waste time (wrong workflow) and money (expensive workflow for cheap task). Intelligent routing learns from failures, improving accuracy from 75% (Month 1) to 91% (Month 3). Each percentage point improvement reduces wasted effort across all future tasks.

Concrete example: Routing accuracy dashboard shows 10 misrouted tasks per week costing 30 minutes each (5 hours wasted). Routing improvements reduce misroutes to 1 per week (30 minutes wasted). System improvement: 4.5 hours/week saved = 234 hours/year. The router's learning directly improves operational efficiency.


The Principle

Task Classification

Dimensions for routing decisions:

  1. Complexity:

    • Simple: Known pattern, clear solution
    • Medium: Some research needed
    • Complex: Research → Plan → Implement required
  2. Novelty:

    • Familiar: Similar task completed before
    • New: First time seeing this type
    • Novel: Requires innovation
  3. Risk:

    • Low: Reversible, non-production
    • Medium: Production, but tested
    • High: Production, untested, critical
  4. Scope:

    • Single file modification
    • Multiple files, one service
    • Multiple services, system-wide

Routing Decision Tree

Task arrives
├─ Is this a known pattern?
│  ├─ Yes → Route to specialized workflow
│  └─ No → Analyze characteristics
│     ├─ Simple + Low Risk → Quick workflow
│     ├─ Complex + High Risk → Research → Plan → Implement
│     └─ Unknown → Interactive router (ask user)
│
└─ Historical accuracy for this route?
   ├─ ›90% → Auto-route
   ├─ 70-90% → Suggest with confidence score
   └─ ‹70% → Present options, let user choose

The Router Pattern

Input: Task description Process: Analyze → Classify → Match → Route Output: Best-fit workflow + confidence score

class IntelligentRouter:
    def route(self, task_description):
        # 1. Extract features
        features = self.extract_features(task_description)

        # 2. Classify
        complexity = self.classify_complexity(features)
        risk = self.classify_risk(features)
        scope = self.classify_scope(features)

        # 3. Match to workflows
        candidates = self.find_matching_workflows(complexity, risk, scope)

        # 4. Rank by historical accuracy
        ranked = self.rank_by_accuracy(candidates, features)

        # 5. Return best match with confidence
        best_match = ranked[0]
        confidence = best_match.historical_accuracy

        if confidence > 0.90:
            return best_match, "auto"
        elif confidence > 0.70:
            return best_match, "suggested"
        else:
            return ranked[:3], "choose"

Why This Works

1. Pattern Recognition (Learning Science)

Human learning principle:

"Experts recognize patterns instantly; novices analyze step-by-step"

For AI routing:

Novice: Reads full task → Analyzes → Uncertain decision
Expert: Sees "Kubernetes StatefulSet" → Instantly routes to k8s-stateful workflow

Result: Pattern recognition enables instant, accurate routing

2. Service Mesh Routing (DevOps/SRE)

Kubernetes/Istio pattern:

"Route requests based on content, not just destination"

For AI agents:

Traditional: All tasks → Same agent
Intelligent: Tasks analyzed → Routed to specialized agents

Example:
- "Fix typo" → quick-edit agent (30 seconds)
- "Refactor architecture" → research-plan-implement (3 hours)

3. Feedback Loops Enable Accuracy

Continuous improvement:

Route task → Execute → Measure success → Update routing model

Example:
Week 1: "Create K8s app" → Simple workflow → Failed → Update model
Week 2: "Create K8s app" → Complex workflow → Success → Reinforce
Week 10: "Create K8s app" → Complex workflow (95% accuracy)

4. Cost Optimization

Problem: Using expensive workflows for cheap tasks

Solution: Route by cost/benefit

Task: Fix typo in README
- Complex workflow: 30 minutes, $2.00 in compute
- Quick edit: 30 seconds, $0.01 in compute

Savings: 60x faster, 200x cheaper

Implementation

Feature Extraction

Keywords matter:

class FeatureExtractor:
    def extract_features(self, task_description):
        features = {
            'keywords': self.extract_keywords(task_description),
            'complexity_signals': [],
            'risk_signals': [],
            'scope_signals': []
        }

        # Complexity signals
        if any(kw in task_description.lower() for kw in ['research', 'investigate', 'explore']):
            features['complexity_signals'].append('research_needed')

        if any(kw in task_description.lower() for kw in ['architecture', 'design', 'plan']):
            features['complexity_signals'].append('planning_needed')

        # Risk signals
        if any(kw in task_description.lower() for kw in ['production', 'prod', 'live']):
            features['risk_signals'].append('production')

        if any(kw in task_description.lower() for kw in ['critical', 'urgent', 'emergency']):
            features['risk_signals'].append('high_priority')

        # Scope signals
        if any(kw in task_description.lower() for kw in ['system-wide', 'all', 'entire']):
            features['scope_signals'].append('broad_scope')

        return features

Workflow Matching

Rule-based routing:

class WorkflowMatcher:
    def match(self, features):
        workflows = []

        # Simple, focused tasks
        if not features['complexity_signals'] and not features['risk_signals']:
            workflows.append({
                'workflow': 'quick-edit',
                'reason': 'Simple task, low risk'
            })

        # Research-heavy tasks
        if 'research_needed' in features['complexity_signals']:
            workflows.append({
                'workflow': 'research-plan-implement',
                'reason': 'Research required before implementation'
            })

        # High-risk production tasks
        if 'production' in features['risk_signals']:
            workflows.append({
                'workflow': 'validated-deployment',
                'reason': 'Production environment requires validation gates'
            })

        return workflows

Machine Learning Router (Advanced)

Train on historical data:

from sklearn.ensemble import RandomForestClassifier

class MLRouter:
    def __init__(self):
        self.model = RandomForestClassifier()
        self.trained = False

    def train(self, historical_tasks):
        # Extract features from past tasks
        X = [self.vectorize(task.description) for task in historical_tasks]

        # Labels: which workflow was successful
        y = [task.successful_workflow for task in historical_tasks]

        # Train model
        self.model.fit(X, y)
        self.trained = True

    def predict(self, task_description):
        if not self.trained:
            raise ValueError("Router not trained yet")

        # Vectorize new task
        features = self.vectorize(task_description)

        # Predict best workflow
        workflow = self.model.predict([features])[0]

        # Get confidence scores
        probabilities = self.model.predict_proba([features])[0]
        confidence = max(probabilities)

        return {
            'workflow': workflow,
            'confidence': confidence,
            'alternatives': self.get_top_n_workflows(probabilities, n=3)
        }

Interactive Routing

When confidence is low, ask:

class InteractiveRouter:
    def route_with_interaction(self, task_description):
        # Try automatic routing first
        result = self.router.predict(task_description)

        if result['confidence'] > 0.90:
            # High confidence, auto-route
            return result['workflow']

        elif result['confidence'] > 0.70:
            # Medium confidence, suggest with option to override
            print(f"Suggested workflow: {result['workflow']} (confidence: {result['confidence']:.0%})")
            print(f"Alternatives: {result['alternatives']}")

            user_choice = input("Accept suggestion? (y/n): ")
            if user_choice.lower() == 'y':
                return result['workflow']
            else:
                return self.present_choices(result['alternatives'])

        else:
            # Low confidence, present choices
            print("I'm not sure which workflow is best. Here are the options:")
            return self.present_choices(result['alternatives'])

Validation

✅ You're doing this right if:

  • Routing accuracy measured (target: ›90%)
  • Users rarely override router suggestions
  • Simple tasks route to simple workflows
  • Complex tasks route to research-first workflows
  • Routing decisions improve over time

❌ You're doing this wrong if:

  • No measurement of routing accuracy
  • Users constantly override suggestions
  • All tasks route to the same workflow
  • No learning from routing failures
  • Manual workflow selection required

Real-World Evidence

Production Routing Accuracy (110 Validation Cases)

Measured routing decisions:

Total tasks routed: 110
Correct routes: 100
Incorrect routes: 10
Accuracy: 90.9%

Breakdown by task type:

Simple tasks (35):     97% accuracy (34/35)
Medium tasks (50):     90% accuracy (45/50)
Complex tasks (25):    84% accuracy (21/25)

Learning curve:

Month 1: 75% accuracy (cold start)
Month 2: 85% accuracy (learning patterns)
Month 3: 91% accuracy (stable)

Specific Examples

Example 1: Kubernetes Application Creation

Task: "Create a new Kubernetes application for Redis caching"

Router analysis:

  • Keywords: "create", "Kubernetes", "application"
  • Complexity: Medium (known pattern)
  • Risk: Low (new app, not modifying existing)
  • Historical: 15 similar tasks, 93% success with applications-create-app

Route decision: applications-create-app workflow (confidence: 93%) Outcome: Success ✅ Time: 10 minutes (vs. 45 minutes if routed to research-first)

Example 2: Architecture Redesign

Task: "Investigate migrating from monolith to microservices"

Router analysis:

  • Keywords: "investigate", "migrating", "architecture"
  • Complexity: High (research needed)
  • Risk: High (system-wide change)
  • Historical: 3 similar tasks, 100% success with research-plan-implement

Route decision: research-plan-implement workflow (confidence: 100%) Outcome: Success ✅ Time: 3 hours (appropriate for complexity)

Example 3: Quick Typo Fix

Task: "Fix typo in README.md"

Router analysis:

  • Keywords: "fix typo"
  • Complexity: Low (simple edit)
  • Risk: Low (documentation)
  • Historical: 50+ similar tasks, 100% success with quick-edit

Route decision: quick-edit workflow (confidence: 100%) Outcome: Success ✅ Time: 30 seconds (vs. 10 minutes if routed to full workflow)


Anti-Patterns

❌ The "One Size Fits All" Trap

Wrong: Route all tasks to the same workflow Right: Match task characteristics to workflow capabilities

❌ The "Perfect Routing" Trap

Wrong: Spend hours building ML model for 10 tasks Right: Start with rule-based routing, upgrade to ML when data exists

❌ The "Ignore Feedback" Trap

Wrong: Never measure routing accuracy Right: Track successes/failures, continuously improve

❌ The "Black Box" Trap

Wrong: Route without explaining why Right: Show reasoning (keywords detected, historical accuracy, confidence)


Relationship to Other Factors

  • Factor III: Focused Agents: Router selects which single-responsibility agent
  • Factor IV: Continuous Validation: Routing accuracy is a validation metric
  • Factor VI: Measure Everything: Measure routing decisions and outcomes
  • Factor IX: Mine Patterns: Routing patterns extracted from successful routes
  • Factor X: Small Iterations: Routing accuracy drives improvement backlog

Routing Patterns

Pattern 1: Cascading Router

Primary router (high-level)
├─ "Simple task" → Quick workflow router
│  ├─ "Edit file" → quick-edit
│  └─ "Format code" → code-formatter
│
├─ "Complex task" → Research workflow router
│  ├─ "New architecture" → research-plan-implement
│  └─ "Migration" → analysis-migration-validation
│
└─ "Uncertain" → Interactive router
   └─ Present options to user

Pattern 2: Context-Aware Routing

class ContextAwareRouter:
    def route(self, task, context):
        # Consider current context
        if context.current_phase == "research":
            # Already in research, continue with research workflows
            return self.research_workflows

        elif context.recent_failures > 3:
            # Many failures, route to safer, validated workflows
            return self.conservative_workflows

        elif context.time_remaining < 30 * 60:  # 30 minutes
            # Limited time, route to quick workflows
            return self.quick_workflows

        # Default: Standard routing
        return self.standard_route(task)

Pattern 3: Confidence Calibration

class CalibratedRouter:
    def calibrate_confidence(self, prediction, historical_data):
        # Adjust confidence based on calibration
        raw_confidence = prediction.confidence

        # Calibration curve from historical data
        calibrated = self.calibration_curve(raw_confidence, historical_data)

        return {
            'workflow': prediction.workflow,
            'raw_confidence': raw_confidence,
            'calibrated_confidence': calibrated,
            'recommendation': 'auto' if calibrated > 0.90 else 'suggest'
        }

Metrics to Track

Routing Accuracy:

accuracy = correct_routes / total_routes
Target: ›90%

User Override Rate:

override_rate = user_overrides / suggestions
Target: ‹10% (users trust suggestions)

Time Savings:

time_saved = (manual_selection_time - auto_routing_time) * num_routes
Example: (60s - 0s) * 100 routes = 100 minutes saved

Cost Optimization:

cost_saved = sum(wrong_workflow_cost - optimal_workflow_cost)
Example: 10 tasks routed to cheap workflows instead of expensive = $50 saved

Next Steps

  1. Implement rule-based router for common task types
  2. Measure baseline accuracy on historical tasks
  3. Collect routing data for ML training (when volume justifies)
  4. Create confidence calibration based on historical accuracy
  5. Build interactive fallback for uncertain cases

Further Reading


Implementation Patterns

These patterns emerge from production deployments in Houston (local-first), Fractal (Kubernetes-native), and ai-platform (IC-hardened). They extend the conceptual routing principles with battle-tested infrastructure patterns.

Pattern 1: Composable Not Chainable (Blackboard Architecture)

The Problem: Traditional orchestration chains agents linearly (A → B → C), creating tight coupling and brittle pipelines.

The Solution: Blackboard-mediated coordination where agents read/write to shared state, infrastructure handles routing.

┌──────────────────────────────────────────────────────────────────────────┐
│                         BLACKBOARD ARCHITECTURE                          │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│                    ┌─────────────────────────────┐                       │
│                    │        BLACKBOARD           │                       │
│                    │   (Shared State Store)      │                       │
│                    │                             │                       │
│                    │  ┌─────────┐ ┌──────────┐  │                       │
│                    │  │Decisions│ │Directives│  │                       │
│                    │  │(append) │ │ (upsert) │  │                       │
│                    │  └─────────┘ └──────────┘  │                       │
│                    └─────────────────────────────┘                       │
│                           ▲           │                                  │
│              ┌────────────┘           └────────────┐                     │
│              │ write                        read   │                     │
│              │                                     ▼                     │
│      ┌───────┴───────┐                    ┌───────────────┐             │
│      │   Agent A     │                    │   Agent B     │             │
│      │ (Research)    │                    │  (Planning)   │             │
│      └───────────────┘                    └───────────────┘             │
│              ▲                                     │                     │
│              │                                     │ write               │
│              │ read                                ▼                     │
│      ┌───────────────┐                    ┌───────────────┐             │
│      │   Agent D     │◄───── read ────────│   Agent C     │             │
│      │  (Review)     │                    │(Implementation)│             │
│      └───────────────┘                    └───────────────┘             │
│                                                                          │
│  KEY INSIGHT: Agents don't know about each other, only the blackboard   │
└──────────────────────────────────────────────────────────────────────────┘

Blackboard Data Model (from Fractal):

# Two types of blackboard entries
apiVersion: fractal.ai/v1alpha1
kind: BlackboardEntry

# Type 1: Decisions (append-only, audit trail)
metadata:
  name: research-findings-001
spec:
  type: decision
  phase: research
  content:
    findings:
      - 'API supports pagination via cursor'
      - 'Rate limit is 100 req/min'
    confidence: 0.92
  author: research-agent
  timestamp: '2025-01-15T10:30:00Z'

---
# Type 2: Directives (upsertable, current state)
metadata:
  name: current-approach
spec:
  type: directive
  content:
    approach: 'cursor-based-pagination'
    priority: 'latency'
    constraints:
      - 'no breaking changes'
  lastUpdated: '2025-01-15T10:35:00Z'

Why Composable Beats Chainable:

AspectChainable (A→B→C)Composable (Blackboard)
CouplingTight (each agent knows next)Loose (agents know only blackboard)
FailureChain breaksOther agents continue
ScalingAdd more chainsAdd more agents to same blackboard
DebuggingTrace through chainRead blackboard state
RecoveryRestart chainResume from blackboard

Pattern 2: SharedInformer Caching (Kubernetes-Native)

The Problem: Agents polling for routing decisions create N×M API calls (N agents × M decisions).

The Solution: SharedInformer pattern from Kubernetes—local read cache with watch for updates.

┌─────────────────────────────────────────────────────────────────────────┐
│                    SHAREDINFORMER CACHING                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│     ┌─────────────────────────────────────────────────────────────┐    │
│     │                     API SERVER                               │    │
│     │              (Source of Truth)                               │    │
│     └─────────────────────────────────────────────────────────────┘    │
│                              │                                          │
│                    ┌─────────┴─────────┐                               │
│                    │  Initial List +   │                               │
│                    │   Watch Stream    │                               │
│                    └─────────┬─────────┘                               │
│                              │                                          │
│                              ▼                                          │
│     ┌─────────────────────────────────────────────────────────────┐    │
│     │                  SHARED INFORMER                             │    │
│     │  ┌──────────────────────────────────────────────────────┐   │    │
│     │  │                    LOCAL CACHE                        │   │    │
│     │  │  ┌──────────┐  ┌──────────┐  ┌──────────┐           │   │    │
│     │  │  │ Agent A  │  │ Agent B  │  │ Agent C  │           │   │    │
│     │  │  │  config  │  │  config  │  │  config  │           │   │    │
│     │  │  └──────────┘  └──────────┘  └──────────┘           │   │    │
│     │  └──────────────────────────────────────────────────────┘   │    │
│     │                                                              │    │
│     │  EVENT HANDLERS:                                             │    │
│     │  - OnAdd    → Route new agent                               │    │
│     │  - OnUpdate → Re-route changed agent                        │    │
│     │  - OnDelete → Remove from routing table                     │    │
│     └─────────────────────────────────────────────────────────────┘    │
│                              │                                          │
│              ┌───────────────┼───────────────┐                         │
│              │               │               │                         │
│              ▼               ▼               ▼                         │
│     ┌─────────────┐ ┌─────────────┐ ┌─────────────┐                   │
│     │  Router 1   │ │  Router 2   │ │  Router 3   │                   │
│     │ (instant    │ │ (instant    │ │ (instant    │                   │
│     │  local read)│ │  local read)│ │  local read)│                   │
│     └─────────────┘ └─────────────┘ └─────────────┘                   │
│                                                                         │
│  BENEFIT: O(1) reads from local cache, watch keeps cache fresh         │
└─────────────────────────────────────────────────────────────────────────┘

Implementation (from Fractal):

// SharedInformer for routing decisions
type RoutingInformer struct {
    informer cache.SharedIndexInformer
    lister   v1alpha1.KAgentLister
}

func NewRoutingInformer(client kubernetes.Interface) *RoutingInformer {
    informer := cache.NewSharedIndexInformer(
        &cache.ListWatch{
            ListFunc: func(options metav1.ListOptions) (runtime.Object, error) {
                return client.FractalV1alpha1().KAgents("").List(ctx, options)
            },
            WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
                return client.FractalV1alpha1().KAgents("").Watch(ctx, options)
            },
        },
        &v1alpha1.KAgent{},
        resyncPeriod,
        cache.Indexers{
            "byCapability": indexByCapability,  // Index agents by what they can do
            "byDomain":     indexByDomain,      // Index agents by domain
        },
    )

    return &RoutingInformer{informer: informer}
}

// Route task to best agent - O(1) local cache read
func (r *RoutingInformer) Route(task Task) (*v1alpha1.KAgent, error) {
    // Get agents with matching capability from local cache
    agents, err := r.informer.GetIndexer().ByIndex("byCapability", task.RequiredCapability)
    if err != nil {
        return nil, err
    }

    // Score and select best agent
    return r.selectBestAgent(agents, task)
}

Pattern 3: Classification-Aware Routing (IC Pattern)

The Problem: In IC environments, data cannot cross classification boundaries. Routing must respect security constraints.

The Solution: Classification-aware routing that keeps data within its security boundary.

┌─────────────────────────────────────────────────────────────────────────┐
│                 CLASSIFICATION-AWARE ROUTING                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  TASK ARRIVES WITH CLASSIFICATION LABEL                                 │
│                    │                                                    │
│                    ▼                                                    │
│     ┌─────────────────────────────────┐                                │
│     │     CLASSIFICATION ROUTER       │                                │
│     │   (Checks label, routes to      │                                │
│     │    appropriate namespace)       │                                │
│     └─────────────────────────────────┘                                │
│                    │                                                    │
│        ┌───────────┼───────────┐                                       │
│        │           │           │                                       │
│        ▼           ▼           ▼                                       │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐                               │
│  │UNCLASS   │ │ SECRET   │ │TOP SECRET│                               │
│  │Namespace │ │Namespace │ │Namespace │                               │
│  │          │ │          │ │          │                               │
│  │┌────────┐│ │┌────────┐│ │┌────────┐│                               │
│  ││Agent A ││ ││Agent B ││ ││Agent C ││                               │
│  │└────────┘│ │└────────┘│ │└────────┘│                               │
│  │┌────────┐│ │┌────────┐│ │┌────────┐│                               │
│  ││Agent D ││ ││Agent E ││ ││Agent F ││                               │
│  │└────────┘│ │└────────┘│ │└────────┘│                               │
│  └──────────┘ └──────────┘ └──────────┘                               │
│        │           │           │                                       │
│        │    NetworkPolicy      │                                       │
│        │    BLOCKS cross-      │                                       │
│        │    namespace traffic  │                                       │
│        └───────────┴───────────┘                                       │
│                                                                         │
│  ENFORCEMENT: Kubernetes NetworkPolicies prevent data spillage         │
└─────────────────────────────────────────────────────────────────────────┘

NetworkPolicy Enforcement:

# Block all cross-namespace traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-cross-namespace
  namespace: secret
spec:
  podSelector: {}  # Apply to all pods in namespace
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector: {}  # Only from same namespace
  egress:
    - to:
        - podSelector: {}  # Only to same namespace
    - to:
        - namespaceSelector:
            matchLabels:
              name: kube-system  # Allow DNS
        ports:
          - port: 53
            protocol: UDP

Classification-Aware Router:

class ClassificationRouter:
    """Routes tasks to agents within classification boundary."""

    CLASSIFICATION_HIERARCHY = {
        'unclassified': 0,
        'cui': 1,
        'secret': 2,
        'top_secret': 3
    }

    def route(self, task: Task) -> Agent:
        # Get task classification
        task_level = self.CLASSIFICATION_HIERARCHY[task.classification]

        # Get available agents at or below task classification
        # (can route DOWN but never UP)
        eligible_agents = [
            agent for agent in self.agents
            if self.CLASSIFICATION_HIERARCHY[agent.namespace] <= task_level
        ]

        if not eligible_agents:
            raise SecurityViolation(
                f"No agents available at classification {task.classification}"
            )

        # Route to best agent within boundary
        return self.select_best(eligible_agents, task)

    def validate_response(self, response: Response, task: Task) -> bool:
        """Ensure response doesn't leak higher classification."""
        response_level = self.detect_classification(response.content)
        task_level = self.CLASSIFICATION_HIERARCHY[task.classification]

        if response_level > task_level:
            raise SecurityViolation(
                f"Response contains {response_level} data for {task_level} task"
            )
        return True

Pattern 4: Multi-Tier Model Routing (ai-platform)

The Problem: Different deployment environments have different constraints (GPU, latency, connectivity).

The Solution: Route to appropriate tier based on task requirements and available resources.

┌─────────────────────────────────────────────────────────────────────────┐
│                    MULTI-TIER MODEL ROUTING                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│                         TASK ARRIVES                                    │
│                              │                                          │
│                              ▼                                          │
│                    ┌─────────────────┐                                 │
│                    │  TIER ROUTER    │                                 │
│                    │                 │                                 │
│                    │ Evaluates:      │                                 │
│                    │ - Latency req   │                                 │
│                    │ - Model size    │                                 │
│                    │ - Connectivity  │                                 │
│                    │ - Cost budget   │                                 │
│                    └─────────────────┘                                 │
│                              │                                          │
│           ┌──────────────────┼──────────────────┐                      │
│           │                  │                  │                      │
│           ▼                  ▼                  ▼                      │
│  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐          │
│  │    TIER 1       │ │    TIER 2       │ │    TIER 3       │          │
│  │ Tactical Edge   │ │   Datacenter    │ │   Connected     │          │
│  │                 │ │                 │ │                 │          │
│  │ • 7B quantized  │ │ • 70B-123B      │ │ • Frontier      │          │
│  │ • 3 GPU nodes   │ │ • Shared HPCaaS │ │ • Claude/GPT-4  │          │
│  │ • 0% internet   │ │ • Internal only │ │ • Full internet │          │
│  │ • ‹100ms latency│ │ • ‹1s latency   │ │ • ‹5s latency   │          │
│  │                 │ │                 │ │                 │          │
│  │ USE: Tactical   │ │ USE: Analysis,  │ │ USE: Complex    │          │
│  │ decisions, edge │ │ planning, heavy │ │ reasoning, code │          │
│  │ inference       │ │ compute         │ │ generation      │          │
│  └─────────────────┘ └─────────────────┘ └─────────────────┘          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Tier Selection Logic:

class TierRouter:
    """Routes to appropriate tier based on task requirements."""

    def select_tier(self, task: Task) -> Tier:
        # Tier 1: Edge - for latency-critical, simple tasks
        if task.latency_requirement_ms < 100:
            if task.complexity == "simple":
                return Tier.EDGE
            else:
                raise LatencyConstraintViolation(
                    "Complex task cannot meet ‹100ms latency"
                )

        # Tier 2: Datacenter - for analysis, medium complexity
        if task.requires_large_model and not task.requires_internet:
            return Tier.DATACENTER

        # Tier 3: Connected - for frontier capabilities
        if task.requires_frontier_model or task.requires_internet:
            if self.connectivity_available():
                return Tier.CONNECTED
            else:
                # Fallback to datacenter with degraded capability
                return Tier.DATACENTER

        # Default: Datacenter (best balance)
        return Tier.DATACENTER

    def route_with_fallback(self, task: Task) -> Response:
        """Route with automatic tier fallback."""
        preferred_tier = self.select_tier(task)

        try:
            return self.execute_on_tier(task, preferred_tier)
        except TierUnavailable:
            # Fallback chain: Connected → Datacenter → Edge
            for fallback in self.get_fallback_chain(preferred_tier):
                try:
                    return self.execute_on_tier(task, fallback)
                except TierUnavailable:
                    continue

            raise AllTiersUnavailable(task)

Anti-Patterns for Production Routing

❌ Anti-Pattern 1: Orchestrator Coupling

Wrong: Router calls orchestrator which calls agents
       Router → Orchestrator → Agent A → Agent B → Agent C

Right: Router writes to blackboard, agents react
       Router → Blackboard ← Agents (react to state changes)

❌ Anti-Pattern 2: Synchronous Routing

Wrong: Router blocks waiting for agent response
       router.route(task)  # Blocks for 30 seconds

Right: Router enqueues, worker processes
       router.enqueue(task)  # Returns immediately
       worker.process_queue()  # Async processing

❌ Anti-Pattern 3: No Classification Enforcement

Wrong: Trust agents to respect classification
       agent.process(task)  # Agent "promises" to stay in boundary

Right: Infrastructure enforces boundaries
       NetworkPolicy + NamespaceIsolation  # Cannot violate

Production Checklist for Smart Routing

## Routing Infrastructure Checklist

### Blackboard Architecture

- [ ] Decisions are append-only (audit trail)
- [ ] Directives are upsertable (current state)
- [ ] Agents read/write blackboard, not each other
- [ ] Blackboard survives agent failures

### Caching Layer

- [ ] SharedInformer or equivalent for routing data
- [ ] Local cache with watch-based updates
- [ ] O(1) routing lookups (not O(N) API calls)
- [ ] Cache invalidation on config changes

### Classification (if IC)

- [ ] NetworkPolicies block cross-namespace traffic
- [ ] Router enforces classification boundaries
- [ ] Response validation prevents data spillage
- [ ] Audit log for all routing decisions

### Multi-Tier (if applicable)

- [ ] Tier selection based on task requirements
- [ ] Automatic fallback chain defined
- [ ] Latency SLOs enforced per tier
- [ ] Cost tracking per tier

Remember: Routing is not just about automation—it's about learning. Every routing decision is a learning opportunity. Measure accuracy, analyze failures, and continuously improve. The router should get smarter with every task.