Context Development Life Cycle

DevOps asked: what if ops looked more like dev? The answer was CI/CD, infrastructure as code, and the SDLC — the infinity loop that made software delivery a disciplined engineering practice.

CDLC asks the same question about coding agents: what if the context we feed them — skills, instructions, knowledge, constraints — were engineered with the same rigor as the code they produce?

The Parallel

Every phase of the software development lifecycle has a context counterpart.

| SDLC (Code) | CDLC (Context) | The Question | |---|---|---| | Plan | Generate | What context should exist? | | Code + Build | Compile | How is raw context assembled into a precise packet? | | Test | Test | Does this context produce the intended behavior? | | Release | Distribute | How do other teams and projects get this context? | | Deploy | Deliver | Did the right context reach the agent at session start? | | Operate + Monitor | Observe | Is the context working? What signals come back? | | Feedback → Plan | Adapt | What should change for next time? |

The SDLC produces deployable artifacts. The CDLC produces injectable context. Both compound through feedback loops. Both degrade without discipline.

The Seven Phases

Phase 1: Generate

Create the context that agents will consume.

You write prompts. You author skills. You pull documentation. You write specs that get broken down into agent-executable plans. Every ticket is context creation. Every Claude.md edit is context creation.

This is the phase most teams think is the whole job. Write a prompt, get an answer. CDLC says generation is one-seventh of the work.

12-Factor alignment: Factor I: Context Is Everything — manage what enters the context window like you manage what enters production. Factor IV: Research Before You Build — don't generate context from guesses.

Phase 2: Compile

Assemble raw context into phase-appropriate, role-scoped, freshness-weighted packets.

A prompt builder concatenates text. A context compiler selects the right pieces, ranks them by utility and freshness, trims to the token budget, and delivers the minimum viable context for the current phase. The agent doing research needs different context than the agent implementing code.

This is where 99% of the gap lives between teams that use AI coding agents and teams that get reliable output from them.

12-Factor alignment: Factor I: Context Is Everything — the 40% cognitive load rule. Load the right things, not all things.

Phase 3: Test

Validate that context produces the intended agent behavior.

You change two lines in your Claude.md. Do you know the impact? Without testing, you don't. Maybe the new wording conflicts with an existing instruction. Maybe it's ambiguous enough that one model interprets it differently than another. Maybe it's fine today but breaks when you switch from Claude to Codex.

Context testing is fundamentally different from code testing. Evals are non-deterministic. You run them five times and measure pass rate. Error budgets replace pass/fail. This is the hardest phase to get right, and the one most teams skip entirely.

12-Factor alignment: Factor V: Validate Externally — your agent thinks its plan is great. Get a second opinion before it builds. Factor IX: Measure What Matters — if you don't measure context quality, you can't improve it.

Phase 4: Distribute

Package and share context across projects, teams, and runtimes.

Context that lives in one person's Claude.md doesn't scale. Skills are the packaging format — a bundle of instructions, scripts, references, and examples that any agent on any runtime can load. A registry lets teams discover and install packages. Dependencies get resolved. Versions get pinned.

This is where context becomes an organizational asset. One team fixes a testing pattern, packages it as a skill, and every other team gets the fix on next install.

12-Factor alignment: Factor II: Track Everything in Git — context is code. Version it. Review it. Ship it.

Phase 5: Deliver

Inject the right context into the right session at the right time.

A compiled context packet is worthless if it doesn't reach the agent. Delivery is the moment where compilation meets the session. Lifecycle hooks fire at session start and load the relevant context automatically. The agent doesn't need to ask for it — the system delivers it.

12-Factor alignment: Factor I: Context Is Everything — right context, right window, right time. Phase-specific. Role-scoped. Freshness-weighted.

Phase 6: Observe

Monitor whether delivered context produces good outcomes.

Three feedback channels matter:

Agent logs. When agents say "I don't know how to do X" or retry the same command three times, that's a signal. Surface the pattern across sessions and the missing context becomes obvious.
PR review feedback. A rejected PR means context failed. The agent had instructions, knowledge, and constraints — and still produced the wrong output. Fix the context, not just the PR.
Production failures. Code generated from context runs in production. When it fails, trace back: what context was loaded when this code was written? Create a test case so it doesn't happen again.

12-Factor alignment: Factor IX: Measure What Matters — telemetry is your feedback channel. Factor XII: Harvest Failures as Wisdom — every failure is a missing piece of context.

Phase 7: Adapt

Feed observations back into context improvement. Close the loop.

This is where the CDLC becomes a flywheel. Each session's outcomes improve the next session's context. Knowledge that works gets promoted — its utility score rises, so it appears more often in future injections. Knowledge that correlates with user corrections gets demoted. The system compounds without human intervention.

Adaptation is what separates a static rulebook from a living system. DevOps learned this with continuous improvement. SRE learned it with error budgets. The CDLC learns it with the knowledge flywheel.

12-Factor alignment: Factor VII: Extract Learnings — every session should leave the system smarter. Factor VIII: Compound Knowledge — the flywheel. Factor VI: Lock Progress Forward — ratchets prevent regression.

Why This Matters

LLMs are engines. Context is fuel. You can't tune the engine — that's the model vendor's job. But you can engineer the fuel.

DevOps proved that disciplined systems around indeterministic workers (humans) produce reliable output. SRE proved it again with SLOs and error budgets. Kubernetes proved it for infrastructure with control loops that reconcile actual state to desired state.

CDLC is the same proof for coding agents. The model stays the same. The context compounds. The system gets better with each use.

The Leverage Hierarchy

Not all phases are equal. Donella Meadows ranked twelve places to intervene in a system, from weakest (#12: tweak a number) to strongest (#1: change the paradigm). The CDLC phases climb that ladder.

| Leverage | Meadows Point | CDLC Phase | What It Means | |---|---|---|---| | Low | #12–#10: Parameters, buffers, structure | Generate | Writing a better prompt helps, but it's the lowest-leverage thing you can do. Most teams stop here. | | Medium | #9–#8: Delays, balancing feedback | Compile, Test | Assembling the right context and validating it before delivery. Feedback loops that catch errors. | | Threshold | #6: Information flows | Distribute, Deliver | Making context available where it's needed. This is the threshold between low and high leverage — the point where individual effort becomes organizational capability. | | High | #5: Rules | Observe | Measuring what actually happens when context meets agents. Rules that govern what gets promoted, demoted, or discarded. | | Highest | #4–#3: Self-organization, goals | Adapt | The system improves itself. Learnings promote automatically. Goals reconcile. The flywheel compounds without human intervention. |

The pattern: the phases most teams skip (Observe, Adapt) are the ones Meadows says matter most. Writing a prompt is #12. Building a system that improves its own context based on what it observes is #4. That's a 8-level leverage gap.

The 12 factors follow the same gradient. The Foundation tier (I–III) operates at the low end — necessary but low leverage. The Knowledge tier (VII–IX) operates at the high end — self-organization and goal-directed measurement. The factors are already ordered by increasing Meadows leverage. That ordering is load-bearing, not cosmetic.

How the 12 Factors Build the Flywheel

The 12 factors are not a flat list. They are a build order — four tiers that construct the three product layers in sequence. The flywheel is what emerges when all three layers are running.

| Tier | Factors | Product Layer | What It Builds | Theory | |---|---|---|---|---| | Foundation (I–IV) | Context Is Everything, Track in Git, One Agent One Job, Research First | Context Compiler | The substrate — context exists, is versioned, is scoped, is researched before use | Cognitive science (40% load, lost-in-middle). Meadows #12–#6. | | Flow (V–VI) | Validate Externally, Lock Progress Forward | Validation Gates | The filter — bad context gets caught, good context can't regress | Brownian Ratchet (chaos + filter + one-way gate). Balancing feedback loops (Meadows #8–#7). | | Knowledge (VII–IX) | Extract Learnings, Compound Knowledge, Measure What Matters | Knowledge Flywheel | The engine — learnings extract, score, promote, inject. The loop closes. | MemRL (reinforcement learning on episodic memory). Self-organization (Meadows #4). Escape velocity: σ×ρ > δ. | | Scale (X–XII) | Isolate Workers, Supervise Hierarchically, Harvest Failures | Infrastructure | The multiplier — all three layers work across parallel agents. Failure becomes fuel. | Control theory (Kubernetes reconciliation loops). SRE (SLOs + error budgets). |

The flywheel doesn't exist until the Knowledge tier kicks in — but it can't function without the layers beneath it. You need context to exist (Foundation) and be validated (Flow) before you can extract learnings from it (Knowledge) and feed them back. Factor VIII (Compound Knowledge) is the climax: the moment the loop closes and starts compounding. Everything before it is setup. Everything after it is scale.

The theoretical threads

Each tier draws from a different body of theory. None of them alone explains the system:

Cognitive science (Sweller 1988, Liu 2023) constrains the Foundation: the 40% load rule, lost-in-middle attention mechanics, and buffer-sizing are why context management matters at all. Without these constraints, you could dump everything into the window and let the model sort it out. You can't.
The Brownian Ratchet operates in the Flow tier: agents produce noisy, variable output. The validation gates are the filter. The ratchet (Factor VI) is the one-way gate. Chaos + filter + gate = net forward progress despite variance.
MemRL (Zhang 2025) drives the Knowledge tier: reinforcement learning on episodic memory. Citation events become training signals. Utility scores update. High-utility learnings surface more often. Low-utility learnings decay. The flywheel has its own learning algorithm.
Control theory (Kubernetes-shaped reconciliation) enables the Scale tier: declared state (GOALS.md) + reconcile loop (/evolve) + error budgets (fitness gates). The system doesn't fire-and-forget — it continuously reconciles actual state to desired state.
Systems dynamics (Meadows 2008) provides the leverage hierarchy that orders all of it: the Foundation is necessary infrastructure (#12–#10), the Flow tier adds feedback (#8–#7), the Knowledge tier reaches self-organization (#4–#3). The highest-leverage phases are the ones most teams never build.

The CDLC is the spine that connects these threads. Each phase is where a different theory becomes operational. No single thread is the answer. The loop that connects them is.

The Two Loops

At the individual level, the inner loop is: generate context, test it, iterate. This is your library authoring tool loop. You're improving your own skills, honing your own Claude.md, crafting better instructions.

At the organizational level, the outer loop is: distribute context, observe usage, adapt, redistribute. One team publishes a skill. Other teams install it. Usage signals flow back. The skill improves. Everyone benefits.

The inner loop is where quality comes from. The outer loop is where scale comes from. CDLC needs both.

Attribution

The CDLC framework was first articulated by Patrick Debois at AI Engineer 2026. The same person who coined "DevOps" in 2009 recognized the pattern: a new class of indeterministic workers needs its own development lifecycle. AgentOps implements all seven phases.