Skip to content
~/agentops
← all skeptics
// SKEPTICS

Our Response

Why AgentOps exists despite the valid criticism of AI-assisted coding: the specific failure modes it gates against.

Our Response

Why 12-Factor AgentOps exists despite the valid criticism.


The Critics Are Right

Let's be clear: the research is valid.

  • METR study: AI tools can slow experienced developers
  • GitClear: Code quality degrades without discipline
  • Security research: AI introduces vulnerabilities at scale
  • Karpathy: Even the inventor retreated from his approach

We don't dispute this. We embrace it.


The Missing Piece

The research describes what happens without operational discipline.

What it doesn't measure:

  • Teams with rigorous validation gates
  • Developers who understand every line shipped
  • Organizations with structured AI workflows
  • Production systems with proper oversight

AgentOps is the operating loop for the gap the research points at.


The Ecosystem

AI-assisted development needs three things:

ProjectRelationship
12-Factor AgentsHow to build agent applications. We're how to operate with them.
Vibe CodingThe methodology of AI-assisted coding. AgentOps is the loop that keeps the work reliable.

Gene Kim shows the upside case for AI-assisted development. We focus on the operating discipline needed to pursue that upside without letting chaos drive the work.


What We Actually Measured

Our production environment over sustained use:

MetricResult
Success rateMore predictable with explicit validation
Deployment velocityImproved when work is scoped and reviewed
Code qualityMaintained through gates and review
UnderstandingRequired (can't ship what we can't explain)

How is this different from the research?

  1. Validation at every step — Factor VII enforced
  2. Context management — 40% rule prevents degradation
  3. Focused agents — Single responsibility, no sprawl
  4. Human checkpoints — Factor XI required for critical changes
  5. Institutional memory — Patterns mined and reused

The 12 Factors (Why They Exist)

Each factor addresses a failure mode from the research:

Research FindingFactor Response
"Context degrades quality"Factor I: Context Is Everything (manage what enters)
"Work gets lost"Factor II: Track Everything in Git (if not in git, didn't happen)
"Code becomes unmaintainable"Factor III: One Agent, One Job (scoped tasks)
"AI has too much access"Factor IV: Enforce Least Privilege (minimum scope per agent)
"AI slowed developers"Factor V: Research Before You Build (understand first)
"Wrong tools for wrong tasks"Factor VI: Isolate Workers (own workspace, own context)
"Bugs go undetected"Factor VII: Validate Externally (no self-grading)
"Big changes cause problems"Factor VIII: Lock Progress Forward (ratchet, no regress)
"Same mistakes repeated"Factor IX: Extract Learnings (two outputs per session)
"Knowledge stays siloed"Factor X: Compound Knowledge (flywheel; failures indexed too)
"Critical errors slip through"Factor XI: Supervise Hierarchically (escalation up)
"Can't measure improvement"Factor XII: Measure Outcomes (fitness, not activity)

The factors are the operational controls the research says are missing.


Why Not Just... Not Use AI?

Fair question. Here's the honest answer:

Without AI discipline:

  • Inconsistent success rate
  • High cognitive load
  • Repetitive work
  • Limited exploration

With AI but no discipline (vibe coding):

  • Variable quality
  • Security issues
  • Tech debt accumulation
  • Skill degradation

With AI + operational discipline (AgentOps):

  • More predictable delivery
  • Reduced cognitive load
  • Pattern reuse
  • Maintained understanding

The path forward isn't rejecting AI. It's operating it responsibly.


For The Skeptics

If you're skeptical, we respect that. Here's what we offer:

Transparency

  • Every claim has attribution
  • Failures shown alongside successes
  • Methodology documented
  • Results reproducible

Evidence

  • Production metrics (2 years)
  • Real infrastructure (not toy examples)
  • Long-term quality tracking
  • Before/after comparisons

Honesty

  • We acknowledge the research
  • We show what didn't work
  • We document failure patterns
  • We iterate publicly

The Invitation

We're not asking you to believe productivity claims.

We're asking you to examine:

  1. The failure patterns (real problems)
  2. The factors (proposed solutions)
  3. The evidence (measured results)

Then decide for yourself.


Get Started

  1. Read the failure patterns — What goes wrong
  2. See the factors — How we address them
  3. Review the skills — See the checks that support the factors
  4. Install AgentOps — Try the workflow in your own environment

Direct Path