Skip to content
~/agentops
← home
// SKEPTICS

Answering the Skeptics

What the METR, GitClear, and Fortune 50 research on AI-assisted coding actually found, and why this site exists anyway.

What The Critics Say

"I tried to use Claude/Codex agents a few times but they just didn't work well enough at all." — Andrej Karpathy, the guy who coined "vibe coding," October 2025

The person who invented vibe coding hand-coded his next project. That's worth taking seriously.

I've read the research. I've experienced the failures myself. The critics aren't wrong about the problems—they're wrong about the conclusion.


The Research Says...

METR Study (July 2025)

The most rigorous study to date found something surprising:

MetricDeveloper PerceptionActual Result
Expected time savings-24%+19% (slower)
Perceived improvement-20%+19% (slower)
Perception-reality gap~40 percentage points

Translation: Developers felt 20% faster while actually being 19% slower.

"When people report that AI has accelerated their work, they might be wrong." — METR Researchers

Source: METR Study


GitClear Analysis (2024-2025)

Code quality metrics paint a concerning picture:

MetricChange
Code churn (written then changed)2x increase (2021-2023)
Code duplication8x increase (2024)
Bugs per developer+9%
Average PR size+154%

"No greater scourge to long-term code maintainability than copy-pasted code." — GitClear Report

Source: GitClear AI Code Quality 2025


Fortune 50 Security Analysis

A large-scale enterprise study revealed:

MetricResult
Code volume increase3-4x
Security issues increase10x
Issue typesHardcoded credentials, privilege escalation, architectural flaws

"These weren't simple bugs but exposed credentials, privilege escalation paths, and architectural design flaws—the kind of security blunders that get people fired."


Developer Survey Data (Stack Overflow 2024)

FindingPercentage
Developers saying AI made them "very productive"16.3%
Developers saying AI had "little or no effect"41.4%
Trust in AI accuracy (dropped from 40%)29%

Top frustration: "AI solutions that are almost right, but not quite."


The Junior Developer Crisis

Employment data shows a troubling trend:

  • Junior developer employment (ages 22-25): down ~20% from late 2022 peak
  • AI tool adoption among juniors: nearly universal (Copilot, Claude, GPT)
  • Deep understanding of shipped code: declining

"Every junior dev I talk to has Copilot or Claude or GPT running 24/7. They're shipping code faster than ever. But when I dig deeper into their understanding of what they're shipping? That's where things get concerning."


Karpathy's Retreat

The most telling signal: Andrej Karpathy—who coined "vibe coding"—hand-coded his next project.

His original definition was clear:

  • "Fully giving in to the vibes"
  • "Forget that the code even exists"
  • "For throwaway weekend projects"

When asked why he didn't use AI for Nanochat:

"I tried to use Claude/Codex agents a few times but they just didn't work well enough at all."

If the inventor isn't using it for real work, that's significant.


So Why Does This Site Exist?

Because I've seen both sides.

I've watched AI sessions spiral into debug hell. I've shipped code that broke in production. I've felt the frustration of being slower with AI than without it.

But I've also used AI assistance across infrastructure work where failure has real consequences. The gains came when the work was scoped, validated, reviewed, and captured for reuse.

The difference isn't the tools. The difference is operational discipline.

The critics are measuring AI-assisted development without discipline. That's like measuring DevOps without CI/CD—of course the results are bad.

Continue to: This Is NOT Vibe Coding →