Skip to content
all comparisons
comparison

AgentOps vs GSD (Get Shit Done) — Detailed Comparison

How AgentOps compares to GSD for AI coding agents. GSD optimizes within sessions with fresh context per agent. AgentOps carries knowledge between sessions with a compounding flywheel.

Upstream review: 2026-04-30Mirrored: 2026-04-30

AgentOps vs GSD (Get Shit Done)

GSD v1.34 is a full-featured spec-driven development framework for AI coding agents. It installs as slash commands into 7 runtimes (Claude Code, Gemini CLI, OpenCode, Codex, Copilot, Cursor, Antigravity) and solves "context rot" by spawning fresh-context subagents for each task. Trusted by engineers at Amazon, Google, Shopify, and Webflow.

Comparison updated April 2026. See the GSD repo for current features.


At a Glance

AspectGSDAgentOps
Philosophy"Ship fast — fresh context per agent""Operational layer for coding agents; technically a context compiler"
Core strengthMulti-agent orchestration with context isolation, multi-runtime supportCross-session memory, validation gates, knowledge flywheel
GitHubglittercowboy/get-shit-doneboshu2/agentops
Latestv1.34.2 (April 2026)v2.38.0 (April 2026)
Scale53 commands, 46 workflows, 16 agents50+ skills, compiled CLI, hooks, schemas
Primary useSpec-driven development with phased executionOngoing codebase work with persistent memory

What GSD Does Well

1. Fresh Context Per Agent

GSD's core innovation. Every spawned agent gets a clean 200K context window. Orchestrators stay thin, agents are disposable. This eliminates context rot — the quality degradation that happens as an AI fills its context window during long sessions.

2. Wave-Based Parallel Execution

Plans are grouped into dependency waves. Plans within a wave run in parallel (each with a fresh agent), waves run sequentially. Includes STATE.md file locking with atomic creation and spin-wait jitter.

Wave 1: [Plan A, Plan B, Plan C]  ← parallel, fresh 200K each
           ↓ (all complete)
Wave 2: [Plan D, Plan E]          ← parallel, depends on Wave 1
           ↓ (all complete)
Wave 3: [Plan F]                  ← sequential, depends on Wave 2

3. Model Cost Tiers

Four profiles (quality/balanced/budget/inherit) with per-agent model assignments. Each profile maps agents to opus/sonnet/haiku. This means routine plan-checking can run on budget models while critical execution stays on quality models.

4. Auto-Repair on Task Failure

When a task fails during execution, GSD auto-classifies the failure as RETRY (with adjustment), DECOMPOSE (break into sub-steps), or PRUNE (remove and escalate). Budget-controlled with a default of 2 attempts. This is structured recovery, not blind retries.

5. Comprehensive Validation Pipeline

Not just "human verification" anymore. GSD v1.27 has:

  • 8-dimension plan checker (max 3 iterations)
  • Nyquist validation audit (test coverage mapping)
  • Post-execution verification agent
  • User acceptance testing with auto-diagnosis
  • Cross-phase verification debt audit

6. Seven Runtime Support

Installs to Claude Code, Gemini CLI, OpenCode, Codex, Copilot, Cursor, and Antigravity. The installer transforms file content per runtime at install time (tool name mapping, agent frontmatter, hook events, path conventions).

7. Advisor Mode and Fast Path (New in v1.34)

/gsd:advisor provides research-backed discussion without execution. /gsd:fast skips planning for trivial tasks. Multi-repo workspace support added for cross-project orchestration.

7. Deep State Management

Full .planning/ directory with 20+ artifact types:

  • PROJECT.md, REQUIREMENTS.md, ROADMAP.md, STATE.md
  • Per-phase directories with research, plans, summaries, verification, UAT
  • Session handoff (HANDOFF.json, continue-here.md)
  • Persistent threads, seeds, debug knowledge base, todos

Where GSD Falls Short

No Cross-Session Learning

GSD has persistence (state files, handoffs, threads) but no knowledge flywheel. There is no mechanism to extract what was learned in one session and inject it into the next. Every session starts with the same agent intelligence — the system does not get smarter over time.

┌─────────────────────────────────────────────────────────────────┐
│                         GSD                                     │
│                                                                 │
│  Session 1: discuss → plan → execute → verify → Done            │
│                                                   ↓             │
│                                              (state saved)      │
│                                                   ↓             │
│  Session 2: resume-work → (same state, same intelligence)       │
│                                                                 │
│  Session 100: (agents are no smarter than session 1)            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                      AGENTOPS                                   │
│                                                                 │
│  Session 1: research → plan → pre-mortem → crank → vibe → retro │
│                                              ↓                  │
│                                      (learnings extracted)      │
│                                              ↓                  │
│                                      (scored and stored)        │
│                                                                 │
│  Session 2: (inject prior knowledge) → better starting point    │
│                                                                 │
│  Session 100: (agent is a domain expert with scored knowledge)  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

No Strategic Goals or Direction

GSD executes phases within a project but has no mechanism for measuring progress toward higher-level objectives. No equivalent to GOALS.md, ao goals measure, or /evolve.

No Pre-Implementation Failure Prevention

GSD validates after execution (verify-work, UAT). The plan checker validates plan quality but does not simulate failure modes before implementation begins. AgentOps runs /pre-mortem to catch issues before code is written.

No Issue Graph or Dependency Tracking

GSD has wave-based parallelism for plans within a phase, but no cross-phase issue graph. There is no equivalent to beads (issue tracking with dependencies), which means no mechanical way to track blocked work across phases.

No Multi-Model Validation Council

GSD's review command supports cross-AI peer review (Gemini, Claude, Codex), but this is a review tool, not a multi-perspective validation council that runs adversarial analysis from different viewpoints simultaneously.


Feature Comparison

FeatureGSDAgentOpsWinner
Multi-runtime support✅ 7 runtimes⚠️ 4 supported runtime paths, less broad than GSDGSD
Fresh context per agent✅ Core design⚠️ Swarm workersGSD
Model cost tiers✅ 4 profiles❌ Not yetGSD
Auto-repair on failure✅ RETRY/DECOMPOSE/PRUNE⚠️ Crank retriesGSD
Context rot detection✅ Hooks at 35%/25%❌ Not yetGSD
Prompt injection defense✅ Advisory hook❌ Not yetGSD
Wave-based parallelism✅ Built in✅ Crank wavesTie
Plan validation✅ 8-dimension checker✅ Pre-mortem + councilTie
Human-in-loop gates✅ Configurable gates✅ Multiple gatesTie
State persistence✅ .planning/ directory✅ .agents/ directoryTie
Cross-session learning❌ No flywheel✅ Extract → score → injectAgentOps
Knowledge maturity❌ No scoring✅ Maturity tracking + decayAgentOps
Pre-mortem simulation❌ Post-execution only✅ Before implementationAgentOps
Multi-model council❌ Sequential review✅ Multi-perspectiveAgentOps
Issue graph execution❌ Phase-scoped waves✅ Beads + dependenciesAgentOps
Strategic goals❌ No goal tracking✅ GOALS.md + evolveAgentOps
Compiled CLI❌ Node.js tools✅ Go binary (ao)AgentOps

Workflow Comparison

GSD Workflow

/gsd:new-project     →  PROJECT.md, REQUIREMENTS.md, ROADMAP.md, config.json
         ↓
/gsd:discuss-phase   →  Capture decisions (/comparisons/CONTEXT)
         ↓
/gsd:plan-phase      →  Research → plan → 8-dimension verify (max 3 iterations)
         ↓
/gsd:execute-phase   →  Wave-based parallel execution (fresh agent per plan)
         ↓                  └── Node repair on failure (RETRY/DECOMPOSE/PRUNE)
/gsd:verify-work     →  UAT with auto-diagnosis
         ↓
/gsd:ship            →  Create PR from phase work
         ↓
       [next phase or complete-milestone]

AgentOps Workflow

/research     →  Explore codebase + inject prior knowledge
     ↓
/plan         →  Break into dependency-tracked issues (beads)
     ↓
/pre-mortem   →  Simulate failure modes before building
     ↓
/crank        →  Execute unblocked waves → validate → commit
     ↓
/vibe         →  Multi-aspect code validation (council optional)
     ↓
/post-mortem  →  Extract learnings → score → store for next session

Architecture Comparison

AspectGSDAgentOps
Commands53 prompt-based slash commands50+ skill definitions
Agents16 specialized (fresh context each)Skill-driven (swarm for parallelism)
CLI toolingNode.js (gsd-tools.cjs, 15 modules)Go binary (ao, structured subcommands)
Hooks5 JS hooks (statusline, context monitor, prompt guard, workflow guard, update check)Shell hooks (session lifecycle, tool gates, knowledge injection)
State.planning/ (Markdown + JSON).agents/ (Markdown + JSON)
Config.planning/config.json (40+ options).agentops.json + GOALS.md
Installnpm package, 3000-line installerShell script + Go binary
ParallelismWave-based with file lockingWave-based via crank + swarm

Overhead Comparison

                    SETUP TIME              ONGOING OVERHEAD
                    ══════════              ════════════════

GSD:                ████████░░░░░░░░        ████████░░░░░░░░
                    (npm install + init)    (moderate — .planning/ management)

AgentOps:           ████████░░░░░░░░        ████████░░░░░░░░
                    (install + init)        (moderate — hooks + .agents/)


                    SESSION VALUE           LONG-TERM VALUE
                    ═════════════           ═══════════════

GSD:                ████████████████        ████████░░░░░░░░
                    (strong execution)      (state persists, no learning)

AgentOps:           ████████████████        ████████████████
                    (strong execution)      (knowledge compounds)

Trade-off: GSD optimizes for execution quality per session. AgentOps optimizes for cumulative intelligence across sessions.


Use Case Fit

GSD is Best For

Use CaseWhy
Greenfield projectsStrong project setup + phased execution
Multi-runtime teams7 runtimes with one install
Cost-sensitive workModel cost tiers control spend
Complex single-phase workWave parallelism + auto-repair
Teams standardizing processClear phases with configurable gates

AgentOps is Best For

Use CaseWhy
Long-running codebasesKnowledge flywheel compounds value
Repeated maintenanceLearns from past sessions
Complex multi-phase workIssue graph + dependency execution
Risk-averse engineeringPre-mortem + council + vibe gates
Strategic directionGOALS.md + evolve loop

When to Choose GSD

  • You work across multiple AI runtimes and need one workflow
  • You want model cost control at the per-agent level
  • Your work is project-scoped (clear start and end)
  • You value fresh context per agent for quality in long sessions
  • You want auto-repair when tasks fail during execution

When to Choose AgentOps

  • You work on the same codebase across many sessions
  • You want the system to get smarter over time
  • You want failure prevention before implementation, not just verification after
  • You want dependency-tracked issue execution across work phases
  • You value strategic goal tracking and measured progress

Can They Work Together?

Partially. GSD and AgentOps both manage state directories and workflow orchestration, so running both simultaneously would create friction. However:

  • GSD's fresh-context-per-agent pattern is a technique AgentOps' swarm could adopt
  • GSD's model cost tiers solve a problem AgentOps does not yet address
  • AgentOps' knowledge flywheel fills GSD's biggest gap (no cross-session learning)

The most practical combination: use GSD for greenfield projects where you need fast phased execution, then bring AgentOps in when the project enters maintenance and long-term development where accumulated knowledge matters.


The Bottom Line

DimensionGSDAgentOps
PhilosophyFresh context, fast executionKnowledge compounds
OverheadModerateModerate
PersistenceState files (no learning)Knowledge flywheel
Validation8-dimension plan check + UATPre-mortem + council + vibe
ParallelismWave-based, fresh agentsWave-based, swarm workers
Cost control4-tier model profilesNot yet
Best forStrong execution per sessionCumulative intelligence across sessions

GSD is a serious framework for structured AI-assisted development. AgentOps differentiates on the knowledge flywheel — the system that makes every session smarter than the last.


The Honest Assessment

GSD is not a lightweight tool anymore. It is a comprehensive development framework with 53 commands, 16 agents, wave-based parallelism, auto-repair, model cost tiers, and deep state management. Dismissing it as "simple meta-prompting" is inaccurate.

Where GSD wins: Execution quality within a session. Fresh context per agent, cost control, auto-repair, and 7-runtime portability.

Where AgentOps wins: Intelligence across sessions. The knowledge flywheel (extract, score, inject, decay) has no equivalent in GSD. After 50 sessions on the same codebase, AgentOps is operating with accumulated domain knowledge while GSD agents start fresh every time.

Session 1:   GSD and AgentOps roughly equal
Session 10:  AgentOps has a library of scored learnings
Session 50:  AgentOps agents get injected domain expertise; GSD agents do not
Session 100: AgentOps is a domain expert; GSD is still starting from its .planning/ state