Engineering

Spec-driven development: how to ship with Claude Code without the rewrite

May 16, 2026 | 9 min read

Developer reviewing a written specification next to an editor with AI suggestions

Hand Claude Code a one-line prompt like "build me a booking page" and you'll get a booking page. You'll also get a data model nobody agreed to, an auth flow that conflicts with the rest of the app, and a test file that asserts the code does what the code does. Three weeks later, you're rewriting it.

The teams shipping fast with AI coding tools share one habit: they write the spec before they write the prompt. We call it spec-driven development, and on Savi projects it cuts rework by roughly 60% compared to prompt-and-pray.

This isn't a return to waterfall. The spec is short, lives next to the code, and changes daily. It exists for one reason: to give the model the constraints a human teammate would have asked for in standup.

Why prompt-only workflows break

Claude Code, Cursor, and Copilot are pattern matchers with judgment. The judgment runs on the context you give them. With a thin prompt, the model fills the gaps with the most common pattern from its training data. That pattern is almost never the one your codebase already uses.

The result is what every team using AI has now seen:

Style drift. The new file uses class components; the rest of the app is hooks. The new endpoint returns snake_case; everything else is camelCase.
Phantom features. You asked for a booking page. You got a booking page, a calendar export, an email reminder, and a payment integration you didn't ask for and can't remove without breaking the rest.
Hallucinated dependencies. The code imports a package that exists, but the function being called doesn't. It compiles. It crashes at runtime.
Test theatre. The model writes tests that re-implement the function and assert the result matches itself. They pass. They prove nothing.

Each of these is fixable. Together they're the reason your engineer says "I should have just written it myself." A spec collapses all four failure modes into one upstream decision.

What a spec actually contains

A useful spec for one feature fits in 200-600 words. Anything longer is a design doc and belongs somewhere else. The minimum viable spec has six sections.

Section	Purpose	Length
User story	Who, what, why in one sentence	1 line
Data shape	Inputs, outputs, types, persisted fields	5-15 lines
Happy path	Step-by-step success flow	3-8 steps
Edge cases	The three or four ways it breaks	3-4 bullets
Out of scope	Adjacent work the model will guess at	2-5 bullets
Acceptance tests	The exact assertions that prove it works	3-6 cases

The most valuable section is "out of scope." It's the only one a thin prompt can't recover. If you don't tell the model what not to build, it builds it. Then you spend an hour deleting code that already passes tests.

The workflow, end to end

1. Draft the spec in plain text

Open a markdown file next to the code: features/booking.md. Write the six sections. Keep it in the repo. Commit it. The spec is part of the codebase, not a Notion page that drifts out of sync.

2. Hand the spec to the model, not the prompt

In Claude Code or Cursor, attach the spec file as context and ask: "Implement the feature described in this spec. Write the acceptance tests first, then make them pass." The model now has constraints, a test target, and an explicit boundary.

3. Review the test names, not the implementation

Before reading a single line of generated code, read the test names. If the model wrote tests for things you didn't ask for, the spec was too vague. Fix the spec, regenerate the tests. Only when test names match the acceptance criteria do you let it write implementation.

4. Run tests; iterate against failures, not vibes

When something breaks, paste the failing test output back to the model with one sentence of context. Don't explain what you think is wrong. Let the model read the failure. Vibes-based feedback ("this feels off") burns context and produces unrelated changes. Failing-test feedback produces targeted fixes.

5. Update the spec when reality disagrees

Halfway through, you'll discover an edge case you didn't list. Update the spec first, then ask for the change. The spec is the source of truth; the code follows. Teams that skip this step end up with code that no longer matches anyone's mental model after two weeks.

How much time this actually costs

The objection we hear: "writing specs is just doing the work twice." It isn't, and here's the math from the last five Savi projects.

Stage	Prompt-only	Spec-driven
Spec writing	0 min	20 min
First generation	10 min	15 min
Review and rework	90 min	25 min
Test authoring	40 min	0 min (in step 2)
Total per feature	140 min	60 min

The win compounds. By feature five, the spec format is muscle memory; the first draft takes 8 minutes. The model also gets better at your codebase because every spec it sees mirrors the last one. Pattern matching cuts both ways.

Where this fits in a real codebase

On Frootex, we ship every feature this way. The repo has a features/ directory with one markdown file per shipped feature. New engineers read the directory before they read the code, and they're productive in a day instead of a week. AI tools read the same directory and produce changes that match the rest of the codebase on the first try.

The pattern also works backwards. When you inherit a vibe-coded prototype, the first job is reverse-engineering specs from the existing code. Read a module, write the spec it should have had, commit the spec. Two weeks of this turns a brittle prototype into a codebase your team can extend. It's the same workflow we use when clients arrive after getting stuck on Lovable or Bolt.

The spec is the bottleneck. That's the point.

AI coding tools made writing code cheap. They didn't make deciding what to build cheap. Spec-driven development moves the slow part (thinking) ahead of the fast part (typing), which is the only sequence that scales. If the spec is hard to write, the feature isn't ready to build. That signal alone is worth the workflow.

Start small. Pick one feature this week. Write the six sections. Hand the spec to your AI tool with "tests first, then code." Track how long the rework takes. If it's shorter than the last one, you have your answer.

Frequently asked questions

What is spec-driven AI development?

Spec-driven development is a workflow where you write a detailed specification before letting an AI coding tool generate code. The spec defines the data model, API contracts, edge cases, and acceptance tests. The AI then implements against the spec instead of guessing intent from a one-line prompt.

How long should an AI coding spec be?

For a single feature, 200-600 words is enough. It should cover the user story, the data shape, the success path, two or three failure modes, and the test criteria. Longer specs hit context limits and dilute focus. Shorter specs let the model invent details you'll have to undo.

Does spec-driven development slow you down?

It adds 15-30 minutes per feature upfront and removes 2-4 hours of rework downstream. Rewrites and merge conflicts cost more than the spec ever does. Teams that adopt the workflow ship roughly 40% faster after the first two weeks.

Can a non-technical founder write specs?

Yes for the what; no for the how. Founders describe the user outcome, the inputs, and the success criteria. A senior engineer translates that into data shapes, API contracts, and acceptance tests. The spec is the handoff document between product intent and AI output.

Want this workflow on your project?

We pair senior engineers with AI tooling to ship faster without the rewrites. 30-minute call, no sales pitch.

Talk to our team