April 10, 2026·7 min read

How to write a spec for Claude Code (that it actually follows)

A field guide to writing specs that AI coding agents like Claude Code, Cursor, and Codex can execute without hallucinating scope — with a copy-paste template.

If you've used Claude Code, Cursor, or Codex for more than a week, you've hit this moment: you asked it to build something, it built a thing, and the thing was almost — but not quite — what you meant. So you nudged. Then you nudged again. Then you deleted half the file and tried a new prompt. Then you gave up and wrote it yourself.

The model isn't the problem. The prompt is. More specifically: the prompt is a spec, and your spec is a vibe.

This post is a field guide to writing specs that AI coding agents actually follow, based on what I've learned building ClearSpec — a tool that generates structured specs for exactly this purpose.

The core insight

The spec is the prompt. Everything you write in Cursor chat, in a Claude Code slash command, in an Agents.md, or in a Jira ticket that someone feeds to an LLM — it's all the same thing. It's a specification of what the code should do. The only question is whether the specification is any good.

A good spec answers a narrow set of questions before the model sees it:

What is the user trying to accomplish? Not "add auth" — "let a returning user sign in with the same provider they used last time, so their data is still attached."
Who is the user? A new visitor, a returning user, an admin, a bot. Each gets different behavior.
What does success look like? Measurable, checkable, testable. Not "it works."
What can break? Network failures, expired tokens, empty lists, rate limits, provider outages, malformed input.
What's explicitly out of scope? This is the line that saves you from the model over-delivering. "Do NOT add email/password login. Do NOT rewrite the theme system."

Most "bad AI output" is actually a missing answer to one of these five questions.

The minimum viable spec

Here's the template I use. Copy-paste it, fill it in, hand it to whichever agent you're using. It works for a feature, a bug fix, or a refactor. Everything in brackets is what you fill in.

# [One-line goal]

## Goal
[Two sentences. What this feature/fix/refactor accomplishes and why.]

## User stories
As a [role], I can [action], so that [outcome].
As a [role], I can [action], so that [outcome].
As a [role], I can [action], so that [outcome].

## Acceptance criteria
- [ ] [Specific, checkable behavior]
- [ ] [Specific, checkable behavior]
- [ ] [Specific, checkable behavior]

## Edge cases
- [What happens when X is empty]
- [What happens when the network fails]
- [What happens when two users do this at the same time]

## Failure states
- [How to handle error X — user-facing message, log entry, fallback]
- [How to handle error Y]

## Out of scope
- [Thing the agent might reasonably do but must NOT do in this change]
- [Another bounded constraint]

## Dependencies
- [External service, existing code path, or config the change depends on]

## Files likely to change
- [Path guesses or "unknown — discover from codebase"]

That's it. Fifty lines of markdown. It turns a conversation with your AI agent from "keep trying until it gets it" into "one shot, review, ship."

Want a more detailed version? I expanded this into a full product spec template with worked examples — including a section-by-section breakdown of what each part prevents.

Walkthrough: "Add OAuth login" done badly and then done well

The vague version

"Add user authentication with OAuth so people can sign in."

What Claude Code does with this: builds a generic email/password form, then also adds OAuth as a secondary option, forgets to handle returning users, stores tokens in localStorage (bad), hardcodes a callback URL to http://localhost:3000 (also bad), and doesn't touch the database schema at all. Then you spend three hours fixing it.

The spec version

# Add user authentication with Google and GitHub OAuth

## Goal
Let users sign in with Google or GitHub. No email/password. Sessions must
survive browser refreshes. Users must be able to link a second provider to
the same account.

## User stories
As a new visitor, I can sign up with Google or GitHub so I can start using
the app without creating a password.

As a returning user, I can sign in with the same provider I used before so
all my data is still attached.

As a security-conscious user, I can link a second OAuth provider to my
existing account so I have a backup sign-in method.

## Acceptance criteria
- [ ] OAuth flow completes in under 3 seconds on broadband
- [ ] Sessions persist in an httpOnly cookie, not localStorage
- [ ] Sessions expire after 7 days of inactivity
- [ ] Failed sign-ins are rate-limited to 10/minute per IP
- [ ] Two providers with the same email resolve to one account

## Edge cases
- User revokes app access from Google/GitHub settings after signing up
- Two providers return the same email address for different real people
- User signs up with Google, then tries to sign in later with GitHub using
  the same email
- Callback URL is called with an expired or replayed authorization code

## Failure states
- Provider is unreachable: show "Sign-in temporarily unavailable" and log
  the upstream error
- Database is unreachable: return HTTP 503 with a generic message; never
  expose database details to the client
- Session token is tampered with: invalidate immediately and return HTTP 401

## Out of scope
- Email/password authentication
- SSO with SAML or Okta
- Multi-factor authentication via SMS or TOTP
- Account deletion flow

## Dependencies
- Google OAuth 2.0 credentials in Google Cloud Console with authorized
  redirect URI `${APP_URL}/auth/callback/google`
- GitHub OAuth App registered with the matching callback URL

## Files likely to change
- src/lib/auth.ts
- src/app/api/auth/callback/route.ts
- src/middleware.ts
- supabase/migrations/XXX_add_oauth_providers.sql

Feed that to Claude Code and you'll get an implementation that's 90% ready on the first shot. The other 10% is style stuff specific to your codebase that the model can't know without reading it — which is what the "Files likely to change" section is for.

The four rules that matter most

After shipping a couple hundred AI-coded features using specs like this, these are the rules that move the needle most:

1. Put "Out of scope" before the model asks

Agents love to over-deliver. They add features you didn't ask for because it "seems helpful." The cure is an explicit out-of-scope list. Every PR you've ever had to revert has a missing out-of-scope line somewhere.

2. Write edge cases in terms of behavior, not code

"Handle empty arrays" is useless. "When a user has zero specs, the dashboard shows a friendly empty state and a CTA to create the first one" is a spec. The first makes the agent defensive; the second makes it build the right thing.

3. Acceptance criteria should be checkable without reading code

If you can't verify the criterion by clicking around the app or reading a log, it's not an acceptance criterion. It's a wish.

4. Treat the spec as living

Claude Code will ask clarifying questions mid-build. Every answer you give is a spec update. Paste the clarifications back into the spec document afterward, so next time you're in the same feature area, the context compounds instead of evaporating.

What ClearSpec does

This is the exact process we automated at ClearSpec. You type a one-sentence goal, the AI asks you the five questions from the top of this post, and you get a structured markdown spec that slots directly into whichever AI coding tool you're using. The spec lives in your dashboard, can be versioned, shared for review, and exported as a Claude Code prompt.

It's free to try. The first one is always the slowest — once you've done a few, you internalize the template and start writing them in your head before you open the editor.

That's the real shift. The tool is a crutch until the template is instinct.

Writing for ClearSpec. If you have a spec that worked wonderfully — or one that failed in a way you learned from — reply on X or drop a note in the comments of a shared spec. I'm collecting examples for a follow-up post.

Further reading:

How to prompt Claude Code for a large feature — the full workflow from spec to committed code
How to write a CLAUDE.md that actually controls your AI coding agent — the persistent project context that pairs with every spec
.cursorrules examples that actually work — the same principles for Cursor users
The product spec template AI agents actually follow — expanded template with worked examples
Vibe coding vs spec-driven development
How to write acceptance criteria for AI coding agents — making your done conditions testable and specific
Claude Code vs Cursor vs GitHub Copilot — which should you use? — if you're comparing AI coding agents before committing