← All posts
·6 min read

How to write acceptance criteria for AI coding agents (with examples)

Acceptance criteria written for human engineers fail when handed to AI coding agents. Here's what makes the difference — with before/after examples for common feature types.

When you write acceptance criteria for a human engineer, you're writing a checklist that anchors a conversation. The engineer reads it, notices the ambiguities, and asks you what you meant. The back-and-forth is part of the process.

When you write acceptance criteria for an AI coding agent, there is no back-and-forth. The agent reads your criteria literally, fills in every gap with its best guess, and ships something that technically satisfies what you wrote — while completely missing what you meant.

This is the core problem with copy-pasting user stories into Claude Code or Cursor and expecting them to work. AC written for humans isn't written precisely enough for machines.

Here's how to write acceptance criteria that AI coding agents can actually execute.

Why human AC fails with AI agents

Consider a standard user story with acceptance criteria:

Feature: User authentication

AC:
- Users can sign in with email and password
- Invalid credentials show an error
- Users stay signed in after refreshing the page

A human engineer reads this and immediately knows: use the existing auth system, follow the existing UI patterns, persist the session via a cookie or token, return a 401 for invalid credentials. None of this is written down — it's filled in from context.

An AI agent reads this and makes all the same inferences — but from its training data, not your codebase. It might implement a custom session store instead of using your existing one. It might add a password reset flow you didn't ask for because "real auth flows have that." It might use a UI pattern from a tutorial it was trained on that doesn't match your component library.

The criteria were met. The implementation was wrong.

What good AC for AI agents looks like

Good AI acceptance criteria are falsifiable and implementation-specific. Not "users can sign in" (behavioral intent) but "the /api/auth/login endpoint returns a 200 with a JWT on correct credentials and a 401 with {error: 'invalid_credentials'} on incorrect ones" (testable specification).

The rule of thumb: if you can't write a failing test for the criterion, the criterion is too vague.

Every acceptance criterion for an AI agent should answer four questions:

  1. Trigger: what action or input initiates this?
  2. Output: what exactly should result?
  3. Verification method: how can this be tested without manual inspection?
  4. Explicit exclusions: what should NOT happen or be implemented?

The fourth point is the most commonly skipped — and the one that causes the most scope drift. AI agents are helpful. If you don't tell them what to leave out, they'll include it.

Before/after examples

Authentication

Before (human AC):

- Users can sign in with email and password
- Invalid credentials show an error
- Session persists after page refresh

After (AI agent AC):

- POST /api/auth/login with {email, password} returns:
  - 200 + {token: <jwt>, user: {id, email, plan}} on correct credentials
  - 401 + {error: "invalid_credentials"} on wrong password
  - 400 + {error: "missing_fields"} if email or password is absent
- The JWT is stored in localStorage under the key "auth_token"
- Subsequent requests include Authorization: Bearer <token> header
- Verification: unit test for each response case; e2e test that signs in and refreshes
- Out of scope: password reset, OAuth, email verification, rate limiting

The "out of scope" line alone eliminates a category of unwanted implementation. Without it, an AI agent may reasonably decide that a login form should also include "Forgot password?" — because that's what login forms have.

UI component

Before (human AC):

- Show a loading spinner while data is fetching
- Display an empty state when there are no items
- Render item cards in a grid layout

After (AI agent AC):

- While fetching: render the existing <Spinner> component (src/components/ui/Spinner.tsx)
  centered in the parent container. Do not create a new spinner.
- Empty state: render the existing <EmptyState> component with title="No specs yet"
  and a "Create your first spec" button linking to /app/specs/new
- Grid: 1 column on mobile (<768px), 2 columns on tablet, 3 columns on desktop.
  Use the existing grid classes from src/app/[locale]/app/specs/page.tsx as the pattern.
- Verification: visual check at each breakpoint; Cypress test for empty state render
- Out of scope: pagination, sorting, filtering, drag-to-reorder

Pointing to existing components and files tells the agent what to reuse instead of what to create. Without explicit file references, it will create new components that duplicate existing ones.

API endpoint

Before (human AC):

- Users can create a new spec
- The spec is saved to the database
- Returns the created spec

After (AI agent AC):

- POST /api/specs with {title, description, ...fields} (see Zod schema in src/lib/validation.ts)
- Auth: calls getAuthenticatedUser() from src/lib/auth — returns 401 if unauthenticated
- DB: inserts into the specs table with user_id from the authenticated user — no exceptions
- Response: 201 + {id, title, created_at, user_id} on success
- Response: 400 + {error: "validation_error", fields: {...}} on invalid input
- Response: 401 + {error: "unauthenticated"} if no valid session
- Verification: unit tests for all response cases; test with and without auth header
- Out of scope: spec versioning, collaborative editing, webhooks, email notifications

Every field that could vary should be specified. "Returns the created spec" is a description of intent — the "after" version is a contract.

Error handling

Before (human AC):

- Show an error message when the API call fails
- The user can retry

After (AI agent AC):

- On any non-2xx response from /api/specs: render the existing <ErrorBanner> component
  (src/components/ui/ErrorBanner.tsx) with the error message from response.error, or
  "Something went wrong. Please try again." if response.error is absent
- Show a "Retry" button that re-submits the form with the same values (do not clear the form)
- Network errors (fetch throws): same ErrorBanner with "Check your connection and try again."
- Success after retry: dismiss the error banner and continue normally
- Verification: test with network-disabled (Cypress cy.intercept returning 500)
- Out of scope: error analytics, support contact link, automatic retry with backoff

The falsifiable test rule in practice

Before finalizing any acceptance criterion, ask: "Can I write a failing test for this right now?"

  • "The page loads quickly" → can't write a test. Too vague.

  • "The initial page load completes in under 2 seconds on a 4G connection" → can write a Lighthouse test. Specific enough.

  • "The form validates correctly" → can't write a test. What's "correctly"?

  • "Submitting the form with an empty email field renders an error message: 'Email is required'" → can write a test. Specific.

This isn't about whether you'll actually write the tests — it's about whether the criterion is precise enough for an AI agent to implement correctly. If you can write a test for it, the agent can implement it to pass that test. If you can't, the agent will interpret it based on training data.

How to generate better AC faster

Writing AC at this level of precision is time-consuming when you do it by hand. The structured thinking is worth it — but the mechanical work of specifying triggers, outputs, verification methods, and exclusions for every criterion adds up.

ClearSpec generates this level of specificity from a plain-language description of the feature. It asks you five questions and produces acceptance criteria with inputs, outputs, edge cases, and explicit out-of-scope guards — ready to paste directly into a Claude Code or Cursor prompt.

If you're doing this manually, use the four-question structure (trigger, output, verification, exclusions) as a checklist for every criterion before it goes into a prompt.

Where AC fits in the larger workflow

Acceptance criteria are the "done condition" in the large feature workflow — the third component of every task prompt after context and the task description. Precise AC means precise done conditions means confident task completion.

They're also the core of a well-written spec — the bridge between "what this feature does" and "how to verify it's done." Every spec that goes into an AI coding agent should have AC at the level of specificity described here.


The pattern across all examples above is the same: turn behavioral descriptions into testable contracts. The more a criterion reads like a test, the better an AI agent can implement it.