Chapter 9

AI-Driven TDD — Let Cursor Write Tests, You Provide Requirements

Chapter 9: AI-Driven TDD — Let Cursor Write Tests, You Provide Requirements

Traditional TDD fails for one reason: writing tests takes too long, and people skip them under deadline pressure. AI-TDD hands the mechanical work of writing test code to the AI, while you focus on describing requirements. This chapter walks through two complete AI-TDD cycles — Python FastAPI with pytest and TypeScript with Jest/Vitest — from "describe the interface" to "all tests green," with a quality review checklist for AI-generated tests.

Chapter goals: Understand the division of labor in AI-TDD and its core loop; write prompts that generate high-quality tests; use coverage reports to drive AI test completion; know which AI-generated test patterns need manual review.

AI-TDD vs Traditional Development: Workflow Comparison

Traditional Dev (code first) Traditional TDD (test first) AI-TDD
Test coverage 30-50% (frequently skipped) 80%+ (hard to sustain) 80%+ (AI handles repetitive work)
Time to write tests High (so it gets skipped) High (the pain point) Low (AI writes, human reviews)
Test quality Inconsistent High (thought through) Medium-high (needs review)
Edge case coverage Often missed Thorough Systematic (AI enumerates well)
Actually executed Often skipped Strictly enforced Strictly enforced
Team fit Everyone Teams with TDD discipline Everyone

The AI-TDD Core Loop

  1. Write the interface spec — function/endpoint signature + business rules (no implementation needed yet)
  2. AI generates tests — complete test suite including edge cases, based on your spec
  3. Run tests — all fail (expected) — red is the correct TDD starting point; it proves tests are testing real behavior
  4. AI implements the code to pass tests — provide the test file as context, ask AI to write the implementation
  5. Run tests, check results — some pass, some fail; move to next step
  6. AI fixes failing tests — paste failure output back to AI for analysis and fixes
  7. Repeat until all green

Retrofit mode works too: If you already have an implementation, ask AI to write tests for existing code. Strict test-first is not required — the goal is high-quality test coverage; the order is a means, not the end.

Full Walkthrough 1: Python FastAPI User Registration — AI-TDD

Step 1: Describe the spec, get AI to write tests

Key: you haven't written any implementation yet. Just the spec.

Cursor Chat — Round 1 Prompt

Write a complete pytest test suite for the following FastAPI endpoint.
No implementation exists yet — write tests first.

Endpoint spec:
POST /api/users/register
Request body: { "email": str, "password": str, "username": str }

Business rules:
- email must be a valid format (Pydantic EmailStr)
- password must be at least 8 characters, contain at least one uppercase letter and one digit
- username must be 3-20 characters, only letters, digits, and underscores
- Duplicate email returns 409 Conflict with code EMAIL_ALREADY_EXISTS
- Duplicate username returns 409 Conflict with code USERNAME_ALREADY_EXISTS (different code from email)
- Success returns 201 + { "id": uuid, "email": str, "username": str, "createdAt": datetime }
- Password must NEVER appear in any response

Test requirements:
- Use httpx AsyncClient with ASGITransport
- Mock UserService to avoid real database operations
- Fixtures in conftest.py, each test function independent

Output: complete conftest.py + tests/api/test_register.py

The AI generates a complete test file covering: success case (201 + correct response body), password-not-in-response security check, invalid email format (422), duplicate email (409 with correct error code), password too short (422), password no uppercase (422), password no digit (422), username too short/long/invalid-chars (422), duplicate username (409 with different code from email), and database exception (500 with no internal info leakage).

Step 2: Run tests — all fail (correct)

pytest tests/api/test_register.py -v

# Output: all FAILED — the endpoint isn't implemented yet
# This is correct. Red is the TDD starting point.

Step 3: Ask AI to implement the endpoint

Cursor Chat — Round 2 Prompt

@tests/api/test_register.py @tests/conftest.py

These tests define the complete behavioral specification. Implement:

1. app/schemas/user.py — Pydantic v2 RegisterRequest and UserResponse
   (include password validation: 8 chars + uppercase + digit; username regex)

2. app/core/exceptions.py — EmailAlreadyExistsError and UsernameAlreadyExistsError

3. app/api/users.py — FastAPI router with exception handlers mapping
   custom exceptions to correct HTTP status codes
   Response format: success { data } error { error: { code, message } }

UserService implementation can be a stub — the tests mock it anyway.

Step 4: Run tests, fix failures

Cursor Chat — Round 3 Prompt

pytest results:

FAILED test_password_no_uppercase_returns_422 — AssertionError: assert 200 == 422
FAILED test_duplicate_username_returns_409 — AssertionError: 'EMAIL_ALREADY_EXISTS' != 'USERNAME_ALREADY_EXISTS'

@app/schemas/user.py @app/api/users.py
Analyze the failures and fix them.

Repeat the "run → paste failures → AI fixes → run again" cycle until all tests are green. This cycle is the core of AI-TDD.

Full Walkthrough 2: TypeScript — AI-TDD for a Business Logic Function

Scenario: AI-TDD for calculateOrderTotal — handles discount stacking, tax calculation, and integer arithmetic.

Cursor Chat — Prompt

Write a complete Vitest test suite for the following function spec.
No implementation yet — tests first.

Function: calculateOrderTotal(items: OrderItem[], options: CalculationOptions): OrderTotal

Types:
interface OrderItem {
  productId: string
  unitPrice: number      // in cents (avoids float issues)
  quantity: number
  discountPercent?: number  // 0-100, item-level discount
}
interface CalculationOptions {
  taxRate: number        // 0-1, e.g. 0.08 = 8%
  couponDiscount?: number // 0-1, order-level discount
  currency: 'USD' | 'CNY' | 'EUR'
}
interface OrderTotal {
  subtotal: number       // before discounts (cents)
  discountAmount: number // total discount (cents)
  taxAmount: number      // tax (cents)
  total: number          // final amount (cents)
  currency: string
}

Business rules:
- All amounts are integers (cents), floor when rounding
- Item discounts applied before coupon discount
- Tax applied after all discounts
- quantity >= 1; unitPrice > 0; discountPercent 0-100; couponDiscount 0-1
- Empty items array: throw EmptyOrderError
- Invalid params: throw ValidationError

Cover: correct calculations for various discount combinations, all edge values, all error cases.
One test per scenario, descriptive names.
// TypeScript — key test examples (abbreviated)
it('applies item discount before coupon discount, tax applied last', () => {
  // unitPrice 2000, qty 1, item discount 10% → 1800
  // coupon discount 20% → 1440
  // tax 8% → floor(1440 * 0.08) = 115
  const result = calculateOrderTotal(
    [{ productId: 'p1', unitPrice: 2000, quantity: 1, discountPercent: 10 }],
    { taxRate: 0.08, couponDiscount: 0.2, currency: 'USD' }
  )
  expect(result.subtotal).toBe(2000)
  expect(result.discountAmount).toBe(560)   // 200 item + 360 coupon
  expect(result.taxAmount).toBe(115)
  expect(result.total).toBe(1555)
})

it('returns integer amounts for all fields (no floating point)', () => {
  const result = calculateOrderTotal(
    [{ productId: 'p1', unitPrice: 100, quantity: 1, discountPercent: 33 }],
    { taxRate: 0.075, currency: 'USD' }
  )
  expect(Number.isInteger(result.total)).toBe(true)
  expect(Number.isInteger(result.taxAmount)).toBe(true)
})

it('throws EmptyOrderError for empty items array', () => {
  expect(() => calculateOrderTotal([], { taxRate: 0.1, currency: 'USD' }))
    .toThrow(EmptyOrderError)
})

Using Coverage Reports to Drive AI Test Completion

Cursor Chat

Here's the coverage report:

app/services/order_service.py  72%
  Missing: 45-52, 89, 102-115

@app/services/order_service.py
Look at lines 45-52, 89, 102-115.
Analyze what business scenario each group of lines represents.
Write tests for each scenario — tests that validate real business constraints,
not coverage-padding assertions.
Append to @tests/services/test_order_service.py

AI Test Quality Problems: Manual Review Checklist

Quality Problem What It Looks Like Fix
Testing implementation, not behavior expect(mockRepo.findById).toHaveBeenCalledTimes(1) — verifies internal calls, not system output Add to prompt: "test observable behavior output, not internal function call counts"
Over-mocking Even core business logic is mocked — tests are testing mocks, not real code Be explicit: "Mock [specific IO dependencies], do NOT mock [core business classes]"
Happy path only 10 tests generated, all success cases, zero error scenarios List specific error scenarios in the prompt — don't let AI decide what errors matter
Weak assertions expect(response.status).toBe(200) and nothing else — no response body checks Add: "verify all key fields in the response body, not just status code"
Missing implicit business rules AI doesn't know "VIP users skip inventory checks" unless you tell it Manually add tests for critical business invariants — this is the 20% AI can't replace

Sustainable AI-TDD rhythm: Let AI generate 80% of boilerplate tests (happy paths, parameter validation, standard error cases). Use the saved time to handwrite the remaining 20% — business rules and implicit constraints that only you know. This division makes TDD genuinely sustainable in real projects.

Chapter Key Points

  1. AI-TDD solves the "who writes test code" problem: you describe requirements and review output; AI handles the mechanical writing — this division makes 80% coverage achievable where 30% was the norm.
  2. Write the spec first, then ask for tests: the more detailed your spec (business rules, boundary conditions, error codes), the more complete the generated tests. Function signature + business rules + error cases are the three required elements.
  3. "All red" is the correct TDD starting point: running tests before implementation and seeing them all fail proves the tests are testing real behavior — not just passing by accident.
  4. Coverage reports are AI context: paste uncovered line numbers directly to AI, ask it to analyze what business scenarios those lines represent, then write tests for them — 10x faster than manual analysis.
  5. AI doesn't know your implicit business rules: only you know "VIP users skip inventory checks" — critical business path tests must be written by hand; this is the 20% of AI-TDD that can't be automated.
Rate this chapter
4.7  / 5  (35 ratings)

💬 Comments