AI-Driven TDD — Let Cursor Write Tests, You Provide Requirements
Chapter 9: AI-Driven TDD — Let Cursor Write Tests, You Provide Requirements
Traditional TDD fails for one reason: writing tests takes too long, and people skip them under deadline pressure. AI-TDD hands the mechanical work of writing test code to the AI, while you focus on describing requirements. This chapter walks through two complete AI-TDD cycles — Python FastAPI with pytest and TypeScript with Jest/Vitest — from "describe the interface" to "all tests green," with a quality review checklist for AI-generated tests.
Chapter goals: Understand the division of labor in AI-TDD and its core loop; write prompts that generate high-quality tests; use coverage reports to drive AI test completion; know which AI-generated test patterns need manual review.
AI-TDD vs Traditional Development: Workflow Comparison
| Traditional Dev (code first) | Traditional TDD (test first) | AI-TDD | |
|---|---|---|---|
| Test coverage | 30-50% (frequently skipped) | 80%+ (hard to sustain) | 80%+ (AI handles repetitive work) |
| Time to write tests | High (so it gets skipped) | High (the pain point) | Low (AI writes, human reviews) |
| Test quality | Inconsistent | High (thought through) | Medium-high (needs review) |
| Edge case coverage | Often missed | Thorough | Systematic (AI enumerates well) |
| Actually executed | Often skipped | Strictly enforced | Strictly enforced |
| Team fit | Everyone | Teams with TDD discipline | Everyone |
The AI-TDD Core Loop
- Write the interface spec — function/endpoint signature + business rules (no implementation needed yet)
- AI generates tests — complete test suite including edge cases, based on your spec
- Run tests — all fail (expected) — red is the correct TDD starting point; it proves tests are testing real behavior
- AI implements the code to pass tests — provide the test file as context, ask AI to write the implementation
- Run tests, check results — some pass, some fail; move to next step
- AI fixes failing tests — paste failure output back to AI for analysis and fixes
- Repeat until all green
Retrofit mode works too: If you already have an implementation, ask AI to write tests for existing code. Strict test-first is not required — the goal is high-quality test coverage; the order is a means, not the end.
Full Walkthrough 1: Python FastAPI User Registration — AI-TDD
Step 1: Describe the spec, get AI to write tests
Key: you haven't written any implementation yet. Just the spec.
Cursor Chat — Round 1 Prompt
Write a complete pytest test suite for the following FastAPI endpoint.
No implementation exists yet — write tests first.
Endpoint spec:
POST /api/users/register
Request body: { "email": str, "password": str, "username": str }
Business rules:
- email must be a valid format (Pydantic EmailStr)
- password must be at least 8 characters, contain at least one uppercase letter and one digit
- username must be 3-20 characters, only letters, digits, and underscores
- Duplicate email returns 409 Conflict with code EMAIL_ALREADY_EXISTS
- Duplicate username returns 409 Conflict with code USERNAME_ALREADY_EXISTS (different code from email)
- Success returns 201 + { "id": uuid, "email": str, "username": str, "createdAt": datetime }
- Password must NEVER appear in any response
Test requirements:
- Use httpx AsyncClient with ASGITransport
- Mock UserService to avoid real database operations
- Fixtures in conftest.py, each test function independent
Output: complete conftest.py + tests/api/test_register.py
The AI generates a complete test file covering: success case (201 + correct response body), password-not-in-response security check, invalid email format (422), duplicate email (409 with correct error code), password too short (422), password no uppercase (422), password no digit (422), username too short/long/invalid-chars (422), duplicate username (409 with different code from email), and database exception (500 with no internal info leakage).
Step 2: Run tests — all fail (correct)
pytest tests/api/test_register.py -v
# Output: all FAILED — the endpoint isn't implemented yet
# This is correct. Red is the TDD starting point.
Step 3: Ask AI to implement the endpoint
Cursor Chat — Round 2 Prompt
@tests/api/test_register.py @tests/conftest.py
These tests define the complete behavioral specification. Implement:
1. app/schemas/user.py — Pydantic v2 RegisterRequest and UserResponse
(include password validation: 8 chars + uppercase + digit; username regex)
2. app/core/exceptions.py — EmailAlreadyExistsError and UsernameAlreadyExistsError
3. app/api/users.py — FastAPI router with exception handlers mapping
custom exceptions to correct HTTP status codes
Response format: success { data } error { error: { code, message } }
UserService implementation can be a stub — the tests mock it anyway.
Step 4: Run tests, fix failures
Cursor Chat — Round 3 Prompt
pytest results:
FAILED test_password_no_uppercase_returns_422 — AssertionError: assert 200 == 422
FAILED test_duplicate_username_returns_409 — AssertionError: 'EMAIL_ALREADY_EXISTS' != 'USERNAME_ALREADY_EXISTS'
@app/schemas/user.py @app/api/users.py
Analyze the failures and fix them.
Repeat the "run → paste failures → AI fixes → run again" cycle until all tests are green. This cycle is the core of AI-TDD.
Full Walkthrough 2: TypeScript — AI-TDD for a Business Logic Function
Scenario: AI-TDD for calculateOrderTotal — handles discount stacking, tax calculation, and integer arithmetic.
Cursor Chat — Prompt
Write a complete Vitest test suite for the following function spec.
No implementation yet — tests first.
Function: calculateOrderTotal(items: OrderItem[], options: CalculationOptions): OrderTotal
Types:
interface OrderItem {
productId: string
unitPrice: number // in cents (avoids float issues)
quantity: number
discountPercent?: number // 0-100, item-level discount
}
interface CalculationOptions {
taxRate: number // 0-1, e.g. 0.08 = 8%
couponDiscount?: number // 0-1, order-level discount
currency: 'USD' | 'CNY' | 'EUR'
}
interface OrderTotal {
subtotal: number // before discounts (cents)
discountAmount: number // total discount (cents)
taxAmount: number // tax (cents)
total: number // final amount (cents)
currency: string
}
Business rules:
- All amounts are integers (cents), floor when rounding
- Item discounts applied before coupon discount
- Tax applied after all discounts
- quantity >= 1; unitPrice > 0; discountPercent 0-100; couponDiscount 0-1
- Empty items array: throw EmptyOrderError
- Invalid params: throw ValidationError
Cover: correct calculations for various discount combinations, all edge values, all error cases.
One test per scenario, descriptive names.
// TypeScript — key test examples (abbreviated)
it('applies item discount before coupon discount, tax applied last', () => {
// unitPrice 2000, qty 1, item discount 10% → 1800
// coupon discount 20% → 1440
// tax 8% → floor(1440 * 0.08) = 115
const result = calculateOrderTotal(
[{ productId: 'p1', unitPrice: 2000, quantity: 1, discountPercent: 10 }],
{ taxRate: 0.08, couponDiscount: 0.2, currency: 'USD' }
)
expect(result.subtotal).toBe(2000)
expect(result.discountAmount).toBe(560) // 200 item + 360 coupon
expect(result.taxAmount).toBe(115)
expect(result.total).toBe(1555)
})
it('returns integer amounts for all fields (no floating point)', () => {
const result = calculateOrderTotal(
[{ productId: 'p1', unitPrice: 100, quantity: 1, discountPercent: 33 }],
{ taxRate: 0.075, currency: 'USD' }
)
expect(Number.isInteger(result.total)).toBe(true)
expect(Number.isInteger(result.taxAmount)).toBe(true)
})
it('throws EmptyOrderError for empty items array', () => {
expect(() => calculateOrderTotal([], { taxRate: 0.1, currency: 'USD' }))
.toThrow(EmptyOrderError)
})
Using Coverage Reports to Drive AI Test Completion
Cursor Chat
Here's the coverage report:
app/services/order_service.py 72%
Missing: 45-52, 89, 102-115
@app/services/order_service.py
Look at lines 45-52, 89, 102-115.
Analyze what business scenario each group of lines represents.
Write tests for each scenario — tests that validate real business constraints,
not coverage-padding assertions.
Append to @tests/services/test_order_service.py
AI Test Quality Problems: Manual Review Checklist
| Quality Problem | What It Looks Like | Fix |
|---|---|---|
| Testing implementation, not behavior | expect(mockRepo.findById).toHaveBeenCalledTimes(1) — verifies internal calls, not system output |
Add to prompt: "test observable behavior output, not internal function call counts" |
| Over-mocking | Even core business logic is mocked — tests are testing mocks, not real code | Be explicit: "Mock [specific IO dependencies], do NOT mock [core business classes]" |
| Happy path only | 10 tests generated, all success cases, zero error scenarios | List specific error scenarios in the prompt — don't let AI decide what errors matter |
| Weak assertions | expect(response.status).toBe(200) and nothing else — no response body checks |
Add: "verify all key fields in the response body, not just status code" |
| Missing implicit business rules | AI doesn't know "VIP users skip inventory checks" unless you tell it | Manually add tests for critical business invariants — this is the 20% AI can't replace |
Sustainable AI-TDD rhythm: Let AI generate 80% of boilerplate tests (happy paths, parameter validation, standard error cases). Use the saved time to handwrite the remaining 20% — business rules and implicit constraints that only you know. This division makes TDD genuinely sustainable in real projects.
Chapter Key Points
- AI-TDD solves the "who writes test code" problem: you describe requirements and review output; AI handles the mechanical writing — this division makes 80% coverage achievable where 30% was the norm.
- Write the spec first, then ask for tests: the more detailed your spec (business rules, boundary conditions, error codes), the more complete the generated tests. Function signature + business rules + error cases are the three required elements.
- "All red" is the correct TDD starting point: running tests before implementation and seeing them all fail proves the tests are testing real behavior — not just passing by accident.
- Coverage reports are AI context: paste uncovered line numbers directly to AI, ask it to analyze what business scenarios those lines represent, then write tests for them — 10x faster than manual analysis.
- AI doesn't know your implicit business rules: only you know "VIP users skip inventory checks" — critical business path tests must be written by hand; this is the 20% of AI-TDD that can't be automated.