Chapter 44

Sub-agents Multi-Agent Collaboration: Definitions, Built-in Types and Agent Teams Concurrent Orchestration

Chapter 44: Claude Code and Test-Driven Development: AI Writes Tests, Sees Red, Fixes Code

44.1 Why TDD and AI Programming Are a Natural Pair

Test-Driven Development (TDD) is a repeatedly validated methodology in software engineering — but it faces serious resistance in practice because writing tests is slower than writing implementation code, and many developers skip it.

Claude Code changes that equation.

When AI is responsible for writing tests, the "tests take too long" excuse disappears. When AI can immediately see failing test output and fix the code, the TDD feedback loop goes from "requires the developer to manually execute multiple steps" to "Claude iterates automatically until everything is green."

The combination of TDD and AI programming solves two long-standing pain points:

TDD's execution cost problem: AI eliminates the time cost of writing tests by hand
AI code's quality verification problem: tests provide an objective quality measure for AI-generated code

Put simply: when you make the AI write tests, the AI cannot lie to you. Tests are a precise behavioral specification — only code that truly works can make tests pass.

44.2 The Three Phases of TDD: Red, Green, Refactor

Classic TDD follows a "Red-Green-Refactor" cycle:

Red:    Write a failing test
    ↓
Green:  Write minimal code to make the test pass
    ↓
Refactor: Improve the code while keeping tests passing
    ↓
(repeat)

With Claude Code's assistance, this loop can accelerate dramatically:

Human: Describe the expected behavior
    ↓
Claude: Translate behavior into test code (Red)
    ↓
Claude: Run tests, confirm they fail
    ↓
Claude: Write implementation code (minimal)
    ↓
Claude: Run tests; if still failing, continue adjusting
    ↓
Claude: Tests pass; display results (Green)
    ↓
Human/Claude: Assess code quality; refactor if necessary

44.3 A TDD Example from Scratch

Here is a real-world scenario: adding a "currency formatting" function to a utility library.

Step 1: Clarify the Requirements Spec

Describe the feature to Claude, but do not ask it to write the implementation yet — ask it to write tests first:

I need a formatCurrency(amount, currency, locale) function with this specification:
- Formats a number as a localized currency string
- Supports USD, EUR, CNY currency codes
- Supports en-US, zh-CN, de-DE locales
- Negative amounts should display a minus sign
- Zero should return "0.00" format (with currency symbol)
- Non-numeric amounts should throw a TypeError

Please write the tests first. Do not write the implementation yet.

Step 2: Claude Writes Tests (Red)

Claude generates tests like these:

// src/utils/formatCurrency.test.ts
import { describe, it, expect } from 'vitest';
import { formatCurrency } from './formatCurrency';

describe('formatCurrency', () => {
  describe('basic currency formatting', () => {
    it('should format USD in US English style', () => {
      expect(formatCurrency(1234.56, 'USD', 'en-US')).toBe('$1,234.56');
    });

    it('should format EUR in German style', () => {
      expect(formatCurrency(1234.56, 'EUR', 'de-DE')).toBe('1.234,56 €');
    });

    it('should format CNY in Chinese style', () => {
      expect(formatCurrency(1234.56, 'CNY', 'zh-CN')).toBe('¥1,234.56');
    });
  });

  describe('edge cases', () => {
    it('negative amounts should display a minus sign', () => {
      expect(formatCurrency(-99.99, 'USD', 'en-US')).toBe('-$99.99');
    });

    it('zero should be formatted correctly', () => {
      expect(formatCurrency(0, 'USD', 'en-US')).toBe('$0.00');
    });

    it('non-numeric input should throw TypeError', () => {
      expect(() => formatCurrency('abc' as any, 'USD', 'en-US')).toThrow(TypeError);
      expect(() => formatCurrency(null as any, 'USD', 'en-US')).toThrow(TypeError);
    });
  });

  describe('large numbers', () => {
    it('should correctly handle numbers above one million', () => {
      expect(formatCurrency(1_000_000, 'USD', 'en-US')).toBe('$1,000,000.00');
    });
  });
});

Then run the tests:

pnpm test src/utils/formatCurrency.test.ts

The output shows all tests failing (red), because formatCurrency does not exist yet. Claude displays and confirms this failing output.

Step 3: Claude Writes the Implementation (Green)

After confirming the red state, Claude writes a minimal implementation:

// src/utils/formatCurrency.ts

export function formatCurrency(
  amount: number,
  currency: 'USD' | 'EUR' | 'CNY',
  locale: 'en-US' | 'zh-CN' | 'de-DE'
): string {
  if (typeof amount !== 'number' || isNaN(amount)) {
    throw new TypeError(
      `Invalid amount: expected a number, got ${typeof amount}`
    );
  }

  return new Intl.NumberFormat(locale, {
    style: 'currency',
    currency: currency,
    minimumFractionDigits: 2,
    maximumFractionDigits: 2,
  }).format(amount);
}

Claude runs the tests again:

pnpm test src/utils/formatCurrency.test.ts

Most tests should pass; some locale formats may have subtle differences across operating systems (for example, the EUR format may vary slightly in spacing). Claude analyzes any remaining failures and adjusts either the test assertions or the implementation until everything passes.

Step 4: Refactoring

Once all tests pass, refactoring begins:

All tests are passing. Please refactor this function:
1. Use types instead of string literal unions for currency and locale
2. Extract an internal validateAmount function
3. Add JSDoc comments
Keep all tests passing.

After refactoring, Claude runs the tests once more to confirm no regressions were introduced.

44.4 Test File Organization

When doing TDD with Claude Code's assistance, test file organization matters:

Strategy 1: Tests Alongside Implementation

src/
├── utils/
│   ├── formatCurrency.ts          # implementation
│   ├── formatCurrency.test.ts     # unit tests
│   ├── parseDate.ts
│   └── parseDate.test.ts

Advantage: implementation and tests are tightly associated, making it easy for Claude to locate and operate on them together.

Strategy 2: Separate Test Directory

src/
├── utils/
│   ├── formatCurrency.ts
│   └── parseDate.ts
tests/
├── unit/
│   └── utils/
│       ├── formatCurrency.test.ts
│       └── parseDate.test.ts
└── integration/
    └── api/
        └── users.test.ts

Advantage: clear separation of test types; unit and integration tests can be run independently.

Document your chosen strategy in CLAUDE.md so Claude knows where test files belong:

## Test File Conventions

- Unit tests live alongside implementation files, named `*.test.ts`
- Integration tests live in `tests/integration/`
- E2E tests live in `tests/e2e/`
- Run all tests: `pnpm test`
- Run only unit tests: `pnpm test:unit`

44.5 Getting Claude to Handle Tricky Test Scenarios

Testing Async Code

Please write tests for the following async function:
async function fetchUserData(userId: string): Promise<User>
- Returns a User object on success
- Throws a 404 error when userId does not exist
- Throws a network error on fetch failure
Mock the fetch call.

Claude generates tests using vi.mock and handles async boundaries correctly:

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { fetchUserData } from './fetchUserData';

const mockFetch = vi.fn();
global.fetch = mockFetch;

describe('fetchUserData', () => {
  beforeEach(() => {
    mockFetch.mockClear();
  });

  it('should return user data on success', async () => {
    const mockUser = { id: '123', name: 'Alice', email: '[email protected]' };
    mockFetch.mockResolvedValueOnce({
      ok: true,
      json: () => Promise.resolve(mockUser),
    });

    const result = await fetchUserData('123');
    expect(result).toEqual(mockUser);
    expect(mockFetch).toHaveBeenCalledWith('/api/users/123');
  });

  it('should throw NotFoundError when user does not exist', async () => {
    mockFetch.mockResolvedValueOnce({
      ok: false,
      status: 404,
      json: () => Promise.resolve({ message: 'User not found' }),
    });

    await expect(fetchUserData('nonexistent')).rejects.toThrow('User not found');
  });

  it('should throw TypeError on network failure', async () => {
    mockFetch.mockRejectedValueOnce(new TypeError('Failed to fetch'));

    await expect(fetchUserData('123')).rejects.toThrow(TypeError);
  });
});

Testing React Components

Write tests for the following React component using React Testing Library:
- UserCard component, props: { user: User, onEdit: () => void }
- Test: renders user name and email; clicking "Edit" button triggers onEdit

Claude generates:

import { render, screen, fireEvent } from '@testing-library/react';
import { describe, it, expect, vi } from 'vitest';
import { UserCard } from './UserCard';

const mockUser = {
  id: '1',
  name: 'Alice Johnson',
  email: '[email protected]',
};

describe('UserCard', () => {
  it('should render the user name and email', () => {
    render(<UserCard user={mockUser} onEdit={() => {}} />);

    expect(screen.getByText('Alice Johnson')).toBeInTheDocument();
    expect(screen.getByText('[email protected]')).toBeInTheDocument();
  });

  it('clicking the edit button should call onEdit', () => {
    const onEdit = vi.fn();
    render(<UserCard user={mockUser} onEdit={onEdit} />);

    fireEvent.click(screen.getByRole('button', { name: /edit/i }));

    expect(onEdit).toHaveBeenCalledTimes(1);
  });
});

Testing Database Operations

Write integration tests for UserRepository.create:
- Successfully creates a user and returns an object with an id
- Throws UniqueConstraintError when email already exists
Use a test database and clean up after each test.

Claude handles database cleanup with beforeEach/afterEach:

import { describe, it, expect, afterEach } from 'vitest';
import { UserRepository } from './UserRepository';
import { db } from '@/lib/test-db';  // dedicated test database connection

describe('UserRepository.create', () => {
  const repo = new UserRepository(db);

  afterEach(async () => {
    await db.user.deleteMany({ where: { email: { contains: '@test.example' } } });
  });

  it('should successfully create a user and return an id', async () => {
    const created = await repo.create({
      name: 'Test User',
      email: '[email protected]',
    });

    expect(created.id).toBeDefined();
    expect(created.name).toBe('Test User');
    expect(created.email).toBe('[email protected]');
  });

  it('should throw UniqueConstraintError when email is duplicated', async () => {
    await repo.create({ name: 'User 1', email: '[email protected]' });

    await expect(
      repo.create({ name: 'User 2', email: '[email protected]' })
    ).rejects.toThrow('UniqueConstraintError');
  });
});

44.6 Coverage Analysis and Test Gap Filling

When an existing codebase lacks tests, you can ask Claude to analyze coverage and fill the gaps:

Run the test coverage report: pnpm test --coverage
Then identify files with coverage below 70% and prioritize writing tests for:
1. Functions containing business logic
2. Boundary condition handling code
3. Error handling branches

Claude will:

Run the coverage command and analyze the output
Identify uncovered code paths
Generate targeted tests for each uncovered path
Re-run coverage to confirm improvement

44.7 Configuring TDD Workflow in CLAUDE.md

Configure TDD as the default workflow in CLAUDE.md so Claude follows it automatically:

## Development Workflow: Test-Driven Development

**Default principle: Tests first, implementation second**

When writing code for new features or functions, Claude must:
1. Write the test file first (even if the function doesn't exist yet)
2. Run tests and confirm they fail (red)
3. Write a minimal implementation to make tests pass (green)
4. Report the passing test results

**Forbidden behaviors:**
- Do not write implementation code without first writing tests
- Do not skip the step of running tests
- Do not modify tests to make them pass (unless requirements changed)

**Test framework: Vitest**
Commands:
- `pnpm test` — run all tests
- `pnpm test <file-path>` — run tests for a specific file
- `pnpm test --coverage` — run with coverage report

44.8 Handling Common TDD Challenges

Challenge 1: Divergence Between Test Spec and Intent

Sometimes Claude's test specification has subtle differences from your expectations. Solution: after writing the tests but before running them, have Claude explain each test case in plain English. Confirm the intent matches before proceeding.

Challenge 2: Overly Complex Mock Design

Excessive mocking leads to brittle tests. Specify mock principles in CLAUDE.md:

## Test Mock Principles

- Only mock external dependencies (HTTP calls, database, filesystem)
- Do not mock internal functions of the module under test
- Prefer real implementations over mocks (in-memory DB > mock DB)
- Tests should verify behavior, not implementation details

Challenge 3: Slow Test Execution

Database integration tests can be slow. Have Claude separate fast and slow tests:

# Fast unit tests (run on every commit in CI)
pnpm test:unit

# Slow integration tests (run on every PR in CI)
pnpm test:integration

Summary

The combination of TDD and Claude Code is mutually reinforcing: TDD gives Claude a precise specification for generating code, and Claude eliminates the execution cost of TDD.

Key takeaways:

Having Claude write failing tests before writing implementation is the most reliable way to ensure code correctness
The Red-Green-Refactor cycle can be fully automated by Claude; humans just confirm at key checkpoints
Claude can generate correct mocks and test structures for complex scenarios including async code, React components, and database operations
Configure TDD as the default workflow in CLAUDE.md to prevent Claude from skipping the test step
Through coverage analysis, Claude can systematically fill test gaps in existing codebases

Rate this chapter

4.8 / 5 (3 ratings)