Chapter 48

Claude Code + CI/CD: PR Auto-Review, Issue Handling and Complete GitHub Actions Configuration

Chapter 48: Fast Mode and Cost Control: Balancing Efficiency and Cost for High-Frequency Tasks

48.1 Understanding Claude Code's Cost Structure

Before using Claude Code for daily development, it is important to understand its cost structure. Every Claude Code conversation consumes Anthropic API tokens; the cost depends on:

Input tokens: everything sent to Claude (CLAUDE.md + conversation history + your request + file contents read in)
Output tokens: everything Claude generates (replies + code + tool call arguments)
The model used: different models (Claude Opus, Sonnet, Haiku) differ significantly in price

Approximate reference prices (2024 pricing, for order-of-magnitude reference):

Claude Opus 4.5:    Input $15/M tokens, Output $75/M tokens
Claude Sonnet 4.5:  Input $3/M tokens,  Output $15/M tokens
Claude Haiku 3.5:   Input $0.8/M tokens, Output $4/M tokens

For a typical programming session (around 50K tokens), the cost is roughly:

Using Opus: $0.75–2.00
Using Sonnet: $0.15–0.40
Using Haiku: $0.04–0.10

These numbers look small individually, but with multiple sessions per day and a full team using Claude Code, costs accumulate rapidly. Understanding cost control strategies lets you cut spending significantly without sacrificing developer productivity.

48.2 What Is Fast Mode

Claude Code's Fast Mode is a mechanism that automatically or manually uses a faster, cheaper model in certain scenarios.

In normal mode, Claude Code uses your configured primary model (typically Claude Sonnet or Opus) for all tasks. Fast Mode allows Claude to switch to a lighter-weight model (such as Claude Haiku) in situations like:

Simple queries: answering a simple question about code
Minor formatting changes: adjusting indentation or reformatting strings
Status checks: confirming whether a file exists or reading simple config values
Repetitive tool calls: performing similar file operations in a loop

Using an Opus-class model for these tasks is pure waste. Fast Mode lets Claude Code adaptively select the model based on task complexity, dramatically cutting costs while maintaining quality.

Enabling Fast Mode

Configure it in .claude/settings.json:

{
  "model": "claude-sonnet-4-5",
  "fastMode": {
    "enabled": true,
    "model": "claude-haiku-3-5",
    "triggers": [
      "simple-query",
      "file-check",
      "format-only"
    ]
  }
}

Or describe the model-use strategy in CLAUDE.md:

## Model Selection Strategy

For the following tasks, Fast Mode (Haiku model) may be used:
- Confirming whether a file exists
- Reading configuration values
- Simple formatting adjustments
- Single-line code changes

For the following tasks, the full model (Sonnet or Opus) is required:
- Architecture design and refactoring
- Complex bug analysis
- Multi-file coordinated changes
- Security-related code review

48.3 The Cost Impact of Context Window Growth

One easily overlooked cost source in Claude Code is context window accumulation. As a conversation grows longer, each new request must send the entire conversation history, causing input token counts to increase continuously.

Conversation Length vs. Cost

Request at turn 1:
  Input = CLAUDE.md (2K) + user message (0.5K) = 2.5K tokens

Request at turn 10:
  Input = CLAUDE.md (2K) + 9 turns of history (~20K) + user message (0.5K) = 22.5K tokens

Request at turn 20:
  Input = CLAUDE.md (2K) + 19 turns of history (~45K) + user message (0.5K) = 47.5K tokens

In a session that runs 20 turns, the per-request cost in the final turns may be 10–20 times higher than it was at the start.

Using /compact to Compress Conversation History

/compact is a Claude Code built-in command that compresses the current session's conversation history, dramatically reducing context token count:

/compact

After execution, Claude condenses the conversation history into a concise summary, freeing large amounts of context space. Subsequent requests will have significantly fewer input tokens.

Recommended times to use it:

After completing a subtask (for example, after fixing a bug), before starting the next task
When the session exceeds 20–30 turns
When Claude begins showing signs of "forgetting" earlier instructions (possible context overflow)

Using /clear to Completely Reset Context

When you need to start a completely new task, the /clear command wipes the entire conversation history:

/clear

This is more thorough than /compact and is appropriate for complete task transitions. The tradeoff is that Claude loses all conversation history, though the project context provided by CLAUDE.md remains.

48.4 The Token Cost of CLAUDE.md

CLAUDE.md is loaded with every request; its size directly affects the baseline cost of each request.

Using different CLAUDE.md sizes as examples, assuming 50 Claude requests per day using the Sonnet model:

CLAUDE.md size   | Extra cost/request | Extra cost/day | Extra cost/month
100 lines (~3K)  | ~$0.009           | ~$0.45         | ~$13.50
500 lines (~15K) | ~$0.045           | ~$2.25         | ~$67.50
1000 lines (~30K)| ~$0.090           | ~$4.50         | ~$135.00

This does not mean CLAUDE.md should be as short as possible — a well-written CLAUDE.md saves far more in repeated communication costs than it adds in token overhead. But it does suggest:

Remove content that is no longer relevant (for example, a completed migration plan)
Avoid repeating the same rule in multiple places
Use @import for selective loading; do not put everything in the main file

48.5 The Cost of Reading Files

Every Read tool call loads file content into the context; large files significantly increase costs.

Strategy 1: Avoid Reading Unnecessary Files

Tell Claude in CLAUDE.md which files do not need to be read:

## File Reading Principles

- Do not read files under node_modules/
- Do not read files under .git/
- Do not read compiled output under dist/ or build/
- Do not read log files longer than 1000 lines

Strategy 2: Use Grep Instead of Reading Full Files

In many cases you only need part of a file. The Grep tool is far more efficient than Read:

# Inefficient: read the entire file to find one function
Read src/services/user.ts (may be 500 lines)

# Efficient: use Grep to locate it
Search for "function getUserById" in src/services/

Grep returns only matching lines and minimal context, consuming far fewer tokens than reading a complete file.

Strategy 3: Specify a Read Range

When you must read a file, use the Read tool's offset and limit parameters:

Please read lines 50–100 of src/api/users.ts

This saves significant tokens compared to reading the entire file.

48.6 Choosing the Right Model

Different tasks demand very different levels of model capability. Matching the model to the task is the most direct cost control lever.

Task-to-Model Matching Matrix

Task type                         Recommended model   Reason
────────────────────────────────────────────────────────────────────
Complex architecture design       Opus                Needs maximum reasoning
Multi-file refactor (>10 files)   Opus / Sonnet       Needs long-range context
Security review                   Opus                High cost of errors
Complex bug analysis              Sonnet              Balanced capability/cost
Single-file feature implementation Sonnet             Everyday workhorse
Code formatting                   Haiku               Simple task
Adding comments                   Haiku               Simple task
Simple Q&A                        Haiku               Simple task
File existence check              Haiku               Minimal task

Switching models in Claude Code:

# Specify model at startup
claude --model claude-haiku-3-5

# Or set default model in .claude/settings.json
{
  "model": "claude-sonnet-4-5"
}

48.7 Cost Optimization for Batch Tasks

For batch processing tasks (such as bulk code reviews or bulk comment additions), several optimization strategies apply.

Strategy 1: Merge Requests into One Session

Instead of launching a separate session for each file, process them all in one session:

# Inefficient: 100 files = 100 separate sessions = 100 CLAUDE.md load costs

# Efficient: 100 files = 1 session, one conversation turn per file
# CLAUDE.md is loaded once; subsequent turns share the context

When using the SDK for batch processing, reuse the same ClaudeCode instance:

const claude = new ClaudeCode({ cwd: projectDir });

for (const file of filesToProcess) {
  // All files are processed within the same instance (same session context)
  await claude.query(`Process file: ${file}`);
}

Strategy 2: Parallel but Isolated Sessions

For tasks that are independent of each other, use multiple parallel sessions (keeping each session's context lean):

// 5 parallel sessions, each handling 20 files
const batchSize = 20;
const batches = chunk(filesToProcess, batchSize);

await Promise.all(batches.map(async (batch) => {
  const claude = new ClaudeCode({ cwd: projectDir });
  for (const file of batch) {
    await claude.query(`Process file: ${file}`);
  }
}));

48.8 Monitoring and Budget Management

Tracking Token Usage

When using the SDK, always record token usage:

interface UsageRecord {
  timestamp: string;
  task: string;
  model: string;
  inputTokens: number;
  outputTokens: number;
  cost: number;
}

function calculateCost(model: string, input: number, output: number): number {
  const prices: Record<string, { input: number; output: number }> = {
    'claude-opus-4-5': { input: 15, output: 75 },
    'claude-sonnet-4-5': { input: 3, output: 15 },
    'claude-haiku-3-5': { input: 0.8, output: 4 },
  };

  const price = prices[model] ?? prices['claude-sonnet-4-5'];
  return (input * price.input + output * price.output) / 1_000_000;
}

// Record usage after every call
const result = await claude.query(prompt);
const record: UsageRecord = {
  timestamp: new Date().toISOString(),
  task: 'code-review',
  model: 'claude-sonnet-4-5',
  inputTokens: result.usage.inputTokens,
  outputTokens: result.usage.outputTokens,
  cost: calculateCost(
    'claude-sonnet-4-5',
    result.usage.inputTokens,
    result.usage.outputTokens
  ),
};

Setting Budget Limits

In automated workflows, add budget checks to prevent unexpected high spend:

const MAX_DAILY_COST_USD = 10;
let dailyCost = 0;

async function queryWithBudget(prompt: string): Promise<string> {
  if (dailyCost >= MAX_DAILY_COST_USD) {
    throw new Error(
      `Daily budget of $${MAX_DAILY_COST_USD} exhausted. Stopping execution.`
    );
  }

  const result = await claude.query(prompt);
  const cost = calculateCost(
    'claude-sonnet-4-5',
    result.usage.inputTokens,
    result.usage.outputTokens
  );
  dailyCost += cost;

  console.log(`This call: $${cost.toFixed(4)} | Running daily total: $${dailyCost.toFixed(4)}`);

  return result.response;
}

48.9 Cost Control Best Practices Checklist

Here is a comprehensive cost control checklist:

CLAUDE.md optimization:

Remove outdated content
Avoid repeating the same rule
Use @import for selective loading
Target size: 200–500 lines

Conversation management:

Use /compact after completing a subtask
Use /clear when starting a completely new topic
Avoid mixing entirely different tasks in one session

File reading:

Prefer Grep over Read whenever possible
Set file size limits (do not read files longer than 500 lines unless necessary)
Use offset/limit to specify read ranges

Model selection:

Haiku for simple tasks
Sonnet for everyday tasks
Opus for critical decisions

Monitoring:

Log token usage
Set daily/monthly budget caps
Periodically analyze high-cost tasks and look for optimization opportunities

Summary

Cost control is an important component of engineering-grade Claude Code usage. Through thoughtful model selection, context management, and batch optimization, you can keep API costs in a reasonable range without reducing developer productivity.

Key takeaways:

Fast Mode allows simple tasks to use lightweight models, dramatically reducing everyday usage costs
Use /compact regularly to compress conversation history and prevent unbounded context growth
CLAUDE.md size directly affects the baseline cost of every request — keep it concise yet sufficiently detailed
Task-to-model matching: Opus for complex tasks, Sonnet for everyday tasks, Haiku for simple tasks
Batch tasks reduce the cost of repeatedly loading context by reusing the same session instance
Add budget monitoring and limits to automated workflows to prevent unexpected high spend

Rate this chapter

4.6 / 5 (3 ratings)