Claude Code + CI/CD: PR Auto-Review, Issue Handling and Complete GitHub Actions Configuration
Chapter 48: Fast Mode and Cost Control: Balancing Efficiency and Cost for High-Frequency Tasks
48.1 Understanding Claude Code's Cost Structure
Before using Claude Code for daily development, it is important to understand its cost structure. Every Claude Code conversation consumes Anthropic API tokens; the cost depends on:
- Input tokens: everything sent to Claude (CLAUDE.md + conversation history + your request + file contents read in)
- Output tokens: everything Claude generates (replies + code + tool call arguments)
- The model used: different models (Claude Opus, Sonnet, Haiku) differ significantly in price
Approximate reference prices (2024 pricing, for order-of-magnitude reference):
Claude Opus 4.5: Input $15/M tokens, Output $75/M tokens
Claude Sonnet 4.5: Input $3/M tokens, Output $15/M tokens
Claude Haiku 3.5: Input $0.8/M tokens, Output $4/M tokens
For a typical programming session (around 50K tokens), the cost is roughly:
- Using Opus: $0.75–2.00
- Using Sonnet: $0.15–0.40
- Using Haiku: $0.04–0.10
These numbers look small individually, but with multiple sessions per day and a full team using Claude Code, costs accumulate rapidly. Understanding cost control strategies lets you cut spending significantly without sacrificing developer productivity.
48.2 What Is Fast Mode
Claude Code's Fast Mode is a mechanism that automatically or manually uses a faster, cheaper model in certain scenarios.
In normal mode, Claude Code uses your configured primary model (typically Claude Sonnet or Opus) for all tasks. Fast Mode allows Claude to switch to a lighter-weight model (such as Claude Haiku) in situations like:
- Simple queries: answering a simple question about code
- Minor formatting changes: adjusting indentation or reformatting strings
- Status checks: confirming whether a file exists or reading simple config values
- Repetitive tool calls: performing similar file operations in a loop
Using an Opus-class model for these tasks is pure waste. Fast Mode lets Claude Code adaptively select the model based on task complexity, dramatically cutting costs while maintaining quality.
Enabling Fast Mode
Configure it in .claude/settings.json:
{
"model": "claude-sonnet-4-5",
"fastMode": {
"enabled": true,
"model": "claude-haiku-3-5",
"triggers": [
"simple-query",
"file-check",
"format-only"
]
}
}
Or describe the model-use strategy in CLAUDE.md:
## Model Selection Strategy
For the following tasks, Fast Mode (Haiku model) may be used:
- Confirming whether a file exists
- Reading configuration values
- Simple formatting adjustments
- Single-line code changes
For the following tasks, the full model (Sonnet or Opus) is required:
- Architecture design and refactoring
- Complex bug analysis
- Multi-file coordinated changes
- Security-related code review
48.3 The Cost Impact of Context Window Growth
One easily overlooked cost source in Claude Code is context window accumulation. As a conversation grows longer, each new request must send the entire conversation history, causing input token counts to increase continuously.
Conversation Length vs. Cost
Request at turn 1:
Input = CLAUDE.md (2K) + user message (0.5K) = 2.5K tokens
Request at turn 10:
Input = CLAUDE.md (2K) + 9 turns of history (~20K) + user message (0.5K) = 22.5K tokens
Request at turn 20:
Input = CLAUDE.md (2K) + 19 turns of history (~45K) + user message (0.5K) = 47.5K tokens
In a session that runs 20 turns, the per-request cost in the final turns may be 10–20 times higher than it was at the start.
Using /compact to Compress Conversation History
/compact is a Claude Code built-in command that compresses the current session's conversation history, dramatically reducing context token count:
/compact
After execution, Claude condenses the conversation history into a concise summary, freeing large amounts of context space. Subsequent requests will have significantly fewer input tokens.
Recommended times to use it:
- After completing a subtask (for example, after fixing a bug), before starting the next task
- When the session exceeds 20–30 turns
- When Claude begins showing signs of "forgetting" earlier instructions (possible context overflow)
Using /clear to Completely Reset Context
When you need to start a completely new task, the /clear command wipes the entire conversation history:
/clear
This is more thorough than /compact and is appropriate for complete task transitions. The tradeoff is that Claude loses all conversation history, though the project context provided by CLAUDE.md remains.
48.4 The Token Cost of CLAUDE.md
CLAUDE.md is loaded with every request; its size directly affects the baseline cost of each request.
Using different CLAUDE.md sizes as examples, assuming 50 Claude requests per day using the Sonnet model:
CLAUDE.md size | Extra cost/request | Extra cost/day | Extra cost/month
100 lines (~3K) | ~$0.009 | ~$0.45 | ~$13.50
500 lines (~15K) | ~$0.045 | ~$2.25 | ~$67.50
1000 lines (~30K)| ~$0.090 | ~$4.50 | ~$135.00
This does not mean CLAUDE.md should be as short as possible — a well-written CLAUDE.md saves far more in repeated communication costs than it adds in token overhead. But it does suggest:
- Remove content that is no longer relevant (for example, a completed migration plan)
- Avoid repeating the same rule in multiple places
- Use
@importfor selective loading; do not put everything in the main file
48.5 The Cost of Reading Files
Every Read tool call loads file content into the context; large files significantly increase costs.
Strategy 1: Avoid Reading Unnecessary Files
Tell Claude in CLAUDE.md which files do not need to be read:
## File Reading Principles
- Do not read files under node_modules/
- Do not read files under .git/
- Do not read compiled output under dist/ or build/
- Do not read log files longer than 1000 lines
Strategy 2: Use Grep Instead of Reading Full Files
In many cases you only need part of a file. The Grep tool is far more efficient than Read:
# Inefficient: read the entire file to find one function
Read src/services/user.ts (may be 500 lines)
# Efficient: use Grep to locate it
Search for "function getUserById" in src/services/
Grep returns only matching lines and minimal context, consuming far fewer tokens than reading a complete file.
Strategy 3: Specify a Read Range
When you must read a file, use the Read tool's offset and limit parameters:
Please read lines 50–100 of src/api/users.ts
This saves significant tokens compared to reading the entire file.
48.6 Choosing the Right Model
Different tasks demand very different levels of model capability. Matching the model to the task is the most direct cost control lever.
Task-to-Model Matching Matrix
Task type Recommended model Reason
────────────────────────────────────────────────────────────────────
Complex architecture design Opus Needs maximum reasoning
Multi-file refactor (>10 files) Opus / Sonnet Needs long-range context
Security review Opus High cost of errors
Complex bug analysis Sonnet Balanced capability/cost
Single-file feature implementation Sonnet Everyday workhorse
Code formatting Haiku Simple task
Adding comments Haiku Simple task
Simple Q&A Haiku Simple task
File existence check Haiku Minimal task
Switching models in Claude Code:
# Specify model at startup
claude --model claude-haiku-3-5
# Or set default model in .claude/settings.json
{
"model": "claude-sonnet-4-5"
}
48.7 Cost Optimization for Batch Tasks
For batch processing tasks (such as bulk code reviews or bulk comment additions), several optimization strategies apply.
Strategy 1: Merge Requests into One Session
Instead of launching a separate session for each file, process them all in one session:
# Inefficient: 100 files = 100 separate sessions = 100 CLAUDE.md load costs
# Efficient: 100 files = 1 session, one conversation turn per file
# CLAUDE.md is loaded once; subsequent turns share the context
When using the SDK for batch processing, reuse the same ClaudeCode instance:
const claude = new ClaudeCode({ cwd: projectDir });
for (const file of filesToProcess) {
// All files are processed within the same instance (same session context)
await claude.query(`Process file: ${file}`);
}
Strategy 2: Parallel but Isolated Sessions
For tasks that are independent of each other, use multiple parallel sessions (keeping each session's context lean):
// 5 parallel sessions, each handling 20 files
const batchSize = 20;
const batches = chunk(filesToProcess, batchSize);
await Promise.all(batches.map(async (batch) => {
const claude = new ClaudeCode({ cwd: projectDir });
for (const file of batch) {
await claude.query(`Process file: ${file}`);
}
}));
48.8 Monitoring and Budget Management
Tracking Token Usage
When using the SDK, always record token usage:
interface UsageRecord {
timestamp: string;
task: string;
model: string;
inputTokens: number;
outputTokens: number;
cost: number;
}
function calculateCost(model: string, input: number, output: number): number {
const prices: Record<string, { input: number; output: number }> = {
'claude-opus-4-5': { input: 15, output: 75 },
'claude-sonnet-4-5': { input: 3, output: 15 },
'claude-haiku-3-5': { input: 0.8, output: 4 },
};
const price = prices[model] ?? prices['claude-sonnet-4-5'];
return (input * price.input + output * price.output) / 1_000_000;
}
// Record usage after every call
const result = await claude.query(prompt);
const record: UsageRecord = {
timestamp: new Date().toISOString(),
task: 'code-review',
model: 'claude-sonnet-4-5',
inputTokens: result.usage.inputTokens,
outputTokens: result.usage.outputTokens,
cost: calculateCost(
'claude-sonnet-4-5',
result.usage.inputTokens,
result.usage.outputTokens
),
};
Setting Budget Limits
In automated workflows, add budget checks to prevent unexpected high spend:
const MAX_DAILY_COST_USD = 10;
let dailyCost = 0;
async function queryWithBudget(prompt: string): Promise<string> {
if (dailyCost >= MAX_DAILY_COST_USD) {
throw new Error(
`Daily budget of $${MAX_DAILY_COST_USD} exhausted. Stopping execution.`
);
}
const result = await claude.query(prompt);
const cost = calculateCost(
'claude-sonnet-4-5',
result.usage.inputTokens,
result.usage.outputTokens
);
dailyCost += cost;
console.log(`This call: $${cost.toFixed(4)} | Running daily total: $${dailyCost.toFixed(4)}`);
return result.response;
}
48.9 Cost Control Best Practices Checklist
Here is a comprehensive cost control checklist:
CLAUDE.md optimization:
- Remove outdated content
- Avoid repeating the same rule
- Use
@importfor selective loading - Target size: 200–500 lines
Conversation management:
- Use
/compactafter completing a subtask - Use
/clearwhen starting a completely new topic - Avoid mixing entirely different tasks in one session
File reading:
- Prefer
GrepoverReadwhenever possible - Set file size limits (do not read files longer than 500 lines unless necessary)
- Use
offset/limitto specify read ranges
Model selection:
- Haiku for simple tasks
- Sonnet for everyday tasks
- Opus for critical decisions
Monitoring:
- Log token usage
- Set daily/monthly budget caps
- Periodically analyze high-cost tasks and look for optimization opportunities
Summary
Cost control is an important component of engineering-grade Claude Code usage. Through thoughtful model selection, context management, and batch optimization, you can keep API costs in a reasonable range without reducing developer productivity.
Key takeaways:
- Fast Mode allows simple tasks to use lightweight models, dramatically reducing everyday usage costs
- Use
/compactregularly to compress conversation history and prevent unbounded context growth - CLAUDE.md size directly affects the baseline cost of every request — keep it concise yet sufficiently detailed
- Task-to-model matching: Opus for complex tasks, Sonnet for everyday tasks, Haiku for simple tasks
- Batch tasks reduce the cost of repeatedly loading context by reusing the same session instance
- Add budget monitoring and limits to automated workflows to prevent unexpected high spend