第 27 章

Context Editing + Compaction：选择性历史清除与服务端自动摘要的完整策略

第二十七章：上下文压缩（Compaction）：自动摘要与无损对话延续

27.1 上下文窗口的物理极限

Claude 的上下文窗口有确定的 token 上限。即使是支持 200K token 的模型，在长时间运行的 Agent 任务中（如多步骤代码调试、文档分析、复杂研究任务），这个上限仍然会被触及。

**上下文压缩（Context Compaction）**是应对这一问题的核心技术。它的目标是：在对话历史超出预算时，用最小的信息损失完成上下文的精简，使对话能够无缝延续。

与暴力截断（丢弃早期消息）相比，压缩技术通过摘要保留了关键语义，避免了 Agent 因"失忆"而重复已完成的工作或做出矛盾的决策。

何时需要压缩？

def should_compact(messages: list[dict], system: str, 
                   threshold_ratio: float = 0.75) -> bool:
    """
    判断是否需要进行上下文压缩
    当已用 token 超过模型上限的 threshold_ratio 时触发
    """
    MODEL_LIMITS = {
        "claude-opus-4-5": 200_000,
        "claude-sonnet-4-5": 200_000,
        "claude-haiku-4-5": 200_000,
    }
    
    # 粗略估算当前 token 使用量
    total_chars = len(system)
    for msg in messages:
        content = msg.get("content", "")
        if isinstance(content, str):
            total_chars += len(content)
        elif isinstance(content, list):
            for block in content:
                if isinstance(block, dict) and "text" in block:
                    total_chars += len(block["text"])
    
    # 1 token ≈ 3-4 字符（粗略估算）
    estimated_tokens = total_chars // 3
    limit = MODEL_LIMITS.get("claude-opus-4-5", 200_000)
    
    return estimated_tokens / limit > threshold_ratio

27.2 Claude Code 的内置 Compaction 机制

Claude Code（Anthropic 官方 CLI）内置了自动上下文压缩功能。理解其工作原理有助于在自建 Agent 中实现类似机制。

触发条件

Claude Code 在以下情况自动触发压缩：

预算触发：当前上下文 token 数超过模型限制的 ~75%
手动触发：用户执行 /compact 命令
任务切换：检测到用户开始了一个与当前任务无关的新请求

压缩流程

原始对话历史（100K tokens）
         │
         ▼
┌─────────────────────┐
│   摘要生成子任务     │
│   模型：Claude Haiku │
│   目标：提取关键状态 │
└─────────────────────┘
         │
         ▼
摘要消息（~2K tokens）
         │
         ▼
┌─────────────────────┐
│  重构对话历史        │
│  [摘要] + [近期消息] │
└─────────────────────┘
         │
         ▼
压缩后上下文（~20K tokens）

摘要的内容

Claude Code 的压缩摘要专门针对代码工作场景，包含：

## Conversation Summary (Auto-generated)

### Task Progress
- Completed: Set up FastAPI project structure, created auth module skeleton
- In Progress: Implementing JWT token validation middleware
- Pending: Write unit tests for auth endpoints

### Key Decisions Made
- Using PyJWT library (not python-jose) for token handling
- Token expiry: 15 minutes for access, 7 days for refresh
- Storing refresh tokens in Redis with user_id as key

### Current File State
- Modified files: src/auth/middleware.py, src/auth/models.py
- Key implementation: JWTMiddleware class in middleware.py (line 45-89)

### Active Context
- Working on: validate_token() function
- Last error: AttributeError on line 67, payload["sub"] not found

27.3 自定义压缩策略实现

基础压缩器

import anthropic
from dataclasses import dataclass
from typing import Optional

@dataclass
class CompactionResult:
    summary: str
    compressed_messages: list[dict]
    tokens_saved: int
    summary_message_count: int

class ContextCompactor:
    """自定义上下文压缩器"""
    
    def __init__(self, client: anthropic.Anthropic):
        self.client = client
        # 使用快速廉价的模型生成摘要
        self.summary_model = "claude-haiku-4-5"
    
    def _generate_summary(self, messages: list[dict], task_type: str = "general") -> str:
        """对历史消息生成结构化摘要"""
        
        prompts = {
            "general": """请对以下对话历史生成一个结构化摘要，包含：
1. 已完成的主要任务和决策
2. 当前进行中的工作
3. 尚未完成的待办事项
4. 任何重要的约束条件或偏好设置

摘要要足够详细，使新的助手实例能够无缝接续工作，不需要重复已完成的步骤。""",
            
            "coding": """请对以下代码工作会话生成摘要，包含：
1. 任务目标和完成状态
2. 已修改的文件和关键改动
3. 当前卡点/错误（如有）
4. 重要的技术决策（使用的库、架构选择等）
5. 下一步行动""",
            
            "research": """请对以下研究会话生成摘要，包含：
1. 研究问题和目标
2. 已获取的关键信息和发现
3. 已排除的假设或路径
4. 还需要调查的内容"""
        }
        
        prompt = prompts.get(task_type, prompts["general"])
        
        # 构建历史文本
        history_parts = []
        for msg in messages:
            role = msg["role"].upper()
            content = msg.get("content", "")
            if isinstance(content, str):
                # 截断过长的单条消息
                truncated = content[:2000] + ("..." if len(content) > 2000 else "")
                history_parts.append(f"{role}: {truncated}")
            elif isinstance(content, list):
                for block in content:
                    if isinstance(block, dict):
                        if block.get("type") == "text":
                            text = block["text"][:1000]
                            history_parts.append(f"{role}: {text}")
                        elif block.get("type") == "tool_use":
                            history_parts.append(
                                f"{role}: [工具调用: {block['name']}({json.dumps(block['input'])[:200]})]"
                            )
                        elif block.get("type") == "tool_result":
                            result_text = str(block.get("content", ""))[:500]
                            history_parts.append(f"TOOL_RESULT: {result_text}")
        
        history_text = "\n\n".join(history_parts)
        
        response = self.client.messages.create(
            model=self.summary_model,
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": f"{prompt}\n\n---\n\n{history_text}"
            }]
        )
        
        return response.content[0].text
    
    def compact(
        self, 
        messages: list[dict], 
        keep_recent_turns: int = 3,
        task_type: str = "general"
    ) -> CompactionResult:
        """
        压缩对话历史
        
        Args:
            messages: 完整对话历史
            keep_recent_turns: 保留最近几轮不压缩
            task_type: 任务类型，影响摘要策略
        
        Returns:
            CompactionResult 包含压缩后的消息列表和统计信息
        """
        
        # 计算分割点
        keep_count = keep_recent_turns * 2  # user + assistant
        if len(messages) <= keep_count:
            return CompactionResult(
                summary="",
                compressed_messages=messages,
                tokens_saved=0,
                summary_message_count=0
            )
        
        to_summarize = messages[:-keep_count]
        to_keep = messages[-keep_count:]
        
        # 生成摘要
        summary = self._generate_summary(to_summarize, task_type)
        
        # 构建压缩后的消息列表
        summary_message = {
            "role": "user",
            "content": f"[自动生成的对话摘要 - 代表早期 {len(to_summarize)} 条消息]\n\n{summary}"
        }
        summary_ack = {
            "role": "assistant",
            "content": "我已理解之前的对话上下文。请继续。"
        }
        
        compressed = [summary_message, summary_ack] + to_keep
        
        # 计算节省的 token（粗估）
        original_chars = sum(len(str(m.get("content", ""))) for m in to_summarize)
        summary_chars = len(summary)
        tokens_saved = (original_chars - summary_chars) // 3
        
        return CompactionResult(
            summary=summary,
            compressed_messages=compressed,
            tokens_saved=max(0, tokens_saved),
            summary_message_count=len(to_summarize)
        )

智能压缩：分层保留策略

不是所有历史消息都同等重要。重要的工具调用结果、关键决策应当被完整保留：

class SmartCompactor(ContextCompactor):
    """智能压缩器：差异化处理不同类型的消息"""
    
    CRITICAL_TOOL_NAMES = {
        "write_file", "execute_code", "database_query",
        "api_call", "create_resource"
    }
    
    def _classify_messages(self, messages: list[dict]) -> tuple[list, list]:
        """将消息分类为关键消息和普通消息"""
        critical = []
        ordinary = []
        
        for msg in messages:
            if self._is_critical(msg):
                critical.append(msg)
            else:
                ordinary.append(msg)
        
        return critical, ordinary
    
    def _is_critical(self, msg: dict) -> bool:
        """判断消息是否关键（应当完整保留）"""
        content = msg.get("content", "")
        
        # 工具调用消息检查
        if isinstance(content, list):
            for block in content:
                if isinstance(block, dict):
                    if block.get("type") == "tool_use":
                        tool_name = block.get("name", "")
                        if tool_name in self.CRITICAL_TOOL_NAMES:
                            return True
                    elif block.get("type") == "tool_result":
                        # 包含错误信息的工具结果
                        result_str = str(block.get("content", ""))
                        if "error" in result_str.lower() or "exception" in result_str.lower():
                            return True
        
        # 用户提供了重要约束条件
        if isinstance(content, str):
            critical_keywords = ["不能", "必须", "禁止", "要求", "约束", 
                                  "must not", "required", "constraint"]
            if any(kw in content.lower() for kw in critical_keywords):
                return True
        
        return False
    
    def smart_compact(
        self,
        messages: list[dict],
        keep_recent_turns: int = 3,
        task_type: str = "coding"
    ) -> CompactionResult:
        """智能压缩：保留关键消息，压缩普通消息"""
        
        to_process = messages[:-keep_recent_turns * 2] if len(messages) > keep_recent_turns * 2 else []
        to_keep = messages[-keep_recent_turns * 2:]
        
        critical, ordinary = self._classify_messages(to_process)
        
        # 对普通消息生成摘要
        summary = ""
        if ordinary:
            summary = self._generate_summary(ordinary, task_type)
        
        # 构建压缩消息
        compressed = []
        
        if summary:
            compressed.append({
                "role": "user",
                "content": f"[对话摘要]\n{summary}"
            })
            compressed.append({
                "role": "assistant",
                "content": "已理解摘要内容，请继续。"
            })
        
        # 关键消息完整保留
        compressed.extend(critical)
        # 最近消息完整保留
        compressed.extend(to_keep)
        
        return CompactionResult(
            summary=summary,
            compressed_messages=compressed,
            tokens_saved=len(to_process) * 100,  # 粗估
            summary_message_count=len(ordinary)
        )

27.4 与 Agent 循环集成

将压缩机制集成到 Agent 主循环中：

import anthropic
import json

class CompactionAwareAgent:
    """集成自动压缩的 Agent"""
    
    COMPACT_THRESHOLD = 0.75  # 75% 使用率时触发压缩
    MODEL_TOKEN_LIMIT = 200_000
    
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.compactor = SmartCompactor(self.client)
        self.messages: list[dict] = []
        self.system = ""
        self.compaction_count = 0
    
    def _estimate_tokens(self) -> int:
        """估算当前 token 使用量"""
        total = len(self.system)
        for msg in self.messages:
            content = msg.get("content", "")
            if isinstance(content, str):
                total += len(content)
            else:
                total += len(str(content))
        return total // 3  # 粗略：3字符≈1token
    
    def _maybe_compact(self):
        """检查并按需触发压缩"""
        estimated = self._estimate_tokens()
        if estimated / self.MODEL_TOKEN_LIMIT > self.COMPACT_THRESHOLD:
            print(f"[Compaction] Token 使用率 {estimated/self.MODEL_TOKEN_LIMIT:.1%}，触发压缩...")
            
            result = self.compactor.smart_compact(
                self.messages,
                keep_recent_turns=5,
                task_type="coding"
            )
            
            self.messages = result.compressed_messages
            self.compaction_count += 1
            
            print(f"[Compaction] 完成！节省约 {result.tokens_saved:,} tokens，"
                  f"压缩了 {result.summary_message_count} 条消息")
    
    def run_turn(self, user_input: str, tools: list[dict] | None = None) -> str:
        """执行一轮对话，包含自动压缩"""
        
        # 添加用户消息
        self.messages.append({"role": "user", "content": user_input})
        
        # 检查是否需要压缩
        self._maybe_compact()
        
        # 调用 Claude
        kwargs = {
            "model": "claude-opus-4-5",
            "max_tokens": 4096,
            "system": self.system,
            "messages": self.messages,
        }
        if tools:
            kwargs["tools"] = tools
        
        response = self.client.messages.create(**kwargs)
        
        # 处理工具调用（简化版）
        if response.stop_reason == "tool_use" and tools:
            self.messages.append({"role": "assistant", "content": response.content})
            tool_results = self._execute_tools(response.content)
            self.messages.append({"role": "user", "content": tool_results})
            # 递归处理
            return self.run_turn.__wrapped__(self, "", tools) if hasattr(self.run_turn, '__wrapped__') else ""
        
        # 提取回复
        assistant_text = next(
            (block.text for block in response.content if hasattr(block, "text")), ""
        )
        self.messages.append({"role": "assistant", "content": assistant_text})
        
        return assistant_text
    
    def _execute_tools(self, content: list) -> list[dict]:
        """执行工具调用（占位实现）"""
        results = []
        for block in content:
            if block.type == "tool_use":
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"[工具 {block.name} 执行成功]"
                })
        return results

27.5 压缩质量的验证

压缩是一把双刃剑：压得太猛会丢失关键信息，压得太保守则效果有限。以下是评估压缩质量的方法：

信息保留率测试

def test_compaction_quality(
    client: anthropic.Anthropic,
    original_messages: list[dict],
    compacted_messages: list[dict],
    test_questions: list[str]
) -> float:
    """
    通过问答测试评估压缩后的信息保留率
    测试同样的问题在原始对话和压缩对话下是否得到相同答案
    """
    
    consistent = 0
    
    for question in test_questions:
        # 在原始上下文下回答
        original_answer = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=256,
            messages=original_messages + [
                {"role": "user", "content": f"简短回答（一句话）：{question}"}
            ]
        ).content[0].text
        
        # 在压缩上下文下回答
        compacted_answer = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=256,
            messages=compacted_messages + [
                {"role": "user", "content": f"简短回答（一句话）：{question}"}
            ]
        ).content[0].text
        
        # 使用 Claude 判断语义一致性
        judge = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=10,
            messages=[{
                "role": "user",
                "content": f"""判断以下两个回答是否语义一致：
A: {original_answer}
B: {compacted_answer}
只回答 YES 或 NO"""
            }]
        ).content[0].text.strip().upper()
        
        if "YES" in judge:
            consistent += 1
    
    return consistent / len(test_questions)

27.6 分层压缩：多级摘要

对于超长会话（数百轮），单层摘要可能仍然过长。分层摘要解决这一问题：

def hierarchical_compact(
    client: anthropic.Anthropic,
    messages: list[dict],
    max_summary_tokens: int = 2000
) -> list[dict]:
    """
    分层压缩：将历史消息分段摘要，再对摘要进行二次摘要
    """
    
    compactor = ContextCompactor(client)
    
    # 第一级：将历史按20条分段，各自摘要
    chunk_size = 20
    chunks = [messages[i:i+chunk_size] for i in range(0, len(messages), chunk_size)]
    
    level1_summaries = []
    for i, chunk in enumerate(chunks[:-1]):  # 最后一段保持原始
        summary = compactor._generate_summary(chunk)
        level1_summaries.append(f"[片段 {i+1}]\n{summary}")
    
    # 第二级：对所有一级摘要再次摘要
    if len(level1_summaries) > 3:
        combined = "\n\n".join(level1_summaries)
        level2_response = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=max_summary_tokens,
            messages=[{
                "role": "user",
                "content": f"请将以下分段摘要合并为一个连贯的总体摘要：\n\n{combined}"
            }]
        )
        final_summary = level2_response.content[0].text
    else:
        final_summary = "\n\n".join(level1_summaries)
    
    # 构建最终压缩消息
    return [
        {"role": "user", "content": f"[会话历史摘要]\n{final_summary}"},
        {"role": "assistant", "content": "已理解历史上下文，请继续。"}
    ] + chunks[-1]  # 保留最后一段完整消息

小结

上下文压缩是长会话 Agent 工程中不可回避的核心问题。良好的压缩策略能够让 Agent 在任意长度的任务中保持连贯性，不因"失忆"而中断工作。

核心要点：

在上下文使用率达 70-80% 时主动触发压缩，而非等到超出限制后崩溃
使用轻量模型（如 Claude Haiku）生成摘要，控制压缩成本
智能分类：关键决策和工具调用结果应完整保留，普通对话可以摘要
保留最近 N 轮完整历史，为即将到来的操作提供精确上下文
建立压缩质量测试机制，防止关键信息的静默丢失

下一章将深入探讨 RAG 架构——如何通过检索增强生成，让 Claude 利用超越上下文窗口的外部知识库。

本章评分

4.8 / 5 (5 评分)