Memory Tool: Complete Mechanism of memory_20250818 for Cross-Session Persistent Memory
Chapter 26: Context Editing: Context Injection, Modification, and Precise Control
26.1 Context Is Claude's Entire World
From Claude's perspective, everything it knows in a given moment comes from what it received in the current API call: the system prompt, the message history, and tool results. This totality โ the context โ is Claude's complete reality for that interaction.
Context Engineering is the discipline of deliberately designing this reality. It is not about dumping information at Claude; it is about precisely deciding what information to include, in what format, at what position, and at what moment in the conversation.
Context Editing differs from the Memory Tool covered in the previous chapter:
- Memory Tool manages cross-session persistence โ what survives between conversations
- Context Editing manages within-session information โ what Claude sees right now
Both are essential, and they compose: memory retrieval feeds into context injection.
26.2 Four Dimensions of Context
Dimension 1: Position
Claude's attention distribution is not uniform across its context window. Research from Anthropic and the broader community identifies a "lost in the middle" effect: information at the very beginning and very end of context receives more attention than the middle.
Practical implications:
| Position | Priority Level | Best Used For |
|---|---|---|
| System prompt opening | Highest | Core role definition, hard constraints |
| System prompt end | High | Dynamic real-time data, current session state |
| Early conversation history | Medium | Long-term background, project setup |
| Recent messages (tail) | Highest | The active task, immediate instructions |
| Assistant turn prefix (prefill) | Directive | Forcing output format |
Dimension 2: Format
The format of injected information shapes how Claude processes it:
<!-- XML tags: clear structure, good for heterogeneous content blocks -->
<user_profile>
<name>Alex Chen</name>
<expertise>Python, FastAPI, PostgreSQL</expertise>
<preference>Concise responses, code over prose</preference>
</user_profile>
<current_task>
Refactor the authentication module to follow JWT best practices
</current_task>
# Session Context
## User Profile
- Name: Alex Chen
- Stack: Python, FastAPI, PostgreSQL
- Style preference: Concise, code-first
## Current Task
Refactor the authentication module to follow JWT best practices
XML tags excel when mixing multiple distinct content types. Markdown feels more natural for documentation and code generation tasks.
Dimension 3: Timing
When to inject matters as much as what to inject:
| When | What to Inject | How |
|---|---|---|
| Session start | User profile, memories, project background | System prompt |
| After tool calls | Tool execution results | Tool result messages |
| Before user message | Real-time data (prices, queries, current state) | Prefix injection |
| Mid long-conversation | Compacted summary of dropped history | Replace old messages |
Dimension 4: Density
More context is not always better. Information overload causes the lost-in-the-middle problem and increases latency and cost. The guiding principle is ruthless relevance: only inject what the current task actually needs.
26.3 Structured System Prompt Design
import anthropic
from datetime import datetime
def build_system_prompt(
user: dict,
memories: list[dict],
active_tools: list[str],
domain: str = "general"
) -> str:
"""Build a layered system prompt with static and dynamic sections"""
sections = []
# Layer 1: Role definition (static)
sections.append("""## Role
You are an intelligent assistant focused on developer productivity.
Your responses must:
- Be concise and direct โ avoid unnecessary preamble
- Prefer code examples over lengthy prose explanations
- When uncertain, say so explicitly rather than guessing""")
# Layer 2: User context (dynamic)
if user:
sections.append(f"""## User Profile
- Name: {user.get('name', 'Unknown')}
- Stack: {', '.join(user.get('skills', []))}
- Language preference: {user.get('language', 'English')}""")
# Layer 3: Retrieved memories (dynamic)
if memories:
mem_lines = [f"- [{m['category']}] {m['content']}" for m in memories[:5]]
sections.append("## Relevant History\n" + "\n".join(mem_lines))
# Layer 4: Available tools (conditional)
if active_tools:
sections.append("## Available Tools\nYou have access to: " + ", ".join(active_tools))
# Layer 5: Real-time state (last โ highest recency signal)
sections.append(f"""## Current State
- Time: {datetime.utcnow().strftime('%Y-%m-%d %H:%M')} UTC
- Domain: {domain}""")
return "\n\n".join(sections)
The layering principle: static, universal content goes first; highly dynamic, session-specific content goes last. This positions the most-current information near the tail of the system prompt, where Claude's attention is strongest.
26.4 Injection Techniques
Technique 1: User Message Prefix Injection
Inject real-time data in front of the user's message, keeping the system prompt clean and reusable:
import json
def inject_realtime_data(user_message: str, context: dict) -> str:
"""Prepend real-time data to a user message"""
injections = []
if "db_results" in context:
results = context["db_results"]
injections.append(
f"[Database Query Results]\n"
f"```json\n{json.dumps(results, indent=2)}\n```"
)
if "stock" in context:
s = context["stock"]
injections.append(
f"[Live Data] {s['symbol']}: ${s['price']} ({s['change']:+.2f}%)"
)
if injections:
return "\n".join(injections) + "\n\n---\n\n" + user_message
return user_message
client = anthropic.Anthropic()
enriched_msg = inject_realtime_data(
"Analyze our user growth trend",
{"db_results": {
"monthly_active_users": [1200, 1450, 1680, 1920, 2310],
"churn_rate": 0.032,
"period": "JanโMay 2025"
}}
)
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": enriched_msg}]
)
Technique 2: Assistant Prefill
By placing an incomplete assistant message at the end of the messages array, you force Claude to continue from that exact starting point. This is the most reliable way to enforce output format:
def get_json_output(prompt: str, json_prefix: str = "{") -> str:
"""Force JSON output via assistant prefill"""
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=2048,
messages=[
{"role": "user", "content": prompt},
# Claude will continue from this exact prefix
{"role": "assistant", "content": json_prefix}
]
)
return json_prefix + response.content[0].text
# Force structured analysis output
result = get_json_output(
prompt="""Analyze the time complexity of this function:
```python
def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(n-i-1):
if arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
Return JSON with keys: time_complexity, space_complexity, explanation""", json_prefix='{"time_complexity": "' )
Returns: {"time_complexity": "O(nยฒ)", "space_complexity": "O(1)", ...}
### Technique 3: Conversation History Editing
Sometimes you need to surgically modify the conversation history โ to inject context, correct a prior response, or insert synthetic exchanges:
```python
class ConversationEditor:
"""Precise conversation history manipulation"""
def __init__(self):
self.messages: list[dict] = []
def append(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
def inject_after_index(self, index: int, role: str, content: str):
"""Insert a message at a specific position"""
self.messages.insert(index + 1, {"role": role, "content": content})
def replace_last_assistant(self, new_content: str):
"""Replace the most recent assistant response (for correction)"""
for i in range(len(self.messages) - 1, -1, -1):
if self.messages[i]["role"] == "assistant":
self.messages[i]["content"] = new_content
return
def trim_to_recent(self, n_turns: int):
"""Keep only the most recent N conversation turns"""
keep = n_turns * 2 # each turn = user + assistant
if len(self.messages) > keep:
self.messages = self.messages[-keep:]
def inject_reminder_before_last_user(self, reminder: str):
"""Insert a system reminder before the last user message"""
for i in range(len(self.messages) - 1, -1, -1):
if self.messages[i]["role"] == "user":
original = self.messages[i]["content"]
if isinstance(original, str):
self.messages[i]["content"] = f"[Reminder: {reminder}]\n\n{original}"
return
def get(self) -> list[dict]:
return self.messages.copy()
Technique 4: Dynamic Role Composition
Instead of maintaining separate system prompts for every use case, compose them from modular blocks:
class DynamicPromptBuilder:
BLOCKS = {
"code_reviewer": """
## Code Review Mode
You are a strict code reviewer. Focus on:
- Security vulnerabilities (injection, auth bypass, data exposure)
- Performance issues (N+1 queries, memory leaks, blocking I/O)
- Maintainability (naming, function length, separation of concerns)
Tag each issue: [CRITICAL] [HIGH] [MEDIUM] [LOW]""",
"architect": """
## Architecture Mode
You are a systems architect. Evaluate:
- Scalability: Does this hold at 10x load?
- Coupling: Can components be replaced independently?
- Data consistency: What guarantees exist in distributed scenarios?
Describe designs using C4 Model layers (Context โ Container โ Component)""",
"technical_writer": """
## Technical Writing Mode
You are a technical documentation specialist. Requirements:
- Audience: senior engineers, no hand-holding
- Structure with headers, lists, code blocks
- Every concept paired with a runnable example"""
}
def compose(self, base: str, roles: list[str], extra: str = "") -> str:
blocks = [base]
for role in roles:
if role in self.BLOCKS:
blocks.append(self.BLOCKS[role])
if extra:
blocks.append(f"## Additional Context\n{extra}")
return "\n\n".join(blocks)
26.5 Long-Conversation Context Management
Sliding Window
def sliding_window(messages: list[dict], max_tokens: int = 150_000,
min_turns: int = 5) -> list[dict]:
"""Keep conversation within token budget, preserving recent turns"""
def est_tokens(msg: dict) -> int:
content = msg.get("content", "")
return len(content) // 3 if isinstance(content, str) else 100
must_keep = messages[-(min_turns * 2):]
optional = messages[:-(min_turns * 2)]
budget = max_tokens - sum(est_tokens(m) for m in must_keep)
selected = []
for msg in reversed(optional):
cost = est_tokens(msg)
if budget - cost > 0:
selected.insert(0, msg)
budget -= cost
else:
break
return selected + must_keep
History Summarization
When sliding window discards early history, compress it first:
def summarize_history(client: anthropic.Anthropic,
old_messages: list[dict]) -> list[dict]:
history = "\n".join(
f"{m['role'].upper()}: {str(m['content'])[:400]}"
for m in old_messages
)
resp = client.messages.create(
model="claude-haiku-4-5", # Cheap model for summarization
max_tokens=400,
messages=[{
"role": "user",
"content": f"Summarize the key facts and decisions in 3-5 sentences:\n\n{history}"
}]
)
summary = resp.content[0].text
return [
{"role": "user", "content": f"[Conversation summary]\n{summary}"},
{"role": "assistant", "content": "Understood, I have the prior context."}
]
26.6 Measuring Context Effectiveness
A context design is only as good as its measurable impact on Claude's outputs. Build automated test suites:
def evaluate_system_prompt(client: anthropic.Anthropic,
system: str,
test_cases: list[dict]) -> float:
"""Measure instruction-following rate across test cases"""
passed = 0
for case in test_cases:
resp = client.messages.create(
model="claude-haiku-4-5",
max_tokens=512,
system=system,
messages=[{"role": "user", "content": case["input"]}]
)
output = resp.content[0].text
if all(check(output) for check in case["checks"]):
passed += 1
return passed / len(test_cases)
test_suite = [
{
"input": "Analyze bubble sort complexity",
"checks": [
lambda r: r.strip().startswith("{"), # must be JSON
lambda r: "O(n" in r, # must contain Big-O notation
]
},
{
"input": "What is recursion?",
"checks": [
lambda r: "```" in r, # must include code example
]
}
]
score = evaluate_system_prompt(client, my_system_prompt, test_suite)
print(f"Instruction-following rate: {score:.0%}")
Summary
Context Editing is the most granular layer of Claude engineering. By controlling position, format, timing, and density, you can systematically improve output quality and consistency.
Key techniques:
- Layered system prompts: static role definition โ dynamic user context โ retrieved memories โ real-time state
- Message prefix injection: add real-time data without polluting the reusable system prompt
- Assistant prefill: the most reliable mechanism to enforce output format
- Conversation history editing: inject synthetic exchanges, correct prior responses, trim old turns
- Sliding window + summarization: maintain context quality as conversations grow long
- Automated evaluation: measure instruction-following rate to validate context designs
The next chapter covers Claude's built-in context compaction mechanism โ the automatic summarization that kicks in when the context window approaches its limit.