Chapter 26

Memory Tool: Complete Mechanism of memory_20250818 for Cross-Session Persistent Memory

Chapter 26: Context Editing: Context Injection, Modification, and Precise Control

26.1 Context Is Claude's Entire World

From Claude's perspective, everything it knows in a given moment comes from what it received in the current API call: the system prompt, the message history, and tool results. This totality โ€” the context โ€” is Claude's complete reality for that interaction.

Context Engineering is the discipline of deliberately designing this reality. It is not about dumping information at Claude; it is about precisely deciding what information to include, in what format, at what position, and at what moment in the conversation.

Context Editing differs from the Memory Tool covered in the previous chapter:

Both are essential, and they compose: memory retrieval feeds into context injection.

26.2 Four Dimensions of Context

Dimension 1: Position

Claude's attention distribution is not uniform across its context window. Research from Anthropic and the broader community identifies a "lost in the middle" effect: information at the very beginning and very end of context receives more attention than the middle.

Practical implications:

Position Priority Level Best Used For
System prompt opening Highest Core role definition, hard constraints
System prompt end High Dynamic real-time data, current session state
Early conversation history Medium Long-term background, project setup
Recent messages (tail) Highest The active task, immediate instructions
Assistant turn prefix (prefill) Directive Forcing output format

Dimension 2: Format

The format of injected information shapes how Claude processes it:

<!-- XML tags: clear structure, good for heterogeneous content blocks -->
<user_profile>
  <name>Alex Chen</name>
  <expertise>Python, FastAPI, PostgreSQL</expertise>
  <preference>Concise responses, code over prose</preference>
</user_profile>

<current_task>
  Refactor the authentication module to follow JWT best practices
</current_task>
# Session Context
## User Profile
- Name: Alex Chen
- Stack: Python, FastAPI, PostgreSQL
- Style preference: Concise, code-first

## Current Task
Refactor the authentication module to follow JWT best practices

XML tags excel when mixing multiple distinct content types. Markdown feels more natural for documentation and code generation tasks.

Dimension 3: Timing

When to inject matters as much as what to inject:

When What to Inject How
Session start User profile, memories, project background System prompt
After tool calls Tool execution results Tool result messages
Before user message Real-time data (prices, queries, current state) Prefix injection
Mid long-conversation Compacted summary of dropped history Replace old messages

Dimension 4: Density

More context is not always better. Information overload causes the lost-in-the-middle problem and increases latency and cost. The guiding principle is ruthless relevance: only inject what the current task actually needs.

26.3 Structured System Prompt Design

import anthropic
from datetime import datetime

def build_system_prompt(
    user: dict,
    memories: list[dict],
    active_tools: list[str],
    domain: str = "general"
) -> str:
    """Build a layered system prompt with static and dynamic sections"""

    sections = []

    # Layer 1: Role definition (static)
    sections.append("""## Role
You are an intelligent assistant focused on developer productivity.
Your responses must:
- Be concise and direct โ€” avoid unnecessary preamble
- Prefer code examples over lengthy prose explanations
- When uncertain, say so explicitly rather than guessing""")

    # Layer 2: User context (dynamic)
    if user:
        sections.append(f"""## User Profile
- Name: {user.get('name', 'Unknown')}
- Stack: {', '.join(user.get('skills', []))}
- Language preference: {user.get('language', 'English')}""")

    # Layer 3: Retrieved memories (dynamic)
    if memories:
        mem_lines = [f"- [{m['category']}] {m['content']}" for m in memories[:5]]
        sections.append("## Relevant History\n" + "\n".join(mem_lines))

    # Layer 4: Available tools (conditional)
    if active_tools:
        sections.append("## Available Tools\nYou have access to: " + ", ".join(active_tools))

    # Layer 5: Real-time state (last โ€” highest recency signal)
    sections.append(f"""## Current State
- Time: {datetime.utcnow().strftime('%Y-%m-%d %H:%M')} UTC
- Domain: {domain}""")

    return "\n\n".join(sections)

The layering principle: static, universal content goes first; highly dynamic, session-specific content goes last. This positions the most-current information near the tail of the system prompt, where Claude's attention is strongest.

26.4 Injection Techniques

Technique 1: User Message Prefix Injection

Inject real-time data in front of the user's message, keeping the system prompt clean and reusable:

import json

def inject_realtime_data(user_message: str, context: dict) -> str:
    """Prepend real-time data to a user message"""
    injections = []

    if "db_results" in context:
        results = context["db_results"]
        injections.append(
            f"[Database Query Results]\n"
            f"```json\n{json.dumps(results, indent=2)}\n```"
        )

    if "stock" in context:
        s = context["stock"]
        injections.append(
            f"[Live Data] {s['symbol']}: ${s['price']} ({s['change']:+.2f}%)"
        )

    if injections:
        return "\n".join(injections) + "\n\n---\n\n" + user_message
    return user_message


client = anthropic.Anthropic()
enriched_msg = inject_realtime_data(
    "Analyze our user growth trend",
    {"db_results": {
        "monthly_active_users": [1200, 1450, 1680, 1920, 2310],
        "churn_rate": 0.032,
        "period": "Janโ€“May 2025"
    }}
)

response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": enriched_msg}]
)

Technique 2: Assistant Prefill

By placing an incomplete assistant message at the end of the messages array, you force Claude to continue from that exact starting point. This is the most reliable way to enforce output format:

def get_json_output(prompt: str, json_prefix: str = "{") -> str:
    """Force JSON output via assistant prefill"""
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=2048,
        messages=[
            {"role": "user", "content": prompt},
            # Claude will continue from this exact prefix
            {"role": "assistant", "content": json_prefix}
        ]
    )
    return json_prefix + response.content[0].text


# Force structured analysis output
result = get_json_output(
    prompt="""Analyze the time complexity of this function:

```python
def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]

Return JSON with keys: time_complexity, space_complexity, explanation""", json_prefix='{"time_complexity": "' )

Returns: {"time_complexity": "O(nยฒ)", "space_complexity": "O(1)", ...}


### Technique 3: Conversation History Editing

Sometimes you need to surgically modify the conversation history โ€” to inject context, correct a prior response, or insert synthetic exchanges:

```python
class ConversationEditor:
    """Precise conversation history manipulation"""

    def __init__(self):
        self.messages: list[dict] = []

    def append(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})

    def inject_after_index(self, index: int, role: str, content: str):
        """Insert a message at a specific position"""
        self.messages.insert(index + 1, {"role": role, "content": content})

    def replace_last_assistant(self, new_content: str):
        """Replace the most recent assistant response (for correction)"""
        for i in range(len(self.messages) - 1, -1, -1):
            if self.messages[i]["role"] == "assistant":
                self.messages[i]["content"] = new_content
                return

    def trim_to_recent(self, n_turns: int):
        """Keep only the most recent N conversation turns"""
        keep = n_turns * 2  # each turn = user + assistant
        if len(self.messages) > keep:
            self.messages = self.messages[-keep:]

    def inject_reminder_before_last_user(self, reminder: str):
        """Insert a system reminder before the last user message"""
        for i in range(len(self.messages) - 1, -1, -1):
            if self.messages[i]["role"] == "user":
                original = self.messages[i]["content"]
                if isinstance(original, str):
                    self.messages[i]["content"] = f"[Reminder: {reminder}]\n\n{original}"
                return

    def get(self) -> list[dict]:
        return self.messages.copy()

Technique 4: Dynamic Role Composition

Instead of maintaining separate system prompts for every use case, compose them from modular blocks:

class DynamicPromptBuilder:
    BLOCKS = {
        "code_reviewer": """
## Code Review Mode
You are a strict code reviewer. Focus on:
- Security vulnerabilities (injection, auth bypass, data exposure)
- Performance issues (N+1 queries, memory leaks, blocking I/O)
- Maintainability (naming, function length, separation of concerns)
Tag each issue: [CRITICAL] [HIGH] [MEDIUM] [LOW]""",

        "architect": """
## Architecture Mode
You are a systems architect. Evaluate:
- Scalability: Does this hold at 10x load?
- Coupling: Can components be replaced independently?
- Data consistency: What guarantees exist in distributed scenarios?
Describe designs using C4 Model layers (Context โ†’ Container โ†’ Component)""",

        "technical_writer": """
## Technical Writing Mode
You are a technical documentation specialist. Requirements:
- Audience: senior engineers, no hand-holding
- Structure with headers, lists, code blocks
- Every concept paired with a runnable example"""
    }

    def compose(self, base: str, roles: list[str], extra: str = "") -> str:
        blocks = [base]
        for role in roles:
            if role in self.BLOCKS:
                blocks.append(self.BLOCKS[role])
        if extra:
            blocks.append(f"## Additional Context\n{extra}")
        return "\n\n".join(blocks)

26.5 Long-Conversation Context Management

Sliding Window

def sliding_window(messages: list[dict], max_tokens: int = 150_000,
                   min_turns: int = 5) -> list[dict]:
    """Keep conversation within token budget, preserving recent turns"""

    def est_tokens(msg: dict) -> int:
        content = msg.get("content", "")
        return len(content) // 3 if isinstance(content, str) else 100

    must_keep = messages[-(min_turns * 2):]
    optional = messages[:-(min_turns * 2)]

    budget = max_tokens - sum(est_tokens(m) for m in must_keep)
    selected = []
    for msg in reversed(optional):
        cost = est_tokens(msg)
        if budget - cost > 0:
            selected.insert(0, msg)
            budget -= cost
        else:
            break

    return selected + must_keep

History Summarization

When sliding window discards early history, compress it first:

def summarize_history(client: anthropic.Anthropic,
                      old_messages: list[dict]) -> list[dict]:
    history = "\n".join(
        f"{m['role'].upper()}: {str(m['content'])[:400]}"
        for m in old_messages
    )
    resp = client.messages.create(
        model="claude-haiku-4-5",  # Cheap model for summarization
        max_tokens=400,
        messages=[{
            "role": "user",
            "content": f"Summarize the key facts and decisions in 3-5 sentences:\n\n{history}"
        }]
    )
    summary = resp.content[0].text
    return [
        {"role": "user", "content": f"[Conversation summary]\n{summary}"},
        {"role": "assistant", "content": "Understood, I have the prior context."}
    ]

26.6 Measuring Context Effectiveness

A context design is only as good as its measurable impact on Claude's outputs. Build automated test suites:

def evaluate_system_prompt(client: anthropic.Anthropic,
                            system: str,
                            test_cases: list[dict]) -> float:
    """Measure instruction-following rate across test cases"""
    passed = 0
    for case in test_cases:
        resp = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=512,
            system=system,
            messages=[{"role": "user", "content": case["input"]}]
        )
        output = resp.content[0].text
        if all(check(output) for check in case["checks"]):
            passed += 1
    return passed / len(test_cases)


test_suite = [
    {
        "input": "Analyze bubble sort complexity",
        "checks": [
            lambda r: r.strip().startswith("{"),   # must be JSON
            lambda r: "O(n" in r,                  # must contain Big-O notation
        ]
    },
    {
        "input": "What is recursion?",
        "checks": [
            lambda r: "```" in r,                  # must include code example
        ]
    }
]

score = evaluate_system_prompt(client, my_system_prompt, test_suite)
print(f"Instruction-following rate: {score:.0%}")

Summary

Context Editing is the most granular layer of Claude engineering. By controlling position, format, timing, and density, you can systematically improve output quality and consistency.

Key techniques:

The next chapter covers Claude's built-in context compaction mechanism โ€” the automatic summarization that kicks in when the context window approaches its limit.

Rate this chapter
4.6  / 5  (6 ratings)

๐Ÿ’ฌ Comments