Chapter 25

Computer Use: Complete Practical Guide and Security Protection for Screenshot Control, Browser Automation and Desktop Operations

Chapter 25: Memory Tool: External Memory Storage and Cross-Session Knowledge Persistence

25.1 Why Agents Need External Memory

Claude's native context window, even at 200K tokens in Claude 3.7 Sonnet, is fundamentally transient. When a session ends, everything in that window disappears. For agents that need to accumulate knowledge across days, weeks, or months — tracking user preferences, project decisions, evolving requirements — this imposes a hard architectural ceiling.

The Memory Tool solves this by converting ephemeral in-context working memory into persistent long-term storage. Rather than clumsily concatenating all prior conversations into every new prompt (which quickly exhausts context budgets and buries relevant facts under noise), Memory Tool gives the agent a structured, searchable external brain.

The fundamental shift is one of agency: instead of the system passively feeding history to Claude, Claude actively decides what to remember, what to retrieve, and what to forget. This mirrors how human experts work — a doctor doesn't replay every prior patient conversation before a follow-up; they recall the relevant history and update it with new findings.

Three Layers of Agent Memory

Layer	Storage Location	Lifespan	Typical Content
Working memory	Context window (in-context)	Single session	Current conversation, tool call results
Episodic memory	External database	Weeks to months	User preferences, past decisions, project context
Semantic memory	Vector database	Long-term	Domain knowledge, factual information, documents

Memory Tool primarily serves episodic and semantic memory management.

25.2 Standard Tool Definitions

In Anthropic's Tool Use framework, Memory Tool is defined as a set of three JSON Schema tool definitions. These schemas are passed in the tools parameter of each API call, enabling Claude to call them autonomously when appropriate.

{
  "name": "memory_store",
  "description": "Store important information in the persistent memory system. Call this when you discover information useful for future interactions: user preferences, project progress, key decisions, important facts.",
  "input_schema": {
    "type": "object",
    "properties": {
      "content": {
        "type": "string",
        "description": "The information to store. Should be concise yet self-contained — readable without the surrounding conversation."
      },
      "category": {
        "type": "string",
        "enum": ["user_preference", "project_context", "factual_knowledge",
                 "decision_log", "relationship", "task_progress"],
        "description": "Category for filtering during retrieval"
      },
      "tags": {
        "type": "array",
        "items": {"type": "string"},
        "description": "Keyword tags for semantic retrieval"
      },
      "importance": {
        "type": "integer",
        "minimum": 1,
        "maximum": 5,
        "description": "Importance score 1-5. Affects retrieval priority and forgetting policy."
      },
      "expires_at": {
        "type": "string",
        "format": "date-time",
        "description": "Optional expiry time (ISO 8601). Omit for permanent storage."
      }
    },
    "required": ["content", "category", "importance"]
  }
}

{
  "name": "memory_retrieve",
  "description": "Retrieve relevant information from persistent memory. Call at the start of complex tasks, or when you need to recall past context about the user or project.",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Natural language description of what you're looking for"
      },
      "categories": {
        "type": "array",
        "items": {"type": "string"},
        "description": "Limit retrieval to these categories. Empty means search all."
      },
      "limit": {
        "type": "integer",
        "default": 10,
        "description": "Maximum number of results to return"
      },
      "min_importance": {
        "type": "integer",
        "minimum": 1,
        "maximum": 5,
        "default": 1,
        "description": "Filter out memories below this importance level"
      }
    },
    "required": ["query"]
  }
}

{
  "name": "memory_delete",
  "description": "Delete a memory entry that is no longer valid. Use when information has become outdated, contradicted, or the user requests it be forgotten.",
  "input_schema": {
    "type": "object",
    "properties": {
      "memory_id": {"type": "string", "description": "ID of the memory entry to delete"},
      "reason": {"type": "string", "description": "Reason for deletion (written to audit log)"}
    },
    "required": ["memory_id", "reason"]
  }
}

25.3 Storage Backends

Vector Database: Semantic Retrieval

Vector databases transform each memory entry into a high-dimensional embedding and retrieve entries by cosine similarity. This means "user prefers async Python" and "user likes awaitable interfaces" will match the same query, even without shared keywords.

Recommended options:

Qdrant — Rust-native, excellent filtering, ideal for self-hosted deployments
Chroma — Best Python ecosystem integration, great for development
Pinecone — Fully managed cloud service, zero ops overhead
pgvector — PostgreSQL extension, ideal for teams already running Postgres

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer
import uuid
from datetime import datetime

class VectorMemoryBackend:
    """Qdrant-backed vector memory storage"""

    def __init__(self, collection_name: str = "agent_memory"):
        self.client = QdrantClient(host="localhost", port=6333)
        # BAAI/bge-m3 supports both English and Chinese
        self.encoder = SentenceTransformer("BAAI/bge-m3")
        self.collection = collection_name
        self._ensure_collection()

    def _ensure_collection(self):
        names = [c.name for c in self.client.get_collections().collections]
        if self.collection not in names:
            self.client.create_collection(
                collection_name=self.collection,
                vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
            )

    def store(self, content: str, category: str, tags: list[str],
              importance: int, expires_at: str | None = None) -> str:
        memory_id = str(uuid.uuid4())
        vector = self.encoder.encode(content).tolist()
        self.client.upsert(
            collection_name=self.collection,
            points=[PointStruct(
                id=memory_id,
                vector=vector,
                payload={
                    "content": content, "category": category,
                    "tags": tags, "importance": importance,
                    "created_at": datetime.utcnow().isoformat(),
                    "expires_at": expires_at
                }
            )]
        )
        return memory_id

    def retrieve(self, query: str, categories: list[str] | None = None,
                 limit: int = 10, min_importance: int = 1) -> list[dict]:
        query_vector = self.encoder.encode(query).tolist()
        
        must_conditions = []
        if categories:
            must_conditions.append({"key": "category", "match": {"any": categories}})
        if min_importance > 1:
            must_conditions.append({"key": "importance", "range": {"gte": min_importance}})

        results = self.client.search(
            collection_name=self.collection,
            query_vector=query_vector,
            query_filter={"must": must_conditions} if must_conditions else None,
            limit=limit,
            with_payload=True
        )
        now = datetime.utcnow().isoformat()
        return [
            {"id": str(r.id), "content": r.payload["content"],
             "category": r.payload["category"],
             "importance": r.payload["importance"],
             "score": r.score, "created_at": r.payload["created_at"]}
            for r in results
            if not r.payload.get("expires_at") or r.payload["expires_at"] > now
        ]

    def delete(self, memory_id: str, reason: str):
        self.client.delete(
            collection_name=self.collection,
            points_selector={"points": [memory_id]}
        )
        print(f"[Memory Audit] Deleted {memory_id}: {reason}")

Key-Value Store: Structured Retrieval

For highly structured memories requiring exact-match lookups — user settings, configuration flags, deterministic facts — a relational store is more appropriate:

import sqlite3, json, uuid
from datetime import datetime

class KVMemoryBackend:
    def __init__(self, db_path: str = "memory.db"):
        self.conn = sqlite3.connect(db_path, check_same_thread=False)
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS memories (
                id TEXT PRIMARY KEY,
                content TEXT NOT NULL,
                category TEXT NOT NULL,
                tags TEXT,
                importance INTEGER DEFAULT 3,
                created_at TEXT NOT NULL,
                expires_at TEXT,
                deleted_at TEXT
            )
        """)
        self.conn.execute("CREATE INDEX IF NOT EXISTS idx_cat ON memories(category)")
        self.conn.commit()

    def store(self, content: str, category: str, tags: list[str],
              importance: int, expires_at: str | None = None) -> str:
        mid = str(uuid.uuid4())
        self.conn.execute(
            "INSERT INTO memories VALUES (?,?,?,?,?,?,?,?)",
            (mid, content, category, json.dumps(tags), importance,
             datetime.utcnow().isoformat(), expires_at, None)
        )
        self.conn.commit()
        return mid

25.4 Full Integration with Claude API

import anthropic, json

class MemoryEnabledAgent:
    def __init__(self, user_id: str):
        self.client = anthropic.Anthropic()
        self.memory = VectorMemoryBackend(f"memory_{user_id}")
        self.tools = self._define_tools()

    def _define_tools(self) -> list[dict]:
        # Returns the three tool definitions shown in section 25.2
        return [memory_store_tool, memory_retrieve_tool, memory_delete_tool]

    def _execute_tool(self, name: str, inp: dict) -> str:
        if name == "memory_store":
            mid = self.memory.store(
                inp["content"], inp["category"],
                inp.get("tags", []), inp["importance"], inp.get("expires_at")
            )
            return json.dumps({"success": True, "memory_id": mid})
        elif name == "memory_retrieve":
            results = self.memory.retrieve(
                inp["query"], inp.get("categories"), inp.get("limit", 5)
            )
            return json.dumps({"memories": results})
        elif name == "memory_delete":
            self.memory.delete(inp["memory_id"], inp["reason"])
            return json.dumps({"success": True})
        return json.dumps({"error": f"Unknown tool: {name}"})

    def chat(self, user_message: str) -> str:
        messages = [{"role": "user", "content": user_message}]
        system = """You are a persistent-memory assistant.

At the start of each conversation:
1. Call memory_retrieve to find context relevant to the user's request
2. Incorporate retrieved memories into your response
3. Call memory_store when you discover important new facts
4. Call memory_delete when you find outdated or contradicted memories

Storage priorities:
- Explicit user preferences → importance 5
- Key project decisions → importance 4  
- Useful background context → importance 3
- Transient details → do not store"""

        while True:
            response = self.client.messages.create(
                model="claude-opus-4-5",
                max_tokens=2048,
                system=system,
                tools=self.tools,
                messages=messages
            )
            if response.stop_reason == "tool_use":
                tool_results = []
                for block in response.content:
                    if block.type == "tool_use":
                        result = self._execute_tool(block.name, block.input)
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": result
                        })
                messages.append({"role": "assistant", "content": response.content})
                messages.append({"role": "user", "content": tool_results})
                continue
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return ""

25.5 Retrieval Strategies

Proactive Prefetch at Session Start

Rather than waiting for Claude to decide to retrieve memories, inject the most relevant ones directly into the system prompt:

def build_system_with_memories(self, user_message: str, base_system: str) -> str:
    memories = self.memory.retrieve(query=user_message, limit=5, min_importance=3)
    if not memories:
        return base_system
    mem_block = "\n".join(
        f"- [{m['category']}] {m['content']}" for m in memories
    )
    return base_system + f"\n\n## Relevant Memory\n{mem_block}"

Temporal Decay

Memories should become less influential over time. A half-life decay model prevents stale information from dominating:

import math

def effective_importance(importance: int, created_at: str,
                         half_life_days: float = 30.0) -> float:
    days = (datetime.utcnow() - datetime.fromisoformat(created_at)).days
    decay = math.exp(-0.693 * days / half_life_days)
    return importance * decay

Conflict Detection

Before storing a new memory, check whether a contradictory entry already exists:

def store_with_dedup(self, content: str, category: str, importance: int) -> str:
    existing = self.memory.retrieve(query=content, categories=[category], limit=3)
    if existing and existing[0]["score"] > 0.92:
        self.memory.delete(existing[0]["id"], f"Superseded by: {content[:50]}")
    return self.memory.store(content, category, [], importance)

25.6 Production Engineering Considerations

Capacity management — Vector databases are not unlimited. Implement a periodic pruning job that removes the lowest effective-importance entries when the collection exceeds a threshold (e.g., 10,000 entries per user).

User isolation — Each user's memories must be strictly namespaced. Using per-user collection names or a user_id metadata filter ensures cross-contamination is impossible.

Privacy compliance — Provide a "forget everything" endpoint (required for GDPR Article 17). Never store passwords, API keys, or payment credentials in the memory system. Encrypt memory content at rest.

Latency — Retrieval operations should be async and ideally run in parallel with the first Claude API call when possible. A well-tuned Qdrant instance returns top-10 results in under 10ms for collections under 1M entries.

Summary

The Memory Tool transforms Claude from a single-session assistant into a long-term knowledge partner. Key takeaways:

Define three tools with standard JSON Schema: memory_store, memory_retrieve, memory_delete
Use vector databases for semantic retrieval; key-value stores for structured exact-match lookups
Combine proactive prefetch (system prompt injection) with agent-driven retrieval for best results
Apply temporal decay and conflict detection to keep memories fresh and consistent
Address capacity limits, user isolation, and privacy compliance before production deployment

The next chapter explores Context Editing — how to surgically inject, modify, and control the information Claude sees within a single session.

Rate this chapter

4.7 / 5 (7 ratings)