Computer Use: Complete Practical Guide and Security Protection for Screenshot Control, Browser Automation and Desktop Operations
Chapter 25: Memory Tool: External Memory Storage and Cross-Session Knowledge Persistence
25.1 Why Agents Need External Memory
Claude's native context window, even at 200K tokens in Claude 3.7 Sonnet, is fundamentally transient. When a session ends, everything in that window disappears. For agents that need to accumulate knowledge across days, weeks, or months — tracking user preferences, project decisions, evolving requirements — this imposes a hard architectural ceiling.
The Memory Tool solves this by converting ephemeral in-context working memory into persistent long-term storage. Rather than clumsily concatenating all prior conversations into every new prompt (which quickly exhausts context budgets and buries relevant facts under noise), Memory Tool gives the agent a structured, searchable external brain.
The fundamental shift is one of agency: instead of the system passively feeding history to Claude, Claude actively decides what to remember, what to retrieve, and what to forget. This mirrors how human experts work — a doctor doesn't replay every prior patient conversation before a follow-up; they recall the relevant history and update it with new findings.
Three Layers of Agent Memory
| Layer | Storage Location | Lifespan | Typical Content |
|---|---|---|---|
| Working memory | Context window (in-context) | Single session | Current conversation, tool call results |
| Episodic memory | External database | Weeks to months | User preferences, past decisions, project context |
| Semantic memory | Vector database | Long-term | Domain knowledge, factual information, documents |
Memory Tool primarily serves episodic and semantic memory management.
25.2 Standard Tool Definitions
In Anthropic's Tool Use framework, Memory Tool is defined as a set of three JSON Schema tool definitions. These schemas are passed in the tools parameter of each API call, enabling Claude to call them autonomously when appropriate.
{
"name": "memory_store",
"description": "Store important information in the persistent memory system. Call this when you discover information useful for future interactions: user preferences, project progress, key decisions, important facts.",
"input_schema": {
"type": "object",
"properties": {
"content": {
"type": "string",
"description": "The information to store. Should be concise yet self-contained — readable without the surrounding conversation."
},
"category": {
"type": "string",
"enum": ["user_preference", "project_context", "factual_knowledge",
"decision_log", "relationship", "task_progress"],
"description": "Category for filtering during retrieval"
},
"tags": {
"type": "array",
"items": {"type": "string"},
"description": "Keyword tags for semantic retrieval"
},
"importance": {
"type": "integer",
"minimum": 1,
"maximum": 5,
"description": "Importance score 1-5. Affects retrieval priority and forgetting policy."
},
"expires_at": {
"type": "string",
"format": "date-time",
"description": "Optional expiry time (ISO 8601). Omit for permanent storage."
}
},
"required": ["content", "category", "importance"]
}
}
{
"name": "memory_retrieve",
"description": "Retrieve relevant information from persistent memory. Call at the start of complex tasks, or when you need to recall past context about the user or project.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Natural language description of what you're looking for"
},
"categories": {
"type": "array",
"items": {"type": "string"},
"description": "Limit retrieval to these categories. Empty means search all."
},
"limit": {
"type": "integer",
"default": 10,
"description": "Maximum number of results to return"
},
"min_importance": {
"type": "integer",
"minimum": 1,
"maximum": 5,
"default": 1,
"description": "Filter out memories below this importance level"
}
},
"required": ["query"]
}
}
{
"name": "memory_delete",
"description": "Delete a memory entry that is no longer valid. Use when information has become outdated, contradicted, or the user requests it be forgotten.",
"input_schema": {
"type": "object",
"properties": {
"memory_id": {"type": "string", "description": "ID of the memory entry to delete"},
"reason": {"type": "string", "description": "Reason for deletion (written to audit log)"}
},
"required": ["memory_id", "reason"]
}
}
25.3 Storage Backends
Vector Database: Semantic Retrieval
Vector databases transform each memory entry into a high-dimensional embedding and retrieve entries by cosine similarity. This means "user prefers async Python" and "user likes awaitable interfaces" will match the same query, even without shared keywords.
Recommended options:
- Qdrant — Rust-native, excellent filtering, ideal for self-hosted deployments
- Chroma — Best Python ecosystem integration, great for development
- Pinecone — Fully managed cloud service, zero ops overhead
- pgvector — PostgreSQL extension, ideal for teams already running Postgres
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
from sentence_transformers import SentenceTransformer
import uuid
from datetime import datetime
class VectorMemoryBackend:
"""Qdrant-backed vector memory storage"""
def __init__(self, collection_name: str = "agent_memory"):
self.client = QdrantClient(host="localhost", port=6333)
# BAAI/bge-m3 supports both English and Chinese
self.encoder = SentenceTransformer("BAAI/bge-m3")
self.collection = collection_name
self._ensure_collection()
def _ensure_collection(self):
names = [c.name for c in self.client.get_collections().collections]
if self.collection not in names:
self.client.create_collection(
collection_name=self.collection,
vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
)
def store(self, content: str, category: str, tags: list[str],
importance: int, expires_at: str | None = None) -> str:
memory_id = str(uuid.uuid4())
vector = self.encoder.encode(content).tolist()
self.client.upsert(
collection_name=self.collection,
points=[PointStruct(
id=memory_id,
vector=vector,
payload={
"content": content, "category": category,
"tags": tags, "importance": importance,
"created_at": datetime.utcnow().isoformat(),
"expires_at": expires_at
}
)]
)
return memory_id
def retrieve(self, query: str, categories: list[str] | None = None,
limit: int = 10, min_importance: int = 1) -> list[dict]:
query_vector = self.encoder.encode(query).tolist()
must_conditions = []
if categories:
must_conditions.append({"key": "category", "match": {"any": categories}})
if min_importance > 1:
must_conditions.append({"key": "importance", "range": {"gte": min_importance}})
results = self.client.search(
collection_name=self.collection,
query_vector=query_vector,
query_filter={"must": must_conditions} if must_conditions else None,
limit=limit,
with_payload=True
)
now = datetime.utcnow().isoformat()
return [
{"id": str(r.id), "content": r.payload["content"],
"category": r.payload["category"],
"importance": r.payload["importance"],
"score": r.score, "created_at": r.payload["created_at"]}
for r in results
if not r.payload.get("expires_at") or r.payload["expires_at"] > now
]
def delete(self, memory_id: str, reason: str):
self.client.delete(
collection_name=self.collection,
points_selector={"points": [memory_id]}
)
print(f"[Memory Audit] Deleted {memory_id}: {reason}")
Key-Value Store: Structured Retrieval
For highly structured memories requiring exact-match lookups — user settings, configuration flags, deterministic facts — a relational store is more appropriate:
import sqlite3, json, uuid
from datetime import datetime
class KVMemoryBackend:
def __init__(self, db_path: str = "memory.db"):
self.conn = sqlite3.connect(db_path, check_same_thread=False)
self.conn.execute("""
CREATE TABLE IF NOT EXISTS memories (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
category TEXT NOT NULL,
tags TEXT,
importance INTEGER DEFAULT 3,
created_at TEXT NOT NULL,
expires_at TEXT,
deleted_at TEXT
)
""")
self.conn.execute("CREATE INDEX IF NOT EXISTS idx_cat ON memories(category)")
self.conn.commit()
def store(self, content: str, category: str, tags: list[str],
importance: int, expires_at: str | None = None) -> str:
mid = str(uuid.uuid4())
self.conn.execute(
"INSERT INTO memories VALUES (?,?,?,?,?,?,?,?)",
(mid, content, category, json.dumps(tags), importance,
datetime.utcnow().isoformat(), expires_at, None)
)
self.conn.commit()
return mid
25.4 Full Integration with Claude API
import anthropic, json
class MemoryEnabledAgent:
def __init__(self, user_id: str):
self.client = anthropic.Anthropic()
self.memory = VectorMemoryBackend(f"memory_{user_id}")
self.tools = self._define_tools()
def _define_tools(self) -> list[dict]:
# Returns the three tool definitions shown in section 25.2
return [memory_store_tool, memory_retrieve_tool, memory_delete_tool]
def _execute_tool(self, name: str, inp: dict) -> str:
if name == "memory_store":
mid = self.memory.store(
inp["content"], inp["category"],
inp.get("tags", []), inp["importance"], inp.get("expires_at")
)
return json.dumps({"success": True, "memory_id": mid})
elif name == "memory_retrieve":
results = self.memory.retrieve(
inp["query"], inp.get("categories"), inp.get("limit", 5)
)
return json.dumps({"memories": results})
elif name == "memory_delete":
self.memory.delete(inp["memory_id"], inp["reason"])
return json.dumps({"success": True})
return json.dumps({"error": f"Unknown tool: {name}"})
def chat(self, user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
system = """You are a persistent-memory assistant.
At the start of each conversation:
1. Call memory_retrieve to find context relevant to the user's request
2. Incorporate retrieved memories into your response
3. Call memory_store when you discover important new facts
4. Call memory_delete when you find outdated or contradicted memories
Storage priorities:
- Explicit user preferences → importance 5
- Key project decisions → importance 4
- Useful background context → importance 3
- Transient details → do not store"""
while True:
response = self.client.messages.create(
model="claude-opus-4-5",
max_tokens=2048,
system=system,
tools=self.tools,
messages=messages
)
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = self._execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
continue
for block in response.content:
if hasattr(block, "text"):
return block.text
return ""
25.5 Retrieval Strategies
Proactive Prefetch at Session Start
Rather than waiting for Claude to decide to retrieve memories, inject the most relevant ones directly into the system prompt:
def build_system_with_memories(self, user_message: str, base_system: str) -> str:
memories = self.memory.retrieve(query=user_message, limit=5, min_importance=3)
if not memories:
return base_system
mem_block = "\n".join(
f"- [{m['category']}] {m['content']}" for m in memories
)
return base_system + f"\n\n## Relevant Memory\n{mem_block}"
Temporal Decay
Memories should become less influential over time. A half-life decay model prevents stale information from dominating:
import math
def effective_importance(importance: int, created_at: str,
half_life_days: float = 30.0) -> float:
days = (datetime.utcnow() - datetime.fromisoformat(created_at)).days
decay = math.exp(-0.693 * days / half_life_days)
return importance * decay
Conflict Detection
Before storing a new memory, check whether a contradictory entry already exists:
def store_with_dedup(self, content: str, category: str, importance: int) -> str:
existing = self.memory.retrieve(query=content, categories=[category], limit=3)
if existing and existing[0]["score"] > 0.92:
self.memory.delete(existing[0]["id"], f"Superseded by: {content[:50]}")
return self.memory.store(content, category, [], importance)
25.6 Production Engineering Considerations
Capacity management — Vector databases are not unlimited. Implement a periodic pruning job that removes the lowest effective-importance entries when the collection exceeds a threshold (e.g., 10,000 entries per user).
User isolation — Each user's memories must be strictly namespaced. Using per-user collection names or a user_id metadata filter ensures cross-contamination is impossible.
Privacy compliance — Provide a "forget everything" endpoint (required for GDPR Article 17). Never store passwords, API keys, or payment credentials in the memory system. Encrypt memory content at rest.
Latency — Retrieval operations should be async and ideally run in parallel with the first Claude API call when possible. A well-tuned Qdrant instance returns top-10 results in under 10ms for collections under 1M entries.
Summary
The Memory Tool transforms Claude from a single-session assistant into a long-term knowledge partner. Key takeaways:
- Define three tools with standard JSON Schema:
memory_store,memory_retrieve,memory_delete - Use vector databases for semantic retrieval; key-value stores for structured exact-match lookups
- Combine proactive prefetch (system prompt injection) with agent-driven retrieval for best results
- Apply temporal decay and conflict detection to keep memories fresh and consistent
- Address capacity limits, user isolation, and privacy compliance before production deployment
The next chapter explores Context Editing — how to surgically inject, modify, and control the information Claude sees within a single session.