Building AI Agents with Claude API — From Tool Use to Autonomous Execution
Chapter 16: Building AI Agents with Claude API — From Tool Use to Autonomous Task Execution
An Agent is not a smarter chatbot — it's a different execution model entirely. You give it a goal; it autonomously plans steps, calls tools, reads results, decides next actions, and loops until done. This chapter covers the complete Tool Use API mechanics, implements a runnable code-review Agent, explains Memory management and safety boundaries, and delivers a 200-line production-ready Agent framework you can fork and deploy today.
Chapter goals: Master the complete Claude Tool Use call loop; implement a runnable code-review Agent independently; understand Agent Memory and safety boundary design; take away a forkable Agent framework you can use immediately.
Agent vs Ordinary LLM Call: The Core Difference
| Dimension | Ordinary LLM Call | Agent |
|---|---|---|
| Interaction | You ask → AI answers → done (one round) | You give a goal → AI loops: plan + execute → done |
| Tool use | None | Reads files, runs commands, writes output, etc. |
| Iterations | 1 | N, until task is complete or limit is hit |
| State | Carried only in the messages list | Tool call history + optional external Memory |
| Best for | Q&A, generation, translation | Multi-step automation, code analysis, data pipelines |
| Examples | Chat conversation | Claude Code, GitHub Copilot Workspace |
Key mechanism: Claude signals intent via stop_reason. "end_turn" means the task is complete. "tool_use" means it wants to call a tool — you execute it, return the result, and let Claude continue. That loop is everything an Agent is.
Complete Tool Use Implementation (Runnable Code)
The following implements a three-tool Agent with tool definitions, execution logic, and the call loop — copy-paste runnable:
import anthropic
import subprocess
from pathlib import Path
client = anthropic.Anthropic()
tools = [
{
"name": "read_file",
"description": "Read the contents of a file",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path"}
},
"required": ["path"]
}
},
{
"name": "write_file",
"description": "Write content to a file",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"}
},
"required": ["path", "content"]
}
},
{
"name": "run_command",
"description": "Execute a shell command and return output",
"input_schema": {
"type": "object",
"properties": {
"command": {"type": "string"}
},
"required": ["command"]
}
}
]
def execute_tool(name: str, inputs: dict) -> str:
if name == "read_file":
try:
return Path(inputs["path"]).read_text()
except FileNotFoundError:
return f"Error: File not found: {inputs['path']}"
elif name == "write_file":
Path(inputs["path"]).write_text(inputs["content"])
return f"Wrote {len(inputs['content'])} chars to {inputs['path']}"
elif name == "run_command":
result = subprocess.run(
inputs["command"], shell=True,
capture_output=True, text=True, timeout=30
)
return (result.stdout + result.stderr)[:5000]
def run_agent(task: str, max_turns: int = 10) -> str:
messages = [{"role": "user", "content": task}]
for turn in range(max_turns):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
return response.content[0].text
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f" → calling: {block.name}({block.input})")
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
return "Max turns reached, task incomplete"
result = run_agent(
"Scan all Python files in src/, find functions without type annotations, "
"write a report to type_report.md"
)
print(result)
Critical ordering: You must append the assistant's reply (containing the
tool_useblock) to messages before appending thetool_result. Wrong order causes an API error — Claude needs to see its own request to match the result.
Practice: Code Review Agent
A code-review Agent that autonomously fetches changed files, reads each diff, identifies issues, and writes a structured report. The three key tools are get_changed_files, read_file_diff, and create_review_report.
The system prompt specifies exact steps in order — without this, the Agent may skip files or generate the report before reading all diffs. Constrain the review to security, bugs, and performance only; skip style and naming to keep signal-to-noise high.
Measured performance: On an 8-file PR, this Agent makes ~12 tool calls and completes in ~40 seconds, reliably catching SQL string concatenation and missing try/except blocks that humans often miss on first pass.
Memory Management
| Type | Implementation | Lifetime | Use case |
|---|---|---|---|
| Short-term | messages list | Current run only | Tool call history, intermediate results |
| Long-term | SQLite / Redis | Across runs | Project conventions, past findings, user preferences |
Add a remember tool to your tool list. The handler writes key-value pairs to a JSON file or SQLite DB. At the start of each run, load the stored facts and inject them into the system prompt — the Agent carries project knowledge across sessions without re-discovering it every time.
Safety Boundaries: Which Tools to Give and Which to Withhold
| Tool type | Risk | Recommendation |
|---|---|---|
| Read file, search code | Low | Give freely; restrict to project directory |
| Write file | Medium | Give; exclude .env and key files from allowed paths |
| Execute shell command | High | Withhold or restrict to allowlist (git, pytest, etc.) |
| Delete file | Extreme | Never give; require manual confirmation |
| Database write | High | Read-only access only; writes trigger Human-in-the-loop |
| Send email / message | High | Always require human confirmation; irreversible actions cannot be automated |
Implement a dangerous=True flag on Tool objects. Before executing any dangerous tool, print the name and inputs, prompt the user to approve, and only proceed on explicit "y". This one pattern prevents the vast majority of accidental Agent damage.
200-Line Production Agent Framework
"""
agent.py — Production-ready Claude Agent framework
Features: tool call loop / Human-in-the-loop / long-term memory / logging
"""
import json
import logging
import sqlite3
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Any, Callable
import anthropic
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s"
)
log = logging.getLogger("agent")
@dataclass
class Tool:
name: str
description: str
input_schema: dict
handler: Callable
dangerous: bool = False # True = requires human approval
def api_spec(self) -> dict:
return {
"name": self.name,
"description": self.description,
"input_schema": self.input_schema
}
class Memory:
"""SQLite long-term memory"""
def __init__(self, path: str = "agent_memory.db"):
self.conn = sqlite3.connect(path)
self.conn.execute(
"CREATE TABLE IF NOT EXISTS mem "
"(agent TEXT, key TEXT, val TEXT, ts TEXT, PRIMARY KEY(agent,key))"
)
self.conn.commit()
def set(self, agent: str, key: str, val: Any):
self.conn.execute(
"INSERT OR REPLACE INTO mem VALUES (?,?,?,?)",
(agent, key, json.dumps(val, ensure_ascii=False), datetime.now().isoformat())
)
self.conn.commit()
def get(self, agent: str, key: str, default=None) -> Any:
row = self.conn.execute(
"SELECT val FROM mem WHERE agent=? AND key=?", (agent, key)
).fetchone()
return json.loads(row[0]) if row else default
def all(self, agent: str) -> dict:
rows = self.conn.execute(
"SELECT key, val FROM mem WHERE agent=?", (agent,)
).fetchall()
return {r[0]: json.loads(r[1]) for r in rows}
@dataclass
class AgentConfig:
model: str = "claude-sonnet-4-6"
max_tokens: int = 4096
max_turns: int = 30
system: str = ""
use_memory: bool = False
memory_db: str = "agent_memory.db"
auto_approve: bool = False # Keep False in production
class Agent:
def __init__(self, agent_id: str, tools: list[Tool], config: AgentConfig = None):
self.id = agent_id
self.tools = {t.name: t for t in tools}
self.cfg = config or AgentConfig()
self.client = anthropic.Anthropic()
self.memory = Memory(self.cfg.memory_db) if self.cfg.use_memory else None
self._call_log: list[dict] = []
def _run_tool(self, name: str, inputs: dict) -> str:
tool = self.tools.get(name)
if not tool:
return f"ERROR: unknown tool '{name}'"
if tool.dangerous and not self.cfg.auto_approve:
print(f"\n[!] Agent [{self.id}] wants to run: {name}")
print(f" inputs: {json.dumps(inputs, ensure_ascii=False)}")
ans = input(" Approve? [y/N] ").strip().lower()
if ans != "y":
return "Rejected by user."
try:
result = tool.handler(**inputs)
self._call_log.append({"tool": name, "ok": True})
log.info(f"tool OK: {name}")
return str(result)
except Exception as exc:
self._call_log.append({"tool": name, "ok": False, "err": str(exc)})
log.error(f"tool FAIL: {name} -> {exc}")
return f"ERROR: {exc}"
def _build_system(self) -> str:
sys_prompt = self.cfg.system
if self.memory:
mem = self.memory.all(self.id)
if mem:
facts = "\n".join(f"- {k}: {v}" for k, v in mem.items())
sys_prompt += f"\n\nKnown project information:\n{facts}"
return sys_prompt
def run(self, task: str) -> str:
log.info(f"Agent [{self.id}] start: {task[:60]}...")
self._call_log.clear()
system = self._build_system()
messages = [{"role": "user", "content": task}]
api_tools = [t.api_spec() for t in self.tools.values()]
for turn in range(self.cfg.max_turns):
log.info(f"turn {turn + 1}/{self.cfg.max_turns}")
kwargs = dict(
model=self.cfg.model,
max_tokens=self.cfg.max_tokens,
tools=api_tools,
messages=messages
)
if system:
kwargs["system"] = system
resp = self.client.messages.create(**kwargs)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason == "end_turn":
final = next(
(b.text for b in resp.content if hasattr(b, "text")), ""
)
log.info(f"done in {len(self._call_log)} tool calls")
if self.memory:
self.memory.set(self.id, "_last_task", {
"task": task[:200],
"at": datetime.now().isoformat(),
"calls": len(self._call_log)
})
return final
if resp.stop_reason == "tool_use":
results = []
for blk in resp.content:
if blk.type == "tool_use":
res = self._run_tool(blk.name, blk.input)
results.append({
"type": "tool_result",
"tool_use_id": blk.id,
"content": res
})
messages.append({"role": "user", "content": results})
else:
break
log.warning("max turns reached")
return "Task incomplete: max turns reached"
# ── Usage example ──────────────────────────────────────────
if __name__ == "__main__":
import subprocess as sp
def _read(path: str) -> str:
return Path(path).read_text()[:4000]
def _git_diff(base: str = "main") -> str:
r = sp.run(["git", "diff", f"{base}...HEAD"], capture_output=True, text=True)
return r.stdout[:5000]
def _shell(cmd: str) -> str:
r = sp.run(cmd, shell=True, capture_output=True, text=True, timeout=30)
return (r.stdout + r.stderr)[:3000]
agent = Agent(
agent_id="code-reviewer",
tools=[
Tool("read_file", "Read a file",
{"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"]},
_read),
Tool("git_diff", "Get git diff",
{"type": "object",
"properties": {"base": {"type": "string"}},
"required": []},
_git_diff),
Tool("run_shell", "Execute a shell command",
{"type": "object",
"properties": {"cmd": {"type": "string"}},
"required": ["cmd"]},
_shell,
dangerous=True),
],
config=AgentConfig(
system="You are a strict code reviewer focused on security and performance.",
use_memory=True,
auto_approve=False
)
)
result = agent.run("Review all code changes in this PR and output a structured review report.")
print(result)
Framework highlights: Under 200 lines, yet includes the full tool call loop, Human-in-the-loop approval, SQLite long-term memory, structured logging, and max-turn protection. Subclass
Agent, pass your tool list, and it runs immediately.
Chapter Key Points
- An Agent is a loop: Check
stop_reason—tool_usemeans execute and return results,end_turnmeans done. That loop is the entire Agent mechanism. - Tool descriptions determine accuracy: Precise, specific descriptions in each tool's
descriptionfield are the single biggest factor in whether Claude calls the right tool with the right arguments. - Always truncate tool output: Cap tool return values at 3000–5000 characters. Long outputs burn context window fast and cause Agents to fail mid-task.
- Dangerous tools need human confirmation: One
input()call before executing irreversible operations is the simplest and most effective safety measure you can add. - SQLite is sufficient for long-term memory: Store project conventions and past findings in SQLite, inject them into the system prompt at run start. No extra dependencies needed.
Next chapter: team workflows — how to standardize .cursorrules across a team, control API costs, and integrate AI Review into your CI pipeline.