Chapter 16

Building AI Agents with Claude API — From Tool Use to Autonomous Execution

Chapter 16: Building AI Agents with Claude API — From Tool Use to Autonomous Task Execution

An Agent is not a smarter chatbot — it's a different execution model entirely. You give it a goal; it autonomously plans steps, calls tools, reads results, decides next actions, and loops until done. This chapter covers the complete Tool Use API mechanics, implements a runnable code-review Agent, explains Memory management and safety boundaries, and delivers a 200-line production-ready Agent framework you can fork and deploy today.

Chapter goals: Master the complete Claude Tool Use call loop; implement a runnable code-review Agent independently; understand Agent Memory and safety boundary design; take away a forkable Agent framework you can use immediately.

Agent vs Ordinary LLM Call: The Core Difference

Dimension Ordinary LLM Call Agent
Interaction You ask → AI answers → done (one round) You give a goal → AI loops: plan + execute → done
Tool use None Reads files, runs commands, writes output, etc.
Iterations 1 N, until task is complete or limit is hit
State Carried only in the messages list Tool call history + optional external Memory
Best for Q&A, generation, translation Multi-step automation, code analysis, data pipelines
Examples Chat conversation Claude Code, GitHub Copilot Workspace

Key mechanism: Claude signals intent via stop_reason. "end_turn" means the task is complete. "tool_use" means it wants to call a tool — you execute it, return the result, and let Claude continue. That loop is everything an Agent is.

Complete Tool Use Implementation (Runnable Code)

The following implements a three-tool Agent with tool definitions, execution logic, and the call loop — copy-paste runnable:

import anthropic
import subprocess
from pathlib import Path

client = anthropic.Anthropic()

tools = [
    {
        "name": "read_file",
        "description": "Read the contents of a file",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "File path"}
            },
            "required": ["path"]
        }
    },
    {
        "name": "write_file",
        "description": "Write content to a file",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string"},
                "content": {"type": "string"}
            },
            "required": ["path", "content"]
        }
    },
    {
        "name": "run_command",
        "description": "Execute a shell command and return output",
        "input_schema": {
            "type": "object",
            "properties": {
                "command": {"type": "string"}
            },
            "required": ["command"]
        }
    }
]

def execute_tool(name: str, inputs: dict) -> str:
    if name == "read_file":
        try:
            return Path(inputs["path"]).read_text()
        except FileNotFoundError:
            return f"Error: File not found: {inputs['path']}"
    elif name == "write_file":
        Path(inputs["path"]).write_text(inputs["content"])
        return f"Wrote {len(inputs['content'])} chars to {inputs['path']}"
    elif name == "run_command":
        result = subprocess.run(
            inputs["command"], shell=True,
            capture_output=True, text=True, timeout=30
        )
        return (result.stdout + result.stderr)[:5000]

def run_agent(task: str, max_turns: int = 10) -> str:
    messages = [{"role": "user", "content": task}]

    for turn in range(max_turns):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        if response.stop_reason == "end_turn":
            return response.content[0].text

        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"  → calling: {block.name}({block.input})")
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            messages.append({"role": "user", "content": tool_results})

    return "Max turns reached, task incomplete"

result = run_agent(
    "Scan all Python files in src/, find functions without type annotations, "
    "write a report to type_report.md"
)
print(result)

Critical ordering: You must append the assistant's reply (containing the tool_use block) to messages before appending the tool_result. Wrong order causes an API error — Claude needs to see its own request to match the result.

Practice: Code Review Agent

A code-review Agent that autonomously fetches changed files, reads each diff, identifies issues, and writes a structured report. The three key tools are get_changed_files, read_file_diff, and create_review_report.

The system prompt specifies exact steps in order — without this, the Agent may skip files or generate the report before reading all diffs. Constrain the review to security, bugs, and performance only; skip style and naming to keep signal-to-noise high.

Measured performance: On an 8-file PR, this Agent makes ~12 tool calls and completes in ~40 seconds, reliably catching SQL string concatenation and missing try/except blocks that humans often miss on first pass.

Memory Management

Type Implementation Lifetime Use case
Short-term messages list Current run only Tool call history, intermediate results
Long-term SQLite / Redis Across runs Project conventions, past findings, user preferences

Add a remember tool to your tool list. The handler writes key-value pairs to a JSON file or SQLite DB. At the start of each run, load the stored facts and inject them into the system prompt — the Agent carries project knowledge across sessions without re-discovering it every time.

Safety Boundaries: Which Tools to Give and Which to Withhold

Tool type Risk Recommendation
Read file, search code Low Give freely; restrict to project directory
Write file Medium Give; exclude .env and key files from allowed paths
Execute shell command High Withhold or restrict to allowlist (git, pytest, etc.)
Delete file Extreme Never give; require manual confirmation
Database write High Read-only access only; writes trigger Human-in-the-loop
Send email / message High Always require human confirmation; irreversible actions cannot be automated

Implement a dangerous=True flag on Tool objects. Before executing any dangerous tool, print the name and inputs, prompt the user to approve, and only proceed on explicit "y". This one pattern prevents the vast majority of accidental Agent damage.

200-Line Production Agent Framework

"""
agent.py — Production-ready Claude Agent framework
Features: tool call loop / Human-in-the-loop / long-term memory / logging
"""
import json
import logging
import sqlite3
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Any, Callable

import anthropic

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s"
)
log = logging.getLogger("agent")


@dataclass
class Tool:
    name: str
    description: str
    input_schema: dict
    handler: Callable
    dangerous: bool = False  # True = requires human approval

    def api_spec(self) -> dict:
        return {
            "name": self.name,
            "description": self.description,
            "input_schema": self.input_schema
        }


class Memory:
    """SQLite long-term memory"""
    def __init__(self, path: str = "agent_memory.db"):
        self.conn = sqlite3.connect(path)
        self.conn.execute(
            "CREATE TABLE IF NOT EXISTS mem "
            "(agent TEXT, key TEXT, val TEXT, ts TEXT, PRIMARY KEY(agent,key))"
        )
        self.conn.commit()

    def set(self, agent: str, key: str, val: Any):
        self.conn.execute(
            "INSERT OR REPLACE INTO mem VALUES (?,?,?,?)",
            (agent, key, json.dumps(val, ensure_ascii=False), datetime.now().isoformat())
        )
        self.conn.commit()

    def get(self, agent: str, key: str, default=None) -> Any:
        row = self.conn.execute(
            "SELECT val FROM mem WHERE agent=? AND key=?", (agent, key)
        ).fetchone()
        return json.loads(row[0]) if row else default

    def all(self, agent: str) -> dict:
        rows = self.conn.execute(
            "SELECT key, val FROM mem WHERE agent=?", (agent,)
        ).fetchall()
        return {r[0]: json.loads(r[1]) for r in rows}


@dataclass
class AgentConfig:
    model: str = "claude-sonnet-4-6"
    max_tokens: int = 4096
    max_turns: int = 30
    system: str = ""
    use_memory: bool = False
    memory_db: str = "agent_memory.db"
    auto_approve: bool = False  # Keep False in production


class Agent:
    def __init__(self, agent_id: str, tools: list[Tool], config: AgentConfig = None):
        self.id = agent_id
        self.tools = {t.name: t for t in tools}
        self.cfg = config or AgentConfig()
        self.client = anthropic.Anthropic()
        self.memory = Memory(self.cfg.memory_db) if self.cfg.use_memory else None
        self._call_log: list[dict] = []

    def _run_tool(self, name: str, inputs: dict) -> str:
        tool = self.tools.get(name)
        if not tool:
            return f"ERROR: unknown tool '{name}'"

        if tool.dangerous and not self.cfg.auto_approve:
            print(f"\n[!] Agent [{self.id}] wants to run: {name}")
            print(f"    inputs: {json.dumps(inputs, ensure_ascii=False)}")
            ans = input("    Approve? [y/N] ").strip().lower()
            if ans != "y":
                return "Rejected by user."

        try:
            result = tool.handler(**inputs)
            self._call_log.append({"tool": name, "ok": True})
            log.info(f"tool OK: {name}")
            return str(result)
        except Exception as exc:
            self._call_log.append({"tool": name, "ok": False, "err": str(exc)})
            log.error(f"tool FAIL: {name} -> {exc}")
            return f"ERROR: {exc}"

    def _build_system(self) -> str:
        sys_prompt = self.cfg.system
        if self.memory:
            mem = self.memory.all(self.id)
            if mem:
                facts = "\n".join(f"- {k}: {v}" for k, v in mem.items())
                sys_prompt += f"\n\nKnown project information:\n{facts}"
        return sys_prompt

    def run(self, task: str) -> str:
        log.info(f"Agent [{self.id}] start: {task[:60]}...")
        self._call_log.clear()

        system = self._build_system()
        messages = [{"role": "user", "content": task}]
        api_tools = [t.api_spec() for t in self.tools.values()]

        for turn in range(self.cfg.max_turns):
            log.info(f"turn {turn + 1}/{self.cfg.max_turns}")
            kwargs = dict(
                model=self.cfg.model,
                max_tokens=self.cfg.max_tokens,
                tools=api_tools,
                messages=messages
            )
            if system:
                kwargs["system"] = system

            resp = self.client.messages.create(**kwargs)
            messages.append({"role": "assistant", "content": resp.content})

            if resp.stop_reason == "end_turn":
                final = next(
                    (b.text for b in resp.content if hasattr(b, "text")), ""
                )
                log.info(f"done in {len(self._call_log)} tool calls")
                if self.memory:
                    self.memory.set(self.id, "_last_task", {
                        "task": task[:200],
                        "at": datetime.now().isoformat(),
                        "calls": len(self._call_log)
                    })
                return final

            if resp.stop_reason == "tool_use":
                results = []
                for blk in resp.content:
                    if blk.type == "tool_use":
                        res = self._run_tool(blk.name, blk.input)
                        results.append({
                            "type": "tool_result",
                            "tool_use_id": blk.id,
                            "content": res
                        })
                messages.append({"role": "user", "content": results})
            else:
                break

        log.warning("max turns reached")
        return "Task incomplete: max turns reached"


# ── Usage example ──────────────────────────────────────────
if __name__ == "__main__":
    import subprocess as sp

    def _read(path: str) -> str:
        return Path(path).read_text()[:4000]

    def _git_diff(base: str = "main") -> str:
        r = sp.run(["git", "diff", f"{base}...HEAD"], capture_output=True, text=True)
        return r.stdout[:5000]

    def _shell(cmd: str) -> str:
        r = sp.run(cmd, shell=True, capture_output=True, text=True, timeout=30)
        return (r.stdout + r.stderr)[:3000]

    agent = Agent(
        agent_id="code-reviewer",
        tools=[
            Tool("read_file", "Read a file",
                 {"type": "object",
                  "properties": {"path": {"type": "string"}},
                  "required": ["path"]},
                 _read),
            Tool("git_diff", "Get git diff",
                 {"type": "object",
                  "properties": {"base": {"type": "string"}},
                  "required": []},
                 _git_diff),
            Tool("run_shell", "Execute a shell command",
                 {"type": "object",
                  "properties": {"cmd": {"type": "string"}},
                  "required": ["cmd"]},
                 _shell,
                 dangerous=True),
        ],
        config=AgentConfig(
            system="You are a strict code reviewer focused on security and performance.",
            use_memory=True,
            auto_approve=False
        )
    )

    result = agent.run("Review all code changes in this PR and output a structured review report.")
    print(result)

Framework highlights: Under 200 lines, yet includes the full tool call loop, Human-in-the-loop approval, SQLite long-term memory, structured logging, and max-turn protection. Subclass Agent, pass your tool list, and it runs immediately.

Chapter Key Points

  1. An Agent is a loop: Check stop_reasontool_use means execute and return results, end_turn means done. That loop is the entire Agent mechanism.
  2. Tool descriptions determine accuracy: Precise, specific descriptions in each tool's description field are the single biggest factor in whether Claude calls the right tool with the right arguments.
  3. Always truncate tool output: Cap tool return values at 3000–5000 characters. Long outputs burn context window fast and cause Agents to fail mid-task.
  4. Dangerous tools need human confirmation: One input() call before executing irreversible operations is the simplest and most effective safety measure you can add.
  5. SQLite is sufficient for long-term memory: Store project conventions and past findings in SQLite, inject them into the system prompt at run start. No extra dependencies needed.

Next chapter: team workflows — how to standardize .cursorrules across a team, control API costs, and integrate AI Review into your CI pipeline.

Rate this chapter
4.5  / 5  (14 ratings)

💬 Comments