Chapter 50

Building AI Agents: Tool Use and Multi-Turn Dialogue

Building AI Agents: Tool Use and Multi-Turn Dialogue

L1: Concept โ€” What Is an AI Agent and Why Tool Use Is the Enabling Technology

From Automation Scripts to Autonomous Agents

Throughout the history of software engineering, automation has been the primary lever for improving efficiency. But traditional automation has a fundamental limitation: it can only execute pre-written, fixed logic. When the external environment changes or unexpected situations arise, traditional scripts fail or produce incorrect results.

AI Agents represent a paradigm shift. Unlike fixed scripts, an Agent can:

  1. Perceive: Receive inputs from the external world โ€” user messages, file contents, API responses, sensor data
  2. Reason: Based on perceived information combined with internal knowledge, formulate a plan of action
  3. Act: Execute tool calls, write files, send requests, interact with external systems
  4. Observe: Watch the results of actions and incorporate them into the next round of reasoning

This "Perceive โ†’ Reason โ†’ Act โ†’ Observe" loop is the essence of an Agent. It mirrors, in a striking way, how a human engineer handles complex tasks: you receive a requirement, think about how to approach it, implement a solution, examine the results, and adjust.

Why Tool Use Is Essential

Large Language Models (LLMs) are powerful reasoning engines, but they have several intrinsic limitations:

Knowledge cutoff: Claude's training data has a cutoff date. It doesn't know what happened in today's news and cannot tell you the current stock price.

No side effects: An LLM can only generate text. It cannot actually send an email, read a file from your disk, or call your company's internal API.

Limited computational precision: For exact mathematical calculations, like computing 1234567 ร— 9876543, an LLM may make errors. It excels at reasoning, not precise computation.

Tool Use is the technology that resolves these limitations. Through tool calls, an LLM can:

Tool Use transforms an LLM from a "consultant who only talks" into an "engineer who can actually get things done." This is a qualitative change, not merely quantitative.

Agent Application Scenarios

Having understood the essence of Agents, let's examine some typical applications:

Research Assistant: Given a research topic, the Agent automatically searches relevant papers, reads key articles, synthesizes information, and generates a structured report. This process may require dozens of tool calls.

Code Assistant: Understands codebase structure, locates bugs, writes fix code, runs tests, until tests pass. Every step requires autonomous Agent decision-making.

Data Analysis Agent: Accepts natural-language-described data analysis requirements, automatically writes SQL queries, calls the database, visualizes results, and composes an analysis report.

Customer Service Agent: Understands customer issues, queries the order system, processes refund requests, sends confirmation emails โ€” the entire workflow without human intervention.


L2: Principles โ€” The Anthropic Tool Use Protocol and Conversation Management

Tool Use Protocol in Detail

Anthropic's Tool Use protocol is a carefully designed JSON specification that defines how LLMs interact with external tools. Understanding every detail of this protocol is foundational to building robust Agents.

Tool Definition

Each tool is defined by three core fields:

{
  "name": "search_web",
  "description": "Search the web for current information. Use this when you need up-to-date information that may not be in your training data. Returns a list of search results with titles, URLs, and snippets.",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "The search query to execute"
      },
      "num_results": {
        "type": "integer",
        "description": "Number of results to return (default: 5, max: 20)",
        "default": 5
      }
    },
    "required": ["query"]
  }
}

The name field is the tool's unique identifier. Names should be clear, descriptive, and LLM-friendly. Avoid abbreviations or obscure names.

The description field is critically important โ€” it directly influences when and how the LLM uses this tool. Descriptions should:

The input_schema is a standard JSON Schema defining the parameters the tool accepts. Each field's description is equally important โ€” the LLM relies on these descriptions to correctly populate arguments.

The tool_use Content Block

When Claude decides to call a tool, it returns a tool_use content block in its response:

{
  "type": "tool_use",
  "id": "toolu_01A09q90qw90lq917835lq9",
  "name": "search_web",
  "input": {
    "query": "Go programming language generics tutorial 2024",
    "num_results": 5
  }
}

The id field is a unique identifier for this tool call, needed when returning the tool result. A single response may contain multiple tool_use blocks, representing parallel tool calls โ€” this is an important performance optimization point.

Tool Result Responses

After executing a tool, you return the result to Claude in a specific format:

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
      "content": "Search results:\n1. Go Generics Tutorial - go.dev/doc/tutorial/generics\n..."
    }
  ]
}

Note: tool_result is sent with the user role, because it is information returned to Claude from "the external world."

Error Handling

When tool execution fails, return a result with is_error: true:

{
  "type": "tool_result",
  "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
  "is_error": true,
  "content": "Error: Network timeout after 30 seconds"
}

Claude will decide how to proceed based on the error information โ€” it may retry, try a different approach, or report the error to the user.

Multi-Turn Conversation Management

The heart of an Agent is maintaining conversation history. Every round of interaction requires sending the complete history to Claude so that it can understand the current context.

The structure of conversation history:

[System Prompt]
  |
[User Message 1]
[Claude Response 1] (may contain tool_use)
[Tool Result 1]    (sent as user message)
[Claude Response 2] (may contain more tool_use or final answer)
[Tool Result 2]
...
[Claude Final Response]

This structure has several important constraints:

  1. Messages must strictly alternate: user โ†’ assistant โ†’ user โ†’ assistant...
  2. If an assistant message contains tool_use, the next user message must contain the corresponding tool_result
  3. A single user message can contain multiple tool_result blocks (for parallel calls)

Agent Main Loop Design

The Agent's main loop is the core control flow of the entire system:

Initialize (set system prompt, tool list)
โ†“
Receive user input
โ†“
[Main loop begins]
Send request to Claude API
โ†“
Receive response
โ†“
If response contains tool_use:
  Execute all tools (potentially in parallel)
  Add results to history
  Continue main loop
โ†“
If response is end_turn (final answer):
  Return answer to user
  Wait for next user input
[Main loop ends]

Loop termination conditions:


L3: Code Practice โ€” Building a Complete Go Agent

Project Structure

agent/
โ”œโ”€โ”€ main.go          # Entry point
โ”œโ”€โ”€ agent.go         # Core agent logic
โ”œโ”€โ”€ tools.go         # Tool definitions and implementations
โ”œโ”€โ”€ client.go        # Anthropic API client
โ””โ”€โ”€ history.go       # Conversation history management

Client Wrapper

// client.go
package agent

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "net/http"
    "os"
)

const (
    AnthropicAPIURL = "https://api.anthropic.com/v1/messages"
    DefaultModel    = "claude-opus-4-5"
    MaxTokens       = 4096
)

type Client struct {
    apiKey     string
    httpClient *http.Client
    model      string
}

func NewClient() *Client {
    return &Client{
        apiKey:     os.Getenv("ANTHROPIC_API_KEY"),
        httpClient: &http.Client{},
        model:      DefaultModel,
    }
}

// MessageRequest represents the complete request sent to Claude
type MessageRequest struct {
    Model     string      `json:"model"`
    MaxTokens int         `json:"max_tokens"`
    System    string      `json:"system,omitempty"`
    Messages  []Message   `json:"messages"`
    Tools     []ToolDef   `json:"tools,omitempty"`
}

// Message represents a single message in the dialogue
type Message struct {
    Role    string         `json:"role"`
    Content []ContentBlock `json:"content"`
}

// ContentBlock can be text, a tool call, or a tool result
type ContentBlock struct {
    Type      string          `json:"type"`
    Text      string          `json:"text,omitempty"`
    ID        string          `json:"id,omitempty"`
    Name      string          `json:"name,omitempty"`
    Input     json.RawMessage `json:"input,omitempty"`
    ToolUseID string          `json:"tool_use_id,omitempty"`
    Content   string          `json:"content,omitempty"`
    IsError   bool            `json:"is_error,omitempty"`
}

// MessageResponse is the response from the Claude API
type MessageResponse struct {
    ID         string         `json:"id"`
    Type       string         `json:"type"`
    Role       string         `json:"role"`
    Content    []ContentBlock `json:"content"`
    Model      string         `json:"model"`
    StopReason string         `json:"stop_reason"`
    Usage      Usage          `json:"usage"`
}

type Usage struct {
    InputTokens  int `json:"input_tokens"`
    OutputTokens int `json:"output_tokens"`
}

// ToolDef defines a single tool
type ToolDef struct {
    Name        string          `json:"name"`
    Description string          `json:"description"`
    InputSchema json.RawMessage `json:"input_schema"`
}

func (c *Client) SendMessage(ctx context.Context, req MessageRequest) (*MessageResponse, error) {
    body, err := json.Marshal(req)
    if err != nil {
        return nil, fmt.Errorf("marshal request: %w", err)
    }

    httpReq, err := http.NewRequestWithContext(ctx, "POST", AnthropicAPIURL, bytes.NewReader(body))
    if err != nil {
        return nil, fmt.Errorf("create request: %w", err)
    }

    httpReq.Header.Set("Content-Type", "application/json")
    httpReq.Header.Set("x-api-key", c.apiKey)
    httpReq.Header.Set("anthropic-version", "2023-06-01")

    resp, err := c.httpClient.Do(httpReq)
    if err != nil {
        return nil, fmt.Errorf("send request: %w", err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusOK {
        var errResp struct {
            Error struct {
                Message string `json:"message"`
            } `json:"error"`
        }
        json.NewDecoder(resp.Body).Decode(&errResp)
        return nil, fmt.Errorf("API error %d: %s", resp.StatusCode, errResp.Error.Message)
    }

    var msgResp MessageResponse
    if err := json.NewDecoder(resp.Body).Decode(&msgResp); err != nil {
        return nil, fmt.Errorf("decode response: %w", err)
    }

    return &msgResp, nil
}

Tool Definitions and Implementations

// tools.go
package agent

import (
    "context"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "net/url"
    "os"
    "strings"
    "time"
)

// ToolHandler is the type for tool execution functions
type ToolHandler func(ctx context.Context, input json.RawMessage) (string, error)

// ToolRegistry manages all tools
type ToolRegistry struct {
    defs     []ToolDef
    handlers map[string]ToolHandler
}

func NewToolRegistry() *ToolRegistry {
    return &ToolRegistry{
        handlers: make(map[string]ToolHandler),
    }
}

func (r *ToolRegistry) Register(def ToolDef, handler ToolHandler) {
    r.defs = append(r.defs, def)
    r.handlers[def.Name] = handler
}

func (r *ToolRegistry) Execute(ctx context.Context, name string, input json.RawMessage) (string, error) {
    handler, ok := r.handlers[name]
    if !ok {
        return "", fmt.Errorf("unknown tool: %s", name)
    }
    return handler(ctx, input)
}

func (r *ToolRegistry) Definitions() []ToolDef {
    return r.defs
}

// --- Tool Implementations ---

// SearchWebInput is the input for the search_web tool
type SearchWebInput struct {
    Query      string `json:"query"`
    NumResults int    `json:"num_results"`
}

// SearchWebHandler implements web search using the DuckDuckGo Instant Answer API
func SearchWebHandler(ctx context.Context, input json.RawMessage) (string, error) {
    var req SearchWebInput
    if err := json.Unmarshal(input, &req); err != nil {
        return "", fmt.Errorf("invalid input: %w", err)
    }
    if req.NumResults == 0 {
        req.NumResults = 5
    }

    apiURL := fmt.Sprintf("https://api.duckduckgo.com/?q=%s&format=json&no_html=1&skip_disambig=1",
        url.QueryEscape(req.Query))

    httpReq, err := http.NewRequestWithContext(ctx, "GET", apiURL, nil)
    if err != nil {
        return "", err
    }
    httpReq.Header.Set("User-Agent", "GoAgent/1.0")

    client := &http.Client{Timeout: 10 * time.Second}
    resp, err := client.Do(httpReq)
    if err != nil {
        return "", fmt.Errorf("search request failed: %w", err)
    }
    defer resp.Body.Close()

    var result struct {
        Abstract       string `json:"Abstract"`
        AbstractSource string `json:"AbstractSource"`
        AbstractURL    string `json:"AbstractURL"`
        RelatedTopics  []struct {
            Text     string `json:"Text"`
            FirstURL string `json:"FirstURL"`
        } `json:"RelatedTopics"`
    }

    if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
        return "", fmt.Errorf("decode response: %w", err)
    }

    var sb strings.Builder
    sb.WriteString(fmt.Sprintf("Search results for: %s\n\n", req.Query))

    if result.Abstract != "" {
        sb.WriteString(fmt.Sprintf("Summary: %s\nSource: %s\nURL: %s\n\n",
            result.Abstract, result.AbstractSource, result.AbstractURL))
    }

    count := 0
    for _, topic := range result.RelatedTopics {
        if count >= req.NumResults {
            break
        }
        if topic.Text != "" {
            sb.WriteString(fmt.Sprintf("%d. %s\n   URL: %s\n\n", count+1, topic.Text, topic.FirstURL))
            count++
        }
    }

    if sb.Len() == 0 {
        return fmt.Sprintf("No results found for query: %s", req.Query), nil
    }
    return sb.String(), nil
}

// ReadFileInput is the input for the read_file tool
type ReadFileInput struct {
    Path     string `json:"path"`
    MaxLines int    `json:"max_lines"`
}

// ReadFileHandler reads a local file
func ReadFileHandler(ctx context.Context, input json.RawMessage) (string, error) {
    var req ReadFileInput
    if err := json.Unmarshal(input, &req); err != nil {
        return "", fmt.Errorf("invalid input: %w", err)
    }
    if req.MaxLines == 0 {
        req.MaxLines = 500
    }

    // Security check: prevent path traversal
    if strings.Contains(req.Path, "..") {
        return "", fmt.Errorf("path traversal not allowed")
    }

    f, err := os.Open(req.Path)
    if err != nil {
        return "", fmt.Errorf("open file: %w", err)
    }
    defer f.Close()

    content, err := io.ReadAll(io.LimitReader(f, 1<<20)) // read at most 1MB
    if err != nil {
        return "", fmt.Errorf("read file: %w", err)
    }

    lines := strings.Split(string(content), "\n")
    if len(lines) > req.MaxLines {
        lines = lines[:req.MaxLines]
        lines = append(lines, fmt.Sprintf("\n... (truncated, showing first %d lines)", req.MaxLines))
    }

    return strings.Join(lines, "\n"), nil
}

// BuildDefaultRegistry creates a tool registry with the default tool set
func BuildDefaultRegistry() *ToolRegistry {
    registry := NewToolRegistry()

    registry.Register(ToolDef{
        Name:        "search_web",
        Description: "Search the web for current information. Use when you need up-to-date facts, news, or technical documentation not in your training data.",
        InputSchema: json.RawMessage(`{
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query"},
                "num_results": {"type": "integer", "description": "Number of results (default: 5, max: 20)"}
            },
            "required": ["query"]
        }`),
    }, SearchWebHandler)

    registry.Register(ToolDef{
        Name:        "read_file",
        Description: "Read the contents of a local file. Use to access code files, configuration, or documents on disk.",
        InputSchema: json.RawMessage(`{
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "Absolute path to the file"},
                "max_lines": {"type": "integer", "description": "Maximum lines to read (default: 500)"}
            },
            "required": ["path"]
        }`),
    }, ReadFileHandler)

    return registry
}

Agent Core Logic

// agent.go
package agent

import (
    "context"
    "encoding/json"
    "fmt"
    "sync"
)

const (
    MaxIterations = 20
    SystemPrompt  = `You are a helpful research assistant with access to tools for searching the web, reading files, executing code, and sending emails.

When given a task:
1. Break it down into steps
2. Use tools to gather information and take actions
3. Synthesize results into a clear, well-structured response
4. If asked to write a report, use markdown formatting

Always explain what you're doing and why before using a tool.`
)

// Agent encapsulates the complete agent logic
type Agent struct {
    client   *Client
    registry *ToolRegistry
    history  []Message
    mu       sync.Mutex

    // Statistics
    totalInputTokens  int
    totalOutputTokens int
    iterationCount    int
}

func NewAgent(client *Client, registry *ToolRegistry) *Agent {
    return &Agent{
        client:   client,
        registry: registry,
    }
}

// Run processes a single user request and returns the final response
func (a *Agent) Run(ctx context.Context, userMessage string) (string, error) {
    a.mu.Lock()
    defer a.mu.Unlock()

    // Add user message to history
    a.history = append(a.history, Message{
        Role:    "user",
        Content: []ContentBlock{{Type: "text", Text: userMessage}},
    })

    // Main loop
    for iteration := 0; iteration < MaxIterations; iteration++ {
        a.iterationCount++

        req := MessageRequest{
            Model:     DefaultModel,
            MaxTokens: MaxTokens,
            System:    SystemPrompt,
            Messages:  a.history,
            Tools:     a.registry.Definitions(),
        }

        resp, err := a.client.SendMessage(ctx, req)
        if err != nil {
            return "", fmt.Errorf("iteration %d: %w", iteration, err)
        }

        a.totalInputTokens += resp.Usage.InputTokens
        a.totalOutputTokens += resp.Usage.OutputTokens

        // Add Claude's response to history
        a.history = append(a.history, Message{
            Role:    "assistant",
            Content: resp.Content,
        })

        // If no tool calls, return the final answer
        if resp.StopReason == "end_turn" || !hasToolUse(resp.Content) {
            return extractText(resp.Content), nil
        }

        // Execute all tool calls (in parallel)
        toolResults, err := a.executeTools(ctx, resp.Content)
        if err != nil {
            return "", fmt.Errorf("execute tools: %w", err)
        }

        // Add tool results as a user message
        a.history = append(a.history, Message{
            Role:    "user",
            Content: toolResults,
        })
    }

    return "", fmt.Errorf("exceeded maximum iterations (%d)", MaxIterations)
}

// executeTools executes all tool calls in parallel
func (a *Agent) executeTools(ctx context.Context, blocks []ContentBlock) ([]ContentBlock, error) {
    var toolUses []ContentBlock
    for _, block := range blocks {
        if block.Type == "tool_use" {
            toolUses = append(toolUses, block)
        }
    }

    results := make([]ContentBlock, len(toolUses))
    var wg sync.WaitGroup

    for i, toolUse := range toolUses {
        wg.Add(1)
        go func(idx int, tu ContentBlock) {
            defer wg.Done()

            output, err := a.registry.Execute(ctx, tu.Name, tu.Input)
            result := ContentBlock{
                Type:      "tool_result",
                ToolUseID: tu.ID,
            }

            if err != nil {
                result.IsError = true
                result.Content = fmt.Sprintf("Error: %s", err.Error())
            } else {
                result.Content = output
            }
            results[idx] = result
        }(i, toolUse)
    }

    wg.Wait()
    return results, nil
}

// Stats returns the Agent's runtime statistics
func (a *Agent) Stats() map[string]int {
    return map[string]int{
        "total_input_tokens":  a.totalInputTokens,
        "total_output_tokens": a.totalOutputTokens,
        "iterations":          a.iterationCount,
        "history_length":      len(a.history),
    }
}

// Reset clears the conversation history (starts a new conversation)
func (a *Agent) Reset() {
    a.mu.Lock()
    defer a.mu.Unlock()
    a.history = nil
    a.totalInputTokens = 0
    a.totalOutputTokens = 0
    a.iterationCount = 0
}

func hasToolUse(blocks []ContentBlock) bool {
    for _, b := range blocks {
        if b.Type == "tool_use" {
            return true
        }
    }
    return false
}

func extractText(blocks []ContentBlock) string {
    var parts []string
    for _, b := range blocks {
        if b.Type == "text" && b.Text != "" {
            parts = append(parts, b.Text)
        }
    }
    result := ""
    for i, p := range parts {
        if i > 0 {
            result += "\n"
        }
        result += p
    }
    return result
}

Main Program: Research Agent Example

// main.go
package main

import (
    "context"
    "fmt"
    "log"
    "strings"
    "time"

    "github.com/yourorg/agent"
)

func main() {
    client := agent.NewClient()
    registry := agent.BuildDefaultRegistry()
    a := agent.NewAgent(client, registry)

    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
    defer cancel()

    task := `Please research the current state of Go generics (as of 2024) and write a comprehensive report covering:
1. Key features and syntax
2. Common use cases and patterns
3. Performance implications
4. Community adoption status

Search the web for recent information, then compile a structured markdown report.
Finally, send the report to [email protected] with subject "Go Generics Report 2024".`

    fmt.Println("Starting research agent...")
    fmt.Println(strings.Repeat("-", 60))

    result, err := a.Run(ctx, task)
    if err != nil {
        log.Fatalf("Agent error: %v", err)
    }

    fmt.Println("\nFinal Result:")
    fmt.Println(result)

    stats := a.Stats()
    fmt.Printf("\nStats: %d iterations, %d input tokens, %d output tokens\n",
        stats["iterations"], stats["total_input_tokens"], stats["total_output_tokens"])
}

L4: Advanced โ€” Multi-Agent Orchestration, Memory, and Evaluation

Multi-Agent Orchestration (Subagent Delegation)

A single Agent has limited capability. For complex tasks, we can decompose work across multiple specialized sub-Agents:

// orchestrator.go - Multi-Agent orchestrator

type OrchestratorAgent struct {
    *Agent
    subagents map[string]*Agent
}

// DelegateToSubagent delegates a task to a named sub-Agent
func (o *OrchestratorAgent) DelegateToSubagent(ctx context.Context, name, task string) (string, error) {
    subagent, ok := o.subagents[name]
    if !ok {
        return "", fmt.Errorf("subagent %s not found", name)
    }
    // Sub-Agents have their own independent conversation history and tool sets
    subagent.Reset()
    return subagent.Run(ctx, task)
}

// The orchestrator can define a delegate_task tool
func buildDelegateTool(orchestrator *OrchestratorAgent) ToolHandler {
    return func(ctx context.Context, input json.RawMessage) (string, error) {
        var req struct {
            AgentName string `json:"agent_name"`
            Task      string `json:"task"`
        }
        if err := json.Unmarshal(input, &req); err != nil {
            return "", err
        }
        return orchestrator.DelegateToSubagent(ctx, req.AgentName, req.Task)
    }
}

The ReAct Pattern (Reasoning + Acting)

ReAct is a pattern that forces the Agent to reason explicitly before acting, implemented through a system prompt:

const ReActSystemPrompt = `You are a ReAct agent. For each step, follow this format:

Thought: [Analyze the current situation and decide what to do next]
Action: [The tool to call and why]
Observation: [What you learned from the tool result]

Repeat until you have enough information to provide a final answer.

Final Answer: [Your comprehensive response]`

This pattern makes the Agent's reasoning process visible, auditable, and far easier to debug.

Long-running Agents need a memory mechanism. Combining a vector database enables semantic memory:

type AgentMemory struct {
    embedder  EmbeddingClient
    store     VectorStore
    maxTokens int
}

func (m *AgentMemory) Store(ctx context.Context, content string) error {
    embedding, err := m.embedder.Embed(ctx, content)
    if err != nil {
        return err
    }
    return m.store.Insert(ctx, embedding, content)
}

func (m *AgentMemory) Recall(ctx context.Context, query string, k int) ([]string, error) {
    queryEmbedding, err := m.embedder.Embed(ctx, query)
    if err != nil {
        return nil, err
    }
    results, err := m.store.Search(ctx, queryEmbedding, k)
    if err != nil {
        return nil, err
    }
    var memories []string
    for _, r := range results {
        memories = append(memories, r.Content)
    }
    return memories, nil
}

// Inject relevant memories into the Agent's system prompt
func (a *Agent) buildSystemPromptWithMemory(ctx context.Context, query string) (string, error) {
    memories, err := a.memory.Recall(ctx, query, 5)
    if err != nil {
        return BaseSystemPrompt, nil // degrade gracefully on failure
    }
    if len(memories) == 0 {
        return BaseSystemPrompt, nil
    }
    memoryContext := "Relevant memories from past conversations:\n"
    for _, m := range memories {
        memoryContext += "- " + m + "\n"
    }
    return BaseSystemPrompt + "\n\n" + memoryContext, nil
}

Cost Control Strategies

In production environments, controlling Agent costs is critical:

type CostController struct {
    maxIterations   int
    maxInputTokens  int
    maxOutputTokens int
    modelSelector   func(estimatedComplexity int) string
}

// Dynamic model selection: Haiku for simple tasks, Opus for complex ones
func adaptiveModelSelector(estimatedComplexity int) string {
    switch {
    case estimatedComplexity < 3:
        return "claude-haiku-4-5" // Fast, cheap
    case estimatedComplexity < 7:
        return "claude-sonnet-4-5" // Balanced
    default:
        return "claude-opus-4-5" // Most capable
    }
}

// Token budget tracking
type TokenBudget struct {
    MaxTotal int
    Used     int
}

func (b *TokenBudget) Remaining() int {
    return b.MaxTotal - b.Used
}

func (b *TokenBudget) Consume(tokens int) error {
    if b.Used+tokens > b.MaxTotal {
        return fmt.Errorf("token budget exceeded: used %d, limit %d", b.Used+tokens, b.MaxTotal)
    }
    b.Used += tokens
    return nil
}

Agent Evaluation Framework

Evaluating Agents is more complex than evaluating single LLM calls. You must consider the entire trajectory:

type AgentEvaluator struct {
    testCases []AgentTestCase
}

type AgentTestCase struct {
    Name          string
    UserMessage   string
    ExpectedTools []string                 // Expected set of tools to be called
    EvalFn        func(result string) bool // Function to evaluate the final result
    MaxIterations int
    MaxCost       float64
}

func (e *AgentEvaluator) RunBenchmark(ctx context.Context, agentFactory func() *Agent) BenchmarkResult {
    var results []TestResult
    for _, tc := range e.testCases {
        a := agentFactory()
        start := time.Now()
        result, err := a.Run(ctx, tc.UserMessage)
        duration := time.Since(start)

        stats := a.Stats()
        passed := err == nil && tc.EvalFn(result)

        results = append(results, TestResult{
            Name:       tc.Name,
            Passed:     passed,
            Duration:   duration,
            Iterations: stats["iterations"],
            Tokens:     stats["total_input_tokens"] + stats["total_output_tokens"],
        })
    }
    return BenchmarkResult{Tests: results}
}

Sandboxed Code Execution

Code execution in production must occur in an isolated environment:

// Using Docker for true sandboxed execution
func DockerCodeExecutor(ctx context.Context, input json.RawMessage) (string, error) {
    var req ExecuteCodeInput
    if err := json.Unmarshal(input, &req); err != nil {
        return "", err
    }

    // Docker flags: resource limits, network isolation, read-only filesystem
    cmd := exec.CommandContext(ctx, "docker", "run",
        "--rm",                  // Delete container after execution
        "--network=none",        // No network access
        "--memory=256m",         // Memory limit
        "--cpus=0.5",            // CPU limit
        "--read-only",           // Read-only filesystem
        "--tmpfs=/tmp:size=64m", // Temporary directory
        "-i",
        fmt.Sprintf("code-sandbox-%s:latest", req.Language),
    )

    cmd.Stdin = strings.NewReader(req.Code)
    output, err := cmd.CombinedOutput()
    if err != nil {
        return "", fmt.Errorf("execution failed: %s\nOutput: %s", err, output)
    }

    // Truncate overly long output
    if len(output) > 10000 {
        output = append(output[:10000], []byte("\n... (output truncated)")...)
    }

    return string(output), nil
}

Summary

Building a production-grade Go AI Agent requires deep understanding of:

  1. Protocol layer: Anthropic Tool Use JSON specification, message structure, stop_reason handling
  2. Concurrent execution: Using Go goroutines to execute multiple tool calls in parallel, dramatically reducing latency
  3. History management: Correctly maintaining conversation history, handling token limits, implementing summary compression
  4. Security: Sandbox isolation for tool execution, path traversal protection, resource limits
  5. Observability: Iteration counts, token consumption, tool call logs, error tracing
  6. Cost control: Dynamic model selection, token budgets, maximum iteration limits

Agent technology is still rapidly evolving, but the core principles described above will remain valid for a considerable time. Master these principles, and you have mastered the ability to build arbitrarily complex AI applications.

Rate this chapter
4.6  / 5  (3 ratings)

๐Ÿ’ฌ Comments