Building AI Agents: Tool Use and Multi-Turn Dialogue
Building AI Agents: Tool Use and Multi-Turn Dialogue
L1: Concept — What Is an AI Agent and Why Tool Use Is the Enabling Technology
From Automation Scripts to Autonomous Agents
Throughout the history of software engineering, automation has been the primary lever for improving efficiency. But traditional automation has a fundamental limitation: it can only execute pre-written, fixed logic. When the external environment changes or unexpected situations arise, traditional scripts fail or produce incorrect results.
AI Agents represent a paradigm shift. Unlike fixed scripts, an Agent can:
- Perceive: Receive inputs from the external world — user messages, file contents, API responses, sensor data
- Reason: Based on perceived information combined with internal knowledge, formulate a plan of action
- Act: Execute tool calls, write files, send requests, interact with external systems
- Observe: Watch the results of actions and incorporate them into the next round of reasoning
This "Perceive → Reason → Act → Observe" loop is the essence of an Agent. It mirrors, in a striking way, how a human engineer handles complex tasks: you receive a requirement, think about how to approach it, implement a solution, examine the results, and adjust.
Why Tool Use Is Essential
Large Language Models (LLMs) are powerful reasoning engines, but they have several intrinsic limitations:
Knowledge cutoff: Claude's training data has a cutoff date. It doesn't know what happened in today's news and cannot tell you the current stock price.
No side effects: An LLM can only generate text. It cannot actually send an email, read a file from your disk, or call your company's internal API.
Limited computational precision: For exact mathematical calculations, like computing 1234567 × 9876543, an LLM may make errors. It excels at reasoning, not precise computation.
Tool Use is the technology that resolves these limitations. Through tool calls, an LLM can:
- Call a search engine for real-time information
- Read and write the local filesystem
- Execute code and obtain exact results
- Interact with arbitrary REST APIs
- Operate databases
Tool Use transforms an LLM from a "consultant who only talks" into an "engineer who can actually get things done." This is a qualitative change, not merely quantitative.
Agent Application Scenarios
Having understood the essence of Agents, let's examine some typical applications:
Research Assistant: Given a research topic, the Agent automatically searches relevant papers, reads key articles, synthesizes information, and generates a structured report. This process may require dozens of tool calls.
Code Assistant: Understands codebase structure, locates bugs, writes fix code, runs tests, until tests pass. Every step requires autonomous Agent decision-making.
Data Analysis Agent: Accepts natural-language-described data analysis requirements, automatically writes SQL queries, calls the database, visualizes results, and composes an analysis report.
Customer Service Agent: Understands customer issues, queries the order system, processes refund requests, sends confirmation emails — the entire workflow without human intervention.
L2: Principles — The Anthropic Tool Use Protocol and Conversation Management
Tool Use Protocol in Detail
Anthropic's Tool Use protocol is a carefully designed JSON specification that defines how LLMs interact with external tools. Understanding every detail of this protocol is foundational to building robust Agents.
Tool Definition
Each tool is defined by three core fields:
{
"name": "search_web",
"description": "Search the web for current information. Use this when you need up-to-date information that may not be in your training data. Returns a list of search results with titles, URLs, and snippets.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query to execute"
},
"num_results": {
"type": "integer",
"description": "Number of results to return (default: 5, max: 20)",
"default": 5
}
},
"required": ["query"]
}
}
The name field is the tool's unique identifier. Names should be clear, descriptive, and LLM-friendly. Avoid abbreviations or obscure names.
The description field is critically important — it directly influences when and how the LLM uses this tool. Descriptions should:
- State what the tool does
- Explain when it should be used (trigger conditions)
- Describe what type of data it returns
- Note its limitations and caveats
The input_schema is a standard JSON Schema defining the parameters the tool accepts. Each field's description is equally important — the LLM relies on these descriptions to correctly populate arguments.
The tool_use Content Block
When Claude decides to call a tool, it returns a tool_use content block in its response:
{
"type": "tool_use",
"id": "toolu_01A09q90qw90lq917835lq9",
"name": "search_web",
"input": {
"query": "Go programming language generics tutorial 2024",
"num_results": 5
}
}
The id field is a unique identifier for this tool call, needed when returning the tool result. A single response may contain multiple tool_use blocks, representing parallel tool calls — this is an important performance optimization point.
Tool Result Responses
After executing a tool, you return the result to Claude in a specific format:
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01A09q90qw90lq917835lq9",
"content": "Search results:\n1. Go Generics Tutorial - go.dev/doc/tutorial/generics\n..."
}
]
}
Note: tool_result is sent with the user role, because it is information returned to Claude from "the external world."
Error Handling
When tool execution fails, return a result with is_error: true:
{
"type": "tool_result",
"tool_use_id": "toolu_01A09q90qw90lq917835lq9",
"is_error": true,
"content": "Error: Network timeout after 30 seconds"
}
Claude will decide how to proceed based on the error information — it may retry, try a different approach, or report the error to the user.
Multi-Turn Conversation Management
The heart of an Agent is maintaining conversation history. Every round of interaction requires sending the complete history to Claude so that it can understand the current context.
The structure of conversation history:
[System Prompt]
|
[User Message 1]
[Claude Response 1] (may contain tool_use)
[Tool Result 1] (sent as user message)
[Claude Response 2] (may contain more tool_use or final answer)
[Tool Result 2]
...
[Claude Final Response]
This structure has several important constraints:
- Messages must strictly alternate: user → assistant → user → assistant...
- If an assistant message contains
tool_use, the next user message must contain the correspondingtool_result - A single user message can contain multiple
tool_resultblocks (for parallel calls)
Agent Main Loop Design
The Agent's main loop is the core control flow of the entire system:
Initialize (set system prompt, tool list)
↓
Receive user input
↓
[Main loop begins]
Send request to Claude API
↓
Receive response
↓
If response contains tool_use:
Execute all tools (potentially in parallel)
Add results to history
Continue main loop
↓
If response is end_turn (final answer):
Return answer to user
Wait for next user input
[Main loop ends]
Loop termination conditions:
stop_reason == "end_turn": Claude considers the task completestop_reason == "max_tokens": Token limit reached, needs truncation handling- Exceeded maximum iteration count (prevents infinite loops)
- Unrecoverable error in tool execution
L3: Code Practice — Building a Complete Go Agent
Project Structure
agent/
├── main.go # Entry point
├── agent.go # Core agent logic
├── tools.go # Tool definitions and implementations
├── client.go # Anthropic API client
└── history.go # Conversation history management
Client Wrapper
// client.go
package agent
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"os"
)
const (
AnthropicAPIURL = "https://api.anthropic.com/v1/messages"
DefaultModel = "claude-opus-4-5"
MaxTokens = 4096
)
type Client struct {
apiKey string
httpClient *http.Client
model string
}
func NewClient() *Client {
return &Client{
apiKey: os.Getenv("ANTHROPIC_API_KEY"),
httpClient: &http.Client{},
model: DefaultModel,
}
}
// MessageRequest represents the complete request sent to Claude
type MessageRequest struct {
Model string `json:"model"`
MaxTokens int `json:"max_tokens"`
System string `json:"system,omitempty"`
Messages []Message `json:"messages"`
Tools []ToolDef `json:"tools,omitempty"`
}
// Message represents a single message in the dialogue
type Message struct {
Role string `json:"role"`
Content []ContentBlock `json:"content"`
}
// ContentBlock can be text, a tool call, or a tool result
type ContentBlock struct {
Type string `json:"type"`
Text string `json:"text,omitempty"`
ID string `json:"id,omitempty"`
Name string `json:"name,omitempty"`
Input json.RawMessage `json:"input,omitempty"`
ToolUseID string `json:"tool_use_id,omitempty"`
Content string `json:"content,omitempty"`
IsError bool `json:"is_error,omitempty"`
}
// MessageResponse is the response from the Claude API
type MessageResponse struct {
ID string `json:"id"`
Type string `json:"type"`
Role string `json:"role"`
Content []ContentBlock `json:"content"`
Model string `json:"model"`
StopReason string `json:"stop_reason"`
Usage Usage `json:"usage"`
}
type Usage struct {
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
}
// ToolDef defines a single tool
type ToolDef struct {
Name string `json:"name"`
Description string `json:"description"`
InputSchema json.RawMessage `json:"input_schema"`
}
func (c *Client) SendMessage(ctx context.Context, req MessageRequest) (*MessageResponse, error) {
body, err := json.Marshal(req)
if err != nil {
return nil, fmt.Errorf("marshal request: %w", err)
}
httpReq, err := http.NewRequestWithContext(ctx, "POST", AnthropicAPIURL, bytes.NewReader(body))
if err != nil {
return nil, fmt.Errorf("create request: %w", err)
}
httpReq.Header.Set("Content-Type", "application/json")
httpReq.Header.Set("x-api-key", c.apiKey)
httpReq.Header.Set("anthropic-version", "2023-06-01")
resp, err := c.httpClient.Do(httpReq)
if err != nil {
return nil, fmt.Errorf("send request: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
var errResp struct {
Error struct {
Message string `json:"message"`
} `json:"error"`
}
json.NewDecoder(resp.Body).Decode(&errResp)
return nil, fmt.Errorf("API error %d: %s", resp.StatusCode, errResp.Error.Message)
}
var msgResp MessageResponse
if err := json.NewDecoder(resp.Body).Decode(&msgResp); err != nil {
return nil, fmt.Errorf("decode response: %w", err)
}
return &msgResp, nil
}
Tool Definitions and Implementations
// tools.go
package agent
import (
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"net/url"
"os"
"strings"
"time"
)
// ToolHandler is the type for tool execution functions
type ToolHandler func(ctx context.Context, input json.RawMessage) (string, error)
// ToolRegistry manages all tools
type ToolRegistry struct {
defs []ToolDef
handlers map[string]ToolHandler
}
func NewToolRegistry() *ToolRegistry {
return &ToolRegistry{
handlers: make(map[string]ToolHandler),
}
}
func (r *ToolRegistry) Register(def ToolDef, handler ToolHandler) {
r.defs = append(r.defs, def)
r.handlers[def.Name] = handler
}
func (r *ToolRegistry) Execute(ctx context.Context, name string, input json.RawMessage) (string, error) {
handler, ok := r.handlers[name]
if !ok {
return "", fmt.Errorf("unknown tool: %s", name)
}
return handler(ctx, input)
}
func (r *ToolRegistry) Definitions() []ToolDef {
return r.defs
}
// --- Tool Implementations ---
// SearchWebInput is the input for the search_web tool
type SearchWebInput struct {
Query string `json:"query"`
NumResults int `json:"num_results"`
}
// SearchWebHandler implements web search using the DuckDuckGo Instant Answer API
func SearchWebHandler(ctx context.Context, input json.RawMessage) (string, error) {
var req SearchWebInput
if err := json.Unmarshal(input, &req); err != nil {
return "", fmt.Errorf("invalid input: %w", err)
}
if req.NumResults == 0 {
req.NumResults = 5
}
apiURL := fmt.Sprintf("https://api.duckduckgo.com/?q=%s&format=json&no_html=1&skip_disambig=1",
url.QueryEscape(req.Query))
httpReq, err := http.NewRequestWithContext(ctx, "GET", apiURL, nil)
if err != nil {
return "", err
}
httpReq.Header.Set("User-Agent", "GoAgent/1.0")
client := &http.Client{Timeout: 10 * time.Second}
resp, err := client.Do(httpReq)
if err != nil {
return "", fmt.Errorf("search request failed: %w", err)
}
defer resp.Body.Close()
var result struct {
Abstract string `json:"Abstract"`
AbstractSource string `json:"AbstractSource"`
AbstractURL string `json:"AbstractURL"`
RelatedTopics []struct {
Text string `json:"Text"`
FirstURL string `json:"FirstURL"`
} `json:"RelatedTopics"`
}
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return "", fmt.Errorf("decode response: %w", err)
}
var sb strings.Builder
sb.WriteString(fmt.Sprintf("Search results for: %s\n\n", req.Query))
if result.Abstract != "" {
sb.WriteString(fmt.Sprintf("Summary: %s\nSource: %s\nURL: %s\n\n",
result.Abstract, result.AbstractSource, result.AbstractURL))
}
count := 0
for _, topic := range result.RelatedTopics {
if count >= req.NumResults {
break
}
if topic.Text != "" {
sb.WriteString(fmt.Sprintf("%d. %s\n URL: %s\n\n", count+1, topic.Text, topic.FirstURL))
count++
}
}
if sb.Len() == 0 {
return fmt.Sprintf("No results found for query: %s", req.Query), nil
}
return sb.String(), nil
}
// ReadFileInput is the input for the read_file tool
type ReadFileInput struct {
Path string `json:"path"`
MaxLines int `json:"max_lines"`
}
// ReadFileHandler reads a local file
func ReadFileHandler(ctx context.Context, input json.RawMessage) (string, error) {
var req ReadFileInput
if err := json.Unmarshal(input, &req); err != nil {
return "", fmt.Errorf("invalid input: %w", err)
}
if req.MaxLines == 0 {
req.MaxLines = 500
}
// Security check: prevent path traversal
if strings.Contains(req.Path, "..") {
return "", fmt.Errorf("path traversal not allowed")
}
f, err := os.Open(req.Path)
if err != nil {
return "", fmt.Errorf("open file: %w", err)
}
defer f.Close()
content, err := io.ReadAll(io.LimitReader(f, 1<<20)) // read at most 1MB
if err != nil {
return "", fmt.Errorf("read file: %w", err)
}
lines := strings.Split(string(content), "\n")
if len(lines) > req.MaxLines {
lines = lines[:req.MaxLines]
lines = append(lines, fmt.Sprintf("\n... (truncated, showing first %d lines)", req.MaxLines))
}
return strings.Join(lines, "\n"), nil
}
// BuildDefaultRegistry creates a tool registry with the default tool set
func BuildDefaultRegistry() *ToolRegistry {
registry := NewToolRegistry()
registry.Register(ToolDef{
Name: "search_web",
Description: "Search the web for current information. Use when you need up-to-date facts, news, or technical documentation not in your training data.",
InputSchema: json.RawMessage(`{
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"},
"num_results": {"type": "integer", "description": "Number of results (default: 5, max: 20)"}
},
"required": ["query"]
}`),
}, SearchWebHandler)
registry.Register(ToolDef{
Name: "read_file",
Description: "Read the contents of a local file. Use to access code files, configuration, or documents on disk.",
InputSchema: json.RawMessage(`{
"type": "object",
"properties": {
"path": {"type": "string", "description": "Absolute path to the file"},
"max_lines": {"type": "integer", "description": "Maximum lines to read (default: 500)"}
},
"required": ["path"]
}`),
}, ReadFileHandler)
return registry
}
Agent Core Logic
// agent.go
package agent
import (
"context"
"encoding/json"
"fmt"
"sync"
)
const (
MaxIterations = 20
SystemPrompt = `You are a helpful research assistant with access to tools for searching the web, reading files, executing code, and sending emails.
When given a task:
1. Break it down into steps
2. Use tools to gather information and take actions
3. Synthesize results into a clear, well-structured response
4. If asked to write a report, use markdown formatting
Always explain what you're doing and why before using a tool.`
)
// Agent encapsulates the complete agent logic
type Agent struct {
client *Client
registry *ToolRegistry
history []Message
mu sync.Mutex
// Statistics
totalInputTokens int
totalOutputTokens int
iterationCount int
}
func NewAgent(client *Client, registry *ToolRegistry) *Agent {
return &Agent{
client: client,
registry: registry,
}
}
// Run processes a single user request and returns the final response
func (a *Agent) Run(ctx context.Context, userMessage string) (string, error) {
a.mu.Lock()
defer a.mu.Unlock()
// Add user message to history
a.history = append(a.history, Message{
Role: "user",
Content: []ContentBlock{{Type: "text", Text: userMessage}},
})
// Main loop
for iteration := 0; iteration < MaxIterations; iteration++ {
a.iterationCount++
req := MessageRequest{
Model: DefaultModel,
MaxTokens: MaxTokens,
System: SystemPrompt,
Messages: a.history,
Tools: a.registry.Definitions(),
}
resp, err := a.client.SendMessage(ctx, req)
if err != nil {
return "", fmt.Errorf("iteration %d: %w", iteration, err)
}
a.totalInputTokens += resp.Usage.InputTokens
a.totalOutputTokens += resp.Usage.OutputTokens
// Add Claude's response to history
a.history = append(a.history, Message{
Role: "assistant",
Content: resp.Content,
})
// If no tool calls, return the final answer
if resp.StopReason == "end_turn" || !hasToolUse(resp.Content) {
return extractText(resp.Content), nil
}
// Execute all tool calls (in parallel)
toolResults, err := a.executeTools(ctx, resp.Content)
if err != nil {
return "", fmt.Errorf("execute tools: %w", err)
}
// Add tool results as a user message
a.history = append(a.history, Message{
Role: "user",
Content: toolResults,
})
}
return "", fmt.Errorf("exceeded maximum iterations (%d)", MaxIterations)
}
// executeTools executes all tool calls in parallel
func (a *Agent) executeTools(ctx context.Context, blocks []ContentBlock) ([]ContentBlock, error) {
var toolUses []ContentBlock
for _, block := range blocks {
if block.Type == "tool_use" {
toolUses = append(toolUses, block)
}
}
results := make([]ContentBlock, len(toolUses))
var wg sync.WaitGroup
for i, toolUse := range toolUses {
wg.Add(1)
go func(idx int, tu ContentBlock) {
defer wg.Done()
output, err := a.registry.Execute(ctx, tu.Name, tu.Input)
result := ContentBlock{
Type: "tool_result",
ToolUseID: tu.ID,
}
if err != nil {
result.IsError = true
result.Content = fmt.Sprintf("Error: %s", err.Error())
} else {
result.Content = output
}
results[idx] = result
}(i, toolUse)
}
wg.Wait()
return results, nil
}
// Stats returns the Agent's runtime statistics
func (a *Agent) Stats() map[string]int {
return map[string]int{
"total_input_tokens": a.totalInputTokens,
"total_output_tokens": a.totalOutputTokens,
"iterations": a.iterationCount,
"history_length": len(a.history),
}
}
// Reset clears the conversation history (starts a new conversation)
func (a *Agent) Reset() {
a.mu.Lock()
defer a.mu.Unlock()
a.history = nil
a.totalInputTokens = 0
a.totalOutputTokens = 0
a.iterationCount = 0
}
func hasToolUse(blocks []ContentBlock) bool {
for _, b := range blocks {
if b.Type == "tool_use" {
return true
}
}
return false
}
func extractText(blocks []ContentBlock) string {
var parts []string
for _, b := range blocks {
if b.Type == "text" && b.Text != "" {
parts = append(parts, b.Text)
}
}
result := ""
for i, p := range parts {
if i > 0 {
result += "\n"
}
result += p
}
return result
}
Main Program: Research Agent Example
// main.go
package main
import (
"context"
"fmt"
"log"
"strings"
"time"
"github.com/yourorg/agent"
)
func main() {
client := agent.NewClient()
registry := agent.BuildDefaultRegistry()
a := agent.NewAgent(client, registry)
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
task := `Please research the current state of Go generics (as of 2024) and write a comprehensive report covering:
1. Key features and syntax
2. Common use cases and patterns
3. Performance implications
4. Community adoption status
Search the web for recent information, then compile a structured markdown report.
Finally, send the report to [email protected] with subject "Go Generics Report 2024".`
fmt.Println("Starting research agent...")
fmt.Println(strings.Repeat("-", 60))
result, err := a.Run(ctx, task)
if err != nil {
log.Fatalf("Agent error: %v", err)
}
fmt.Println("\nFinal Result:")
fmt.Println(result)
stats := a.Stats()
fmt.Printf("\nStats: %d iterations, %d input tokens, %d output tokens\n",
stats["iterations"], stats["total_input_tokens"], stats["total_output_tokens"])
}
L4: Advanced — Multi-Agent Orchestration, Memory, and Evaluation
Multi-Agent Orchestration (Subagent Delegation)
A single Agent has limited capability. For complex tasks, we can decompose work across multiple specialized sub-Agents:
// orchestrator.go - Multi-Agent orchestrator
type OrchestratorAgent struct {
*Agent
subagents map[string]*Agent
}
// DelegateToSubagent delegates a task to a named sub-Agent
func (o *OrchestratorAgent) DelegateToSubagent(ctx context.Context, name, task string) (string, error) {
subagent, ok := o.subagents[name]
if !ok {
return "", fmt.Errorf("subagent %s not found", name)
}
// Sub-Agents have their own independent conversation history and tool sets
subagent.Reset()
return subagent.Run(ctx, task)
}
// The orchestrator can define a delegate_task tool
func buildDelegateTool(orchestrator *OrchestratorAgent) ToolHandler {
return func(ctx context.Context, input json.RawMessage) (string, error) {
var req struct {
AgentName string `json:"agent_name"`
Task string `json:"task"`
}
if err := json.Unmarshal(input, &req); err != nil {
return "", err
}
return orchestrator.DelegateToSubagent(ctx, req.AgentName, req.Task)
}
}
The ReAct Pattern (Reasoning + Acting)
ReAct is a pattern that forces the Agent to reason explicitly before acting, implemented through a system prompt:
const ReActSystemPrompt = `You are a ReAct agent. For each step, follow this format:
Thought: [Analyze the current situation and decide what to do next]
Action: [The tool to call and why]
Observation: [What you learned from the tool result]
Repeat until you have enough information to provide a final answer.
Final Answer: [Your comprehensive response]`
This pattern makes the Agent's reasoning process visible, auditable, and far easier to debug.
Agent Memory with Vector Search
Long-running Agents need a memory mechanism. Combining a vector database enables semantic memory:
type AgentMemory struct {
embedder EmbeddingClient
store VectorStore
maxTokens int
}
func (m *AgentMemory) Store(ctx context.Context, content string) error {
embedding, err := m.embedder.Embed(ctx, content)
if err != nil {
return err
}
return m.store.Insert(ctx, embedding, content)
}
func (m *AgentMemory) Recall(ctx context.Context, query string, k int) ([]string, error) {
queryEmbedding, err := m.embedder.Embed(ctx, query)
if err != nil {
return nil, err
}
results, err := m.store.Search(ctx, queryEmbedding, k)
if err != nil {
return nil, err
}
var memories []string
for _, r := range results {
memories = append(memories, r.Content)
}
return memories, nil
}
// Inject relevant memories into the Agent's system prompt
func (a *Agent) buildSystemPromptWithMemory(ctx context.Context, query string) (string, error) {
memories, err := a.memory.Recall(ctx, query, 5)
if err != nil {
return BaseSystemPrompt, nil // degrade gracefully on failure
}
if len(memories) == 0 {
return BaseSystemPrompt, nil
}
memoryContext := "Relevant memories from past conversations:\n"
for _, m := range memories {
memoryContext += "- " + m + "\n"
}
return BaseSystemPrompt + "\n\n" + memoryContext, nil
}
Cost Control Strategies
In production environments, controlling Agent costs is critical:
type CostController struct {
maxIterations int
maxInputTokens int
maxOutputTokens int
modelSelector func(estimatedComplexity int) string
}
// Dynamic model selection: Haiku for simple tasks, Opus for complex ones
func adaptiveModelSelector(estimatedComplexity int) string {
switch {
case estimatedComplexity < 3:
return "claude-haiku-4-5" // Fast, cheap
case estimatedComplexity < 7:
return "claude-sonnet-4-5" // Balanced
default:
return "claude-opus-4-5" // Most capable
}
}
// Token budget tracking
type TokenBudget struct {
MaxTotal int
Used int
}
func (b *TokenBudget) Remaining() int {
return b.MaxTotal - b.Used
}
func (b *TokenBudget) Consume(tokens int) error {
if b.Used+tokens > b.MaxTotal {
return fmt.Errorf("token budget exceeded: used %d, limit %d", b.Used+tokens, b.MaxTotal)
}
b.Used += tokens
return nil
}
Agent Evaluation Framework
Evaluating Agents is more complex than evaluating single LLM calls. You must consider the entire trajectory:
type AgentEvaluator struct {
testCases []AgentTestCase
}
type AgentTestCase struct {
Name string
UserMessage string
ExpectedTools []string // Expected set of tools to be called
EvalFn func(result string) bool // Function to evaluate the final result
MaxIterations int
MaxCost float64
}
func (e *AgentEvaluator) RunBenchmark(ctx context.Context, agentFactory func() *Agent) BenchmarkResult {
var results []TestResult
for _, tc := range e.testCases {
a := agentFactory()
start := time.Now()
result, err := a.Run(ctx, tc.UserMessage)
duration := time.Since(start)
stats := a.Stats()
passed := err == nil && tc.EvalFn(result)
results = append(results, TestResult{
Name: tc.Name,
Passed: passed,
Duration: duration,
Iterations: stats["iterations"],
Tokens: stats["total_input_tokens"] + stats["total_output_tokens"],
})
}
return BenchmarkResult{Tests: results}
}
Sandboxed Code Execution
Code execution in production must occur in an isolated environment:
// Using Docker for true sandboxed execution
func DockerCodeExecutor(ctx context.Context, input json.RawMessage) (string, error) {
var req ExecuteCodeInput
if err := json.Unmarshal(input, &req); err != nil {
return "", err
}
// Docker flags: resource limits, network isolation, read-only filesystem
cmd := exec.CommandContext(ctx, "docker", "run",
"--rm", // Delete container after execution
"--network=none", // No network access
"--memory=256m", // Memory limit
"--cpus=0.5", // CPU limit
"--read-only", // Read-only filesystem
"--tmpfs=/tmp:size=64m", // Temporary directory
"-i",
fmt.Sprintf("code-sandbox-%s:latest", req.Language),
)
cmd.Stdin = strings.NewReader(req.Code)
output, err := cmd.CombinedOutput()
if err != nil {
return "", fmt.Errorf("execution failed: %s\nOutput: %s", err, output)
}
// Truncate overly long output
if len(output) > 10000 {
output = append(output[:10000], []byte("\n... (output truncated)")...)
}
return string(output), nil
}
Summary
Building a production-grade Go AI Agent requires deep understanding of:
- Protocol layer: Anthropic Tool Use JSON specification, message structure, stop_reason handling
- Concurrent execution: Using Go goroutines to execute multiple tool calls in parallel, dramatically reducing latency
- History management: Correctly maintaining conversation history, handling token limits, implementing summary compression
- Security: Sandbox isolation for tool execution, path traversal protection, resource limits
- Observability: Iteration counts, token consumption, tool call logs, error tracing
- Cost control: Dynamic model selection, token budgets, maximum iteration limits
Agent technology is still rapidly evolving, but the core principles described above will remain valid for a considerable time. Master these principles, and you have mastered the ability to build arbitrarily complex AI applications.