Chapter 7

Messages API Complete Parameter Reference: All 27 Parameters, Defaults, and Best Practices

Chapter 7: Messages API Complete Reference: Parameters, Versions, and Compatibility

7.1 API Overview

POST https://api.anthropic.com/v1/messages is the single core endpoint for the Claude API. Understanding each parameter's precise semantics—not just what it does, but what its defaults are, what its limits are, and what breaks when you misuse it—is the foundation of building reliable production systems.

Full Request Structure

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "messages": [...],
  "system": "...",
  "temperature": 1.0,
  "top_p": null,
  "top_k": null,
  "stop_sequences": [],
  "stream": false,
  "tools": [],
  "tool_choice": {"type": "auto"},
  "thinking": null,
  "metadata": {},
  "betas": []
}

Full Response Structure

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "..."},
    {"type": "tool_use", "id": "toolu_...", "name": "...", "input": {...}}
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 100,
    "output_tokens": 200,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0
  }
}

7.2 Required Parameters

`model`

The model ID string. Case-sensitive.

# Current model IDs (as of 2025)
AVAILABLE_MODELS = {
    "claude-opus-4-6": {
        "context_window": 200_000,
        "max_output_tokens": 32_000,
        "extended_thinking": True,
        "vision": True,
        "input_price_per_m": 15.00,
        "output_price_per_m": 75.00,
    },
    "claude-sonnet-4-6": {
        "context_window": 200_000,
        "max_output_tokens": 64_000,
        "extended_thinking": True,
        "vision": True,
        "input_price_per_m": 3.00,
        "output_price_per_m": 15.00,
    },
    "claude-haiku-4-5-20251001": {
        "context_window": 200_000,
        "max_output_tokens": 8_192,
        "extended_thinking": False,
        "vision": True,
        "input_price_per_m": 0.25,
        "output_price_per_m": 1.25,
    },
}

Version pinning: Model IDs without a date stamp (e.g., claude-sonnet-4-6) point to the current latest snapshot of that version, which may change. For production systems where response consistency matters, use a date-stamped ID if Anthropic provides one for that version.

`max_tokens`

The maximum number of tokens the model may generate in this response. Required; no default.

# Per-model output token limits
OUTPUT_TOKEN_LIMITS = {
    "claude-opus-4-6":          32_000,
    "claude-sonnet-4-6":        64_000,
    "claude-haiku-4-5-20251001": 8_192,
}

# IMPORTANT: max_tokens is a ceiling, not a target.
# The model stops naturally when done (stop_reason = "end_turn").
# It is truncated only when it reaches max_tokens (stop_reason = "max_tokens").

Always check stop_reason in production code:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,   # might be too low for a detailed answer
    messages=[{"role": "user", "content": "Explain how TCP/IP works in detail."}]
)

if response.stop_reason == "max_tokens":
    # Response was cut off — the answer is incomplete
    # Options: increase max_tokens, or implement continuation logic
    print("WARNING: Response was truncated")

elif response.stop_reason == "end_turn":
    # Model finished naturally
    print(response.content[0].text)

`messages`

The conversation history array. Each element has role and content.

Roles: "user" and "assistant" only. The API does not accept a "system" role in messages (system prompt goes in the separate system parameter).

Content forms:

# Form 1: string (text only)
{"role": "user", "content": "Hello"}

# Form 2: content block array (enables multimodal)
{
    "role": "user",
    "content": [
        {"type": "text", "text": "What is in this image?"},
        {
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": "image/jpeg",
                "data": "/9j/4AAQ..."
            }
        }
    ]
}

# Image via URL (must be publicly accessible)
{
    "type": "image",
    "source": {"type": "url", "url": "https://example.com/photo.jpg"}
}

Message alternation rules (violations cause HTTP 400):

Messages must begin with a user turn
user and assistant turns must alternate strictly
The final message must be from user (unless using assistant prefill)

# Valid
messages = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi! How can I help?"},
    {"role": "user", "content": "Explain binary search"},
]

# Invalid — consecutive user messages → HTTP 400
bad_messages = [
    {"role": "user", "content": "Hello"},
    {"role": "user", "content": "Are you there?"},  # ERROR
]

# Fix: merge into one user message
fixed_messages = [
    {"role": "user", "content": "Hello, are you there?"}
]

7.3 Optional Parameters: Sampling Control

`temperature`

Controls output randomness. Range: 0.0–1.0. Default: 1.0. When Extended Thinking is enabled, forced to 1.0.

Low temperature → more deterministic, repeatable
High temperature → more creative, varied

Practical reference values:
  Code generation / math:    0.0–0.2
  Factual Q&A:               0.0–0.3
  Summarization / extraction: 0.0–0.5
  Conversational chat:        0.5–0.8
  Creative writing:           0.7–1.0
  Brainstorming:              0.9–1.0

# Deterministic code generation
code_resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    temperature=0.1,
    messages=[{"role": "user", "content": "Write a binary search in Python"}]
)

# Creative story writing
story_resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    temperature=0.9,
    messages=[{"role": "user", "content": "Write the opening of a time-travel story"}]
)

temperature is not a quality dial. Low temperature does not fix a poorly structured prompt—it just makes the poor output more repeatable.

`top_p` and `top_k`

Additional sampling controls. Anthropic recommends adjusting only one sampling parameter at a time.

top_p (nucleus sampling): Sample only from tokens whose cumulative probability reaches top_p. top_p=0.9 excludes the long tail of low-probability tokens.
top_k: Sample only from the top-k most probable tokens.

For most production use cases, adjusting only temperature is sufficient. Use top_p or top_k only when you have a specific reason.

`stop_sequences`

A list of strings (max 4, max 256 chars each). Generation stops immediately when any of these strings appears in the output.

# Force single-line output
resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=100,
    stop_sequences=["\n"],
    messages=[{"role": "user", "content": "Give the capital of France in one word."}]
)

# Stop at a custom delimiter (useful with structured prompts)
resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    stop_sequences=["</answer>"],
    messages=[{"role": "user",
               "content": "Answer in <answer> tags: what is a hash table?"}]
)

# Check which stop sequence triggered
if resp.stop_reason == "stop_sequence":
    print(f"Stopped at: {resp.stop_sequence!r}")
    # resp.stop_sequence contains the exact string that triggered the stop

7.4 The `system` Parameter

The system prompt can be a string or a content block array (the latter enables prompt caching):

# Simple string
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[...]
)

# Content block array with cache control
STATIC_DOCS = "..."  # thousands of tokens of reference material

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant.\n\n" + STATIC_DOCS,
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[...]
)
# First call: writes to cache (slight surcharge)
# Subsequent calls within 5 minutes: 90% discount on those tokens

Restriction: The system parameter supports only text content blocks—no images, no tool results.

7.5 Tool Use Parameters

`tools`

Defines the tools the model can call. Each tool has a name, description, and a JSON schema for its inputs:

tools = [
    {
        "name": "get_weather",          # alphanumeric + underscores, max 64 chars
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "required": ["city"],
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'Paris', 'Tokyo'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit, defaults to celsius"
                }
            }
        }
    }
]

`tool_choice`

# Auto (default): model decides whether and which tool to call
tool_choice = {"type": "auto"}

# Any: model must call a tool, but can choose which one
tool_choice = {"type": "any"}

# Specific tool: model must call this specific tool
tool_choice = {"type": "tool", "name": "get_weather"}

# Disabled: model cannot use tools even if defined
tool_choice = {"type": "none"}

Complete Tool Call Loop

import anthropic
import json

client = anthropic.Anthropic()

def run_weather_agent(user_query: str) -> str:
    tools = [{
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "required": ["city"],
            "properties": {
                "city": {"type": "string"},
                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            }
        }
    }]

    messages = [{"role": "user", "content": user_query}]

    while True:
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages,
        )

        if resp.stop_reason != "tool_use":
            # Model finished; return final text
            return next(
                (b.text for b in resp.content if b.type == "text"), ""
            )

        # Add assistant turn (including tool_use blocks)
        messages.append({"role": "assistant", "content": resp.content})

        # Execute all tool calls and collect results
        tool_results = []
        for block in resp.content:
            if block.type != "tool_use":
                continue

            if block.name == "get_weather":
                # Simulate a real API call
                city = block.input.get("city", "")
                units = block.input.get("units", "celsius")
                result = {
                    "city": city,
                    "temperature": 22 if units == "celsius" else 72,
                    "condition": "sunny",
                    "humidity": 65,
                }
            else:
                result = {"error": f"Unknown tool: {block.name}"}

            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": json.dumps(result),
            })

        # Add tool results as a user turn
        messages.append({"role": "user", "content": tool_results})
        # Loop: model will now generate a response using the tool results

7.6 Extended Thinking Parameter

`thinking`

Enables the Extended Thinking mode for supported models (Opus and Sonnet):

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16_000,
    thinking={
        "type": "enabled",
        "budget_tokens": 8_000   # max tokens allocated to thinking
    },
    messages=[{"role": "user", "content": "Prove that there are infinitely many prime numbers."}]
)

# Response has two block types
for block in response.content:
    if block.type == "thinking":
        print(f"[Thinking: {len(block.thinking)} chars]\n{block.thinking[:300]}...\n")
    elif block.type == "text":
        print(f"[Answer]\n{block.text}")

Extended Thinking constraints:

budget_tokens must be ≥ 1,024
max_tokens must be greater than budget_tokens
Thinking tokens are billed at output token prices
temperature is forced to 1.0 when thinking is enabled
top_p and top_k cannot be set when thinking is enabled
stop_sequences cannot be used with Extended Thinking

Multi-Turn with Thinking Blocks

Thinking blocks can be passed back in subsequent messages to preserve reasoning continuity:

def multi_turn_reasoning(questions: list[str]) -> list[str]:
    messages = []
    answers = []

    for question in questions:
        messages.append({"role": "user", "content": question})

        resp = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=8_000,
            thinking={"type": "enabled", "budget_tokens": 4_000},
            messages=messages,
        )

        # Preserve the full response (including thinking blocks) in history
        messages.append({
            "role": "assistant",
            "content": resp.content   # includes both thinking and text blocks
        })

        answer = next((b.text for b in resp.content if b.type == "text"), "")
        answers.append(answer)

    return answers

7.7 Request Metadata

`metadata`

Arbitrary key-value metadata attached to the request for logging and abuse tracking:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    metadata={
        "user_id": "usr_12345",          # end-user identifier
        "session_id": "sess_abc987",
        "request_source": "mobile_app_v3.2",
        "ab_test_variant": "prompt_v2",
    },
    messages=[{"role": "user", "content": "..."}]
)

user_id has special significance: Anthropic uses it to track per-user behavior for abuse detection. Providing it allows Anthropic to take targeted action against problematic users without affecting your entire API key.

`betas`

Opt into experimental features that are not yet generally available:

# Enable a specific beta feature
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    betas=["feature-identifier-2024-01"],
    messages=[...]
)

Beta feature identifiers are published in the Anthropic changelog. Features graduate from beta to GA on their own timelines; once GA, the betas flag is no longer required (though including it doesn't break anything).

7.8 API Version and Compatibility

Version Header

Every request requires anthropic-version: 2023-06-01. The Python and TypeScript SDKs add this automatically.

# Manual HTTP — must include the version header
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [...]}'

Anthropic's Compatibility Guarantee

Version format: YYYY-MM-DD (e.g., 2023-06-01)

Guarantees:
  - Breaking changes never introduced to an existing version
  - New fields added to existing versions are always optional
  - Deprecation notices given at least 6 months in advance
  - Currently only one stable version: 2023-06-01

Safe Model Upgrade Checklist

Before switching model versions in production:

def validate_model_upgrade(
    old_model: str,
    new_model: str,
    test_cases: list[dict],
    system_prompt: str = "",
) -> dict:
    """
    Compare old and new model on a test set before promoting to production.
    test_cases: [{"prompt": str, "expected_format": str|None, "min_length": int}]
    """
    regressions = []

    for case in test_cases:
        kwargs = {
            "max_tokens": 1024,
            "messages": [{"role": "user", "content": case["prompt"]}],
        }
        if system_prompt:
            kwargs["system"] = system_prompt

        new_resp = client.messages.create(model=new_model, **kwargs)
        new_text = new_resp.content[0].text

        # Check format requirements
        if case.get("expected_format") == "json":
            try:
                import json
                json.loads(new_text)
            except json.JSONDecodeError:
                regressions.append({
                    "prompt": case["prompt"][:80],
                    "issue": "New model does not produce valid JSON",
                })

        # Check minimum length
        if len(new_text) < case.get("min_length", 0):
            regressions.append({
                "prompt": case["prompt"][:80],
                "issue": f"Response too short: {len(new_text)} chars",
            })

    return {
        "total_cases": len(test_cases),
        "regressions": len(regressions),
        "regression_details": regressions,
        "safe_to_upgrade": len(regressions) == 0,
    }

7.9 Complete Error Reference

HTTP Status   Error Type                    Meaning
───────────   ─────────────────────────     ─────────────────────────────────
400           invalid_request_error         Bad request parameters
401           authentication_error          Invalid or missing API key
403           permission_error              Content policy violation
404           not_found_error               Model ID does not exist
413           request_too_large             Request body exceeds size limit
422           unprocessable_entity_error    Content violates usage policy
429           rate_limit_error              Rate limit exceeded
500           api_error                     Anthropic server error
529           overloaded_error              Service temporarily overloaded

Common 400 Error Causes

# 1. max_tokens exceeds model limit
client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=100_000,  # Haiku limit is 8,192 → 400
    messages=[...]
)

# 2. Non-alternating messages
client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Q1"},
        {"role": "user", "content": "Q2"},   # consecutive user → 400
    ]
)

# 3. Total input exceeds context window
# system(80K) + messages(130K) = 210K > 200K → 400

# 4. Extended Thinking + stop_sequences (incompatible)
client.messages.create(
    model="claude-opus-4-6",
    max_tokens=5000,
    thinking={"type": "enabled", "budget_tokens": 2000},
    stop_sequences=["END"],  # not supported with thinking → 400
    messages=[...]
)

# 5. Extended Thinking budget_tokens too small
client.messages.create(
    model="claude-opus-4-6",
    max_tokens=5000,
    thinking={"type": "enabled", "budget_tokens": 500},  # min is 1024 → 400
    messages=[...]
)

Summary

The Messages API is compact but has several non-obvious constraints that cause production issues:

max_tokens is a ceiling; always check stop_reason for truncation
messages must strictly alternate user/assistant; merge consecutive same-role turns before sending
temperature default is 1.0 (not 0 as some developers assume); set explicitly for reproducible workloads
stop_sequences are checked against the generated output, not the full context
Extended Thinking has several incompatibilities: no stop_sequences, no top_p/top_k, forced temperature=1.0
Tool calls require a loop: the model stops with stop_reason="tool_use", you execute, then send results back as a user turn
Metadata user_id enables per-user abuse tracking and is worth including in production systems

The next chapter covers multi-turn conversation management in depth: how to handle long conversations, implement context compression, and design session persistence that scales.

Rate this chapter

4.6 / 5 (74 ratings)