Chapter 7

Messages API Complete Parameter Reference: All 27 Parameters, Defaults, and Best Practices

Chapter 7: Messages API Complete Reference: Parameters, Versions, and Compatibility

7.1 API Overview

POST https://api.anthropic.com/v1/messages is the single core endpoint for the Claude API. Understanding each parameter's precise semanticsโ€”not just what it does, but what its defaults are, what its limits are, and what breaks when you misuse itโ€”is the foundation of building reliable production systems.

Full Request Structure

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "messages": [...],
  "system": "...",
  "temperature": 1.0,
  "top_p": null,
  "top_k": null,
  "stop_sequences": [],
  "stream": false,
  "tools": [],
  "tool_choice": {"type": "auto"},
  "thinking": null,
  "metadata": {},
  "betas": []
}

Full Response Structure

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "..."},
    {"type": "tool_use", "id": "toolu_...", "name": "...", "input": {...}}
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 100,
    "output_tokens": 200,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0
  }
}

7.2 Required Parameters

model

The model ID string. Case-sensitive.

# Current model IDs (as of 2025)
AVAILABLE_MODELS = {
    "claude-opus-4-6": {
        "context_window": 200_000,
        "max_output_tokens": 32_000,
        "extended_thinking": True,
        "vision": True,
        "input_price_per_m": 15.00,
        "output_price_per_m": 75.00,
    },
    "claude-sonnet-4-6": {
        "context_window": 200_000,
        "max_output_tokens": 64_000,
        "extended_thinking": True,
        "vision": True,
        "input_price_per_m": 3.00,
        "output_price_per_m": 15.00,
    },
    "claude-haiku-4-5-20251001": {
        "context_window": 200_000,
        "max_output_tokens": 8_192,
        "extended_thinking": False,
        "vision": True,
        "input_price_per_m": 0.25,
        "output_price_per_m": 1.25,
    },
}

Version pinning: Model IDs without a date stamp (e.g., claude-sonnet-4-6) point to the current latest snapshot of that version, which may change. For production systems where response consistency matters, use a date-stamped ID if Anthropic provides one for that version.

max_tokens

The maximum number of tokens the model may generate in this response. Required; no default.

# Per-model output token limits
OUTPUT_TOKEN_LIMITS = {
    "claude-opus-4-6":          32_000,
    "claude-sonnet-4-6":        64_000,
    "claude-haiku-4-5-20251001": 8_192,
}

# IMPORTANT: max_tokens is a ceiling, not a target.
# The model stops naturally when done (stop_reason = "end_turn").
# It is truncated only when it reaches max_tokens (stop_reason = "max_tokens").

Always check stop_reason in production code:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,   # might be too low for a detailed answer
    messages=[{"role": "user", "content": "Explain how TCP/IP works in detail."}]
)

if response.stop_reason == "max_tokens":
    # Response was cut off โ€” the answer is incomplete
    # Options: increase max_tokens, or implement continuation logic
    print("WARNING: Response was truncated")

elif response.stop_reason == "end_turn":
    # Model finished naturally
    print(response.content[0].text)

messages

The conversation history array. Each element has role and content.

Roles: "user" and "assistant" only. The API does not accept a "system" role in messages (system prompt goes in the separate system parameter).

Content forms:

# Form 1: string (text only)
{"role": "user", "content": "Hello"}

# Form 2: content block array (enables multimodal)
{
    "role": "user",
    "content": [
        {"type": "text", "text": "What is in this image?"},
        {
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": "image/jpeg",
                "data": "/9j/4AAQ..."
            }
        }
    ]
}

# Image via URL (must be publicly accessible)
{
    "type": "image",
    "source": {"type": "url", "url": "https://example.com/photo.jpg"}
}

Message alternation rules (violations cause HTTP 400):

# Valid
messages = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi! How can I help?"},
    {"role": "user", "content": "Explain binary search"},
]

# Invalid โ€” consecutive user messages โ†’ HTTP 400
bad_messages = [
    {"role": "user", "content": "Hello"},
    {"role": "user", "content": "Are you there?"},  # ERROR
]

# Fix: merge into one user message
fixed_messages = [
    {"role": "user", "content": "Hello, are you there?"}
]

7.3 Optional Parameters: Sampling Control

temperature

Controls output randomness. Range: 0.0โ€“1.0. Default: 1.0. When Extended Thinking is enabled, forced to 1.0.

Low temperature โ†’ more deterministic, repeatable
High temperature โ†’ more creative, varied

Practical reference values:
  Code generation / math:    0.0โ€“0.2
  Factual Q&A:               0.0โ€“0.3
  Summarization / extraction: 0.0โ€“0.5
  Conversational chat:        0.5โ€“0.8
  Creative writing:           0.7โ€“1.0
  Brainstorming:              0.9โ€“1.0
# Deterministic code generation
code_resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    temperature=0.1,
    messages=[{"role": "user", "content": "Write a binary search in Python"}]
)

# Creative story writing
story_resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    temperature=0.9,
    messages=[{"role": "user", "content": "Write the opening of a time-travel story"}]
)

temperature is not a quality dial. Low temperature does not fix a poorly structured promptโ€”it just makes the poor output more repeatable.

top_p and top_k

Additional sampling controls. Anthropic recommends adjusting only one sampling parameter at a time.

For most production use cases, adjusting only temperature is sufficient. Use top_p or top_k only when you have a specific reason.

stop_sequences

A list of strings (max 4, max 256 chars each). Generation stops immediately when any of these strings appears in the output.

# Force single-line output
resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=100,
    stop_sequences=["\n"],
    messages=[{"role": "user", "content": "Give the capital of France in one word."}]
)

# Stop at a custom delimiter (useful with structured prompts)
resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    stop_sequences=["</answer>"],
    messages=[{"role": "user",
               "content": "Answer in <answer> tags: what is a hash table?"}]
)

# Check which stop sequence triggered
if resp.stop_reason == "stop_sequence":
    print(f"Stopped at: {resp.stop_sequence!r}")
    # resp.stop_sequence contains the exact string that triggered the stop

7.4 The system Parameter

The system prompt can be a string or a content block array (the latter enables prompt caching):

# Simple string
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[...]
)

# Content block array with cache control
STATIC_DOCS = "..."  # thousands of tokens of reference material

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a helpful assistant.\n\n" + STATIC_DOCS,
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[...]
)
# First call: writes to cache (slight surcharge)
# Subsequent calls within 5 minutes: 90% discount on those tokens

Restriction: The system parameter supports only text content blocksโ€”no images, no tool results.

7.5 Tool Use Parameters

tools

Defines the tools the model can call. Each tool has a name, description, and a JSON schema for its inputs:

tools = [
    {
        "name": "get_weather",          # alphanumeric + underscores, max 64 chars
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "required": ["city"],
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'Paris', 'Tokyo'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit, defaults to celsius"
                }
            }
        }
    }
]

tool_choice

# Auto (default): model decides whether and which tool to call
tool_choice = {"type": "auto"}

# Any: model must call a tool, but can choose which one
tool_choice = {"type": "any"}

# Specific tool: model must call this specific tool
tool_choice = {"type": "tool", "name": "get_weather"}

# Disabled: model cannot use tools even if defined
tool_choice = {"type": "none"}

Complete Tool Call Loop

import anthropic
import json

client = anthropic.Anthropic()

def run_weather_agent(user_query: str) -> str:
    tools = [{
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "required": ["city"],
            "properties": {
                "city": {"type": "string"},
                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            }
        }
    }]

    messages = [{"role": "user", "content": user_query}]

    while True:
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages,
        )

        if resp.stop_reason != "tool_use":
            # Model finished; return final text
            return next(
                (b.text for b in resp.content if b.type == "text"), ""
            )

        # Add assistant turn (including tool_use blocks)
        messages.append({"role": "assistant", "content": resp.content})

        # Execute all tool calls and collect results
        tool_results = []
        for block in resp.content:
            if block.type != "tool_use":
                continue

            if block.name == "get_weather":
                # Simulate a real API call
                city = block.input.get("city", "")
                units = block.input.get("units", "celsius")
                result = {
                    "city": city,
                    "temperature": 22 if units == "celsius" else 72,
                    "condition": "sunny",
                    "humidity": 65,
                }
            else:
                result = {"error": f"Unknown tool: {block.name}"}

            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": json.dumps(result),
            })

        # Add tool results as a user turn
        messages.append({"role": "user", "content": tool_results})
        # Loop: model will now generate a response using the tool results

7.6 Extended Thinking Parameter

thinking

Enables the Extended Thinking mode for supported models (Opus and Sonnet):

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16_000,
    thinking={
        "type": "enabled",
        "budget_tokens": 8_000   # max tokens allocated to thinking
    },
    messages=[{"role": "user", "content": "Prove that there are infinitely many prime numbers."}]
)

# Response has two block types
for block in response.content:
    if block.type == "thinking":
        print(f"[Thinking: {len(block.thinking)} chars]\n{block.thinking[:300]}...\n")
    elif block.type == "text":
        print(f"[Answer]\n{block.text}")

Extended Thinking constraints:

Multi-Turn with Thinking Blocks

Thinking blocks can be passed back in subsequent messages to preserve reasoning continuity:

def multi_turn_reasoning(questions: list[str]) -> list[str]:
    messages = []
    answers = []

    for question in questions:
        messages.append({"role": "user", "content": question})

        resp = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=8_000,
            thinking={"type": "enabled", "budget_tokens": 4_000},
            messages=messages,
        )

        # Preserve the full response (including thinking blocks) in history
        messages.append({
            "role": "assistant",
            "content": resp.content   # includes both thinking and text blocks
        })

        answer = next((b.text for b in resp.content if b.type == "text"), "")
        answers.append(answer)

    return answers

7.7 Request Metadata

metadata

Arbitrary key-value metadata attached to the request for logging and abuse tracking:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    metadata={
        "user_id": "usr_12345",          # end-user identifier
        "session_id": "sess_abc987",
        "request_source": "mobile_app_v3.2",
        "ab_test_variant": "prompt_v2",
    },
    messages=[{"role": "user", "content": "..."}]
)

user_id has special significance: Anthropic uses it to track per-user behavior for abuse detection. Providing it allows Anthropic to take targeted action against problematic users without affecting your entire API key.

betas

Opt into experimental features that are not yet generally available:

# Enable a specific beta feature
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    betas=["feature-identifier-2024-01"],
    messages=[...]
)

Beta feature identifiers are published in the Anthropic changelog. Features graduate from beta to GA on their own timelines; once GA, the betas flag is no longer required (though including it doesn't break anything).

7.8 API Version and Compatibility

Version Header

Every request requires anthropic-version: 2023-06-01. The Python and TypeScript SDKs add this automatically.

# Manual HTTP โ€” must include the version header
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [...]}'

Anthropic's Compatibility Guarantee

Version format: YYYY-MM-DD (e.g., 2023-06-01)

Guarantees:
  - Breaking changes never introduced to an existing version
  - New fields added to existing versions are always optional
  - Deprecation notices given at least 6 months in advance
  - Currently only one stable version: 2023-06-01

Safe Model Upgrade Checklist

Before switching model versions in production:

def validate_model_upgrade(
    old_model: str,
    new_model: str,
    test_cases: list[dict],
    system_prompt: str = "",
) -> dict:
    """
    Compare old and new model on a test set before promoting to production.
    test_cases: [{"prompt": str, "expected_format": str|None, "min_length": int}]
    """
    regressions = []

    for case in test_cases:
        kwargs = {
            "max_tokens": 1024,
            "messages": [{"role": "user", "content": case["prompt"]}],
        }
        if system_prompt:
            kwargs["system"] = system_prompt

        new_resp = client.messages.create(model=new_model, **kwargs)
        new_text = new_resp.content[0].text

        # Check format requirements
        if case.get("expected_format") == "json":
            try:
                import json
                json.loads(new_text)
            except json.JSONDecodeError:
                regressions.append({
                    "prompt": case["prompt"][:80],
                    "issue": "New model does not produce valid JSON",
                })

        # Check minimum length
        if len(new_text) < case.get("min_length", 0):
            regressions.append({
                "prompt": case["prompt"][:80],
                "issue": f"Response too short: {len(new_text)} chars",
            })

    return {
        "total_cases": len(test_cases),
        "regressions": len(regressions),
        "regression_details": regressions,
        "safe_to_upgrade": len(regressions) == 0,
    }

7.9 Complete Error Reference

HTTP Status   Error Type                    Meaning
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€     โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
400           invalid_request_error         Bad request parameters
401           authentication_error          Invalid or missing API key
403           permission_error              Content policy violation
404           not_found_error               Model ID does not exist
413           request_too_large             Request body exceeds size limit
422           unprocessable_entity_error    Content violates usage policy
429           rate_limit_error              Rate limit exceeded
500           api_error                     Anthropic server error
529           overloaded_error              Service temporarily overloaded

Common 400 Error Causes

# 1. max_tokens exceeds model limit
client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=100_000,  # Haiku limit is 8,192 โ†’ 400
    messages=[...]
)

# 2. Non-alternating messages
client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Q1"},
        {"role": "user", "content": "Q2"},   # consecutive user โ†’ 400
    ]
)

# 3. Total input exceeds context window
# system(80K) + messages(130K) = 210K > 200K โ†’ 400

# 4. Extended Thinking + stop_sequences (incompatible)
client.messages.create(
    model="claude-opus-4-6",
    max_tokens=5000,
    thinking={"type": "enabled", "budget_tokens": 2000},
    stop_sequences=["END"],  # not supported with thinking โ†’ 400
    messages=[...]
)

# 5. Extended Thinking budget_tokens too small
client.messages.create(
    model="claude-opus-4-6",
    max_tokens=5000,
    thinking={"type": "enabled", "budget_tokens": 500},  # min is 1024 โ†’ 400
    messages=[...]
)

Summary

The Messages API is compact but has several non-obvious constraints that cause production issues:

The next chapter covers multi-turn conversation management in depth: how to handle long conversations, implement context compression, and design session persistence that scales.

Rate this chapter
4.6  / 5  (74 ratings)

๐Ÿ’ฌ Comments