Messages API Complete Parameter Reference: All 27 Parameters, Defaults, and Best Practices
Chapter 7: Messages API Complete Reference: Parameters, Versions, and Compatibility
7.1 API Overview
POST https://api.anthropic.com/v1/messages is the single core endpoint for the Claude API. Understanding each parameter's precise semantics—not just what it does, but what its defaults are, what its limits are, and what breaks when you misuse it—is the foundation of building reliable production systems.
Full Request Structure
{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"messages": [...],
"system": "...",
"temperature": 1.0,
"top_p": null,
"top_k": null,
"stop_sequences": [],
"stream": false,
"tools": [],
"tool_choice": {"type": "auto"},
"thinking": null,
"metadata": {},
"betas": []
}
Full Response Structure
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{"type": "text", "text": "..."},
{"type": "tool_use", "id": "toolu_...", "name": "...", "input": {...}}
],
"model": "claude-sonnet-4-6",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 100,
"output_tokens": 200,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0
}
}
7.2 Required Parameters
model
The model ID string. Case-sensitive.
# Current model IDs (as of 2025)
AVAILABLE_MODELS = {
"claude-opus-4-6": {
"context_window": 200_000,
"max_output_tokens": 32_000,
"extended_thinking": True,
"vision": True,
"input_price_per_m": 15.00,
"output_price_per_m": 75.00,
},
"claude-sonnet-4-6": {
"context_window": 200_000,
"max_output_tokens": 64_000,
"extended_thinking": True,
"vision": True,
"input_price_per_m": 3.00,
"output_price_per_m": 15.00,
},
"claude-haiku-4-5-20251001": {
"context_window": 200_000,
"max_output_tokens": 8_192,
"extended_thinking": False,
"vision": True,
"input_price_per_m": 0.25,
"output_price_per_m": 1.25,
},
}
Version pinning: Model IDs without a date stamp (e.g., claude-sonnet-4-6) point to the current latest snapshot of that version, which may change. For production systems where response consistency matters, use a date-stamped ID if Anthropic provides one for that version.
max_tokens
The maximum number of tokens the model may generate in this response. Required; no default.
# Per-model output token limits
OUTPUT_TOKEN_LIMITS = {
"claude-opus-4-6": 32_000,
"claude-sonnet-4-6": 64_000,
"claude-haiku-4-5-20251001": 8_192,
}
# IMPORTANT: max_tokens is a ceiling, not a target.
# The model stops naturally when done (stop_reason = "end_turn").
# It is truncated only when it reaches max_tokens (stop_reason = "max_tokens").
Always check stop_reason in production code:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=200, # might be too low for a detailed answer
messages=[{"role": "user", "content": "Explain how TCP/IP works in detail."}]
)
if response.stop_reason == "max_tokens":
# Response was cut off — the answer is incomplete
# Options: increase max_tokens, or implement continuation logic
print("WARNING: Response was truncated")
elif response.stop_reason == "end_turn":
# Model finished naturally
print(response.content[0].text)
messages
The conversation history array. Each element has role and content.
Roles: "user" and "assistant" only. The API does not accept a "system" role in messages (system prompt goes in the separate system parameter).
Content forms:
# Form 1: string (text only)
{"role": "user", "content": "Hello"}
# Form 2: content block array (enables multimodal)
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": "/9j/4AAQ..."
}
}
]
}
# Image via URL (must be publicly accessible)
{
"type": "image",
"source": {"type": "url", "url": "https://example.com/photo.jpg"}
}
Message alternation rules (violations cause HTTP 400):
- Messages must begin with a
userturn userandassistantturns must alternate strictly- The final message must be from
user(unless using assistant prefill)
# Valid
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi! How can I help?"},
{"role": "user", "content": "Explain binary search"},
]
# Invalid — consecutive user messages → HTTP 400
bad_messages = [
{"role": "user", "content": "Hello"},
{"role": "user", "content": "Are you there?"}, # ERROR
]
# Fix: merge into one user message
fixed_messages = [
{"role": "user", "content": "Hello, are you there?"}
]
7.3 Optional Parameters: Sampling Control
temperature
Controls output randomness. Range: 0.0–1.0. Default: 1.0. When Extended Thinking is enabled, forced to 1.0.
Low temperature → more deterministic, repeatable
High temperature → more creative, varied
Practical reference values:
Code generation / math: 0.0–0.2
Factual Q&A: 0.0–0.3
Summarization / extraction: 0.0–0.5
Conversational chat: 0.5–0.8
Creative writing: 0.7–1.0
Brainstorming: 0.9–1.0
# Deterministic code generation
code_resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
temperature=0.1,
messages=[{"role": "user", "content": "Write a binary search in Python"}]
)
# Creative story writing
story_resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
temperature=0.9,
messages=[{"role": "user", "content": "Write the opening of a time-travel story"}]
)
temperature is not a quality dial. Low temperature does not fix a poorly structured prompt—it just makes the poor output more repeatable.
top_p and top_k
Additional sampling controls. Anthropic recommends adjusting only one sampling parameter at a time.
top_p(nucleus sampling): Sample only from tokens whose cumulative probability reachestop_p.top_p=0.9excludes the long tail of low-probability tokens.top_k: Sample only from the top-k most probable tokens.
For most production use cases, adjusting only temperature is sufficient. Use top_p or top_k only when you have a specific reason.
stop_sequences
A list of strings (max 4, max 256 chars each). Generation stops immediately when any of these strings appears in the output.
# Force single-line output
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=100,
stop_sequences=["\n"],
messages=[{"role": "user", "content": "Give the capital of France in one word."}]
)
# Stop at a custom delimiter (useful with structured prompts)
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
stop_sequences=["</answer>"],
messages=[{"role": "user",
"content": "Answer in <answer> tags: what is a hash table?"}]
)
# Check which stop sequence triggered
if resp.stop_reason == "stop_sequence":
print(f"Stopped at: {resp.stop_sequence!r}")
# resp.stop_sequence contains the exact string that triggered the stop
7.4 The system Parameter
The system prompt can be a string or a content block array (the latter enables prompt caching):
# Simple string
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[...]
)
# Content block array with cache control
STATIC_DOCS = "..." # thousands of tokens of reference material
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant.\n\n" + STATIC_DOCS,
"cache_control": {"type": "ephemeral"},
}
],
messages=[...]
)
# First call: writes to cache (slight surcharge)
# Subsequent calls within 5 minutes: 90% discount on those tokens
Restriction: The system parameter supports only text content blocks—no images, no tool results.
7.5 Tool Use Parameters
tools
Defines the tools the model can call. Each tool has a name, description, and a JSON schema for its inputs:
tools = [
{
"name": "get_weather", # alphanumeric + underscores, max 64 chars
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"required": ["city"],
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'Paris', 'Tokyo'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit, defaults to celsius"
}
}
}
}
]
tool_choice
# Auto (default): model decides whether and which tool to call
tool_choice = {"type": "auto"}
# Any: model must call a tool, but can choose which one
tool_choice = {"type": "any"}
# Specific tool: model must call this specific tool
tool_choice = {"type": "tool", "name": "get_weather"}
# Disabled: model cannot use tools even if defined
tool_choice = {"type": "none"}
Complete Tool Call Loop
import anthropic
import json
client = anthropic.Anthropic()
def run_weather_agent(user_query: str) -> str:
tools = [{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"required": ["city"],
"properties": {
"city": {"type": "string"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
}
}
}]
messages = [{"role": "user", "content": user_query}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=messages,
)
if resp.stop_reason != "tool_use":
# Model finished; return final text
return next(
(b.text for b in resp.content if b.type == "text"), ""
)
# Add assistant turn (including tool_use blocks)
messages.append({"role": "assistant", "content": resp.content})
# Execute all tool calls and collect results
tool_results = []
for block in resp.content:
if block.type != "tool_use":
continue
if block.name == "get_weather":
# Simulate a real API call
city = block.input.get("city", "")
units = block.input.get("units", "celsius")
result = {
"city": city,
"temperature": 22 if units == "celsius" else 72,
"condition": "sunny",
"humidity": 65,
}
else:
result = {"error": f"Unknown tool: {block.name}"}
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result),
})
# Add tool results as a user turn
messages.append({"role": "user", "content": tool_results})
# Loop: model will now generate a response using the tool results
7.6 Extended Thinking Parameter
thinking
Enables the Extended Thinking mode for supported models (Opus and Sonnet):
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16_000,
thinking={
"type": "enabled",
"budget_tokens": 8_000 # max tokens allocated to thinking
},
messages=[{"role": "user", "content": "Prove that there are infinitely many prime numbers."}]
)
# Response has two block types
for block in response.content:
if block.type == "thinking":
print(f"[Thinking: {len(block.thinking)} chars]\n{block.thinking[:300]}...\n")
elif block.type == "text":
print(f"[Answer]\n{block.text}")
Extended Thinking constraints:
budget_tokensmust be ≥ 1,024max_tokensmust be greater thanbudget_tokens- Thinking tokens are billed at output token prices
temperatureis forced to 1.0 when thinking is enabledtop_pandtop_kcannot be set when thinking is enabledstop_sequencescannot be used with Extended Thinking
Multi-Turn with Thinking Blocks
Thinking blocks can be passed back in subsequent messages to preserve reasoning continuity:
def multi_turn_reasoning(questions: list[str]) -> list[str]:
messages = []
answers = []
for question in questions:
messages.append({"role": "user", "content": question})
resp = client.messages.create(
model="claude-opus-4-6",
max_tokens=8_000,
thinking={"type": "enabled", "budget_tokens": 4_000},
messages=messages,
)
# Preserve the full response (including thinking blocks) in history
messages.append({
"role": "assistant",
"content": resp.content # includes both thinking and text blocks
})
answer = next((b.text for b in resp.content if b.type == "text"), "")
answers.append(answer)
return answers
7.7 Request Metadata
metadata
Arbitrary key-value metadata attached to the request for logging and abuse tracking:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
metadata={
"user_id": "usr_12345", # end-user identifier
"session_id": "sess_abc987",
"request_source": "mobile_app_v3.2",
"ab_test_variant": "prompt_v2",
},
messages=[{"role": "user", "content": "..."}]
)
user_id has special significance: Anthropic uses it to track per-user behavior for abuse detection. Providing it allows Anthropic to take targeted action against problematic users without affecting your entire API key.
betas
Opt into experimental features that are not yet generally available:
# Enable a specific beta feature
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
betas=["feature-identifier-2024-01"],
messages=[...]
)
Beta feature identifiers are published in the Anthropic changelog. Features graduate from beta to GA on their own timelines; once GA, the betas flag is no longer required (though including it doesn't break anything).
7.8 API Version and Compatibility
Version Header
Every request requires anthropic-version: 2023-06-01. The Python and TypeScript SDKs add this automatically.
# Manual HTTP — must include the version header
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [...]}'
Anthropic's Compatibility Guarantee
Version format: YYYY-MM-DD (e.g., 2023-06-01)
Guarantees:
- Breaking changes never introduced to an existing version
- New fields added to existing versions are always optional
- Deprecation notices given at least 6 months in advance
- Currently only one stable version: 2023-06-01
Safe Model Upgrade Checklist
Before switching model versions in production:
def validate_model_upgrade(
old_model: str,
new_model: str,
test_cases: list[dict],
system_prompt: str = "",
) -> dict:
"""
Compare old and new model on a test set before promoting to production.
test_cases: [{"prompt": str, "expected_format": str|None, "min_length": int}]
"""
regressions = []
for case in test_cases:
kwargs = {
"max_tokens": 1024,
"messages": [{"role": "user", "content": case["prompt"]}],
}
if system_prompt:
kwargs["system"] = system_prompt
new_resp = client.messages.create(model=new_model, **kwargs)
new_text = new_resp.content[0].text
# Check format requirements
if case.get("expected_format") == "json":
try:
import json
json.loads(new_text)
except json.JSONDecodeError:
regressions.append({
"prompt": case["prompt"][:80],
"issue": "New model does not produce valid JSON",
})
# Check minimum length
if len(new_text) < case.get("min_length", 0):
regressions.append({
"prompt": case["prompt"][:80],
"issue": f"Response too short: {len(new_text)} chars",
})
return {
"total_cases": len(test_cases),
"regressions": len(regressions),
"regression_details": regressions,
"safe_to_upgrade": len(regressions) == 0,
}
7.9 Complete Error Reference
HTTP Status Error Type Meaning
─────────── ───────────────────────── ─────────────────────────────────
400 invalid_request_error Bad request parameters
401 authentication_error Invalid or missing API key
403 permission_error Content policy violation
404 not_found_error Model ID does not exist
413 request_too_large Request body exceeds size limit
422 unprocessable_entity_error Content violates usage policy
429 rate_limit_error Rate limit exceeded
500 api_error Anthropic server error
529 overloaded_error Service temporarily overloaded
Common 400 Error Causes
# 1. max_tokens exceeds model limit
client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=100_000, # Haiku limit is 8,192 → 400
messages=[...]
)
# 2. Non-alternating messages
client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Q1"},
{"role": "user", "content": "Q2"}, # consecutive user → 400
]
)
# 3. Total input exceeds context window
# system(80K) + messages(130K) = 210K > 200K → 400
# 4. Extended Thinking + stop_sequences (incompatible)
client.messages.create(
model="claude-opus-4-6",
max_tokens=5000,
thinking={"type": "enabled", "budget_tokens": 2000},
stop_sequences=["END"], # not supported with thinking → 400
messages=[...]
)
# 5. Extended Thinking budget_tokens too small
client.messages.create(
model="claude-opus-4-6",
max_tokens=5000,
thinking={"type": "enabled", "budget_tokens": 500}, # min is 1024 → 400
messages=[...]
)
Summary
The Messages API is compact but has several non-obvious constraints that cause production issues:
max_tokensis a ceiling; always checkstop_reasonfor truncationmessagesmust strictly alternate user/assistant; merge consecutive same-role turns before sendingtemperaturedefault is 1.0 (not 0 as some developers assume); set explicitly for reproducible workloadsstop_sequencesare checked against the generated output, not the full context- Extended Thinking has several incompatibilities: no
stop_sequences, notop_p/top_k, forcedtemperature=1.0 - Tool calls require a loop: the model stops with
stop_reason="tool_use", you execute, then send results back as auserturn - Metadata
user_idenables per-user abuse tracking and is worth including in production systems
The next chapter covers multi-turn conversation management in depth: how to handle long conversations, implement context compression, and design session persistence that scales.