Prefill Deprecation Migration + Effort Parameter: Two Must-Know Changes When Upgrading to Claude 4.x
Chapter 14: OpenAI-Compatible Endpoint: Migration Guide and Handling Differences
14.1 Why an OpenAI-Compatible Endpoint?
Anthropic provides an endpoint that closely mirrors the OpenAI Chat Completions API. Developers can call Claude models using the openai Python library, LangChain, LiteLLM, and other OpenAI-dependent tooling by changing only two lines: base_url and api_key.
Primary value propositions:
- Zero-code migration — existing OpenAI projects reach Claude with a two-line config change
- Ecosystem compatibility — tools built on the OpenAI SDK (LangChain, CrewAI, agent frameworks) work out of the box
- A/B testing — switch providers within a single codebase to compare cost and quality
- Unified routing layer — when using LiteLLM or similar tools, Claude becomes a drop-in backend
Important caveat: The OpenAI-compatible endpoint is a least-common-denominator interface. It does not expose Claude-exclusive features such as Extended Thinking, Prompt Caching control, or beta parameters like PDF support. For those capabilities, use the native Anthropic SDK.
14.2 Basic Configuration
Using the OpenAI Python library to call Claude
import openai
client = openai.OpenAI(
base_url="https://api.anthropic.com/v1/",
api_key="your-anthropic-api-key"
)
response = client.chat.completions.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "system", "content": "You are a professional technical advisor."},
{"role": "user", "content": "Explain microservices architecture."}
]
)
print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
model="claude-sonnet-4-6",
max_tokens=1024,
stream=True,
messages=[{"role": "user", "content": "Write a poem about software."}]
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Environment variable configuration
# Set these environment variables to avoid hardcoding credentials
export OPENAI_BASE_URL=https://api.anthropic.com/v1/
export OPENAI_API_KEY=your-anthropic-api-key
import openai
# With the env vars set, the default client routes to Claude automatically
client = openai.OpenAI()
14.3 Model Name Mapping
When using the compatibility endpoint, use Anthropic model IDs, not OpenAI model names:
| OpenAI model | Claude equivalent | Notes |
|---|---|---|
| gpt-4o | claude-sonnet-4-6 | Best balance of capability and speed |
| gpt-4o-mini | claude-haiku-4-5-20251001 | Fast and cost-efficient |
| o1 | claude-opus-4-6 | Strongest reasoning |
| gpt-3.5-turbo | claude-haiku-4-5-20251001 | Low latency |
MODEL_MAP = {
"fast": "claude-haiku-4-5-20251001",
"standard": "claude-sonnet-4-6",
"powerful": "claude-opus-4-6"
}
def create_completion(tier: str, messages: list) -> str:
response = client.chat.completions.create(
model=MODEL_MAP[tier],
max_tokens=1024,
messages=messages
)
return response.choices[0].message.content
14.4 Tool Calling (Function Calling) Compatibility
Claude's tool calling is exposed through the compatibility endpoint using the same schema as OpenAI's function calling:
import json, openai
client = openai.OpenAI(
base_url="https://api.anthropic.com/v1/",
api_key="your-anthropic-api-key"
)
tools = [{
"type": "function",
"function": {
"name": "get_stock_price",
"description": "Get the current price of a stock.",
"parameters": {
"type": "object",
"properties": {
"symbol": {"type": "string", "description": "Ticker symbol, e.g. AAPL"},
"currency": {"type": "string", "enum": ["USD", "EUR"]}
},
"required": ["symbol"]
}
}
}]
def tool_loop(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.chat.completions.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=messages
)
choice = response.choices[0]
if choice.finish_reason == "stop":
return choice.message.content
if choice.finish_reason == "tool_calls":
# Append assistant message with tool calls
messages.append({
"role": "assistant",
"content": choice.message.content,
"tool_calls": [
{
"id": tc.id,
"type": "function",
"function": {"name": tc.function.name, "arguments": tc.function.arguments}
}
for tc in choice.message.tool_calls
]
})
# Execute tools and append results
for tc in choice.message.tool_calls:
args = json.loads(tc.function.arguments)
if tc.function.name == "get_stock_price":
result = {"symbol": args["symbol"], "price": 150.23, "currency": "USD"}
else:
result = {"error": "unknown function"}
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": json.dumps(result)
})
else:
break
return "No response"
print(tool_loop("What's Apple's stock price?"))
14.5 LangChain Integration
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage
llm = ChatOpenAI(
model="claude-sonnet-4-6",
openai_api_key="your-anthropic-api-key",
openai_api_base="https://api.anthropic.com/v1/",
max_tokens=1024
)
messages = [
SystemMessage(content="You are a professional code reviewer."),
HumanMessage(content="Review this Python code:\n\ndef divide(a, b):\n return a / b")
]
response = llm.invoke(messages)
print(response.content)
LangChain LCEL (modern chain syntax)
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
llm = ChatOpenAI(
model="claude-haiku-4-5-20251001",
openai_api_key="your-anthropic-api-key",
openai_api_base="https://api.anthropic.com/v1/"
)
prompt = ChatPromptTemplate.from_template("Explain in one sentence: {concept}")
chain = prompt | llm | StrOutputParser()
concepts = ["quantum computing", "blockchain", "federated learning"]
results = chain.batch([{"concept": c} for c in concepts])
for concept, result in zip(concepts, results):
print(f"{concept}: {result}")
14.6 LiteLLM for Multi-Provider Routing
LiteLLM provides a single interface for calling OpenAI, Claude, Gemini, and other providers:
import litellm
# Method 1: Anthropic prefix
response = litellm.completion(
model="anthropic/claude-sonnet-4-6",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# Method 2: Via compatibility endpoint
response = litellm.completion(
model="openai/claude-sonnet-4-6",
api_base="https://api.anthropic.com/v1/",
api_key="your-anthropic-api-key",
messages=[{"role": "user", "content": "Hello!"}]
)
# Method 3: Load-balanced fallback routing
from litellm import Router
router = Router(
model_list=[
{
"model_name": "smart",
"litellm_params": {
"model": "anthropic/claude-sonnet-4-6",
"api_key": "your-anthropic-api-key"
}
},
{
"model_name": "smart", # Fallback for the same alias
"litellm_params": {
"model": "gpt-4o-mini",
"api_key": "your-openai-api-key"
}
}
],
routing_strategy="latency-based-routing"
)
response = router.completion(
model="smart",
messages=[{"role": "user", "content": "What is 2+2?"}]
)
14.7 Critical Differences and Gotchas
Parameter behavior differences
| Parameter | OpenAI behavior | Claude compatibility behavior |
|---|---|---|
temperature |
0–2 | Recommended 0–1; Claude default is 1.0 |
n |
Generate N completions | Not supported — always returns 1 |
presence_penalty |
Supported | Ignored — no effect |
frequency_penalty |
Supported | Ignored — no effect |
logprobs |
Supported | Not supported |
max_tokens |
Optional | Required in Claude |
stop |
Supported | Supported |
stream |
Supported | Supported |
The max_tokens requirement — most common gotcha
# This works with OpenAI but FAILS with Claude
try:
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Hello"}]
# Missing max_tokens!
)
except Exception as e:
print(f"Error: {e}") # max_tokens is required for Claude
# Always include max_tokens
response = client.chat.completions.create(
model="claude-sonnet-4-6",
max_tokens=1024, # Required
messages=[{"role": "user", "content": "Hello"}]
)
System message handling
# Multiple system messages: only the first is reliably honored
messages = [
{"role": "system", "content": "You are a Python expert."}, # Used
{"role": "user", "content": "Explain decorators."},
{"role": "system", "content": "Be brief."}, # May be ignored
]
# Recommended: single system message at position 0
messages_correct = [
{"role": "system", "content": "You are a Python expert. Be brief."},
{"role": "user", "content": "Explain decorators."}
]
Features only available via native Anthropic SDK
import anthropic
client = anthropic.Anthropic()
# 1. Extended Thinking — no OpenAI equivalent
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "Solve a hard math problem."}]
)
# 2. Prompt Caching control
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[{
"type": "text",
"text": "Very long system prompt...",
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": "Question"}]
)
# 3. Message Batches API
batch = client.messages.batches.create(requests=[...])
# 4. PDF support via betas
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
betas=["pdfs-2024-09-25"],
messages=[...]
)
14.8 Migration Walkthrough
Simple migration: two-line change
# BEFORE (OpenAI)
import openai
client = openai.OpenAI(api_key="sk-openai-key")
response = client.chat.completions.create(
model="gpt-4o",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
# AFTER (Claude via compatibility endpoint)
import openai
client = openai.OpenAI(
base_url="https://api.anthropic.com/v1/", # Changed
api_key="your-anthropic-api-key" # Changed
)
response = client.chat.completions.create(
model="claude-sonnet-4-6", # Changed model name
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
Provider-agnostic abstraction layer
import os, openai
def build_client(provider: str = None) -> tuple[openai.OpenAI, str]:
"""Return (client, default_model) for the specified provider."""
provider = provider or os.environ.get("LLM_PROVIDER", "anthropic")
if provider == "anthropic":
return (
openai.OpenAI(
base_url="https://api.anthropic.com/v1/",
api_key=os.environ["ANTHROPIC_API_KEY"]
),
"claude-sonnet-4-6"
)
elif provider == "openai":
return (
openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"]),
"gpt-4o"
)
else:
raise ValueError(f"Unknown provider: {provider}")
# Application code stays unchanged as you swap providers
client, model = build_client()
response = client.chat.completions.create(
model=model,
max_tokens=1024,
messages=[{"role": "user", "content": "Explain RAG."}]
)
print(response.choices[0].message.content)
14.9 Debugging Compatibility Issues
def diagnose_request(messages: list, **kwargs) -> dict:
"""Detect common compatibility issues before sending the request."""
warnings = []
if "max_tokens" not in kwargs:
warnings.append("max_tokens is missing — required for Claude")
kwargs["max_tokens"] = 1024 # Auto-fix
if kwargs.get("n", 1) > 1:
warnings.append(f"n={kwargs['n']} is not supported; only n=1 works")
if kwargs.get("temperature", 1.0) > 1.0:
warnings.append(f"temperature={kwargs['temperature']} exceeds Claude's recommended range [0, 1]")
for p in ("logprobs", "presence_penalty", "frequency_penalty"):
if p in kwargs and kwargs[p] not in (None, 0, False):
warnings.append(f"{p} is ignored by Claude")
for w in warnings:
print(f"WARNING: {w}")
try:
client = openai.OpenAI(
base_url="https://api.anthropic.com/v1/",
api_key=os.environ["ANTHROPIC_API_KEY"]
)
response = client.chat.completions.create(messages=messages, **kwargs)
return {"ok": True, "content": response.choices[0].message.content}
except openai.BadRequestError as e:
return {"ok": False, "error": str(e)}
Summary
The OpenAI-compatible endpoint dramatically lowers the barrier to adopting Claude in OpenAI-based codebases. Key takeaways:
- Change
base_urltohttps://api.anthropic.com/v1/and swap the API key — most code just works - Tool calling and streaming are fully supported through the compatibility layer
max_tokensis required for Claude — the single most common migration gotchan > 1,logprobs,presence_penalty, andfrequency_penaltyare unsupported or ignored- Extended Thinking, Prompt Caching, Batch API, and beta features require the native Anthropic SDK
- Use LiteLLM for multi-provider routing with automatic fallback between Claude and OpenAI models