Chapter 14

Prefill Deprecation Migration + Effort Parameter: Two Must-Know Changes When Upgrading to Claude 4.x

Chapter 14: OpenAI-Compatible Endpoint: Migration Guide and Handling Differences

14.1 Why an OpenAI-Compatible Endpoint?

Anthropic provides an endpoint that closely mirrors the OpenAI Chat Completions API. Developers can call Claude models using the openai Python library, LangChain, LiteLLM, and other OpenAI-dependent tooling by changing only two lines: base_url and api_key.

Primary value propositions:

Zero-code migration — existing OpenAI projects reach Claude with a two-line config change
Ecosystem compatibility — tools built on the OpenAI SDK (LangChain, CrewAI, agent frameworks) work out of the box
A/B testing — switch providers within a single codebase to compare cost and quality
Unified routing layer — when using LiteLLM or similar tools, Claude becomes a drop-in backend

Important caveat: The OpenAI-compatible endpoint is a least-common-denominator interface. It does not expose Claude-exclusive features such as Extended Thinking, Prompt Caching control, or beta parameters like PDF support. For those capabilities, use the native Anthropic SDK.

14.2 Basic Configuration

Using the OpenAI Python library to call Claude

import openai

client = openai.OpenAI(
    base_url="https://api.anthropic.com/v1/",
    api_key="your-anthropic-api-key"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "system", "content": "You are a professional technical advisor."},
        {"role": "user", "content": "Explain microservices architecture."}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    stream=True,
    messages=[{"role": "user", "content": "Write a poem about software."}]
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Environment variable configuration

# Set these environment variables to avoid hardcoding credentials
export OPENAI_BASE_URL=https://api.anthropic.com/v1/
export OPENAI_API_KEY=your-anthropic-api-key

import openai

# With the env vars set, the default client routes to Claude automatically
client = openai.OpenAI()

14.3 Model Name Mapping

When using the compatibility endpoint, use Anthropic model IDs, not OpenAI model names:

OpenAI model	Claude equivalent	Notes
gpt-4o	claude-sonnet-4-6	Best balance of capability and speed
gpt-4o-mini	claude-haiku-4-5-20251001	Fast and cost-efficient
o1	claude-opus-4-6	Strongest reasoning
gpt-3.5-turbo	claude-haiku-4-5-20251001	Low latency

MODEL_MAP = {
    "fast": "claude-haiku-4-5-20251001",
    "standard": "claude-sonnet-4-6",
    "powerful": "claude-opus-4-6"
}

def create_completion(tier: str, messages: list) -> str:
    response = client.chat.completions.create(
        model=MODEL_MAP[tier],
        max_tokens=1024,
        messages=messages
    )
    return response.choices[0].message.content

14.4 Tool Calling (Function Calling) Compatibility

Claude's tool calling is exposed through the compatibility endpoint using the same schema as OpenAI's function calling:

import json, openai

client = openai.OpenAI(
    base_url="https://api.anthropic.com/v1/",
    api_key="your-anthropic-api-key"
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get the current price of a stock.",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {"type": "string", "description": "Ticker symbol, e.g. AAPL"},
                "currency": {"type": "string", "enum": ["USD", "EUR"]}
            },
            "required": ["symbol"]
        }
    }
}]

def tool_loop(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.chat.completions.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )
        choice = response.choices[0]

        if choice.finish_reason == "stop":
            return choice.message.content

        if choice.finish_reason == "tool_calls":
            # Append assistant message with tool calls
            messages.append({
                "role": "assistant",
                "content": choice.message.content,
                "tool_calls": [
                    {
                        "id": tc.id,
                        "type": "function",
                        "function": {"name": tc.function.name, "arguments": tc.function.arguments}
                    }
                    for tc in choice.message.tool_calls
                ]
            })
            # Execute tools and append results
            for tc in choice.message.tool_calls:
                args = json.loads(tc.function.arguments)
                if tc.function.name == "get_stock_price":
                    result = {"symbol": args["symbol"], "price": 150.23, "currency": "USD"}
                else:
                    result = {"error": "unknown function"}

                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": json.dumps(result)
                })
        else:
            break

    return "No response"

print(tool_loop("What's Apple's stock price?"))

14.5 LangChain Integration

from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

llm = ChatOpenAI(
    model="claude-sonnet-4-6",
    openai_api_key="your-anthropic-api-key",
    openai_api_base="https://api.anthropic.com/v1/",
    max_tokens=1024
)

messages = [
    SystemMessage(content="You are a professional code reviewer."),
    HumanMessage(content="Review this Python code:\n\ndef divide(a, b):\n    return a / b")
]
response = llm.invoke(messages)
print(response.content)

LangChain LCEL (modern chain syntax)

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

llm = ChatOpenAI(
    model="claude-haiku-4-5-20251001",
    openai_api_key="your-anthropic-api-key",
    openai_api_base="https://api.anthropic.com/v1/"
)

prompt = ChatPromptTemplate.from_template("Explain in one sentence: {concept}")
chain = prompt | llm | StrOutputParser()

concepts = ["quantum computing", "blockchain", "federated learning"]
results = chain.batch([{"concept": c} for c in concepts])
for concept, result in zip(concepts, results):
    print(f"{concept}: {result}")

14.6 LiteLLM for Multi-Provider Routing

LiteLLM provides a single interface for calling OpenAI, Claude, Gemini, and other providers:

import litellm

# Method 1: Anthropic prefix
response = litellm.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

# Method 2: Via compatibility endpoint
response = litellm.completion(
    model="openai/claude-sonnet-4-6",
    api_base="https://api.anthropic.com/v1/",
    api_key="your-anthropic-api-key",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Method 3: Load-balanced fallback routing
from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "smart",
            "litellm_params": {
                "model": "anthropic/claude-sonnet-4-6",
                "api_key": "your-anthropic-api-key"
            }
        },
        {
            "model_name": "smart",   # Fallback for the same alias
            "litellm_params": {
                "model": "gpt-4o-mini",
                "api_key": "your-openai-api-key"
            }
        }
    ],
    routing_strategy="latency-based-routing"
)

response = router.completion(
    model="smart",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

14.7 Critical Differences and Gotchas

Parameter behavior differences

Parameter	OpenAI behavior	Claude compatibility behavior
`temperature`	0–2	Recommended 0–1; Claude default is 1.0
`n`	Generate N completions	Not supported — always returns 1
`presence_penalty`	Supported	Ignored — no effect
`frequency_penalty`	Supported	Ignored — no effect
`logprobs`	Supported	Not supported
`max_tokens`	Optional	Required in Claude
`stop`	Supported	Supported
`stream`	Supported	Supported

The `max_tokens` requirement — most common gotcha

# This works with OpenAI but FAILS with Claude
try:
    response = client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=[{"role": "user", "content": "Hello"}]
        # Missing max_tokens!
    )
except Exception as e:
    print(f"Error: {e}")  # max_tokens is required for Claude

# Always include max_tokens
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,   # Required
    messages=[{"role": "user", "content": "Hello"}]
)

System message handling

# Multiple system messages: only the first is reliably honored
messages = [
    {"role": "system", "content": "You are a Python expert."},   # Used
    {"role": "user", "content": "Explain decorators."},
    {"role": "system", "content": "Be brief."},   # May be ignored
]

# Recommended: single system message at position 0
messages_correct = [
    {"role": "system", "content": "You are a Python expert. Be brief."},
    {"role": "user", "content": "Explain decorators."}
]

Features only available via native Anthropic SDK

import anthropic
client = anthropic.Anthropic()

# 1. Extended Thinking — no OpenAI equivalent
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Solve a hard math problem."}]
)

# 2. Prompt Caching control
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Very long system prompt...",
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "Question"}]
)

# 3. Message Batches API
batch = client.messages.batches.create(requests=[...])

# 4. PDF support via betas
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=4096,
    betas=["pdfs-2024-09-25"],
    messages=[...]
)

14.8 Migration Walkthrough

Simple migration: two-line change

# BEFORE (OpenAI)
import openai
client = openai.OpenAI(api_key="sk-openai-key")
response = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

# AFTER (Claude via compatibility endpoint)
import openai
client = openai.OpenAI(
    base_url="https://api.anthropic.com/v1/",   # Changed
    api_key="your-anthropic-api-key"             # Changed
)
response = client.chat.completions.create(
    model="claude-sonnet-4-6",                   # Changed model name
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

Provider-agnostic abstraction layer

import os, openai

def build_client(provider: str = None) -> tuple[openai.OpenAI, str]:
    """Return (client, default_model) for the specified provider."""
    provider = provider or os.environ.get("LLM_PROVIDER", "anthropic")

    if provider == "anthropic":
        return (
            openai.OpenAI(
                base_url="https://api.anthropic.com/v1/",
                api_key=os.environ["ANTHROPIC_API_KEY"]
            ),
            "claude-sonnet-4-6"
        )
    elif provider == "openai":
        return (
            openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"]),
            "gpt-4o"
        )
    else:
        raise ValueError(f"Unknown provider: {provider}")

# Application code stays unchanged as you swap providers
client, model = build_client()

response = client.chat.completions.create(
    model=model,
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain RAG."}]
)
print(response.choices[0].message.content)

14.9 Debugging Compatibility Issues

def diagnose_request(messages: list, **kwargs) -> dict:
    """Detect common compatibility issues before sending the request."""
    warnings = []

    if "max_tokens" not in kwargs:
        warnings.append("max_tokens is missing — required for Claude")
        kwargs["max_tokens"] = 1024  # Auto-fix

    if kwargs.get("n", 1) > 1:
        warnings.append(f"n={kwargs['n']} is not supported; only n=1 works")

    if kwargs.get("temperature", 1.0) > 1.0:
        warnings.append(f"temperature={kwargs['temperature']} exceeds Claude's recommended range [0, 1]")

    for p in ("logprobs", "presence_penalty", "frequency_penalty"):
        if p in kwargs and kwargs[p] not in (None, 0, False):
            warnings.append(f"{p} is ignored by Claude")

    for w in warnings:
        print(f"WARNING: {w}")

    try:
        client = openai.OpenAI(
            base_url="https://api.anthropic.com/v1/",
            api_key=os.environ["ANTHROPIC_API_KEY"]
        )
        response = client.chat.completions.create(messages=messages, **kwargs)
        return {"ok": True, "content": response.choices[0].message.content}
    except openai.BadRequestError as e:
        return {"ok": False, "error": str(e)}

Summary

The OpenAI-compatible endpoint dramatically lowers the barrier to adopting Claude in OpenAI-based codebases. Key takeaways:

Change base_url to https://api.anthropic.com/v1/ and swap the API key — most code just works
Tool calling and streaming are fully supported through the compatibility layer
max_tokens is required for Claude — the single most common migration gotcha
n > 1, logprobs, presence_penalty, and frequency_penalty are unsupported or ignored
Extended Thinking, Prompt Caching, Batch API, and beta features require the native Anthropic SDK
Use LiteLLM for multi-provider routing with automatic fallback between Claude and OpenAI models

Rate this chapter

4.6 / 5 (30 ratings)