Chapter 14

Prefill Deprecation Migration + Effort Parameter: Two Must-Know Changes When Upgrading to Claude 4.x

Chapter 14: OpenAI-Compatible Endpoint: Migration Guide and Handling Differences

14.1 Why an OpenAI-Compatible Endpoint?

Anthropic provides an endpoint that closely mirrors the OpenAI Chat Completions API. Developers can call Claude models using the openai Python library, LangChain, LiteLLM, and other OpenAI-dependent tooling by changing only two lines: base_url and api_key.

Primary value propositions:

  1. Zero-code migration โ€” existing OpenAI projects reach Claude with a two-line config change
  2. Ecosystem compatibility โ€” tools built on the OpenAI SDK (LangChain, CrewAI, agent frameworks) work out of the box
  3. A/B testing โ€” switch providers within a single codebase to compare cost and quality
  4. Unified routing layer โ€” when using LiteLLM or similar tools, Claude becomes a drop-in backend

Important caveat: The OpenAI-compatible endpoint is a least-common-denominator interface. It does not expose Claude-exclusive features such as Extended Thinking, Prompt Caching control, or beta parameters like PDF support. For those capabilities, use the native Anthropic SDK.

14.2 Basic Configuration

Using the OpenAI Python library to call Claude

import openai

client = openai.OpenAI(
    base_url="https://api.anthropic.com/v1/",
    api_key="your-anthropic-api-key"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "system", "content": "You are a professional technical advisor."},
        {"role": "user", "content": "Explain microservices architecture."}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    stream=True,
    messages=[{"role": "user", "content": "Write a poem about software."}]
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Environment variable configuration

# Set these environment variables to avoid hardcoding credentials
export OPENAI_BASE_URL=https://api.anthropic.com/v1/
export OPENAI_API_KEY=your-anthropic-api-key
import openai

# With the env vars set, the default client routes to Claude automatically
client = openai.OpenAI()

14.3 Model Name Mapping

When using the compatibility endpoint, use Anthropic model IDs, not OpenAI model names:

OpenAI model Claude equivalent Notes
gpt-4o claude-sonnet-4-6 Best balance of capability and speed
gpt-4o-mini claude-haiku-4-5-20251001 Fast and cost-efficient
o1 claude-opus-4-6 Strongest reasoning
gpt-3.5-turbo claude-haiku-4-5-20251001 Low latency
MODEL_MAP = {
    "fast": "claude-haiku-4-5-20251001",
    "standard": "claude-sonnet-4-6",
    "powerful": "claude-opus-4-6"
}

def create_completion(tier: str, messages: list) -> str:
    response = client.chat.completions.create(
        model=MODEL_MAP[tier],
        max_tokens=1024,
        messages=messages
    )
    return response.choices[0].message.content

14.4 Tool Calling (Function Calling) Compatibility

Claude's tool calling is exposed through the compatibility endpoint using the same schema as OpenAI's function calling:

import json, openai

client = openai.OpenAI(
    base_url="https://api.anthropic.com/v1/",
    api_key="your-anthropic-api-key"
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get the current price of a stock.",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {"type": "string", "description": "Ticker symbol, e.g. AAPL"},
                "currency": {"type": "string", "enum": ["USD", "EUR"]}
            },
            "required": ["symbol"]
        }
    }
}]

def tool_loop(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.chat.completions.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )
        choice = response.choices[0]

        if choice.finish_reason == "stop":
            return choice.message.content

        if choice.finish_reason == "tool_calls":
            # Append assistant message with tool calls
            messages.append({
                "role": "assistant",
                "content": choice.message.content,
                "tool_calls": [
                    {
                        "id": tc.id,
                        "type": "function",
                        "function": {"name": tc.function.name, "arguments": tc.function.arguments}
                    }
                    for tc in choice.message.tool_calls
                ]
            })
            # Execute tools and append results
            for tc in choice.message.tool_calls:
                args = json.loads(tc.function.arguments)
                if tc.function.name == "get_stock_price":
                    result = {"symbol": args["symbol"], "price": 150.23, "currency": "USD"}
                else:
                    result = {"error": "unknown function"}

                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": json.dumps(result)
                })
        else:
            break

    return "No response"

print(tool_loop("What's Apple's stock price?"))

14.5 LangChain Integration

from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

llm = ChatOpenAI(
    model="claude-sonnet-4-6",
    openai_api_key="your-anthropic-api-key",
    openai_api_base="https://api.anthropic.com/v1/",
    max_tokens=1024
)

messages = [
    SystemMessage(content="You are a professional code reviewer."),
    HumanMessage(content="Review this Python code:\n\ndef divide(a, b):\n    return a / b")
]
response = llm.invoke(messages)
print(response.content)

LangChain LCEL (modern chain syntax)

from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

llm = ChatOpenAI(
    model="claude-haiku-4-5-20251001",
    openai_api_key="your-anthropic-api-key",
    openai_api_base="https://api.anthropic.com/v1/"
)

prompt = ChatPromptTemplate.from_template("Explain in one sentence: {concept}")
chain = prompt | llm | StrOutputParser()

concepts = ["quantum computing", "blockchain", "federated learning"]
results = chain.batch([{"concept": c} for c in concepts])
for concept, result in zip(concepts, results):
    print(f"{concept}: {result}")

14.6 LiteLLM for Multi-Provider Routing

LiteLLM provides a single interface for calling OpenAI, Claude, Gemini, and other providers:

import litellm

# Method 1: Anthropic prefix
response = litellm.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

# Method 2: Via compatibility endpoint
response = litellm.completion(
    model="openai/claude-sonnet-4-6",
    api_base="https://api.anthropic.com/v1/",
    api_key="your-anthropic-api-key",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Method 3: Load-balanced fallback routing
from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "smart",
            "litellm_params": {
                "model": "anthropic/claude-sonnet-4-6",
                "api_key": "your-anthropic-api-key"
            }
        },
        {
            "model_name": "smart",   # Fallback for the same alias
            "litellm_params": {
                "model": "gpt-4o-mini",
                "api_key": "your-openai-api-key"
            }
        }
    ],
    routing_strategy="latency-based-routing"
)

response = router.completion(
    model="smart",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

14.7 Critical Differences and Gotchas

Parameter behavior differences

Parameter OpenAI behavior Claude compatibility behavior
temperature 0โ€“2 Recommended 0โ€“1; Claude default is 1.0
n Generate N completions Not supported โ€” always returns 1
presence_penalty Supported Ignored โ€” no effect
frequency_penalty Supported Ignored โ€” no effect
logprobs Supported Not supported
max_tokens Optional Required in Claude
stop Supported Supported
stream Supported Supported

The max_tokens requirement โ€” most common gotcha

# This works with OpenAI but FAILS with Claude
try:
    response = client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=[{"role": "user", "content": "Hello"}]
        # Missing max_tokens!
    )
except Exception as e:
    print(f"Error: {e}")  # max_tokens is required for Claude

# Always include max_tokens
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,   # Required
    messages=[{"role": "user", "content": "Hello"}]
)

System message handling

# Multiple system messages: only the first is reliably honored
messages = [
    {"role": "system", "content": "You are a Python expert."},   # Used
    {"role": "user", "content": "Explain decorators."},
    {"role": "system", "content": "Be brief."},   # May be ignored
]

# Recommended: single system message at position 0
messages_correct = [
    {"role": "system", "content": "You are a Python expert. Be brief."},
    {"role": "user", "content": "Explain decorators."}
]

Features only available via native Anthropic SDK

import anthropic
client = anthropic.Anthropic()

# 1. Extended Thinking โ€” no OpenAI equivalent
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Solve a hard math problem."}]
)

# 2. Prompt Caching control
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Very long system prompt...",
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "Question"}]
)

# 3. Message Batches API
batch = client.messages.batches.create(requests=[...])

# 4. PDF support via betas
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=4096,
    betas=["pdfs-2024-09-25"],
    messages=[...]
)

14.8 Migration Walkthrough

Simple migration: two-line change

# BEFORE (OpenAI)
import openai
client = openai.OpenAI(api_key="sk-openai-key")
response = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

# AFTER (Claude via compatibility endpoint)
import openai
client = openai.OpenAI(
    base_url="https://api.anthropic.com/v1/",   # Changed
    api_key="your-anthropic-api-key"             # Changed
)
response = client.chat.completions.create(
    model="claude-sonnet-4-6",                   # Changed model name
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

Provider-agnostic abstraction layer

import os, openai

def build_client(provider: str = None) -> tuple[openai.OpenAI, str]:
    """Return (client, default_model) for the specified provider."""
    provider = provider or os.environ.get("LLM_PROVIDER", "anthropic")

    if provider == "anthropic":
        return (
            openai.OpenAI(
                base_url="https://api.anthropic.com/v1/",
                api_key=os.environ["ANTHROPIC_API_KEY"]
            ),
            "claude-sonnet-4-6"
        )
    elif provider == "openai":
        return (
            openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"]),
            "gpt-4o"
        )
    else:
        raise ValueError(f"Unknown provider: {provider}")

# Application code stays unchanged as you swap providers
client, model = build_client()

response = client.chat.completions.create(
    model=model,
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain RAG."}]
)
print(response.choices[0].message.content)

14.9 Debugging Compatibility Issues

def diagnose_request(messages: list, **kwargs) -> dict:
    """Detect common compatibility issues before sending the request."""
    warnings = []

    if "max_tokens" not in kwargs:
        warnings.append("max_tokens is missing โ€” required for Claude")
        kwargs["max_tokens"] = 1024  # Auto-fix

    if kwargs.get("n", 1) > 1:
        warnings.append(f"n={kwargs['n']} is not supported; only n=1 works")

    if kwargs.get("temperature", 1.0) > 1.0:
        warnings.append(f"temperature={kwargs['temperature']} exceeds Claude's recommended range [0, 1]")

    for p in ("logprobs", "presence_penalty", "frequency_penalty"):
        if p in kwargs and kwargs[p] not in (None, 0, False):
            warnings.append(f"{p} is ignored by Claude")

    for w in warnings:
        print(f"WARNING: {w}")

    try:
        client = openai.OpenAI(
            base_url="https://api.anthropic.com/v1/",
            api_key=os.environ["ANTHROPIC_API_KEY"]
        )
        response = client.chat.completions.create(messages=messages, **kwargs)
        return {"ok": True, "content": response.choices[0].message.content}
    except openai.BadRequestError as e:
        return {"ok": False, "error": str(e)}

Summary

The OpenAI-compatible endpoint dramatically lowers the barrier to adopting Claude in OpenAI-based codebases. Key takeaways:

  1. Change base_url to https://api.anthropic.com/v1/ and swap the API key โ€” most code just works
  2. Tool calling and streaming are fully supported through the compatibility layer
  3. max_tokens is required for Claude โ€” the single most common migration gotcha
  4. n > 1, logprobs, presence_penalty, and frequency_penalty are unsupported or ignored
  5. Extended Thinking, Prompt Caching, Batch API, and beta features require the native Anthropic SDK
  6. Use LiteLLM for multi-provider routing with automatic fallback between Claude and OpenAI models
Rate this chapter
4.6  / 5  (30 ratings)

๐Ÿ’ฌ Comments