OpenAI-Compatible Endpoint and Zero-Change Migration: Complete Guide from GPT-4 to Claude
Chapter 6: Response Format Control: JSON Mode, XML Tags, and Structured Output
6.1 Why Format Control Is a Production Requirement
In most production systems, Claude's output is not presented directly to usersโit is parsed, validated, stored, and passed downstream. If the format is unreliable, the pipeline breaks at runtime. Format failures are particularly insidious because they are intermittent: a model that follows a format 95% of the time will fail silently, generating hard-to-debug errors in production.
The core challenge: language models are probabilistic. They generate the statistically most likely next token given the context, not a string that is guaranteed to satisfy a JSON schema. Even with explicit format instructions, a model may occasionally:
- Prepend or append explanatory text around the JSON
- Use single quotes instead of double quotes
- Return a string where a number is expected
- Omit optional fields or add unexpected keys
This chapter covers three main approaches, roughly ordered by reliability: prompt-level instructions, XML tags, and Tool Use (function calling) for enforced schema adherence.
6.2 Prompt-Level JSON Control
The Basic Approach
The simplest method is explicit instruction:
import json
import re
import anthropic
client = anthropic.Anthropic()
def extract_product_info(description: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="""You are a product information extractor.
Extract structured data from product descriptions.
Return ONLY valid JSON with no surrounding text, no code fences, no explanation.
JSON structure:
{
"name": "product name",
"category": "category string",
"price": number or null,
"features": ["feature1", "feature2"],
"target_audience": "description of target user"
}""",
messages=[{"role": "user", "content": description}]
)
raw = response.content[0].text.strip()
try:
return json.loads(raw)
except json.JSONDecodeError:
return _repair_json(raw)
def _repair_json(text: str) -> dict:
# Remove code fences
text = re.sub(r'^```(?:json)?\s*\n?', '', text, flags=re.MULTILINE)
text = re.sub(r'\n?```\s*$', '', text, flags=re.MULTILINE).strip()
try:
return json.loads(text)
except json.JSONDecodeError:
# Extract the first JSON object or array
m = re.search(r'(\{[\s\S]*\}|\[[\s\S]*\])', text)
if m:
return json.loads(m.group(1))
raise ValueError(f"Could not parse model output as JSON: {text[:200]}")
More Reliable: Assistant Prefill
The most underused technique for JSON reliability is assistant prefill: supply the beginning of Claude's response in the messages array. Since the model must continue from where you left off, starting with { makes it virtually impossible for the model to prepend explanatory text.
def extract_with_prefill(text: str) -> dict:
"""
Force JSON output by prefilling the assistant's turn with '{'
"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="Extract key information from text. Return JSON only.",
messages=[
{"role": "user", "content": f"Extract key information from:\n\n{text}"},
{"role": "assistant", "content": "{"}, # <-- prefill
]
)
# The model continues from '{', so prepend it back
json_str = "{" + response.content[0].text
return json.loads(json_str)
This works because Claude generates completions, not rewrites. Starting with { puts it in a JSON-completion context from the first token.
Providing a JSON Schema
For complex schemas, including the schema itself in the prompt substantially reduces structural errors:
INVOICE_SCHEMA = {
"type": "object",
"required": ["invoice_number", "date", "total", "line_items"],
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string", "format": "date"},
"vendor": {"type": "string"},
"total": {"type": "number"},
"currency": {"type": "string", "enum": ["USD", "EUR", "CNY", "JPY"]},
"line_items": {
"type": "array",
"items": {
"type": "object",
"required": ["description", "quantity", "unit_price"],
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"unit_price": {"type": "number"},
"total": {"type": "number"},
}
}
}
}
}
def extract_invoice(invoice_text: str) -> dict:
schema_str = json.dumps(INVOICE_SCHEMA, indent=2)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": f"""Extract invoice data.
Output ONLY valid JSON matching the schema below. No explanation.
Schema:
{schema_str}
Invoice text:
{invoice_text}"""}]
)
return json.loads(response.content[0].text.strip())
6.3 XML Tags: Organizing Complex Multi-Part Output
XML tags are Anthropic's own recommended way to separate distinct content types within a single response. They shine when the output combines things that don't fit into a flat JSON structureโcode mixed with explanation, reasoning alongside a final answer, or multiple independently parseable sections.
Basic XML Tag Pattern
from xml.etree import ElementTree as ET
def analyze_code_xml(code: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system="""You are a code review expert. Respond in exactly this XML format:
<analysis>
<severity>critical|high|medium|low|none</severity>
<issues>
<issue>
<type>bug|performance|security|style</type>
<line>line number or N/A</line>
<description>description of the issue</description>
</issue>
</issues>
<fixed_code>corrected code here</fixed_code>
<explanation>explanation of changes</explanation>
</analysis>""",
messages=[{"role": "user", "content": f"Review this code:\n\n```python\n{code}\n```"}]
)
return _parse_analysis_xml(response.content[0].text)
def _parse_analysis_xml(text: str) -> dict:
m = re.search(r'<analysis>(.*?)</analysis>', text, re.DOTALL)
if not m:
raise ValueError(f"No <analysis> tag in response: {text[:200]}")
try:
root = ET.fromstring(f"<analysis>{m.group(1)}</analysis>")
return {
"severity": root.findtext("severity"),
"issues": [
{
"type": issue.findtext("type"),
"line": issue.findtext("line"),
"description": issue.findtext("description"),
}
for issue in root.findall(".//issue")
],
"fixed_code": root.findtext("fixed_code"),
"explanation": root.findtext("explanation"),
}
except ET.ParseError as e:
raise ValueError(f"XML parse error: {e}") from e
Lightweight Regex Extraction
For simpler cases where you only need a few specific tags, regex extraction is faster and less fragile than full XML parsing:
def extract_tags(text: str, *tag_names: str) -> dict[str, str | None]:
"""
Extract content from named XML tags. Returns None for missing tags.
Handles multi-line content.
"""
return {
tag: (m.group(1).strip() if (m := re.search(
rf'<{tag}>(.*?)</{tag}>', text, re.DOTALL
)) else None)
for tag in tag_names
}
# Usage
result = extract_tags(response_text, "answer", "confidence", "reasoning")
print(result["answer"]) # direct answer
print(result["confidence"]) # confidence score
print(result["reasoning"]) # step-by-step reasoning
Chain-of-Thought with XML
def solve_with_cot(problem: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{"role": "user", "content": f"""Solve the following problem.
Show your reasoning inside <thinking> tags.
Give your final answer inside <answer> tags.
<problem>
{problem}
</problem>"""}]
)
return extract_tags(response.content[0].text, "thinking", "answer")
6.4 Tool Use: The Most Reliable Structured Output
Tool use (function calling) is the most reliable mechanism for structured output. The principle: you define a function with a JSON schema for its parameters, and the model must produce valid JSON matching that schema whenever it "calls" the function. This bypasses natural language generation uncertainty entirely.
Using Tool Use for Pure Extraction (No Real Function Call)
You don't need to actually execute the tool. Define it, tell the model to use it, and treat the tool call arguments as your structured output:
def extract_entities(text: str) -> dict:
"""
Use tool_choice to force schema-compliant JSON output.
The tool is never actually executedโit's just a schema enforcement mechanism.
"""
tools = [
{
"name": "save_entities",
"description": "Save extracted entities from the text",
"input_schema": {
"type": "object",
"required": ["persons", "organizations", "locations", "dates"],
"properties": {
"persons": {
"type": "array",
"items": {
"type": "object",
"required": ["name", "role"],
"properties": {
"name": {"type": "string"},
"role": {"type": "string"},
}
}
},
"organizations": {
"type": "array",
"items": {"type": "string"}
},
"locations": {
"type": "array",
"items": {"type": "string"}
},
"dates": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
tool_choice={"type": "tool", "name": "save_entities"}, # force this specific tool
messages=[{"role": "user", "content": f"Extract all entities from:\n\n{text}"}]
)
for block in response.content:
if block.type == "tool_use" and block.name == "save_entities":
return block.input # already a dict, no parsing needed
raise RuntimeError("Model did not generate a tool call")
Multiple Tools: Let the Model Choose Output Type
def classify_or_extract(query: str) -> dict:
"""
Define two tools; let the model choose which schema applies.
"""
tools = [
{
"name": "classify_sentiment",
"description": "Use when the input asks for sentiment analysis",
"input_schema": {
"type": "object",
"required": ["sentiment", "score", "explanation"],
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral", "mixed"]
},
"score": {
"type": "number",
"minimum": -1.0,
"maximum": 1.0
},
"explanation": {"type": "string"}
}
}
},
{
"name": "extract_facts",
"description": "Use when the input contains factual content to extract",
"input_schema": {
"type": "object",
"required": ["facts", "summary"],
"properties": {
"facts": {
"type": "array",
"items": {
"type": "object",
"properties": {
"statement": {"type": "string"},
"confidence": {"type": "number"}
}
}
},
"summary": {"type": "string"}
}
}
}
]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
tool_choice={"type": "auto"},
messages=[{"role": "user", "content": query}]
)
for block in response.content:
if block.type == "tool_use":
return {"tool": block.name, "data": block.input}
# Model chose not to call a tool
text = next((b.text for b in response.content if b.type == "text"), "")
return {"tool": None, "text": text}
TypeScript Tool Use Example
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
interface ExtractedData {
title: string;
author: string | null;
publication_date: string | null;
key_points: string[];
}
async function extractDocumentMetadata(text: string): Promise<ExtractedData> {
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
tools: [
{
name: "save_metadata",
description: "Save document metadata",
input_schema: {
type: "object" as const,
required: ["title", "key_points"],
properties: {
title: { type: "string" },
author: { type: "string" },
publication_date: { type: "string" },
key_points: {
type: "array",
items: { type: "string" },
},
},
},
},
],
tool_choice: { type: "tool", name: "save_metadata" },
messages: [{ role: "user", content: `Extract metadata from:\n\n${text}` }],
});
const toolUse = response.content.find((b) => b.type === "tool_use");
if (!toolUse || toolUse.type !== "tool_use") {
throw new Error("No tool call in response");
}
return toolUse.input as ExtractedData;
}
6.5 Robust Parser Design
Regardless of which approach you choose, production parsers need graceful degradation.
Tolerant JSON Parser
from typing import Any
def robust_json_parse(text: str) -> Any:
"""
Multi-strategy JSON parser that handles common model output issues.
"""
# Strategy 1: direct parse
try:
return json.loads(text)
except json.JSONDecodeError:
pass
# Strategy 2: strip code fences
cleaned = re.sub(r'^```(?:json)?\s*\n?', '', text.strip(), flags=re.MULTILINE)
cleaned = re.sub(r'\n?```\s*$', '', cleaned, flags=re.MULTILINE).strip()
try:
return json.loads(cleaned)
except json.JSONDecodeError:
pass
# Strategy 3: extract first JSON object or array
m = re.search(r'(\{[\s\S]*\}|\[[\s\S]*\])', cleaned)
if m:
try:
return json.loads(m.group(1))
except json.JSONDecodeError:
pass
# Strategy 4: json5 for lenient parsing (single quotes, trailing commas, etc.)
try:
import json5 # pip install json5
return json5.loads(cleaned)
except (ImportError, Exception):
pass
raise ValueError(f"Cannot parse as JSON: {text[:300]}")
Format Repair via LLM
When parsing fails entirely, ask Claude to fix the JSONโusing a cheap Haiku call:
def parse_or_repair(raw: str, schema: dict) -> dict:
try:
return robust_json_parse(raw)
except ValueError:
repair_resp = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=2048,
messages=[{"role": "user", "content": f"""The following text should be JSON
but failed to parse. Fix it and return only valid JSON that matches the schema.
Schema: {json.dumps(schema)}
Text to fix: {raw}"""}]
)
return json.loads(repair_resp.content[0].text.strip())
6.6 Format Consistency Testing
Track format reliability in CI and production:
import jsonschema
from dataclasses import dataclass
from typing import Callable
@dataclass
class FormatTestReport:
total: int
parse_errors: int
schema_errors: int
success_rate: float
error_samples: list[dict]
def test_format_consistency(
call_fn: Callable[[str], str],
test_inputs: list[str],
schema: dict,
sample_limit: int = 3,
) -> FormatTestReport:
parse_errors, schema_errors = 0, 0
error_samples: list[dict] = []
for inp in test_inputs:
raw = call_fn(inp)
try:
parsed = robust_json_parse(raw)
jsonschema.validate(parsed, schema)
except (ValueError, json.JSONDecodeError) as e:
parse_errors += 1
if len(error_samples) < sample_limit:
error_samples.append({"input": inp[:80], "output": raw[:150],
"error": str(e), "type": "parse"})
except jsonschema.ValidationError as e:
schema_errors += 1
if len(error_samples) < sample_limit:
error_samples.append({"input": inp[:80], "output": raw[:150],
"error": e.message, "type": "schema"})
total = len(test_inputs)
total_errors = parse_errors + schema_errors
return FormatTestReport(
total=total,
parse_errors=parse_errors,
schema_errors=schema_errors,
success_rate=(total - total_errors) / total if total else 0.0,
error_samples=error_samples,
)
6.7 Choosing the Right Approach
Comparison summary:
Prompt XML Tags Tool Use
Instructions
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโ โโโโโโโโ
Implementation Low Low Medium
Format reliability ~90% ~95% ~99%+
Best for Simple JSON Multi-part Critical pipelines
Data type support JSON/text Any text Strongly typed JSON
Parse overhead Need parser Regex/XML SDK returns dict
Token overhead Lowest Low Higher (tool defs)
Decision guide:
- Prototyping / simple extraction: prompt instructions + tolerant parser
- Mixed text and code output: XML tags
- Production critical path / zero-tolerance for format errors: Tool Use
- High-volume / cost-sensitive batch processing: optimized prompt instructions + repair fallback
Summary
Reliable structured output is essential for integrating Claude into production pipelines. The three-level approach:
- Prompt instructions + assistant prefill: simplest, ~90% reliable; use a robust parser with fallback repair for the remaining 10%
- XML tags: natural for multi-section responses; regex extraction is fast and readable; best when output mixes code, text, and structured data
- Tool Use with
tool_choice: enforces JSON schema at the API level, achieving near-100% format compliance; the right choice for any critical-path extraction task
Regardless of the approach, always build a tolerant parser, test format consistency on a representative sample, and have a repair strategy for the cases where the model's output doesn't parse cleanly.
The next chapter dives into the Messages API in full detailโevery parameter, its semantics, version compatibility, and the gotchas that trip up developers in production.