Chapter 6

OpenAI-Compatible Endpoint and Zero-Change Migration: Complete Guide from GPT-4 to Claude

Chapter 6: Response Format Control: JSON Mode, XML Tags, and Structured Output

6.1 Why Format Control Is a Production Requirement

In most production systems, Claude's output is not presented directly to users—it is parsed, validated, stored, and passed downstream. If the format is unreliable, the pipeline breaks at runtime. Format failures are particularly insidious because they are intermittent: a model that follows a format 95% of the time will fail silently, generating hard-to-debug errors in production.

The core challenge: language models are probabilistic. They generate the statistically most likely next token given the context, not a string that is guaranteed to satisfy a JSON schema. Even with explicit format instructions, a model may occasionally:

Prepend or append explanatory text around the JSON
Use single quotes instead of double quotes
Return a string where a number is expected
Omit optional fields or add unexpected keys

This chapter covers three main approaches, roughly ordered by reliability: prompt-level instructions, XML tags, and Tool Use (function calling) for enforced schema adherence.

6.2 Prompt-Level JSON Control

The Basic Approach

The simplest method is explicit instruction:

import json
import re
import anthropic

client = anthropic.Anthropic()

def extract_product_info(description: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="""You are a product information extractor.
Extract structured data from product descriptions.

Return ONLY valid JSON with no surrounding text, no code fences, no explanation.
JSON structure:
{
  "name": "product name",
  "category": "category string",
  "price": number or null,
  "features": ["feature1", "feature2"],
  "target_audience": "description of target user"
}""",
        messages=[{"role": "user", "content": description}]
    )

    raw = response.content[0].text.strip()
    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        return _repair_json(raw)

def _repair_json(text: str) -> dict:
    # Remove code fences
    text = re.sub(r'^```(?:json)?\s*\n?', '', text, flags=re.MULTILINE)
    text = re.sub(r'\n?```\s*$', '', text, flags=re.MULTILINE).strip()
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        # Extract the first JSON object or array
        m = re.search(r'(\{[\s\S]*\}|\[[\s\S]*\])', text)
        if m:
            return json.loads(m.group(1))
        raise ValueError(f"Could not parse model output as JSON: {text[:200]}")

More Reliable: Assistant Prefill

The most underused technique for JSON reliability is assistant prefill: supply the beginning of Claude's response in the messages array. Since the model must continue from where you left off, starting with { makes it virtually impossible for the model to prepend explanatory text.

def extract_with_prefill(text: str) -> dict:
    """
    Force JSON output by prefilling the assistant's turn with '{'
    """
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="Extract key information from text. Return JSON only.",
        messages=[
            {"role": "user", "content": f"Extract key information from:\n\n{text}"},
            {"role": "assistant", "content": "{"},  # <-- prefill
        ]
    )

    # The model continues from '{', so prepend it back
    json_str = "{" + response.content[0].text
    return json.loads(json_str)

This works because Claude generates completions, not rewrites. Starting with { puts it in a JSON-completion context from the first token.

Providing a JSON Schema

For complex schemas, including the schema itself in the prompt substantially reduces structural errors:

INVOICE_SCHEMA = {
    "type": "object",
    "required": ["invoice_number", "date", "total", "line_items"],
    "properties": {
        "invoice_number": {"type": "string"},
        "date": {"type": "string", "format": "date"},
        "vendor": {"type": "string"},
        "total": {"type": "number"},
        "currency": {"type": "string", "enum": ["USD", "EUR", "CNY", "JPY"]},
        "line_items": {
            "type": "array",
            "items": {
                "type": "object",
                "required": ["description", "quantity", "unit_price"],
                "properties": {
                    "description": {"type": "string"},
                    "quantity": {"type": "number"},
                    "unit_price": {"type": "number"},
                    "total": {"type": "number"},
                }
            }
        }
    }
}

def extract_invoice(invoice_text: str) -> dict:
    schema_str = json.dumps(INVOICE_SCHEMA, indent=2)
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{"role": "user", "content": f"""Extract invoice data.
Output ONLY valid JSON matching the schema below. No explanation.

Schema:
{schema_str}

Invoice text:
{invoice_text}"""}]
    )
    return json.loads(response.content[0].text.strip())

6.3 XML Tags: Organizing Complex Multi-Part Output

XML tags are Anthropic's own recommended way to separate distinct content types within a single response. They shine when the output combines things that don't fit into a flat JSON structure—code mixed with explanation, reasoning alongside a final answer, or multiple independently parseable sections.

Basic XML Tag Pattern

from xml.etree import ElementTree as ET

def analyze_code_xml(code: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system="""You are a code review expert. Respond in exactly this XML format:

<analysis>
<severity>critical|high|medium|low|none</severity>
<issues>
  <issue>
    <type>bug|performance|security|style</type>
    <line>line number or N/A</line>
    <description>description of the issue</description>
  </issue>
</issues>
<fixed_code>corrected code here</fixed_code>
<explanation>explanation of changes</explanation>
</analysis>""",
        messages=[{"role": "user", "content": f"Review this code:\n\n```python\n{code}\n```"}]
    )

    return _parse_analysis_xml(response.content[0].text)

def _parse_analysis_xml(text: str) -> dict:
    m = re.search(r'<analysis>(.*?)</analysis>', text, re.DOTALL)
    if not m:
        raise ValueError(f"No <analysis> tag in response: {text[:200]}")

    try:
        root = ET.fromstring(f"<analysis>{m.group(1)}</analysis>")
        return {
            "severity": root.findtext("severity"),
            "issues": [
                {
                    "type": issue.findtext("type"),
                    "line": issue.findtext("line"),
                    "description": issue.findtext("description"),
                }
                for issue in root.findall(".//issue")
            ],
            "fixed_code": root.findtext("fixed_code"),
            "explanation": root.findtext("explanation"),
        }
    except ET.ParseError as e:
        raise ValueError(f"XML parse error: {e}") from e

Lightweight Regex Extraction

For simpler cases where you only need a few specific tags, regex extraction is faster and less fragile than full XML parsing:

def extract_tags(text: str, *tag_names: str) -> dict[str, str | None]:
    """
    Extract content from named XML tags. Returns None for missing tags.
    Handles multi-line content.
    """
    return {
        tag: (m.group(1).strip() if (m := re.search(
            rf'<{tag}>(.*?)</{tag}>', text, re.DOTALL
        )) else None)
        for tag in tag_names
    }

# Usage
result = extract_tags(response_text, "answer", "confidence", "reasoning")
print(result["answer"])      # direct answer
print(result["confidence"])  # confidence score
print(result["reasoning"])   # step-by-step reasoning

Chain-of-Thought with XML

def solve_with_cot(problem: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{"role": "user", "content": f"""Solve the following problem.
Show your reasoning inside <thinking> tags.
Give your final answer inside <answer> tags.

<problem>
{problem}
</problem>"""}]
    )

    return extract_tags(response.content[0].text, "thinking", "answer")

6.4 Tool Use: The Most Reliable Structured Output

Tool use (function calling) is the most reliable mechanism for structured output. The principle: you define a function with a JSON schema for its parameters, and the model must produce valid JSON matching that schema whenever it "calls" the function. This bypasses natural language generation uncertainty entirely.

Using Tool Use for Pure Extraction (No Real Function Call)

You don't need to actually execute the tool. Define it, tell the model to use it, and treat the tool call arguments as your structured output:

def extract_entities(text: str) -> dict:
    """
    Use tool_choice to force schema-compliant JSON output.
    The tool is never actually executed—it's just a schema enforcement mechanism.
    """
    tools = [
        {
            "name": "save_entities",
            "description": "Save extracted entities from the text",
            "input_schema": {
                "type": "object",
                "required": ["persons", "organizations", "locations", "dates"],
                "properties": {
                    "persons": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "required": ["name", "role"],
                            "properties": {
                                "name": {"type": "string"},
                                "role": {"type": "string"},
                            }
                        }
                    },
                    "organizations": {
                        "type": "array",
                        "items": {"type": "string"}
                    },
                    "locations": {
                        "type": "array",
                        "items": {"type": "string"}
                    },
                    "dates": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                }
            }
        }
    ]

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=tools,
        tool_choice={"type": "tool", "name": "save_entities"},  # force this specific tool
        messages=[{"role": "user", "content": f"Extract all entities from:\n\n{text}"}]
    )

    for block in response.content:
        if block.type == "tool_use" and block.name == "save_entities":
            return block.input  # already a dict, no parsing needed

    raise RuntimeError("Model did not generate a tool call")

Multiple Tools: Let the Model Choose Output Type

def classify_or_extract(query: str) -> dict:
    """
    Define two tools; let the model choose which schema applies.
    """
    tools = [
        {
            "name": "classify_sentiment",
            "description": "Use when the input asks for sentiment analysis",
            "input_schema": {
                "type": "object",
                "required": ["sentiment", "score", "explanation"],
                "properties": {
                    "sentiment": {
                        "type": "string",
                        "enum": ["positive", "negative", "neutral", "mixed"]
                    },
                    "score": {
                        "type": "number",
                        "minimum": -1.0,
                        "maximum": 1.0
                    },
                    "explanation": {"type": "string"}
                }
            }
        },
        {
            "name": "extract_facts",
            "description": "Use when the input contains factual content to extract",
            "input_schema": {
                "type": "object",
                "required": ["facts", "summary"],
                "properties": {
                    "facts": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "statement": {"type": "string"},
                                "confidence": {"type": "number"}
                            }
                        }
                    },
                    "summary": {"type": "string"}
                }
            }
        }
    ]

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=tools,
        tool_choice={"type": "auto"},
        messages=[{"role": "user", "content": query}]
    )

    for block in response.content:
        if block.type == "tool_use":
            return {"tool": block.name, "data": block.input}

    # Model chose not to call a tool
    text = next((b.text for b in response.content if b.type == "text"), "")
    return {"tool": None, "text": text}

TypeScript Tool Use Example

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

interface ExtractedData {
  title: string;
  author: string | null;
  publication_date: string | null;
  key_points: string[];
}

async function extractDocumentMetadata(text: string): Promise<ExtractedData> {
  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    tools: [
      {
        name: "save_metadata",
        description: "Save document metadata",
        input_schema: {
          type: "object" as const,
          required: ["title", "key_points"],
          properties: {
            title: { type: "string" },
            author: { type: "string" },
            publication_date: { type: "string" },
            key_points: {
              type: "array",
              items: { type: "string" },
            },
          },
        },
      },
    ],
    tool_choice: { type: "tool", name: "save_metadata" },
    messages: [{ role: "user", content: `Extract metadata from:\n\n${text}` }],
  });

  const toolUse = response.content.find((b) => b.type === "tool_use");
  if (!toolUse || toolUse.type !== "tool_use") {
    throw new Error("No tool call in response");
  }

  return toolUse.input as ExtractedData;
}

6.5 Robust Parser Design

Regardless of which approach you choose, production parsers need graceful degradation.

Tolerant JSON Parser

from typing import Any

def robust_json_parse(text: str) -> Any:
    """
    Multi-strategy JSON parser that handles common model output issues.
    """
    # Strategy 1: direct parse
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        pass

    # Strategy 2: strip code fences
    cleaned = re.sub(r'^```(?:json)?\s*\n?', '', text.strip(), flags=re.MULTILINE)
    cleaned = re.sub(r'\n?```\s*$', '', cleaned, flags=re.MULTILINE).strip()
    try:
        return json.loads(cleaned)
    except json.JSONDecodeError:
        pass

    # Strategy 3: extract first JSON object or array
    m = re.search(r'(\{[\s\S]*\}|\[[\s\S]*\])', cleaned)
    if m:
        try:
            return json.loads(m.group(1))
        except json.JSONDecodeError:
            pass

    # Strategy 4: json5 for lenient parsing (single quotes, trailing commas, etc.)
    try:
        import json5  # pip install json5
        return json5.loads(cleaned)
    except (ImportError, Exception):
        pass

    raise ValueError(f"Cannot parse as JSON: {text[:300]}")

Format Repair via LLM

When parsing fails entirely, ask Claude to fix the JSON—using a cheap Haiku call:

def parse_or_repair(raw: str, schema: dict) -> dict:
    try:
        return robust_json_parse(raw)
    except ValueError:
        repair_resp = client.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=2048,
            messages=[{"role": "user", "content": f"""The following text should be JSON
but failed to parse. Fix it and return only valid JSON that matches the schema.

Schema: {json.dumps(schema)}

Text to fix: {raw}"""}]
        )
        return json.loads(repair_resp.content[0].text.strip())

6.6 Format Consistency Testing

Track format reliability in CI and production:

import jsonschema
from dataclasses import dataclass
from typing import Callable

@dataclass
class FormatTestReport:
    total: int
    parse_errors: int
    schema_errors: int
    success_rate: float
    error_samples: list[dict]

def test_format_consistency(
    call_fn: Callable[[str], str],
    test_inputs: list[str],
    schema: dict,
    sample_limit: int = 3,
) -> FormatTestReport:
    parse_errors, schema_errors = 0, 0
    error_samples: list[dict] = []

    for inp in test_inputs:
        raw = call_fn(inp)
        try:
            parsed = robust_json_parse(raw)
            jsonschema.validate(parsed, schema)
        except (ValueError, json.JSONDecodeError) as e:
            parse_errors += 1
            if len(error_samples) < sample_limit:
                error_samples.append({"input": inp[:80], "output": raw[:150],
                                      "error": str(e), "type": "parse"})
        except jsonschema.ValidationError as e:
            schema_errors += 1
            if len(error_samples) < sample_limit:
                error_samples.append({"input": inp[:80], "output": raw[:150],
                                      "error": e.message, "type": "schema"})

    total = len(test_inputs)
    total_errors = parse_errors + schema_errors
    return FormatTestReport(
        total=total,
        parse_errors=parse_errors,
        schema_errors=schema_errors,
        success_rate=(total - total_errors) / total if total else 0.0,
        error_samples=error_samples,
    )

6.7 Choosing the Right Approach

Comparison summary:

                    Prompt       XML Tags     Tool Use
                    Instructions
──────────────────  ───────────  ─────────    ────────
Implementation      Low          Low          Medium
Format reliability  ~90%         ~95%         ~99%+
Best for            Simple JSON  Multi-part   Critical pipelines
Data type support   JSON/text    Any text     Strongly typed JSON
Parse overhead      Need parser  Regex/XML    SDK returns dict
Token overhead      Lowest       Low          Higher (tool defs)

Decision guide:

Prototyping / simple extraction: prompt instructions + tolerant parser
Mixed text and code output: XML tags
Production critical path / zero-tolerance for format errors: Tool Use
High-volume / cost-sensitive batch processing: optimized prompt instructions + repair fallback

Summary

Reliable structured output is essential for integrating Claude into production pipelines. The three-level approach:

Prompt instructions + assistant prefill: simplest, ~90% reliable; use a robust parser with fallback repair for the remaining 10%
XML tags: natural for multi-section responses; regex extraction is fast and readable; best when output mixes code, text, and structured data
Tool Use with tool_choice: enforces JSON schema at the API level, achieving near-100% format compliance; the right choice for any critical-path extraction task

Regardless of the approach, always build a tolerant parser, test format consistency on a representative sample, and have a repair strategy for the cases where the model's output doesn't parse cleanly.

The next chapter dives into the Messages API in full detail—every parameter, its semantics, version compatibility, and the gotchas that trip up developers in production.

Rate this chapter

4.7 / 5 (84 ratings)