Every LLM feature that does something useful with model output eventually hits the same wall: the model returns beautiful prose, and your code needs a dictionary. You call json.loads() and get a JSONDecodeError. Or the JSON is valid but the keys are wrong. Or there is a paragraph of explanation before the opening brace.
Structured output is the discipline of making LLMs return machine-parseable data reliably enough to ship. This guide covers every approach โ prompt-based, native API, schema-enforced โ and shows you which to pick and how to validate what comes back.
Why Structured Output Matters
Most real LLM use cases are pipelines, not chatbots. A pipeline takes some input, runs it through a model, parses the output, and does something with the result. The pipeline breaks if the output format is unpredictable.
Consider entity extraction: you feed news articles through a model to extract company names, locations, and dates. Each extraction feeds a downstream database insert. If one article causes the model to return "No entities found in this text." instead of {"entities": []}, your pipeline crashes. Structured output is what keeps pipelines from crashing at 2am.
Beyond reliability, structured output enables:
- Typed data models โ parse directly into Pydantic models, eliminating manual field extraction
- Agent planning โ agentic systems need structured intermediate state to pass between steps
- Chained prompts โ output of one call becomes input to the next, so format must be predictable
- Parallel processing โ batch structured outputs can be parsed and stored in bulk
Three Approaches to Getting Structured Output
Approach 1: Prompt-Based JSON Instruction
The simplest approach: tell the model to return JSON in the prompt. Works with any model, any API.
System: You are a data extraction assistant. Always respond with valid JSON
and nothing else โ no prose before or after the JSON.
User: Extract the company name, founding year, and CEO from this text:
"Anthropic was founded in 2021 by Dario Amodei and Daniela Amodei.
Dario Amodei serves as CEO."
Expected output:
{
"company": "Anthropic",
"founding_year": 2021,
"ceo": "Dario Amodei"
}
This works most of the time, but "most of the time" is not production-grade. The model may:
- Add
"Here is the JSON:"before the opening brace - Use single quotes instead of double quotes
- Include a trailing comma after the last field
- Hallucinate a field that was not in your implicit schema
Use prompt-based instruction only for prototyping or with a validation layer that catches and retries errors.
Approach 2: Native JSON Mode (OpenAI)
OpenAI's Chat Completions API and Responses API support response_format: { type: "json_object" }. When enabled, the model is constrained to always return valid JSON syntax.
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": "Extract company info and return JSON with keys: "
"company, founding_year, ceo"
},
{
"role": "user",
"content": "Anthropic was founded in 2021 by Dario Amodei..."
}
]
)
import json
result = json.loads(response.choices[0].message.content)
Gotchas with JSON mode:
- JSON mode guarantees valid syntax but not correct schema โ the model still chooses which keys to include
- You must mention JSON in your system or user message, otherwise the API returns an error
- JSON mode can make the model more verbose inside the JSON (longer string values)
Structured Outputs (strict mode) goes further: you provide a JSON Schema and the model is constrained to match it exactly. This is the most reliable approach available through the OpenAI API.
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
response_format={
"type": "json_schema",
"json_schema": {
"name": "company_info",
"strict": True,
"schema": {
"type": "object",
"properties": {
"company": {"type": "string"},
"founding_year": {"type": "integer"},
"ceo": {"type": "string"}
},
"required": ["company", "founding_year", "ceo"],
"additionalProperties": False
}
}
},
messages=[...]
)
Approach 3: Tool/Function Calling with Schemas
The most portable approach across providers. Instead of asking the model to return JSON, you define a "tool" that the model must call, with the tool's parameters being exactly the structured data you want.
This works because tool calling is a first-class API feature โ the model's output is constrained to a structured function call format, not free-form text.
JSON Schema Basics for AI Prompting
Whether you use native JSON mode or prompt-based instructions, you need to specify your schema. JSON Schema is the standard format.
A complete schema for extracting information from a job posting:
{
"type": "object",
"properties": {
"job_title": {
"type": "string",
"description": "The exact job title as listed"
},
"company": {
"type": "string"
},
"location": {
"type": "string",
"description": "City, State or 'Remote'"
},
"employment_type": {
"type": "string",
"enum": ["full-time", "part-time", "contract", "internship"]
},
"salary_range": {
"type": ["object", "null"],
"properties": {
"min": {"type": "integer"},
"max": {"type": "integer"},
"currency": {"type": "string", "default": "USD"}
},
"required": ["min", "max"]
},
"required_skills": {
"type": "array",
"items": {"type": "string"},
"description": "List of required technical skills"
},
"years_experience_required": {
"type": ["integer", "null"]
}
},
"required": ["job_title", "company", "location", "employment_type",
"required_skills"],
"additionalProperties": false
}
Key schema concepts for LLM use:
requiredarray โ list fields the model must always populateadditionalProperties: falseโ prevents the model from inventing extra fieldsenumโ constrain a string to a fixed set of values (much more reliable than free text for categories)type: ["string", "null"]โ allow null when the information is not present in the sourcedescriptionon properties โ tells the model what the field means, critical for accuracy
Using Anthropic's Structured Output (Tool Use Pattern)
Claude does not have a native JSON mode equivalent to OpenAI's. Instead, structured output in Claude is achieved through tool definitions. You define a tool whose parameters match your desired output schema, then force the model to use that tool.
import anthropic
import json
client = anthropic.Anthropic()
# Define the output schema as a tool
extract_tool = {
"name": "extract_company_info",
"description": "Extract structured company information from text",
"input_schema": {
"type": "object",
"properties": {
"company": {
"type": "string",
"description": "Company name"
},
"founding_year": {
"type": "integer",
"description": "Year the company was founded"
},
"ceo": {
"type": "string",
"description": "Current CEO full name"
},
"headquarters": {
"type": ["string", "null"],
"description": "City and country of headquarters, or null if not mentioned"
}
},
"required": ["company", "founding_year", "ceo", "headquarters"]
}
}
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=[extract_tool],
tool_choice={"type": "tool", "name": "extract_company_info"},
messages=[
{
"role": "user",
"content": "Anthropic was founded in 2021 by Dario Amodei and "
"Daniela Amodei. Dario serves as CEO. The company is "
"headquartered in San Francisco, USA."
}
]
)
# Extract the structured result from the tool call
tool_use = next(
block for block in response.content
if block.type == "tool_use"
)
result = tool_use.input
print(result)
# {'company': 'Anthropic', 'founding_year': 2021,
# 'ceo': 'Dario Amodei', 'headquarters': 'San Francisco, USA'}
The tool_choice: {"type": "tool", "name": "..."} forces Claude to call that specific tool, guaranteeing structured output. Without tool_choice, Claude might respond in prose and not call the tool at all.
Prompting Techniques That Improve JSON Accuracy
Even with native JSON mode, how you write the prompt affects accuracy. These techniques reduce error rates across all providers.
Include the Schema in the Prompt
Paste the schema (or a simplified version of it) directly into the user message or system prompt. Models follow an explicit schema more reliably than they infer structure from field names alone.
Extract information and return JSON matching this exact schema:
{
"company": string,
"founding_year": integer,
"ceo": string,
"headquarters": string or null
}
If a field is not mentioned in the text, use null.
Return ONLY the JSON object, no explanation before or after.
Show an Example Output
Few-shot examples dramatically reduce format errors. Include one complete example in your prompt:
Extract company information from text.
Example input:
"Apple Inc. was founded in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.
Tim Cook is the current CEO. The company is headquartered in Cupertino, California."
Example output:
{
"company": "Apple Inc.",
"founding_year": 1976,
"ceo": "Tim Cook",
"headquarters": "Cupertino, California"
}
Now extract from this text:
[user input]
Use "null" Explicitly, Not Omission
Tell the model to use null for missing fields rather than omitting them. Omitted fields cause KeyError in downstream code. Null values are explicit and handleable.
Bad instruction: "Include fields only if they are mentioned." Good instruction: "Include all fields. Use null for any field not mentioned in the text."
Break Complex Schemas into Chains
A 20-field schema produces more errors than two 10-field schemas called sequentially. For documents with many fields:
# Call 1: extract basic metadata
basic = extract_with_schema(doc, BASIC_SCHEMA) # 5 fields
# Call 2: extract technical details
technical = extract_with_schema(doc, TECHNICAL_SCHEMA) # 8 fields
# Call 3: extract relationships
relationships = extract_with_schema(doc, RELATIONSHIP_SCHEMA) # 6 fields
result = {**basic, **technical, **relationships}
Each call is focused and less error-prone than a single sprawling call.
Ask the Model to Verify Its Own Output
For high-stakes extraction, add a verification step:
Step 1: Extract the information and return JSON.
Step 2: Before finalizing, re-read your JSON and check: does every field have
a value or null? Is the JSON syntactically valid? Fix any issues.
Step 3: Return the final, verified JSON.
This self-verification step catches about 30% of errors before they reach your validation layer.
Debugging Structured Output Failures
When your structured output pipeline breaks in production, the failure usually falls into one of these categories:
Syntax errors: The JSON is not valid โ trailing commas, unquoted keys, single quotes instead of double quotes, or truncated output. Fix with native JSON mode, or strip and repair common syntax mistakes before parsing.
import re
def repair_json(raw: str) -> str:
"""Attempt basic JSON repair before parsing."""
# Remove prose before the first { or [
raw = re.sub(r'^[^{\[]*', '', raw.strip())
# Remove prose after the last } or ]
raw = re.sub(r'[^}\]]*$', '', raw.strip())
# Replace Python-style None/True/False with JSON equivalents
raw = raw.replace('None', 'null').replace('True', 'true').replace('False', 'false')
return raw
Schema errors: The JSON is valid but does not match your schema โ wrong field names, wrong types, missing required fields, extra fields. Fix with Pydantic validation + retry.
Semantic errors: The JSON is valid and matches the schema, but the values are wrong โ the model extracted the wrong person as CEO, or confused two dates. Fix with better prompt instructions, few-shot examples targeting the failure case, or a more capable model.
Truncation: The model hit the token limit mid-JSON. Fix by increasing max_tokens, shortening the input, or splitting into multiple calls.
Validating LLM JSON Output
Models produce invalid JSON more often than you expect. Never call json.loads() without a try/except. Better: use a validation library that checks both syntax and schema.
Pydantic Validation (Python)
from pydantic import BaseModel, ValidationError
from typing import Optional
import json
class CompanyInfo(BaseModel):
company: str
founding_year: int
ceo: str
headquarters: Optional[str] = None
def parse_and_validate(raw_output: str) -> CompanyInfo:
"""Parse LLM JSON output and validate against schema."""
try:
# Strip any prose wrapping the JSON
json_start = raw_output.find('{')
json_end = raw_output.rfind('}') + 1
if json_start == -1:
raise ValueError("No JSON object found in output")
clean_json = raw_output[json_start:json_end]
data = json.loads(clean_json)
return CompanyInfo(**data)
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON syntax: {e}")
except ValidationError as e:
raise ValueError(f"Schema validation failed: {e}")
Retry Pattern with Validation Feedback
When validation fails, feed the error back to the model. One retry resolves most issues.
def extract_with_retry(text: str, max_retries: int = 2) -> CompanyInfo:
messages = [
{
"role": "user",
"content": f"Extract company info from this text and return "
f"valid JSON matching this schema: "
f"{CompanyInfo.model_json_schema()}\n\nText: {text}"
}
]
for attempt in range(max_retries + 1):
response = llm.complete(messages)
raw_output = response.content
try:
return parse_and_validate(raw_output)
except ValueError as e:
if attempt == max_retries:
raise
# Feed error back to model
messages.append({"role": "assistant", "content": raw_output})
messages.append({
"role": "user",
"content": f"Your output failed validation: {e}. "
f"Please fix it and return only valid JSON."
})
raise RuntimeError("Unreachable")
Common Structured Output Patterns
Classification with Confidence Scores
{
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": ["positive", "negative", "neutral", "mixed"]
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence score between 0 and 1"
},
"reasoning": {
"type": "string",
"description": "One sentence explaining the classification"
}
},
"required": ["category", "confidence", "reasoning"]
}
Entity Extraction
{
"type": "object",
"properties": {
"people": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"role": {"type": ["string", "null"]},
"organization": {"type": ["string", "null"]}
},
"required": ["name", "role", "organization"]
}
},
"organizations": {
"type": "array",
"items": {"type": "string"}
},
"locations": {
"type": "array",
"items": {"type": "string"}
},
"dates": {
"type": "array",
"items": {
"type": "object",
"properties": {
"text": {"type": "string"},
"normalized": {"type": ["string", "null"],
"description": "ISO 8601 format if determinable"}
},
"required": ["text", "normalized"]
}
}
},
"required": ["people", "organizations", "locations", "dates"]
}
Multi-Field Document Parsing
For long documents with many fields, split extraction into logical groups. Extracting 5 fields per call with high accuracy beats extracting 25 fields per call with frequent errors.
# Split into two focused calls instead of one large call
basic_info = extract_with_schema(document, schema=BASIC_INFO_SCHEMA)
financial_info = extract_with_schema(document, schema=FINANCIAL_SCHEMA)
# Merge results
result = {**basic_info, **financial_info}
Structured Output for Agentic Systems
Agents need structured output for their intermediate state, not just their final answer. An agent deciding what to do next should return a machine-parseable plan, not a prose paragraph.
A planning schema for a multi-step agent:
{
"type": "object",
"properties": {
"goal_understood": {
"type": "string",
"description": "Restate the goal in one sentence to confirm understanding"
},
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"step_id": {"type": "integer"},
"action": {"type": "string"},
"tool": {"type": "string"},
"parameters": {"type": "object"},
"depends_on": {
"type": "array",
"items": {"type": "integer"},
"description": "IDs of steps that must complete before this one"
}
},
"required": ["step_id", "action", "tool", "parameters", "depends_on"]
}
},
"can_parallelize": {
"type": "boolean",
"description": "True if any steps can run in parallel"
}
},
"required": ["goal_understood", "steps", "can_parallelize"]
}
The orchestrating code parses this plan and executes steps in the right order, potentially in parallel. Without structured output, the agent's plan is prose โ readable but not executable.
Performance Tips
Shorter schemas produce fewer errors. Each field is an opportunity for a mistake. If you only need 5 of 15 possible fields for a given task, define a schema with only 5 fields.
Use enum for categorical fields. Instead of "type": "string" for a sentiment field, use "enum": ["positive", "negative", "neutral"]. The model makes far fewer errors when it knows the exact set of valid values.
Add description to every non-obvious field. The model uses property descriptions to decide what to extract. "founding_year: year the company was incorporated" is much clearer than an undescribed integer field.
Validate types strictly. If a field should be an integer year, declare it as "type": "integer". If the model returns "2021" (a string), catch it in validation and include that in the retry prompt.
Test with edge cases. Run your extraction schema against documents where fields are missing, ambiguous, or expressed unusually. These are exactly the cases that break production pipelines and they almost never show up in your initial testing.
Cache schema-heavy prompts. If your schema and system prompt are long and constant across many requests, use prompt caching (available in both Anthropic and OpenAI APIs) to cut costs significantly.
Choosing the Right Approach
| Scenario | Recommended Approach |
|---|---|
| Prototyping, any model | Prompt-based with Pydantic validation |
| OpenAI API, exact schema required | Structured Outputs (strict mode) |
| OpenAI API, valid JSON sufficient | JSON mode |
| Anthropic Claude | Tool use with tool_choice forced |
| Production pipeline, any model | Tool/function calling + validation + retry |
| Simple 2-3 field extraction | Prompt-based usually sufficient |
| Complex nested schemas | Native structured output APIs |