Prompting & Reasoning

Structured Outputs (Constrained Decoding)

Structured outputs are a feature that forces a language model's response to match a developer-supplied JSON Schema exactly, using constrained decoding to make invalid tokens impossible to sample.

What is Structured Outputs (Constrained Decoding)?

Structured Outputs is a model capability that guarantees a language model's response conforms to a developer-provided JSON Schema. Rather than asking the model in the prompt to please return JSON and hoping it complies, the API enforces the schema during token generation so the returned text always parses and always matches the declared field names, types, and required keys.

The underlying mechanism is constrained decoding, sometimes called constrained sampling. At each generation step the model produces a probability distribution over the vocabulary, and the decoder masks out any token that would violate the grammar derived from the schema. Because forbidden tokens are removed before sampling, the model literally cannot emit a string that breaks the structure.

Schema adherence is enforced at decode time, not requested in the prompt.
Output is valid JSON that maps directly to a typed object.
Differs from older JSON mode, which only guaranteed syntactic JSON, not a specific shape.

How constrained decoding works

The provider compiles the JSON Schema into a formal grammar, typically a context-free grammar or a finite-state machine. During inference the decoder tracks which tokens are legal given the partial output so far. After the model computes logits for the next token, tokens that the grammar disallows are set to negative infinity, so their probability after the softmax is effectively zero. Sampling then proceeds normally over the remaining valid tokens.

This approach changes the search space rather than the model weights. The model still chooses content based on its learned distribution, but the set of choices is restricted to those that keep the output on a path toward a complete, schema-valid document. The result is that field names, nested objects, enums, and required properties all appear exactly as specified.

Schema is translated into a grammar the decoder can check token by token.
Invalid next-tokens are masked before sampling, guaranteeing structural validity.
Content quality still depends on the model; only the shape is guaranteed.

Reliability and how it differs from JSON mode

On OpenAI's evaluation of complex JSON Schema following, the gpt-4o-2024-08-06 model with Structured Outputs scored 100 percent schema adherence, compared with substantially lower rates for prompting alone. The earlier JSON mode guaranteed only that the output was syntactically valid JSON; it did not guarantee that the JSON matched a particular schema, so a response could be well-formed yet missing required fields or using the wrong types.

Structured Outputs is exposed in two places: as a response_format of type json_schema for general completions, and as strict: true inside a function or tool definition for function calling. Setting strict to true makes the arguments the model produces conform to the function's parameter schema, which removes a common class of agent failures where a tool call cannot be parsed.

JSON mode: valid JSON syntax only.
Structured Outputs: valid JSON that matches a named schema, with required keys and types enforced.
Function calling gains the same guarantee through strict: true on the tool definition.

Code example

The example below requests a response constrained to a schema using the OpenAI Python SDK. The json_schema response format with strict set to true makes every field in the schema mandatory and disallows extra properties.

python

from openai import OpenAI

client = OpenAI()

schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "sentiment": {"type": "string", "enum": ["positive", "neutral", "negative"]},
        "keywords": {"type": "array", "items": {"type": "string"}}
    },
    "required": ["title", "sentiment", "keywords"],
    "additionalProperties": False
}

resp = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[{"role": "user", "content": "Summarize this review: 'Battery lasts all day, screen is gorgeous.'"}],
    response_format={
        "type": "json_schema",
        "json_schema": {"name": "review_summary", "schema": schema, "strict": True}
    }
)

import json
parsed = json.loads(resp.choices[0].message.content)
print(parsed["sentiment"])  # always one of the enum values

Requesting a schema-constrained response with the OpenAI Python SDK.

When to use structured outputs

Structured outputs fit any pipeline where a downstream program consumes the model's response: extraction into a database, populating UI components, classification with a fixed label set, or tool and function calling inside an agent. By moving validation into the decoder, the application can skip retry-and-reparse loops that handle malformed JSON.

Limitations apply. The guarantee covers shape, not truth, so a model can still return a schema-valid object with hallucinated values. Very large or deeply recursive schemas can be rejected or slow compilation, and not every model snapshot supports the feature. Treating the schema as a contract for structure while validating semantic correctness separately is the safe pattern.

Ideal for extraction, classification, and agent tool calls.
Removes parse-and-retry error handling for JSON.
Does not validate that values are factually correct.

Key takeaways

Structured outputs force model responses to match a JSON Schema by masking invalid tokens during decoding.
Constrained decoding changes what the model can sample, not its weights, so structure is guaranteed and content is not.
It supersedes JSON mode, which guaranteed only valid JSON syntax rather than a specific schema.
On OpenAI's complex-schema evaluation, gpt-4o-2024-08-06 with structured outputs reached 100 percent schema adherence.
Use strict: true for function calling to make tool-call arguments reliably parseable in agents.

Frequently asked questions

JSON mode only guarantees that the output is syntactically valid JSON. Structured outputs additionally guarantee that the JSON matches a specific schema, including required fields, types, and enums, by constraining decoding to that schema.

The schema is compiled into a grammar. At each step the decoder masks any next-token that would violate the grammar, setting its probability to zero before sampling. The model can only choose tokens that keep the output on a path to a valid document.

No. They guarantee structure, not truth. A response can match the schema perfectly while containing hallucinated or wrong values, so semantic validation should still happen in your application code.

Yes. Setting strict: true inside a function or tool definition makes the model produce arguments that conform to the function's parameter schema, which prevents tool calls that cannot be parsed by the agent.

OpenAI introduced response_format json_schema support starting with gpt-4o-2024-08-06 and gpt-4o-mini snapshots, with later models also supporting it. Availability and schema complexity limits vary by provider and model version.

Put the idea into practice

MemX is an AI memory agent built on these ideas: store anything, skip the folders, and find it again by asking in plain English.

Try MemX Free