Specification
Open Responses is an open-source specification and ecosystem for building multi-provider, interoperable LLM interfaces based on the OpenAI Responses API. It defines a shared schema, client library, and tooling layer that enable a unified experience for calling language models, streaming results, and composing agentic workflows—independent of provider.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
Motivation and Overview
Modern LLM systems have converged on similar primitives—messages, function calls, tool usage, and multimodal inputs—but each provider encodes them differently. Open Responses standardizes these concepts, enabling:
- One spec, many providers: Describe inputs/outputs once; run on OpenAI, Anthropic, Gemini, or local models.
- Composable agentic loops: Unified streaming, tool invocation, and message orchestration.
- Easier evaluation and routing: Compare providers, route requests, and log results through a shared schema.
- Blueprints for provider APIs: Labs and model providers wanting to expose their APIs in a common format can easily do so.
Key Principles
Agentic Loop
All models, to some extent, exhibit agency — the ability to perceive input, reason, act through tools, and reflect on outcomes.
Open Responses at is core is designed to expose the power of this agentic loop to developers, making requests that allow the model to do multiple things and yield back a result, whether this is developer-hosted tool calls where control is yielded back to the user, or provider-hosted tools where control is held by the model provider until the model signals an exit criteria.
Open Responses defines a common pattern for defining control flow in the agent loop, a set of item definitions for developer-controlled tools, and pattern for defining provider and router-hosted tools.
Items → Items
Items are the fundamental unit of context in Open Responses: they represent an atomic unit of model output, tool invocation, or reasoning state. Items are bidirectional, they can be provided as inputs to the model, or as outputs from the model.
Each item type has a defined schema that binds it and contains properties specific to it’s unique purpose.
Open Responses defines a common set of items supported by a quorum of model providers, and defines how provider-specific item types can be defined.
Semantic events
Streaming is modeled as a series of semantic events, not raw text or object deltas.
Events describe meaningful transitions. They are either state transitions–response.in_progress, response.completed–or they can represent a delta from a previous state–response.output_item.added, response.output_text.delta.
Open Responses defines a common set of streaming events supported by a quorum of model providers, and defines how provider-specific streaming events can be defined.
Statefulness
By default, Open Responses integrations are stateless. In future versions, the specification will evolve to support stateful interactions.
Consistent core with provider-specific abstractions
The schema is designed to provide a minimal abstraction over common provider features, yet define how provider-specific items, parameters, and streaming events may be defined.
For example, items are common to all providers, as are popular tools like functions. But providers may have specific hosted tools they expose, or knobs that don’t generalize across providers. Open Responses supports this common core pattern.
curl -X POST "https://api.modelprovider.com/v1/responses" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ --data '{ "model": "gpt-5", "provider": "openai", "input": "Explain how photosynthesis works", "provider_options": [ { "type": "openai", "reasoning": { "effort": "high", "summary": "detailed" } }, { "type": "anthropic", "thinking": { "type": "enabled", "budget_tokens": 2048 } } ] }'When features with reasonably similar abstractions are sufficiently generalized in the ecosystem, they will belong in the core request body. Features specific to one or two providers will belong in the provider-specific block.
State machines
Objects in Open Responses are state machines, that is, they can live in one of a finite number of states, such as in_progress, completed, or failed. The spec defines the set of valid states for each state machine in the API.
Open Responses vs OpenAI’s Responses API
Open Responses differs slightly from OpenAI’s Responses API. While many of the concepts and principles were originally implemented there, Open Responses adopts a more generic approach, aiming to be able to serve models from many providers.
Passthrough Extensibility
Open Responses is designed to support published features supported by major model providers, but we also recognize the need to ship features that may not be part of the spec. Open Responses supports pass through provider-specific params that are not part of the official spec.
curl -X POST "https://api.modelprovider.com/v1/responses" \-H "Authorization: Bearer YOUR_API_KEY" \-H "Content-Type: application/json" \--data '{ "model": "gpt-5", "provider": "openai", "input": "tell me a joke", "provider_options": [ { "type": "anthropic", "inference_speed": "fast" } ]}'Overview
HTTP Requests
All messages MUST follow the HTTP protocol.
Headers
| Header | Description | Required |
|---|---|---|
Authorization | Authorization token identifying the developer | Yes |
Content-Type | Describes to the server what kind of data is in the body of the request and how to interpret it. | Yes |
HTTP Request Bodies
Clients MUST send request bodies encoded as application/json
HTTP Responses
Headers
| Header | Description | Required |
|---|---|---|
Content-Type | Describes to the client what kind of data is in the body of the response and how to interpret it. | Yes |
HTTP Response Bodies
When not streaming, servers MUST return data only as application/json.
Streaming HTTP Responses
When streaming, servers MUST return header Content-Type: text/event-stream , with individual data objects returned as JSON-encoded strings. The terminal event MUST be the literal string [DONE].
The event field MUST match the type in the event body. Servers SHOULD NOT use id.
event: response.output_text.deltadata: { "type":"response.output_text.delta","sequence_number":10,"item_id":"msg_07315d23576898080068e95daa2e34819685fb0a98a0503f78","output_index":0,"content_index":0,"delta":" a","logprobs":[],"obfuscation":"Wd6S45xQ7SyQLT"}Items
Items are the core unit of context in Open Responses. Open Responses defines a few types of items that are common to all providers, such as message or function_call, and defines how to define additional, provider-specific item types. There are some general principles that apply to all items:
Items are polymorphic
Item shapes can vary depending on what purpose they serve. A message item, for instance, has a different shape from a function_call , and so on. Items are discriminated based on the type field
-
Example: Message
{"type": "message","id": "msg_01A2B3C4D5E6F7G8H9I0J1K2L3M4N5O6","role": "assistant","status": "completed","content": [{"type": "output_text","text": "Hello! How can I assist you today?"}]} -
Example: Function Call
{"type": "function_call","id": "fc_00123xyzabc9876def0gh1ij2klmno345","name": "sendEmail","call_id": "call_987zyx654wvu321","arguments": "{\"recipient\":\"jane.doe@example.com\",\"subject\":\"Meeting Reminder\",\"body\":\"Don't forget about our meeting tomorrow at 10am.\"}"}
Items are state machines
In Open Responses, all items have a lifecycle. They can be in_progress if the model is sampling the current item, incomplete if the model exhausts its token budget before sampling finishes, or completed when they are fully done sampling.
It’s important to note that while items follow a state machine model—transitioning through these defined states—this does not necessarily mean they are stateful in the sense of being persisted to disk or stored long-term. The state machine describes the item’s status within the response flow, but persistence is a separate concern and is not implied by the item’s state transitions.
All item types should have these three basic status:
in_progress- the model is currently emitting tokens belonging to this itemincomplete- the model has exhausted its token budget while emitting tokens belonging to this item. This is a terminal state. If an item ends in a terminal state, it MUST be the last item emitted, and the containing response also MUST be in anincompletestate.completed- the model has finished emitting tokens containing the item, and / or a tool call has completed successfully. This is a terminal status and no updates may be made to the item after the item has moved into this state.
Certain kinds of items, like hosted tools, MAY have additional statuses. Take openai:file_search_call for example:
searching- the tool is currently searching documentsfailed- the call to search documents failed. This is a terminal state.
Items are streamable
As items change states or values w/in an item change, Open Responses defines how those updates should be communicated in a response stream.
- The first event MUST always be
response.output_item.added. The item is echoed in the payload with as much detail as is available at that time. Formessages, this means at least therole, and forfunction_call, this is at least thename. All fields not marked nullable MUST have a value. Use zero values where applicable.
-
Example
{"type": "response.output_item.added","sequence_number": 11,"output_index": 3,"item": {"id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65","type": "message","status": "in_progress","content": [],"role": "assistant"}}
- Some items have streamable content. Usually this is text being emitted by the model, such as in the case of a
messageoropenai:reasoningitem. Such content MUST be backed by a content part. The first event in such cases MUST beresponse.content_part.added, followed by events representing the delta to the content part, egresponse.<content_type>.delta. The delta events MAY be repeated many times, and end withresponse.<content_type>.done. The content part is then closed withresponse.content_part.done.
-
Example
{"type": "response.content_part.added","sequence_number": 12,"item_id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65","output_index": 3,"content_index": 0,"part": {"type": "output_text","annotations": [],"text": ""}}{"type": "response.output_text.delta","sequence_number": 13,"item_id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65","output_index": 3,"content_index": 0,"delta": "Here",}...{"type": "response.output_text.done","sequence_number": 25,"item_id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65","output_index": 3,"content_index": 0,"text": "Here\u2019s the current weather for San Francisco, CA (as of Wednesday, October 15, 2025):\n\n- Current conditions: Cloudy, 58\u00b0F (14\u00b0C).\n- Today\u2019s outlook: A shower in spots late this morning; otherwise a cloudy start with sunshine returning later. High near 67\u00b0F (19\u00b0C), low around 51\u00b0F (11\u00b0C).\n\nIf you\u2019d like wind speed, humidity, or hourly updates, I can pull those too. ",}{"type": "response.content_part.done","sequence_number": 26,"item_id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65","output_index": 3,"content_index": 0,"part": {"type": "output_text","annotations": [],"text": "Here\u2019s the current weather for San Francisco, CA (as of Wednesday, October 15, 2025):\n\n- Current conditions: Cloudy, 58\u00b0F (14\u00b0C).\n- Today\u2019s outlook: A shower in spots late this morning; otherwise a cloudy start with sunshine returning later. High near 67\u00b0F (19\u00b0C), low around 51\u00b0F (11\u00b0C).\n\nIf you\u2019d like wind speed, humidity, or hourly updates, I can pull those too. "}}
- Items with
contentMAY emit multiple content parts, following the pattern above. When finished, the item is closed withresponse.output_item.done
-
Example
{"type": "response.output_item.done","sequence_number": 27,"output_index": 3,"item": {"id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65","type": "message","status": "completed","content": [{"type": "output_text","annotations": [],"text": "Here\u2019s the current weather for San Francisco, CA (as of Wednesday, October 15, 2025):\n\n- Current conditions: Cloudy, 58\u00b0F (14\u00b0C).\n- Today\u2019s outlook: A shower in spots late this morning; otherwise a cloudy start with sunshine returning later. High near 67\u00b0F (19\u00b0C), low around 51\u00b0F (11\u00b0C).\n\nIf you\u2019d like wind speed, humidity, or hourly updates, I can pull those too. "}],"role": "assistant"}}
Items are extensible
Model providers may emit their own types of items not contained in the Open Responses spec. These items MUST be prefixed with the canonical provider slug, eg
{ "id": "ws_0df093a2d268cd7f0068efe79ac9408190b9833ec01e5d05ed", "type": "openai:web_search_call", "status": "completed", "action": { "type": "search", "query": "weather: San Francisco, CA" }}Content
In Open Responses, “content” represents the raw material exchanged between the user and the model within each message turn. It is deliberately modeled as a discriminated union rather than a single polymorphic type, allowing different shapes for user-authored input and model-generated output.
User Content vs Model Content
There are two top-level content unions:
UserContent: structured data provided by the user or client application.ModelContent: structured data returned by the model.
This distinction reflects the asymmetric nature of conversation turns. User inputs can include multiple modalities (e.g. text, images, audio, video), while model outputs are usually restricted to text, at least in the base protocol. Keeping these unions separate allows model providers to evolve their output schemas independently (for example, adding output_image or output_tool_call in the future) without over-generalizing the user side.
Why User and Assistant Content Have Different Shapes
Although both are called “content,” user and assistant payloads serve very different roles:
- User content captures what the model is being asked to process. It can include one or more text segments, images, or other binary data that are meaningful inputs to the model’s inference step.
- Assistant (model) content captures what the model produced—its textual completion or reasoning output. These are usually serializable as UTF-8 text, optionally with metadata like token logprobs.
Because of this asymmetry:
- User content must support multiple data types (text, base64-encoded or URL-sourced images, potentially other future modalities).
- Model content is intentionally narrower to keep the protocol minimal and predictable across providers.
This separation simplifies validation, streaming, and logging. For example, a client can safely assume every model message contains output_text, while user messages may contain arbitrary mixtures of text and image inputs.
Errors
Error handling in Open Responses is designed to provide clear, actionable feedback to developers and users when requests fail or encounter issues.
When an error occurs, the API responds with a structured error object that includes:
- Type: The category of the error, such as
server_error,model_error,invalid_request, ornot_found. These generally, but not always, map to the status code of the response. - Code: An optional error code providing additional detail about the specific problem. All common error codes are enumerated in the spec.
- Param: The input parameter related to the error, if applicable.
- Message: A human-readable explanation of what went wrong.
Errors are also emitted as events in the streaming protocol, allowing clients to detect and respond to issues in real time while processing streamed responses. Any error incurred while streaming will be followed by a response.failed event.
It is recommended to check for error responses and handle them gracefully in your client implementation. Display user-friendly messages or trigger appropriate fallbacks where possible.
Error types can indicate whether an error is recoverable or not, allowing your application to determine if it should retry, prompt the user, or halt the operation. The error’s code provide additional details, often including guidance or steps for recovery, which helps developers create more robust and responsive clients.
Streaming
Open Responses defines two major types of streaming events into which all events MUST fall.
Delta Events
A delta event represents a change to an object since its last update. This mechanism allows updates to be communicated incrementally as they happen, making streamed responses possible and efficient.
Common examples of delta events include:
- Adding a new item to a list (
response.output_item.added)—for example, when a new message is generated and appended to a conversation. - Appending more text to an ongoing output (
response.output_text.deltaorresponse.function_call_arguments.delta)—such as when a language model is generating a reply word by word. - Signaling that no further changes will occur to a content part (
response.content_part.done)—for example, when the model has finished generating a sentence or paragraph.
All streamable objects must follow this delta event pattern. For instance, the flow of delta events on an output item could start with the item being added, followed by several text deltas as content is generated, and ending with a “done” event to indicate completion.
State Machine events
A state machine event represents a change in the status of an object as it progresses through its lifecycle. For example, when a response changes status from in_progress to completed, a response.completed event is triggered.
Other examples include:
- When a
Responsemoves fromqueuedtoin_progress, aresponse.in_progressevent is emitted. - If an Response encounters an error and its status changes to
failed, aresponse.failedevent is generated.
Tools
Tool-use is at the heart of modern LLM’s agentic capabilities. There are generally two types of tools that can be exposed in Open Responses:
Externally-hosted tools
Externally hosted tools are ones where the implementation lives outside of the model provider’s system. Functions are a great example, where the LLM is able to call the function, but the developer must run the function and return the output back to the LLM in a second request. MCP is another example, where external servers host the implementations, although control is not first yielded back to the developer.
Internally-hosted tools
Internally hosted tools are ones where the implementation lives inside the model provider’s system. An example is OpenAI’s file search tool, where the model is able to call file search, and the tool can execute and return results to the LLM without yielding control back to the developer.
Implementing Open Responses as a routing layer
Build hosted tools at the routing layer
Routers act as intermediaries between clients and upstream model providers. They can expose their own hosted tools—distinct from standard function tools—by providing new capabilities directly within the router, such as custom search, retrieval, or workflow orchestration. When a user or model invokes a hosted tool, the router executes it internally, often integrating with services or logic outside the model provider, and returns the result without exposing the implementation details. This enables routers to add custom features and maintain a consistent tool interface for both users and models.
{ "model": "my-model", "messages": [ {"role": "user", "content": "Find relevant documents about climate change."} ], "tools": [ { "type": "custom_document_search", "documents": [ { "type": "external_file", "url": "https://arxiv.org/example.pdf" } ] } ]}In this example, the client requests the custom_document_search hosted tool exposed by the router. The router executes the tool directly, returning results to the client and abstracting away the internal workings.
Params for multi-provider routing
Routers MAY add two top-level params to the CreateResponseBody payload when routing to providers that do not support Open Responses:
provider_options– an array of objects keyed with the model provider and their provider-specific API options, if they do not belong in the base specprovider– a string indicating which provider to target for a given request
curl -X POST "https://api.router.com/v1/responses" \-H "Authorization: Bearer YOUR_API_KEY" \-H "Content-Type: application/json" \-H "Open Responses-Version: latest" \--data '{ "model": "opus-4.5", "provider": "anthropic", "input": "tell me a joke", "provider_options": [ { "type": "anthropic", "inference_speed": "fast" } ]}'Model provider implementers of Open Responses may not add these params.
Implementing Open Responses as a model provider
The Agentic loop
The agentic loop is the core principle that enables Open Responses to interact intelligently with users, reason about tasks, and invoke tools as needed to complete complex workflows. In this loop, the language model analyzes the user’s request, determines whether it can answer directly or needs to use a tool, and issues tool calls as required.
When a tool needs to be used, the model emits a tool invocation event, such as a function call or a hosted tool call, containing the necessary parameters. For externally-hosted tools, the execution happens outside the model provider, and the result is returned in a follow-up request. For internally-hosted tools, the model provider executes the tool and immediately streams back the results.
This loop of reasoning, tool invocation, and response generation can repeat multiple times within a session, allowing the model to gather information, process data, or take actions through tools before producing a final answer for the user.
sequenceDiagram
autonumber
participant User
participant API as API Server
participant LLM
participant Tool
User->>API: user request
loop agentic loop
API->>LLM: sample from model
LLM-->>API: Sampled message
alt tool called by model
API->>Tool: sampled tool call arguments
Tool-->>API: tool result
else
Note right of API: loop finished
end
end
API-->>User: output items