Specification

Open Responses is an open-source specification and ecosystem for building multi-provider, interoperable LLM interfaces based on the OpenAI Responses API. It defines a shared schemaclient library, and tooling layer that enable a unified experience for calling language models, streaming results, and composing agentic workflows—independent of provider.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Motivation and Overview

Modern LLM systems have converged on similar primitives—messages, function calls, tool usage, and multimodal inputs—but each provider encodes them differently. Open Responses standardizes these concepts, enabling:

  • One spec, many providers: Describe inputs/outputs once; run on OpenAI, Anthropic, Gemini, or local models.
  • Composable agentic loops: Unified streaming, tool invocation, and message orchestration.
  • Easier evaluation and routing: Compare providers, route requests, and log results through a shared schema.
  • Blueprints for provider APIs: Labs and model providers wanting to expose their APIs in a common format can easily do so.

Key Principles

Agentic Loop

All models, to some extent, exhibit agency — the ability to perceive input, reason, act through tools, and reflect on outcomes.

Open Responses at is core is designed to expose the power of this agentic loop to developers, making requests that allow the model to do multiple things and yield back a result, whether this is developer-hosted tool calls where control is yielded back to the user, or provider-hosted tools where control is held by the model provider until the model signals an exit criteria.

Open Responses defines a common pattern for defining control flow in the agent loop, a set of item definitions for developer-controlled tools, and pattern for defining provider and router-hosted tools.

Items → Items

Items are the fundamental unit of context in Open Responses: they represent an atomic unit of model output, tool invocation, or reasoning state. Items are bidirectional, they can be provided as inputs to the model, or as outputs from the model.

Each item type has a defined schema that binds it and contains properties specific to it’s unique purpose.

Open Responses defines a common set of items supported by a quorum of model providers, and defines how provider-specific item types can be defined.

Semantic events

Streaming is modeled as a series of semantic events, not raw text or object deltas.

Events describe meaningful transitions. They are either state transitions–response.in_progress, response.completed–or they can represent a delta from a previous state–response.output_item.added, response.output_text.delta.

Open Responses defines a common set of streaming events supported by a quorum of model providers, and defines how provider-specific streaming events can be defined.

Statefulness

By default, Open Responses integrations are stateless. In future versions, the specification will evolve to support stateful interactions.

Consistent core with provider-specific abstractions

The schema is designed to provide a minimal abstraction over common provider features, yet define how provider-specific items, parameters, and streaming events may be defined.

For example, items are common to all providers, as are popular tools like functions. But providers may have specific hosted tools they expose, or knobs that don’t generalize across providers. Open Responses supports this common core pattern.

curl -X POST "https://api.modelprovider.com/v1/responses" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
--data '{
"model": "gpt-5",
"provider": "openai",
"input": "Explain how photosynthesis works",
"provider_options": [
{
"type": "openai",
"reasoning": {
"effort": "high",
"summary": "detailed"
}
},
{
"type": "anthropic",
"thinking": {
"type": "enabled",
"budget_tokens": 2048
}
}
]
}'

When features with reasonably similar abstractions are sufficiently generalized in the ecosystem, they will belong in the core request body. Features specific to one or two providers will belong in the provider-specific block.

State machines

Objects in Open Responses are state machines, that is, they can live in one of a finite number of states, such as in_progress, completed, or failed. The spec defines the set of valid states for each state machine in the API.

Open Responses vs OpenAI’s Responses API

Open Responses differs slightly from OpenAI’s Responses API. While many of the concepts and principles were originally implemented there, Open Responses adopts a more generic approach, aiming to be able to serve models from many providers.

Passthrough Extensibility

Open Responses is designed to support published features supported by major model providers, but we also recognize the need to ship features that may not be part of the spec. Open Responses supports pass through provider-specific params that are not part of the official spec.

curl -X POST "https://api.modelprovider.com/v1/responses" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
--data '{
"model": "gpt-5",
"provider": "openai",
"input": "tell me a joke",
"provider_options": [
{
"type": "anthropic",
"inference_speed": "fast"
}
]
}'

Overview

HTTP Requests

All messages MUST follow the HTTP protocol.

Headers

HeaderDescriptionRequired
AuthorizationAuthorization token identifying the developerYes
Content-TypeDescribes to the server what kind of data is in the body of the request and how to interpret it.Yes

HTTP Request Bodies

Clients MUST send request bodies encoded as application/json

HTTP Responses

Headers

HeaderDescriptionRequired
Content-TypeDescribes to the client what kind of data is in the body of the response and how to interpret it.Yes

HTTP Response Bodies

When not streaming, servers MUST return data only as application/json.

Streaming HTTP Responses

When streaming, servers MUST return header Content-Type: text/event-stream , with individual data objects returned as JSON-encoded strings. The terminal event MUST be the literal string [DONE].

The event field MUST match the type in the event body. Servers SHOULD NOT use id.

event: response.output_text.delta
data: { "type":"response.output_text.delta","sequence_number":10,"item_id":"msg_07315d23576898080068e95daa2e34819685fb0a98a0503f78","output_index":0,"content_index":0,"delta":" a","logprobs":[],"obfuscation":"Wd6S45xQ7SyQLT"}

Items

Items are the core unit of context in Open Responses. Open Responses defines a few types of items that are common to all providers, such as message or function_call, and defines how to define additional, provider-specific item types. There are some general principles that apply to all items:

Items are polymorphic

Item shapes can vary depending on what purpose they serve. A message item, for instance, has a different shape from a function_call , and so on. Items are discriminated based on the type field

  • Example: Message

    {
    "type": "message",
    "id": "msg_01A2B3C4D5E6F7G8H9I0J1K2L3M4N5O6",
    "role": "assistant",
    "status": "completed",
    "content": [
    {
    "type": "output_text",
    "text": "Hello! How can I assist you today?"
    }
    ]
    }
  • Example: Function Call

    {
    "type": "function_call",
    "id": "fc_00123xyzabc9876def0gh1ij2klmno345",
    "name": "sendEmail",
    "call_id": "call_987zyx654wvu321",
    "arguments": "{\"recipient\":\"jane.doe@example.com\",\"subject\":\"Meeting Reminder\",\"body\":\"Don't forget about our meeting tomorrow at 10am.\"}"
    }

Items are state machines

In Open Responses, all items have a lifecycle. They can be in_progress if the model is sampling the current item, incomplete if the model exhausts its token budget before sampling finishes, or completed when they are fully done sampling.

It’s important to note that while items follow a state machine model—transitioning through these defined states—this does not necessarily mean they are stateful in the sense of being persisted to disk or stored long-term. The state machine describes the item’s status within the response flow, but persistence is a separate concern and is not implied by the item’s state transitions.

All item types should have these three basic status:

  • in_progress - the model is currently emitting tokens belonging to this item
  • incomplete - the model has exhausted its token budget while emitting tokens belonging to this item. This is a terminal state. If an item ends in a terminal state, it MUST be the last item emitted, and the containing response also MUST be in an incomplete state.
  • completed - the model has finished emitting tokens containing the item, and / or a tool call has completed successfully. This is a terminal status and no updates may be made to the item after the item has moved into this state.

Certain kinds of items, like hosted tools, MAY have additional statuses. Take openai:file_search_call for example:

  • searching - the tool is currently searching documents
  • failed - the call to search documents failed. This is a terminal state.

Items are streamable

As items change states or values w/in an item change, Open Responses defines how those updates should be communicated in a response stream.

  1. The first event MUST always be response.output_item.added. The item is echoed in the payload with as much detail as is available at that time. For messages, this means at least the role, and for function_call, this is at least the name. All fields not marked nullable MUST have a value. Use zero values where applicable.
  • Example

    {
    "type": "response.output_item.added",
    "sequence_number": 11,
    "output_index": 3,
    "item": {
    "id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65",
    "type": "message",
    "status": "in_progress",
    "content": [],
    "role": "assistant"
    }
    }
  1. Some items have streamable content. Usually this is text being emitted by the model, such as in the case of a message or openai:reasoning item. Such content MUST be backed by a content part. The first event in such cases MUST be response.content_part.added , followed by events representing the delta to the content part, eg response.<content_type>.delta . The delta events MAY be repeated many times, and end with response.<content_type>.done. The content part is then closed with response.content_part.done.
  • Example

    {
    "type": "response.content_part.added",
    "sequence_number": 12,
    "item_id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65",
    "output_index": 3,
    "content_index": 0,
    "part": {
    "type": "output_text",
    "annotations": [],
    "text": ""
    }
    }
    {
    "type": "response.output_text.delta",
    "sequence_number": 13,
    "item_id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65",
    "output_index": 3,
    "content_index": 0,
    "delta": "Here",
    }
    ...
    {
    "type": "response.output_text.done",
    "sequence_number": 25,
    "item_id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65",
    "output_index": 3,
    "content_index": 0,
    "text": "Here\u2019s the current weather for San Francisco, CA (as of Wednesday, October 15, 2025):\n\n- Current conditions: Cloudy, 58\u00b0F (14\u00b0C).\n- Today\u2019s outlook: A shower in spots late this morning; oth
    erwise a cloudy start with sunshine returning later. High near 67\u00b0F (19\u00b0C), low around 51\u00b0F (11\u00b0C).\n\nIf you\u2019d like wind speed, humidity, or hourly updates, I can pull those too. ",
    }
    {
    "type": "response.content_part.done",
    "sequence_number": 26,
    "item_id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65",
    "output_index": 3,
    "content_index": 0,
    "part": {
    "type": "output_text",
    "annotations": [],
    "text": "Here\u2019s the current weather for San Francisco, CA (as of Wednesday, October 15, 2025):\n\n- Current conditions: Cloudy, 58\u00b0F (14\u00b0C).\n- Today\u2019s outlook: A shower in spots late this morning; o
    therwise a cloudy start with sunshine returning later. High near 67\u00b0F (19\u00b0C), low around 51\u00b0F (11\u00b0C).\n\nIf you\u2019d like wind speed, humidity, or hourly updates, I can pull those too. "
    }
    }
  1. Items with content MAY emit multiple content parts, following the pattern above. When finished, the item is closed with response.output_item.done
  • Example

    {
    "type": "response.output_item.done",
    "sequence_number": 27,
    "output_index": 3,
    "item": {
    "id": "msg_0f7983f1618f89d20068efe9b45d748191a5239d49c2971a65",
    "type": "message",
    "status": "completed",
    "content": [
    {
    "type": "output_text",
    "annotations": [],
    "text": "Here\u2019s the current weather for San Francisco, CA (as of Wednesday, October 15, 2025):\n\n- Current conditions: Cloudy, 58\u00b0F (14\u00b0C).\n- Today\u2019s outlook: A shower in spots late this mornin
    g; otherwise a cloudy start with sunshine returning later. High near 67\u00b0F (19\u00b0C), low around 51\u00b0F (11\u00b0C).\n\nIf you\u2019d like wind speed, humidity, or hourly updates, I can pull those too. "
    }
    ],
    "role": "assistant"
    }
    }

Items are extensible

Model providers may emit their own types of items not contained in the Open Responses spec. These items MUST be prefixed with the canonical provider slug, eg

{
"id": "ws_0df093a2d268cd7f0068efe79ac9408190b9833ec01e5d05ed",
"type": "openai:web_search_call",
"status": "completed",
"action": {
"type": "search",
"query": "weather: San Francisco, CA"
}
}

Content

In Open Responses, “content” represents the raw material exchanged between the user and the model within each message turn. It is deliberately modeled as a discriminated union rather than a single polymorphic type, allowing different shapes for user-authored input and model-generated output.

User Content vs Model Content

There are two top-level content unions:

  • UserContent: structured data provided by the user or client application.
  • ModelContent: structured data returned by the model.

This distinction reflects the asymmetric nature of conversation turns. User inputs can include multiple modalities (e.g. text, images, audio, video), while model outputs are usually restricted to text, at least in the base protocol. Keeping these unions separate allows model providers to evolve their output schemas independently (for example, adding output_image or output_tool_call in the future) without over-generalizing the user side.

Why User and Assistant Content Have Different Shapes

Although both are called “content,” user and assistant payloads serve very different roles:

  • User content captures what the model is being asked to process. It can include one or more text segments, images, or other binary data that are meaningful inputs to the model’s inference step.
  • Assistant (model) content captures what the model produced—its textual completion or reasoning output. These are usually serializable as UTF-8 text, optionally with metadata like token logprobs.

Because of this asymmetry:

  • User content must support multiple data types (text, base64-encoded or URL-sourced images, potentially other future modalities).
  • Model content is intentionally narrower to keep the protocol minimal and predictable across providers.

This separation simplifies validation, streaming, and logging. For example, a client can safely assume every model message contains output_text, while user messages may contain arbitrary mixtures of text and image inputs.

Errors

Error handling in Open Responses is designed to provide clear, actionable feedback to developers and users when requests fail or encounter issues.

When an error occurs, the API responds with a structured error object that includes:

  • Type: The category of the error, such as server_error, model_error, invalid_request, or not_found. These generally, but not always, map to the status code of the response.
  • Code: An optional error code providing additional detail about the specific problem. All common error codes are enumerated in the spec.
  • Param: The input parameter related to the error, if applicable.
  • Message: A human-readable explanation of what went wrong.

Errors are also emitted as events in the streaming protocol, allowing clients to detect and respond to issues in real time while processing streamed responses. Any error incurred while streaming will be followed by a response.failed event.

It is recommended to check for error responses and handle them gracefully in your client implementation. Display user-friendly messages or trigger appropriate fallbacks where possible.

Error types can indicate whether an error is recoverable or not, allowing your application to determine if it should retry, prompt the user, or halt the operation. The error’s code provide additional details, often including guidance or steps for recovery, which helps developers create more robust and responsive clients.

Streaming

Open Responses defines two major types of streaming events into which all events MUST fall.

Delta Events

A delta event represents a change to an object since its last update. This mechanism allows updates to be communicated incrementally as they happen, making streamed responses possible and efficient.

Common examples of delta events include:

  • Adding a new item to a list (response.output_item.added)—for example, when a new message is generated and appended to a conversation.
  • Appending more text to an ongoing output (response.output_text.delta or response.function_call_arguments.delta)—such as when a language model is generating a reply word by word.
  • Signaling that no further changes will occur to a content part (response.content_part.done)—for example, when the model has finished generating a sentence or paragraph.

All streamable objects must follow this delta event pattern. For instance, the flow of delta events on an output item could start with the item being added, followed by several text deltas as content is generated, and ending with a “done” event to indicate completion.

simple-streaming.svg

State Machine events

A state machine event represents a change in the status of an object as it progresses through its lifecycle. For example, when a response changes status from in_progress to completed, a response.completed event is triggered.

Other examples include:

  • When a Response moves from queued to in_progress, a response.in_progress event is emitted.
  • If an Response encounters an error and its status changes to failed, a response.failed event is generated.

Tools

Tool-use is at the heart of modern LLM’s agentic capabilities. There are generally two types of tools that can be exposed in Open Responses:

Externally-hosted tools

Externally hosted tools are ones where the implementation lives outside of the model provider’s system. Functions are a great example, where the LLM is able to call the function, but the developer must run the function and return the output back to the LLM in a second request. MCP is another example, where external servers host the implementations, although control is not first yielded back to the developer.

Internally-hosted tools

Internally hosted tools are ones where the implementation lives inside the model provider’s system. An example is OpenAI’s file search tool, where the model is able to call file search, and the tool can execute and return results to the LLM without yielding control back to the developer.

Implementing Open Responses as a routing layer

Build hosted tools at the routing layer

Routers act as intermediaries between clients and upstream model providers. They can expose their own hosted tools—distinct from standard function tools—by providing new capabilities directly within the router, such as custom search, retrieval, or workflow orchestration. When a user or model invokes a hosted tool, the router executes it internally, often integrating with services or logic outside the model provider, and returns the result without exposing the implementation details. This enables routers to add custom features and maintain a consistent tool interface for both users and models.

{
"model": "my-model",
"messages": [
{"role": "user", "content": "Find relevant documents about climate change."}
],
"tools": [
{
"type": "custom_document_search",
"documents": [
{ "type": "external_file", "url": "https://arxiv.org/example.pdf" }
]
}
]
}

In this example, the client requests the custom_document_search hosted tool exposed by the router. The router executes the tool directly, returning results to the client and abstracting away the internal workings.

Params for multi-provider routing

Routers MAY add two top-level params to the CreateResponseBody payload when routing to providers that do not support Open Responses:

  • provider_options – an array of objects keyed with the model provider and their provider-specific API options, if they do not belong in the base spec
  • provider – a string indicating which provider to target for a given request
curl -X POST "https://api.router.com/v1/responses" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Open Responses-Version: latest" \
--data '{
"model": "opus-4.5",
"provider": "anthropic",
"input": "tell me a joke",
"provider_options": [
{
"type": "anthropic",
"inference_speed": "fast"
}
]
}'

Model provider implementers of Open Responses may not add these params.

Implementing Open Responses as a model provider

The Agentic loop

The agentic loop is the core principle that enables Open Responses to interact intelligently with users, reason about tasks, and invoke tools as needed to complete complex workflows. In this loop, the language model analyzes the user’s request, determines whether it can answer directly or needs to use a tool, and issues tool calls as required.

When a tool needs to be used, the model emits a tool invocation event, such as a function call or a hosted tool call, containing the necessary parameters. For externally-hosted tools, the execution happens outside the model provider, and the result is returned in a follow-up request. For internally-hosted tools, the model provider executes the tool and immediately streams back the results.

This loop of reasoning, tool invocation, and response generation can repeat multiple times within a session, allowing the model to gather information, process data, or take actions through tools before producing a final answer for the user.

sequenceDiagram
    autonumber
    participant User
    participant API as API Server
    participant LLM
    participant Tool

    User->>API: user request

    loop agentic loop
        API->>LLM: sample from model
        LLM-->>API: Sampled message
        alt tool called by model
            API->>Tool: sampled tool call arguments
            Tool-->>API: tool result
        else
            Note right of API: loop finished
        end
    end

    API-->>User: output items