Reference
Full OpenAPI Schema
Request Parameters
The Responses API allows you to generate text, execute tools, and handle multi-turn conversations. It supports both synchronous requests and streaming for real-time output.
Parameters may be supplied as either application/json or application/x-www-form-urlencoded.
| Header | Values | Required? |
|---|
Content-Type | application/json or application/x-www-form-urlencoded | Yes |
Authorization | Authorization token, usually in the form of Bearer [secret token] | Yes |
Body
The model to use for this request, e.g. 'gpt-5.2'.
Context to provide to the model for the scope of this request. May either be a string or an array of input items. If a string is provided, it is interpreted as a user message.
The ID of the response to use as the prior turn for this request.
Values
| reasoning.encrypted_content | includes encrypted reasoning content so that it may be rehydrated on a subsequent request. |
| message.output_text.logprobs | includes sampled logprobs in assistant messages. |
A list of tools that the model may call while generating the response.
Controls which tool the model should use, if any.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Configuration options for text output.
Sampling temperature to use, between 0 and 2. Higher values make the output more random.
Nucleus sampling parameter, between 0 and 1. The model considers only the tokens with the top cumulative probability.
Penalizes new tokens based on whether they appear in the text so far.
Penalizes new tokens based on their frequency in the text so far.
Whether the model may call multiple tools in parallel.
Whether to stream response events as server-sent events.
Options that control streamed response behavior.
Whether to run the request in the background and return immediately.
The maximum number of tokens the model may generate for this response.
The maximum number of tool calls the model may make while generating the response.
Configuration options for reasoning behavior.
A stable identifier used for safety monitoring and abuse detection.
A key to use when reading from or writing to the prompt cache.
Controls how the service truncates the input when it exceeds the model context window.
Values
| auto | Let the service decide how to truncate. |
| disabled | Disable service truncation. Context over the model's context limit will result in a 400 error. |
Additional instructions to guide the model for this request.
Whether to store the response so it can be retrieved later.
The service tier to use for this request.
Values
| auto | Choose a service tier automatically based on current account state. |
| default | Choose the default service tier. |
| flex | Choose the flex service tier. |
| priority | Choose the priority service tier. |
The number of most likely tokens to return at each position, along with their log probabilities.
Response Parameters
| Header | Values | Required? |
|---|
Content-Type | application/json or text/event-stream | Yes |
Body
The unique ID of the response that was created.
object
"response"
required
The object type, which was always response.
created_at integer
required
The Unix timestamp (in seconds) for when the response was created.
The Unix timestamp (in seconds) for when the response was completed, if it was completed.
The status that was set for the response.
incomplete_details
required
Details about why the response was incomplete, if applicable.
The model that generated this response.
previous_response_id
required
The ID of the previous response in the chain that was referenced, if any.
Additional instructions that were used to guide the model for this response.
The output items that were generated by the model.
The error that occurred, if the response failed.
The tools that were available to the model during response generation.
How the input was truncated by the service when it exceeded the model context window.
Values
| auto | Let the service decide how to truncate. |
| disabled | Disable service truncation. Context over the model's context limit will result in a 400 error. |
parallel_tool_calls boolean
required
Whether the model was allowed to call multiple tools in parallel.
Configuration options for text output that were used.
The nucleus sampling parameter that was used for this response.
presence_penalty number
required
The presence penalty that was used to penalize new tokens based on whether they appear in the text so far.
frequency_penalty number
required
The frequency penalty that was used to penalize new tokens based on their frequency in the text so far.
top_logprobs integer
required
The number of most likely tokens that were returned at each position, along with their log probabilities.
temperature number
required
The sampling temperature that was used for this response.
Reasoning configuration and outputs that were produced for this response.
Token usage statistics that were recorded for the response, if available.
max_output_tokens
required
The maximum number of tokens the model was allowed to generate for this response.
The maximum number of tool calls the model was allowed to make while generating the response.
Whether this response was stored so it can be retrieved later.
background boolean
required
Whether this request was run in the background.
service_tier string
required
The service tier that was used for this response.
metadata unknown
required
Developer-defined metadata that was associated with the response.
safety_identifier
required
A stable identifier that was used for safety monitoring and abuse detection.
prompt_cache_key
required
A key that was used to read from or write to the prompt cache.
WebSocket Mode
Open a WebSocket connection to /v1/responses and start each turn with a JSON response.create client message. The message uses the normal response creation body, except HTTP transport fields such as stream, stream_options, and background are not sent. Server progress messages use the same streaming event objects as text/event-stream responses, while WebSocket failures use an error envelope.
type
"response.create"
required
The client event type. Always response.create.
The model to use for this request, e.g. 'gpt-5.2'.
Context to provide to the model for the scope of this request. May either be a string or an array of input items. If a string is provided, it is interpreted as a user message.
The ID of the response to use as the prior turn for this request.
Values
| reasoning.encrypted_content | includes encrypted reasoning content so that it may be rehydrated on a subsequent request. |
| message.output_text.logprobs | includes sampled logprobs in assistant messages. |
A list of tools that the model may call while generating the response.
Controls which tool the model should use, if any.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Configuration options for text output.
Sampling temperature to use, between 0 and 2. Higher values make the output more random.
Nucleus sampling parameter, between 0 and 1. The model considers only the tokens with the top cumulative probability.
Penalizes new tokens based on whether they appear in the text so far.
Penalizes new tokens based on their frequency in the text so far.
Whether the model may call multiple tools in parallel.
The maximum number of tokens the model may generate for this response.
The maximum number of tool calls the model may make while generating the response.
Configuration options for reasoning behavior.
A stable identifier used for safety monitoring and abuse detection.
A key to use when reading from or writing to the prompt cache.
Controls how the service truncates the input when it exceeds the model context window.
Values
| auto | Let the service decide how to truncate. |
| disabled | Disable service truncation. Context over the model's context limit will result in a 400 error. |
Additional instructions to guide the model for this request.
Whether to store the response so it can be retrieved later.
The service tier to use for this request.
Values
| auto | Choose a service tier automatically based on current account state. |
| default | Choose the default service tier. |
| flex | Choose the flex service tier. |
| priority | Choose the priority service tier. |
The number of most likely tokens to return at each position, along with their log probabilities.
The event type. Always error.
The HTTP-style status code for the WebSocket error.
The WebSocket error payload.
Compaction Endpoint
POST
/v1/responses/compact
The compaction endpoint returns a compacted conversation state object that can be used to preserve long-running context without asserting provider-specific compression behavior.
Body
Model ID used to generate the response, like
gpt-5 or
o3. OpenAI offers a wide range of models with different capabilities, performance characteristics, and price points. Refer to the
model guide to browse and compare available models.
Text, image, or file inputs to the model, used to generate a response
The unique ID of the previous response to the model. Use this to create multi-turn conversations. Learn more about
conversation state. Cannot be used in conjunction with
conversation.
A system (or developer) message inserted into the model's context.
When used along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.
A key to use when reading from or writing to the prompt cache.
Response
The unique identifier for the compacted response.
object
"response.compaction"
required
The object type. Always response.compaction.
The compacted list of output items.
created_at integer
required
Unix timestamp (in seconds) when the compacted conversation was created.
Token accounting for the compaction pass, including cached, reasoning, and total tokens.
API Objects
Enums
FunctionCallOutputStatusEnum
Similar to FunctionCallStatus. All three options are allowed here for compatibility, but because in practice these items will be provided by developers, only completed should be used.
Values
| in_progress | - |
| completed | - |
| incomplete | - |
FunctionCallStatus
Values
| in_progress | Model is currently sampling this item. |
| completed | Model has finished sampling this item. |
| incomplete | Model was interrupted from sampling this item partway through. This can occur, for example, if the model encounters a stop token or exhausts its output_token budget. |
ImageDetail
Values
| low | Restricts the model to a lower-resolution version of the image. |
| high | Allows the model to "see" a higher-resolution version of the image, usually increasing input token costs. |
| auto | Choose the detail level automatically. |
IncludeEnum
Values
| reasoning.encrypted_content | includes encrypted reasoning content so that it may be rehydrated on a subsequent request. |
| message.output_text.logprobs | includes sampled logprobs in assistant messages. |
MessageRole
Values
| user | End‑user input in the conversation. |
| assistant | Model-generated content in the conversation. |
| system | System-level instructions that set global behavior. |
| developer | Developer-supplied guidance that shapes the assistant’s behavior. |
MessageStatus
Values
| in_progress | Model is currently sampling this item. |
| completed | Model has finished sampling this item. |
| incomplete | Model was interrupted from sampling this item partway through. This can occur, for example, if the model encounters a stop token or exhausts its output_token budget. |
ReasoningEffortEnum
Values
| none | Restrict the model from performing any reasoning before emitting a final answer. |
| low | Use a lower reasoning effort for faster responses. |
| medium | Use a balanced reasoning effort. |
| high | Use a higher reasoning effort to improve answer quality. |
| xhigh | Use the maximum reasoning effort available. |
ReasoningSummaryEnum
Values
| concise | Emit concise summaries of reasoning content. |
| detailed | Emit details summaries of reasoning content. |
| auto | Allow the model to decide when to summarize. |
ServiceTierEnum
Values
| auto | Choose a service tier automatically based on current account state. |
| default | Choose the default service tier. |
| flex | Choose the flex service tier. |
| priority | Choose the priority service tier. |
TruncationEnum
Values
| auto | Let the service decide how to truncate. |
| disabled | Disable service truncation. Context over the model's context limit will result in a 400 error. |
VerbosityEnum
Values
| low | Instruct the model to emit less verbose final responses. |
| medium | Use the model's default verbosity setting. |
| high | Instruct the model to emit more verbose final responses. |
Unions
Annotation
An annotation that applies to a span of output text.
ItemField
An item representing a message, tool call, tool output, reasoning, or other response element.
ItemParam
An internal identifier for an item to reference.
Tool
A tool that can be used to generate a response.
ToolChoiceParam
Controls which tool the model should use, if any.
Objects
AssistantMessageItemParam
The unique ID of this message item.
type
"message"
required
The item type. Always message.
role
"assistant"
required
The role of the message author. Always assistant.
The message content, as an array of content parts.
phaseenum
Labels an assistant message as intermediate commentary (commentary) or the final answer (final_answer). For models like gpt-5.3-codex and beyond, when sending follow-up requests, preserve and resend phase on all assistant messages. Omitting it can degrade performance. Not used for user messages.
The status of the message item.
CompactionBody
A compaction item generated by the
v1/responses/compact API.
type
"compaction"
required
The type of the item. Always compaction.
idstring
required
The unique ID of the compaction item.
encrypted_contentstring
required
The encrypted content that was produced by compaction.
created_bystring
The identifier of the actor that created the item.
CompactionSummaryItemParam
A compaction item generated by the
v1/responses/compact API.
The ID of the compaction item.
type
"compaction"
required
The type of the item. Always compaction.
encrypted_contentstring
required
The encrypted content of the compaction summary.
CreateResponseBody
The model to use for this request, e.g. 'gpt-5.2'.
Context to provide to the model for the scope of this request. May either be a string or an array of input items. If a string is provided, it is interpreted as a user message.
previous_response_idstring
The ID of the response to use as the prior turn for this request.
includeenum[]
Values
| reasoning.encrypted_content | includes encrypted reasoning content so that it may be rehydrated on a subsequent request. |
| message.output_text.logprobs | includes sampled logprobs in assistant messages. |
A list of tools that the model may call while generating the response.
Controls which tool the model should use, if any.
Values
| none | Restrict the model from calling any tools. |
| auto | Let the model choose the tools from among the provided set. |
| required | Require the model to call a tool. |
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Configuration options for text output.
Sampling temperature to use, between 0 and 2. Higher values make the output more random.
Nucleus sampling parameter, between 0 and 1. The model considers only the tokens with the top cumulative probability.
Penalizes new tokens based on whether they appear in the text so far.
Penalizes new tokens based on their frequency in the text so far.
parallel_tool_callsboolean
Whether the model may call multiple tools in parallel.
streamboolean
Whether to stream response events as server-sent events.
Options that control streamed response behavior.
backgroundboolean
Whether to run the request in the background and return immediately.
The maximum number of tokens the model may generate for this response.
The maximum number of tool calls the model may make while generating the response.
Configuration options for reasoning behavior.
A stable identifier used for safety monitoring and abuse detection.
A key to use when reading from or writing to the prompt cache.
truncationenum
Controls how the service truncates the input when it exceeds the model context window.
Values
| auto | Let the service decide how to truncate. |
| disabled | Disable service truncation. Context over the model's context limit will result in a 400 error. |
Additional instructions to guide the model for this request.
storeboolean
Whether to store the response so it can be retrieved later.
service_tierenum
The service tier to use for this request.
Values
| auto | Choose a service tier automatically based on current account state. |
| default | Choose the default service tier. |
| flex | Choose the flex service tier. |
| priority | Choose the priority service tier. |
The number of most likely tokens to return at each position, along with their log probabilities.
DeveloperMessageItemParam
The unique ID of this message item.
type
"message"
required
The item type. Always message.
role
"developer"
required
The message role. Always developer.
The message content, as an array of content parts.
The status of the message item.
Error
An error that occurred while generating the response.
codestring
required
A machine-readable error code that was returned.
messagestring
required
A human-readable description of the error that was returned.
FunctionCall
A function tool call that was generated by the model.
type
"function_call"
required
The type of the item. Always function_call.
idstring
required
The unique ID of the function call item.
call_idstring
required
The unique ID of the function tool call that was generated.
namestring
required
The name of the function that was called.
argumentsstring
required
The arguments JSON string that was generated.
statusenum
required
The status of the function call item that was recorded.
Values
| in_progress | Model is currently sampling this item. |
| completed | Model has finished sampling this item. |
| incomplete | Model was interrupted from sampling this item partway through. This can occur, for example, if the model encounters a stop token or exhausts its output_token budget. |
FunctionCallItemParam
The unique ID of this function tool call.
call_idstring
required
The unique ID of the function tool call generated by the model.
type
"function_call"
required
The item type. Always function_call.
namestring
required
The name of the function to call.
argumentsstring
required
The function arguments as a JSON string.
The status of the function tool call.
Values
| in_progress | Model is currently sampling this item. |
| completed | Model has finished sampling this item. |
| incomplete | Model was interrupted from sampling this item partway through. This can occur, for example, if the model encounters a stop token or exhausts its output_token budget. |
FunctionCallOutput
A function tool call output that was returned by the tool.
type
"function_call_output"
required
The type of the function tool call output. Always function_call_output.
idstring
required
The unique ID of the function tool call output. Populated when this item is returned via API.
call_idstring
required
The unique ID of the function tool call generated by the model.
A JSON string of the output of the function tool call.
statusenum
required
The status of the item. One of in_progress, completed, or incomplete. Populated when items are returned via API.
Values
| in_progress | - |
| completed | - |
| incomplete | - |
FunctionCallOutputItemParam
The output of a function tool call.
The unique ID of the function tool call output. Populated when this item is returned via API.
call_idstring
required
The unique ID of the function tool call generated by the model.
type
"function_call_output"
required
The type of the function tool call output. Always function_call_output.
Text, image, or file output of the function tool call.
The status of the item. One of in_progress, completed, or incomplete. Populated when items are returned via API.
Values
| in_progress | Model is currently sampling this item. |
| completed | Model has finished sampling this item. |
| incomplete | Model was interrupted from sampling this item partway through. This can occur, for example, if the model encounters a stop token or exhausts its output_token budget. |
IncompleteDetails
Details about why the response was incomplete.
reasonstring
required
The reason the response could not be completed.
InputFileContent
A file input to the model.
type
"input_file"
required
The type of the input item. Always input_file.
filenamestring
The name of the file to be sent to the model.
file_urlstring
The URL of the file to be sent to the model.
InputFileContentParam
A file input to the model.
type
"input_file"
required
The type of the input item. Always input_file.
The name of the file to be sent to the model.
The base64-encoded data of the file to be sent to the model.
The URL of the file to be sent to the model.
InputImageContent
An image input to the model. Learn about
image inputs.
type
"input_image"
required
The type of the input item. Always input_image.
The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
detailenum
required
The detail level of the image to be sent to the model. One of high, low, or auto. Defaults to auto.
Values
| low | Restricts the model to a lower-resolution version of the image. |
| high | Allows the model to "see" a higher-resolution version of the image, usually increasing input token costs. |
| auto | Choose the detail level automatically. |
InputImageContentParamAutoParam
An image input to the model. Learn about
image inputstype
"input_image"
required
The type of the input item. Always input_image.
The URL of the image to be sent to the model. A fully qualified URL or base64 encoded image in a data URL.
The detail level of the image to be sent to the model. One of high, low, or auto. Defaults to auto.
Values
| low | Restricts the model to a lower-resolution version of the image. |
| high | Allows the model to "see" a higher-resolution version of the image, usually increasing input token costs. |
| auto | Choose the detail level automatically. |
InputTextContent
A text input to the model.
type
"input_text"
required
The type of the input item. Always input_text.
textstring
required
The text input to the model.
InputTextContentParam
A text input to the model.
type
"input_text"
required
The type of the input item. Always input_text.
textstring
required
The text input to the model.
InputVideoContent
A content block representing a video input to the model.
type
"input_video"
required
The type of the input content. Always input_video.
video_urlstring
required
A base64 or remote url that resolves to a video file.
ItemReferenceParam
An internal identifier for an item to reference.
type
"item_reference"
The type of item to reference. Always item_reference.
idstring
required
The ID of the item to reference.
JsonSchemaResponseFormatParam
LogProb
The log probability of a token.
top_logprobsTopLogProb[]
required
Message
A message to or from the model.
type
"message"
required
The type of the message. Always set to message.
idstring
required
The unique ID of the message.
statusenum
required
The status of item. One of in_progress, completed, or incomplete. Populated when items are returned via API.
Values
| in_progress | Model is currently sampling this item. |
| completed | Model has finished sampling this item. |
| incomplete | Model was interrupted from sampling this item partway through. This can occur, for example, if the model encounters a stop token or exhausts its output_token budget. |
roleenum
required
The role of the message. One of unknown, user, assistant, system, critic, discriminator, developer, or tool.
Values
| user | End‑user input in the conversation. |
| assistant | Model-generated content in the conversation. |
| system | System-level instructions that set global behavior. |
| developer | Developer-supplied guidance that shapes the assistant’s behavior. |
contentInputTextContent[]
required
The content of the message
phaseenum
Labels an assistant message as intermediate commentary (commentary) or the final answer (final_answer). For models like gpt-5.3-codex and beyond, when sending follow-up requests, preserve and resend phase on all assistant messages. Omitting it can degrade performance. Not used for user messages.
OutputTextContent
A text output from the model.
type
"output_text"
required
The type of the output text. Always output_text.
textstring
required
The text output from the model.
annotationsAnnotation[]
required
The annotations of the text output.
OutputTextContentParam
type
"output_text"
required
The content type. Always output_text.
textstring
required
The text content.
Citations associated with the text content.
OutputTokensDetails
A breakdown of output token usage that was recorded.
reasoning_tokensinteger
required
The number of output tokens that were attributed to reasoning.
Reasoning
Reasoning configuration and metadata that were used for the response.
The reasoning effort that was requested for the model, if specified.
Values
| none | Restrict the model from performing any reasoning before emitting a final answer. |
| low | Use a lower reasoning effort for faster responses. |
| medium | Use a balanced reasoning effort. |
| high | Use a higher reasoning effort to improve answer quality. |
| xhigh | Use the maximum reasoning effort available. |
A model-generated summary of its reasoning that was produced, if available.
Values
| concise | Emit concise summaries of reasoning content. |
| detailed | Emit details summaries of reasoning content. |
| auto | Allow the model to decide when to summarize. |
ReasoningBody
A reasoning item that was generated by the model.
type
"reasoning"
required
The type of the item. Always reasoning.
idstring
required
The unique ID of the reasoning item.
contentInputTextContent[]
The reasoning content that was generated.
summaryInputTextContent[]
required
The reasoning summary content that was generated.
encrypted_contentstring
The encrypted reasoning content that was generated.
ReasoningItemParam
The unique ID of this reasoning item.
type
"reasoning"
required
The item type. Always reasoning.
summaryReasoningSummaryContentParam[]
required
Reasoning summary content associated with this item.
An encrypted representation of the reasoning content.
ReasoningParam
gpt-5 and o-series models only Configuration options for
reasoning models.
Controls the level of reasoning effort the model should apply. Higher effort may increase latency and cost.
Values
| none | Restrict the model from performing any reasoning before emitting a final answer. |
| low | Use a lower reasoning effort for faster responses. |
| medium | Use a balanced reasoning effort. |
| high | Use a higher reasoning effort to improve answer quality. |
| xhigh | Use the maximum reasoning effort available. |
Controls whether the response includes a reasoning summary.
Values
| concise | Emit concise summaries of reasoning content. |
| detailed | Emit details summaries of reasoning content. |
| auto | Allow the model to decide when to summarize. |
ReasoningSummaryContentParam
type
"summary_text"
required
The content type. Always summary_text.
textstring
required
The reasoning summary text.
ReasoningTextContent
Reasoning text from the model.
type
"reasoning_text"
required
The type of the reasoning text. Always reasoning_text.
textstring
required
The reasoning text from the model.
RefusalContent
A refusal from the model.
type
"refusal"
required
The type of the refusal. Always refusal.
refusalstring
required
The refusal explanation from the model.
RefusalContentParam
type
"refusal"
required
The content type. Always refusal.
refusalstring
required
The refusal text.
SpecificFunctionParam
type
"function"
required
The tool to call. Always function.
namestring
required
The name of the function tool to call.
StreamOptionsParam
Options that control streamed response behavior.
include_obfuscationboolean
Whether to obfuscate sensitive information in streamed output. Defaults to true.
SummaryTextContent
A summary text from the model.
type
"summary_text"
required
The type of the object. Always summary_text.
textstring
required
A summary of the reasoning output from the model so far.
SystemMessageItemParam
The unique ID of this message item.
type
"message"
required
The item type. Always message.
role
"system"
required
The message role. Always system.
The message content, as an array of content parts.
The status of the message item.
TextField
verbosityenum
Values
| low | Instruct the model to emit less verbose final responses. |
| medium | Use the model's default verbosity setting. |
| high | Instruct the model to emit more verbose final responses. |
TextParam
The format configuration for text output.
verbosityenum
Controls the level of detail in generated text output.
Values
| low | Instruct the model to emit less verbose final responses. |
| medium | Use the model's default verbosity setting. |
| high | Instruct the model to emit more verbose final responses. |
TopLogProb
The top log probability of a token.
UrlCitationBody
A citation for a web resource used to generate a model response.
type
"url_citation"
required
The type of the URL citation. Always url_citation.
urlstring
required
The URL of the web resource.
start_indexinteger
required
The index of the first character of the URL citation in the message.
end_indexinteger
required
The index of the last character of the URL citation in the message.
titlestring
required
The title of the web resource.
UrlCitationParam
type
"url_citation"
required
The citation type. Always url_citation.
start_indexinteger
required
The index of the first character of the citation in the message.
end_indexinteger
required
The index of the last character of the citation in the message.
urlstring
required
The URL of the cited resource.
titlestring
required
The title of the cited resource.
Usage
Token usage statistics that were recorded for the response.
input_tokensinteger
required
The number of input tokens that were used to generate the response.
output_tokensinteger
required
The number of output tokens that were generated by the model.
total_tokensinteger
required
The total number of tokens that were used.
input_tokens_detailsInputTokensDetails
required
A breakdown of input token usage that was recorded.
output_tokens_detailsOutputTokensDetails
required
A breakdown of output token usage that was recorded.
UserMessageItemParam
The unique ID of this message item.
type
"message"
required
The item type. Always message.
role
"user"
required
The message role. Always user.
The message content, as an array of content parts.
The status of the message item.