Skip to main content

/responses [Beta]

LiteLLM provides a BETA endpoint in the spec of OpenAI's /responses API

FeatureSupportedNotes
Cost Tracking✅Works with all supported models
Logging✅Works across all integrations
End-user Tracking✅
Streaming✅
Fallbacks✅Works between supported models
Loadbalancing✅Works between supported models
Supported operationsCreate a response, Get a response, Delete a response
Supported LiteLLM Versions1.63.8+
Supported LLM providersAll LiteLLM supported providersopenai, anthropic, bedrock, vertex_ai, gemini, azure, azure_ai etc.

Usage​

LiteLLM Python SDK​

Non-streaming​

OpenAI Non-streaming Response
import litellm

# Non-streaming response
response = litellm.responses(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)

print(response)

Streaming​

OpenAI Streaming Response
import litellm

# Streaming response
response = litellm.responses(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)

for event in response:
print(event)

GET a Response​

Get Response by ID
import litellm

# First, create a response
response = litellm.responses(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)

# Get the response ID
response_id = response.id

# Retrieve the response by ID
retrieved_response = litellm.get_responses(
response_id=response_id
)

print(retrieved_response)

# For async usage
# retrieved_response = await litellm.aget_responses(response_id=response_id)

DELETE a Response​

Delete Response by ID
import litellm

# First, create a response
response = litellm.responses(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)

# Get the response ID
response_id = response.id

# Delete the response by ID
delete_response = litellm.delete_responses(
response_id=response_id
)

print(delete_response)

# For async usage
# delete_response = await litellm.adelete_responses(response_id=response_id)

LiteLLM Proxy with OpenAI SDK​

First, set up and start your LiteLLM proxy server.

Start LiteLLM Proxy Server
litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

First, add this to your litellm proxy config.yaml:

OpenAI Proxy Configuration
model_list:
- model_name: openai/o1-pro
litellm_params:
model: openai/o1-pro
api_key: os.environ/OPENAI_API_KEY

Non-streaming​

OpenAI Proxy Non-streaming Response
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

Streaming​

OpenAI Proxy Streaming Response
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)

# Streaming response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)

for event in response:
print(event)

GET a Response​

Get Response by ID with OpenAI SDK
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)

# First, create a response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn."
)

# Get the response ID
response_id = response.id

# Retrieve the response by ID
retrieved_response = client.responses.retrieve(response_id)

print(retrieved_response)

DELETE a Response​

Delete Response by ID with OpenAI SDK
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)

# First, create a response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn."
)

# Get the response ID
response_id = response.id

# Delete the response by ID
delete_response = client.responses.delete(response_id)

print(delete_response)

Supported Responses API Parameters​

ProviderSupported Parameters
openaiAll Responses API parameters are supported
azureAll Responses API parameters are supported
anthropicSee supported parameters here
bedrockSee supported parameters here
geminiSee supported parameters here
vertex_aiSee supported parameters here
azure_aiSee supported parameters here
All other llm api providersSee supported parameters here

Load Balancing with Session Continuity.​

When using the Responses API with multiple deployments of the same model (e.g., multiple Azure OpenAI endpoints), LiteLLM provides session continuity. This ensures that follow-up requests using a previous_response_id are routed to the same deployment that generated the original response.

Example Usage​

Python SDK with Session Continuity
import litellm

# Set up router with multiple deployments of the same model
router = litellm.Router(
model_list=[
{
"model_name": "azure-gpt4-turbo",
"litellm_params": {
"model": "azure/gpt-4-turbo",
"api_key": "your-api-key-1",
"api_version": "2024-06-01",
"api_base": "https://endpoint1.openai.azure.com",
},
},
{
"model_name": "azure-gpt4-turbo",
"litellm_params": {
"model": "azure/gpt-4-turbo",
"api_key": "your-api-key-2",
"api_version": "2024-06-01",
"api_base": "https://endpoint2.openai.azure.com",
},
},
],
optional_pre_call_checks=["responses_api_deployment_check"],
)

# Initial request
response = await router.aresponses(
model="azure-gpt4-turbo",
input="Hello, who are you?",
truncation="auto",
)

# Store the response ID
response_id = response.id

# Follow-up request - will be automatically routed to the same deployment
follow_up = await router.aresponses(
model="azure-gpt4-turbo",
input="Tell me more about yourself",
truncation="auto",
previous_response_id=response_id # This ensures routing to the same deployment
)

Session Management - Non-OpenAI Models​

LiteLLM Proxy supports session management for non-OpenAI models. This allows you to store and fetch conversation history (state) in LiteLLM Proxy.

Usage​

  1. Enable storing request / response content in the database

Set store_prompts_in_spend_logs: true in your proxy config.yaml. When this is enabled, LiteLLM will store the request and response content in the database.

general_settings:
store_prompts_in_spend_logs: true
  1. Make request 1 with no previous_response_id (new session)

Start a new conversation by making a request without specifying a previous response ID.

curl http://localhost:4000/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "anthropic/claude-3-5-sonnet-latest",
"input": "who is Michael Jordan"
}'

Response:

{
"id":"resp_123abc",
"model":"claude-3-5-sonnet-20241022",
"output":[{
"type":"message",
"content":[{
"type":"output_text",
"text":"Michael Jordan is widely considered one of the greatest basketball players of all time. He played for the Chicago Bulls (1984-1993, 1995-1998) and Washington Wizards (2001-2003), winning 6 NBA Championships with the Bulls."
}]
}]
}
  1. Make request 2 with previous_response_id (same session)

Continue the conversation by referencing the previous response ID to maintain conversation context.

curl http://localhost:4000/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "anthropic/claude-3-5-sonnet-latest",
"input": "can you tell me more about him",
"previous_response_id": "resp_123abc"
}'

Response:

{
"id":"resp_456def",
"model":"claude-3-5-sonnet-20241022",
"output":[{
"type":"message",
"content":[{
"type":"output_text",
"text":"Michael Jordan was born February 17, 1963. He attended University of North Carolina before being drafted 3rd overall by the Bulls in 1984. Beyond basketball, he built the Air Jordan brand with Nike and later became owner of the Charlotte Hornets."
}]
}]
}
  1. Make request 3 with no previous_response_id (new session)

Start a brand new conversation without referencing previous context to demonstrate how context is not maintained between sessions.

curl http://localhost:4000/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "anthropic/claude-3-5-sonnet-latest",
"input": "can you tell me more about him"
}'

Response:

{
"id":"resp_789ghi",
"model":"claude-3-5-sonnet-20241022",
"output":[{
"type":"message",
"content":[{
"type":"output_text",
"text":"I don't see who you're referring to in our conversation. Could you let me know which person you'd like to learn more about?"
}]
}]
}