Skip to main content

3 posts tagged with "responses_api"

View All Tags

Krrish Dholakia
Ishaan Jaffer

Deploy this version​

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.67.4-stable

Key Highlights​

  • Improved User Management: This release enables search and filtering across users, keys, teams, and models.
  • Responses API Load Balancing: Route requests across provider regions and ensure session continuity.
  • UI Session Logs: Group several requests to LiteLLM into a session.

Improved User Management​


This release makes it easier to manage users and keys on LiteLLM. You can now search and filter across users, keys, teams, and models, and control user settings more easily.

New features include:

  • Search for users by email, ID, role, or team.
  • See all of a user's models, teams, and keys in one place.
  • Change user roles and model access right from the Users Tab.

These changes help you spend less time on user setup and management on LiteLLM.

Responses API Load Balancing​


This release introduces load balancing for the Responses API, allowing you to route requests across provider regions and ensure session continuity. It works as follows:

  • If a previous_response_id is provided, LiteLLM will route the request to the original deployment that generated the prior response β€” ensuring session continuity.
  • If no previous_response_id is provided, LiteLLM will load-balance requests across your available deployments.

Read more

UI Session Logs​


This release allow you to group requests to LiteLLM proxy into a session. If you specify a litellm_session_id in your request LiteLLM will automatically group all logs in the same session. This allows you to easily track usage and request content per session.

Read more

New Models / Updated Models​

  • OpenAI
    1. Added gpt-image-1 cost tracking Get Started
    2. Bug fix: added cost tracking for gpt-image-1 when quality is unspecified PR
  • Azure
    1. Fixed timestamp granularities passing to whisper in Azure Get Started
    2. Added azure/gpt-image-1 pricing Get Started, PR
    3. Added cost tracking for azure/computer-use-preview, azure/gpt-4o-audio-preview-2024-12-17, azure/gpt-4o-mini-audio-preview-2024-12-17 PR
  • Bedrock
    1. Added support for all compatible Bedrock parameters when model="arn:.." (Bedrock application inference profile models) Get started, PR
    2. Fixed wrong system prompt transformation PR
  • VertexAI / Google AI Studio
    1. Allow setting budget_tokens=0 for gemini-2.5-flash Get Started,PR
    2. Ensure returned usage includes thinking token usage PR
    3. Added cost tracking for gemini-2.5-pro-preview-03-25 PR
  • Cohere
    1. Added support for cohere command-a-03-2025 Get Started, PR
  • SageMaker
    1. Added support for max_completion_tokens parameter Get Started, PR
  • Responses API
    1. Added support for GET and DELETE operations - /v1/responses/{response_id} Get Started
    2. Added session management support for non-OpenAI models PR
    3. Added routing affinity to maintain model consistency within sessions Get Started, PR

Spend Tracking Improvements​

  • Bug Fix: Fixed spend tracking bug, ensuring default litellm params aren't modified in memory PR
  • Deprecation Dates: Added deprecation dates for Azure, VertexAI models PR

Management Endpoints / UI​

Users​

  • Filtering and Searching:

    • Filter users by user_id, role, team, sso_id
    • Search users by email

  • User Info Panel: Added a new user information pane PR

    • View teams, keys, models associated with User
    • Edit user role, model permissions

Teams​

  • Filtering and Searching:

    • Filter teams by Organization, Team ID PR
    • Search teams by Team Name PR

Keys​

  • Key Management:
    • Support for cross-filtering and filtering by key hash PR
    • Fixed key alias reset when resetting filters PR
    • Fixed table rendering on key creation PR

UI Logs Page​

UI Authentication & Security​

  • Required Authentication: Authentication now required for all dashboard pages PR
  • SSO Fixes: Fixed SSO user login invalid token error PR
  • [BETA] Encrypted Tokens: Moved UI to encrypted token usage PR
  • Token Expiry: Support token refresh by re-routing to login page (fixes issue where expired token would show a blank page) PR

UI General fixes​

  • Fixed UI Flicker: Addressed UI flickering issues in Dashboard PR
  • Improved Terminology: Better loading and no-data states on Keys and Tools pages PR
  • Azure Model Support: Fixed editing Azure public model names and changing model names after creation PR
  • Team Model Selector: Bug fix for team model selection PR

Logging / Guardrail Integrations​

  • Datadog:
    1. Fixed Datadog LLM observability logging Get Started, PR
  • Prometheus / Grafana:
    1. Enable datasource selection on LiteLLM Grafana Template Get Started, PR
  • AgentOps:
    1. Added AgentOps Integration Get Started, PR
  • Arize:
    1. Added missing attributes for Arize & Phoenix Integration Get Started, PR

General Proxy Improvements​

  • Caching: Fixed caching to account for thinking or reasoning_effort when calculating cache key PR
  • Model Groups: Fixed handling for cases where user sets model_group inside model_info PR
  • Passthrough Endpoints: Ensured PassthroughStandardLoggingPayload is logged with method, URL, request/response body PR
  • Fix SQL Injection: Fixed potential SQL injection vulnerability in spend_management_endpoints.py PR

Helm​

  • Fixed serviceAccountName on migration job PR

Full Changelog​

The complete list of changes can be found in the GitHub release notes.

Krrish Dholakia
Ishaan Jaffer

These are the changes since v1.63.11-stable.

This release brings:

  • LLM Translation Improvements (MCP Support and Bedrock Application Profiles)
  • Perf improvements for Usage-based Routing
  • Streaming guardrail support via websockets
  • Azure OpenAI client perf fix (from previous release)

Docker Run LiteLLM Proxy​

docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.63.14-stable.patch1

Demo Instance​

Here's a Demo Instance to test changes:

New Models / Updated Models​

  • Azure gpt-4o - fixed pricing to latest global pricing - PR
  • O1-Pro - add pricing + model information - PR
  • Azure AI - mistral 3.1 small pricing added - PR
  • Azure - gpt-4.5-preview pricing added - PR

LLM Translation​

  1. New LLM Features
  • Bedrock: Support bedrock application inference profiles Docs
    • Infer aws region from bedrock application profile id - (arn:aws:bedrock:us-east-1:...)
  • Ollama - support calling via /v1/completions Get Started
  • Bedrock - support us.deepseek.r1-v1:0 model name Docs
  • OpenRouter - OPENROUTER_API_BASE env var support Docs
  • Azure - add audio model parameter support - Docs
  • OpenAI - PDF File support Docs
  • OpenAI - o1-pro Responses API streaming support Docs
  • [BETA] MCP - Use MCP Tools with LiteLLM SDK Docs
  1. Bug Fixes
  • Voyage: prompt token on embedding tracking fix - PR
  • Sagemaker - Fix β€˜Too little data for declared Content-Length’ error - PR
  • OpenAI-compatible models - fix issue when calling openai-compatible models w/ custom_llm_provider set - PR
  • VertexAI - Embedding β€˜outputDimensionality’ support - PR
  • Anthropic - return consistent json response format on streaming/non-streaming - PR

Spend Tracking Improvements​

  • litellm_proxy/ - support reading litellm response cost header from proxy, when using client sdk
  • Reset Budget Job - fix budget reset error on keys/teams/users PR
  • Streaming - Prevents final chunk w/ usage from being ignored (impacted bedrock streaming + cost tracking) PR

UI​

  1. Users Page
    • Feature: Control default internal user settings PR
  2. Icons:
    • Feature: Replace external "artificialanalysis.ai" icons by local svg PR
  3. Sign In/Sign Out
    • Fix: Default login when default_user_id user does not exist in DB PR

Logging Integrations​

  • Support post-call guardrails for streaming responses Get Started
  • Arize Get Started
    • fix invalid package import PR
    • migrate to using standardloggingpayload for metadata, ensures spans land successfully PR
    • fix logging to just log the LLM I/O PR
    • Dynamic API Key/Space param support Get Started
  • StandardLoggingPayload - Log litellm_model_name in payload. Allows knowing what the model sent to API provider was Get Started
  • Prompt Management - Allow building custom prompt management integration Get Started

Performance / Reliability improvements​

  • Redis Caching - add 5s default timeout, prevents hanging redis connection from impacting llm calls PR
  • Allow disabling all spend updates / writes to DB - patch to allow disabling all spend updates to DB with a flag PR
  • Azure OpenAI - correctly re-use azure openai client, fixes perf issue from previous Stable release PR
  • Azure OpenAI - uses litellm.ssl_verify on Azure/OpenAI clients PR
  • Usage-based routing - Wildcard model support Get Started
  • Usage-based routing - Support batch writing increments to redis - reduces latency to same as β€˜simple-shuffle’ PR
  • Router - show reason for model cooldown on β€˜no healthy deployments available error’ PR
  • Caching - add max value limit to an item in in-memory cache (1MB) - prevents OOM errors on large image url’s being sent through proxy PR

General Improvements​

  • Passthrough Endpoints - support returning api-base on pass-through endpoints Response Headers Docs
  • SSL - support reading ssl security level from env var - Allows user to specify lower security settings Get Started
  • Credentials - only poll Credentials table when STORE_MODEL_IN_DB is True PR
  • Image URL Handling - new architecture doc on image url handling Docs
  • OpenAI - bump to pip install "openai==1.68.2" PR
  • Gunicorn - security fix - bump gunicorn==23.0.0 PR

Complete Git Diff​

Here's the complete git diff

Krrish Dholakia
Ishaan Jaffer

These are the changes since v1.63.2-stable.

This release is primarily focused on:

  • [Beta] Responses API Support
  • Snowflake Cortex Support, Amazon Nova Image Generation
  • UI - Credential Management, re-use credentials when adding new models
  • UI - Test Connection to LLM Provider before adding a model

Known Issues​

  • 🚨 Known issue on Azure OpenAI - We don't recommend upgrading if you use Azure OpenAI. This version failed our Azure OpenAI load test

Docker Run LiteLLM Proxy​

docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.63.11-stable

Demo Instance​

Here's a Demo Instance to test changes:

New Models / Updated Models​

  • Image Generation support for Amazon Nova Canvas Getting Started
  • Add pricing for Jamba new models PR
  • Add pricing for Amazon EU models PR
  • Add Bedrock Deepseek R1 model pricing PR
  • Update Gemini pricing: Gemma 3, Flash 2 thinking update, LearnLM PR
  • Mark Cohere Embedding 3 models as Multimodal PR
  • Add Azure Data Zone pricing PR
    • LiteLLM Tracks cost for azure/eu and azure/us models

LLM Translation​

  1. New Endpoints
  1. New LLM Providers
  1. New LLM Features
  1. Bug Fixes
  • OpenAI: Return code, param and type on bad request error More information on litellm exceptions
  • Bedrock: Fix converse chunk parsing to only return empty dict on tool use PR
  • Bedrock: Support extra_headers PR
  • Azure: Fix Function Calling Bug & Update Default API Version to 2025-02-01-preview PR
  • Azure: Fix AI services URL PR
  • Vertex AI: Handle HTTP 201 status code in response PR
  • Perplexity: Fix incorrect streaming response PR
  • Triton: Fix streaming completions bug PR
  • Deepgram: Support bytes.IO when handling audio files for transcription PR
  • Ollama: Fix "system" role has become unacceptable PR
  • All Providers (Streaming): Fix String data: stripped from entire content in streamed responses PR

Spend Tracking Improvements​

  1. Support Bedrock converse cache token tracking Getting Started
  2. Cost Tracking for Responses API Getting Started
  3. Fix Azure Whisper cost tracking Getting Started

UI​

Re-Use Credentials on UI​

You can now onboard LLM provider credentials on LiteLLM UI. Once these credentials are added you can re-use them when adding new models Getting Started

Test Connections before adding models​

Before adding a model you can test the connection to the LLM provider to verify you have setup your API Base + API Key correctly

General UI Improvements​

  1. Add Models Page
    • Allow adding Cerebras, Sambanova, Perplexity, Fireworks, Openrouter, TogetherAI Models, Text-Completion OpenAI on Admin UI
    • Allow adding EU OpenAI models
    • Fix: Instantly show edit + deletes to models
  2. Keys Page
    • Fix: Instantly show newly created keys on Admin UI (don't require refresh)
    • Fix: Allow clicking into Top Keys when showing users Top API Key
    • Fix: Allow Filter Keys by Team Alias, Key Alias and Org
    • UI Improvements: Show 100 Keys Per Page, Use full height, increase width of key alias
  3. Users Page
    • Fix: Show correct count of internal user keys on Users Page
    • Fix: Metadata not updating in Team UI
  4. Logs Page
    • UI Improvements: Keep expanded log in focus on LiteLLM UI
    • UI Improvements: Minor improvements to logs page
    • Fix: Allow internal user to query their own logs
    • Allow switching off storing Error Logs in DB Getting Started
  5. Sign In/Sign Out

Security​

  1. Support for Rotating Master Keys Getting Started
  2. Fix: Internal User Viewer Permissions, don't allow internal_user_viewer role to see Test Key Page or Create Key Button More information on role based access controls
  3. Emit audit logs on All user + model Create/Update/Delete endpoints Getting Started
  4. JWT
    • Support multiple JWT OIDC providers Getting Started
    • Fix JWT access with Groups not working when team is assigned All Proxy Models access
  5. Using K/V pairs in 1 AWS Secret Getting Started

Logging Integrations​

  1. Prometheus: Track Azure LLM API latency metric Getting Started
  2. Athina: Added tags, user_feedback and model_options to additional_keys which can be sent to Athina Getting Started

Performance / Reliability improvements​

  1. Redis + litellm router - Fix Redis cluster mode for litellm router PR

General Improvements​

  1. OpenWebUI Integration - display thinking tokens

Complete Git Diff​

Here's the complete git diff