DAY 0 Support: Gemini 3 Flash on LiteLLM
LiteLLM now supports gemini-3-flash-preview and all the new API changes along with it.
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.8-stable.1
pip install litellm==1.80.8.post1
What's New​
1. New Thinking Levels: thinkingLevel with MINIMAL & MEDIUM​
Gemini 3 Flash introduces granular thinking control with thinkingLevel instead of thinkingBudget.
- MINIMAL: Ultra-lightweight thinking for fast responses
- MEDIUM: Balanced thinking for complex reasoning
- HIGH: Maximum reasoning depth
LiteLLM automatically maps the OpenAI reasoning_effort parameter to Gemini's thinkingLevel, so you can use familiar reasoning_effort values (minimal, low, medium, high) without changing your code!
2. Thought Signatures​
Like gemini-3-pro, this model also includes thought signatures for tool calls. LiteLLM handles signature extraction and embedding internally. Learn more about thought signatures.
Edge Case Handling: If thought signatures are missing in the request, LiteLLM adds a dummy signature ensuring the API call doesn't break
Supported Endpoints​
LiteLLM provides full end-to-end support for Gemini 3 Flash on:
- ✅
/v1/chat/completions- OpenAI-compatible chat completions endpoint - ✅
/v1/responses- OpenAI Responses API endpoint (streaming and non-streaming) - ✅
/v1/messages- Anthropic-compatible messages endpoint - ✅
/v1/generateContent– Google Gemini API compatible endpoint All endpoints support: - Streaming and non-streaming responses
- Function calling with thought signatures
- Multi-turn conversations
- All Gemini 3-specific features
- Converstion of provider specific thinking related param to thinkingLevel
Quick Start​
- SDK
- PROXY
- LOW
- MEDIUM (NEW)
- HIGH
Basic Usage with MEDIUM thinking (NEW)
from litellm import completion
# No need to make any changes to your code as we map openai reasoning param to thinkingLevel
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Solve this complex math problem: 25 * 4 + 10"}],
reasoning_effort="medium", # NEW: MEDIUM thinking level
)
print(response.choices[0].message.content)
1. Setup config.yaml
model_list:
- model_name: gemini-3-flash
litellm_params:
model: gemini/gemini-3-flash-preview
api_key: os.environ/GEMINI_API_KEY
2. Start proxy
litellm --config /path/to/config.yaml
3. Call with MEDIUM thinking
curl -X POST http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "gemini-3-flash",
"messages": [{"role": "user", "content": "Complex reasoning task"}],
"reasoning_effort": "medium"
}'
``'
</TabItem>
</Tabs>
---
## All `reasoning_effort` Levels
<Tabs>
<TabItem value="minimal" label="MINIMAL">
**Ultra-fast, minimal reasoning**
```python
from litellm import completion
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "What's 2+2?"}],
reasoning_effort="minimal",
)
Simple instruction following
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Write a haiku about coding"}],
reasoning_effort="low",
)
Balanced reasoning for complex tasks ✨
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Analyze this dataset and find patterns"}],
reasoning_effort="medium", # NEW!
)
Maximum reasoning depth
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Prove this mathematical theorem"}],
reasoning_effort="high",
)
Key Features​
✅ Thinking Levels: MINIMAL, LOW, MEDIUM, HIGH
✅ Thought Signatures: Track reasoning with unique identifiers
✅ Seamless Integration: Works with existing OpenAI-compatible client
✅ Backward Compatible: Gemini 2.5 models continue using thinkingBudget
Installation​
pip install litellm --upgrade
import litellm
from litellm import completion
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Your question here"}],
reasoning_effort="medium", # Use MEDIUM thinking
)
print(response)
reasoning_effort Mapping for Gemini 3+​
| reasoning_effort | thinking_level |
|---|---|
minimal | minimal |
low | low |
medium | medium |
high | high |
disable | minimal |
none | minimal |


