Gemini 3.1 Flash-Lite is our most cost-efficient Gemini model,
optimized for low latency use cases for high-volume, cost-sensitive LLM traffic.
It provides a significant quality increase over
Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite
models, matching Gemini 2.5 Flash performance across key capability areas:
- Improved response quality: Aims to match 2.5 Flash performance.
- Improved instruction following: Targeted improvements to serve as a reliable
migration path for complex chatbot and instruction-heavy workflows.
- Improved audio input: Improved audio-input quality for tasks like Automated Speech
Recognition (ASR).
- Expanded thinking support: You can control how much reasoning the model
performs by choosing from minimal, low, medium, or high
thinking levels. This feature
lets you balance response quality and speed for your specific use case.
Try in Vertex AI (Preview) Deploy example app
Note: To use the "Deploy example app" feature, you need a Google Cloud project with billing and Vertex AI API enabled.
| Model ID |
gemini-3.1-flash-lite-preview |
| Supported inputs & outputs |
-
Inputs:
Text,
Code,
Images,
Audio,
Video,
PDF
-
Outputs:
|
| Token limits |
- Maximum input tokens: 1,048,576
- Maximum output tokens: 65,535 (default)
|
| Capabilities |
|
| Consumption options |
|
|
See Consumption options for more information.
|
| Technical specifications |
| Images
|
-
Maximum images per prompt:
3,000
-
Maximum file size per file for inline data or direct uploads through the console:
7 MB
-
Maximum file size per file from Google Cloud Storage:
30 MB
-
Maximum number of output images per prompt:
10
-
Supported MIME types:
image/png,
image/jpeg,
image/webp,
image/heic,
image/heif
|
| Documents
|
-
Maximum number of files per prompt:
3,000
-
Maximum number of pages per file:
1,000
-
Maximum file size per file for the API or Cloud Storage imports:
50 MB(application/pdf) or 7 MB(text/plain)
-
Maximum file size per file for direct uploads through the console:
7 MB
-
Supported MIME types:
application/pdf,
text/plain
|
| Video
|
-
Maximum video length (with audio):
Approximately 45 minutes
-
Maximum video length (without audio):
Approximately 1 hour
-
Maximum number of videos per prompt:
10
-
Supported MIME types:
video/x-flv,
video/quicktime,
video/mpeg,
video/mpegs,
video/mpg,
video/mp4,
video/webm,
video/wmv,
video/3gpp
|
| Audio
|
-
Maximum audio length per prompt:
Approximately 8.4 hours, or up to 1 million tokens
-
Maximum number of audio files per prompt:
1
-
Supported MIME types:
audio/x-aac,
audio/flac,
audio/mp3,
audio/m4a,
audio/mpeg,
audio/mpga,
audio/mp4,
audio/ogg,
audio/pcm,
audio/wav,
audio/webm
|
| Parameter defaults |
- Temperature: 0.0-2.0 (default 1.0)
- topP: 0.0-1.0 (default 0.95)
- topK: 64 (fixed)
- candidateCount: 1–8 (default 1)
|
| Supported regions |
|
Model availability
|
|
| See Deployments and endpoints for more information. |
| Knowledge cutoff date |
January 2025 |
| Versions |
gemini-3.1-flash-lite-preview
- Launch stage: Public preview
- Release date: March 3, 2026
|
| Supported languages |
See Supported languages. |
| Pricing |
See Pricing. |