Google has launched Gemini 3.1 Flash-Lite, a new AI model it describes as its fastest and most cost-efficient in the Gemini 3 series. It is now available in preview for developers and enterprises via Google AI Studio and Vertex AI, priced at $0.25 per million input tokens.
The model is designed for scalable tasks like translation and content moderation, as well as more complex workloads requiring reasoning, such as generating user interfaces. It features "thinking levels" to allow control over computational effort per task.
Early testers praise its efficiency and capability to handle complex inputs. Benchmark results show it outperforms previous models in its tier, achieving high scores on reasoning and multimodal understanding tests.
Google has introduced Gemini 3.1 Flash-Lite, which it says is its âfastest and most cost-efficient Gemini 3 series model.â
âStarting today, 3.1 Flash-Lite is rolling out in a preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI,â the company said in a blog post.
Priced at $0.25 per million input tokens and $1.50 per million output tokens, Flash-Lite is significantly cheaper than flagship models such as Gemini 3.1 Pro ($2.00 per million input tokens and $1.50 per million output tokens).
Google claims it âoutperforms 2.5 Flash with a 2.5X faster Time to First Answer Token and 45% increase in output speed, according to the Artificial Analysis benchmark, while maintaining similar or better quality.â
What Gemini 3.1 Flash-Lite can do
The model comes with âthinking levelsâ in AI Studio and Vertex AI, giving developers the ability to control how much the model âthinksâ for each task â important for managing high-frequency workloads.
â3.1 Flash-Lite can tackle tasks at scale, like high-volume translation and content moderation, where cost is a priority. And it can also handle more complex workloads where more in-depth reasoning is needed, like generating user interfaces and dashboards, creating simulations or following instructions,â the blog post said.
Early-access developers and companies, including Latitude, Cartwheel, and Whering, are already testing Flash-Lite for large-scale problem solving. Early testers highlighted 3.1 Flash-Liteâs efficiency and reasoning capabilities, saying it can âhandle complex inputs with the precision of a larger-tier model, plus follow instructions and maintain adherence,â according to the blog post
Benchmarks and performance
Gemini 3.1 Flash-Lite got an Elo score of 1432 on the Arena.ai Leaderboard, outperforming other models in its tier for reasoning and multimodal understanding. It achieved 86.9% on GPQA Diamond and 76.8% on MMMU Pro, surpassing even larger Gemini models from previous generations, such as 2.5 Flash.
The model combines speed, cost efficiency, and flexible reasoning, making it suitable for both high-volume routine tasks and more complex AI workloads.
âStarting today, 3.1 Flash-Lite is rolling out in a preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI,â the company said in a blog post.
Priced at $0.25 per million input tokens and $1.50 per million output tokens, Flash-Lite is significantly cheaper than flagship models such as Gemini 3.1 Pro ($2.00 per million input tokens and $1.50 per million output tokens).
Google claims it âoutperforms 2.5 Flash with a 2.5X faster Time to First Answer Token and 45% increase in output speed, according to the Artificial Analysis benchmark, while maintaining similar or better quality.â
What Gemini 3.1 Flash-Lite can do
The model comes with âthinking levelsâ in AI Studio and Vertex AI, giving developers the ability to control how much the model âthinksâ for each task â important for managing high-frequency workloads.
â3.1 Flash-Lite can tackle tasks at scale, like high-volume translation and content moderation, where cost is a priority. And it can also handle more complex workloads where more in-depth reasoning is needed, like generating user interfaces and dashboards, creating simulations or following instructions,â the blog post said.
Early-access developers and companies, including Latitude, Cartwheel, and Whering, are already testing Flash-Lite for large-scale problem solving. Early testers highlighted 3.1 Flash-Liteâs efficiency and reasoning capabilities, saying it can âhandle complex inputs with the precision of a larger-tier model, plus follow instructions and maintain adherence,â according to the blog post
Benchmarks and performance
Gemini 3.1 Flash-Lite got an Elo score of 1432 on the Arena.ai Leaderboard, outperforming other models in its tier for reasoning and multimodal understanding. It achieved 86.9% on GPQA Diamond and 76.8% on MMMU Pro, surpassing even larger Gemini models from previous generations, such as 2.5 Flash.
The model combines speed, cost efficiency, and flexible reasoning, making it suitable for both high-volume routine tasks and more complex AI workloads.