Gemini 3.1 Flash-Lite (Generally Available)

Google's low-cost, low-latency Gemini model is now generally available, with response times under two seconds for full replies.

p95 response time around 1.8 seconds for full replies. Sub-second for classifiers and tool calls.
Cited at roughly 60% lower cost than comparable thinking-tier models in early customer use.
Announced on the Gemini Enterprise Agent Platform / Business Edition. Availability on Gemini API, AI Studio, or Vertex AI not confirmed in the post.