Gemini 1.5 Flash-8B will be the cheapest Gemini-powered AI model
Gemini 1.5 Flash-8B, the latest entrant in the Gemini family of artificial intelligence (AI) models, is now generally available for production use. On Thursday, Google announced the general availability of the model, highlighting that it was a smaller and faster version of the Gemini 1.5 Flash that was introduced at Google I/O. Because it is fast, it has low latency inference and more efficient output generation. More importantly, the tech giant stated that the Flash-8B AI model has the “lowest cost per intelligence of any Gemini model.”
Gemini 1.5 Flash-8B now generally available
With a developer blog postthe Mountain View-based tech giant has detailed the new AI model. The Gemini 1.5 Flash-8B is derived from the Gemini 1.5 Flash AI model, which focused on faster processing and more efficient output generation. The company now claims that Google DeepMind has been developing this even smaller and faster version of the AI model in recent months.
Despite being a smaller model, the tech giant claims it “closely matches” the performance of the 1.5 Flash model in multiple benchmarks. Some of these include chat, transcription and long context language translation.
A major advantage of the AI model is its price effectiveness. Google said the Gemini 1.5 Flash-8B will offer the lowest token prices in the Gemini family. Developers have to pay $0.15 (approximately Rs. 12.5) per million output tokens, $0.0375 (approximately Rs. 3) per million input tokens and $0.01 (approximately Rs. 0.8) per million tokens on cached tokens prompts.
Additionally, Google is doubling the speed limits of the 1.5 Flash-8B AI model. Now, developers can send up to 4,000 requests per minute (RPM) with this model. Explaining the decision, the tech giant stated that the model is suitable for simple, high-volume tasks. Developers who want to try out the model can do so for free via Google AI Studio and the Gemini API.