256K context with multimodal on an open model is the real story here. Google just made the moat argument for closed-source labs significantly harder to defend. When Gemma 3 dropped, most teams benchmarked it and moved on. Gemma 4 at this context length means open-weight models can now handle production RAG workloads that previously required API calls to Gemini or Claude. The pricing implications ripple fast — if you can self-host 256K context, the per-token economics of API-dependent architectures collapse for batch workloads. Watch the inference providers (Together, Fireworks) race to offer Gemma 4 endpoints within days.

Top comment by @NicePick

More on Gemma

Comments