Google Releases Gemma 4: Open-Weight Multimodal AI with 256K Context Under Apache 2.0 (April 2026)
Google released Gemma 4 on April 2, 2026 — four new open-weight multimodal models ranging from 2B to 31B parameters, licensed under Apache 2.0, with context windows up to 256K tokens and native audio/video support. The 31B model ranks #3 among all open models globally on Arena AI benchmarks.
Google released Gemma 4 on , delivering its most capable open-weight model family to date. The release includes four model sizes — E2B, E4B, 26B (Mixture-of-Experts), and 31B Dense — all licensed under Apache 2.0, enabling full commercial use without restrictions. The 31B Dense variant currently ranks #3 among all open models globally on the Arena AI text leaderboard with a score of 1452.
What Happened
At Google I/O 2026, Google DeepMind unveiled Gemma 4 as "the most intelligent open models" built from the same research as Gemini 3. The models are immediately available on Hugging Face, Google's Vertex AI, Kaggle, and Ollama. Unlike previous Gemma releases, all Gemma 4 models natively process images, video, and text — and the smaller E2B and E4B variants additionally support audio input.
The shift to Apache 2.0 licensing is significant. Previous Gemma models used a custom "Gemma Terms of Use" that restricted redistribution and commercial use. Apache 2.0 removes these barriers entirely, making Gemma 4 the first truly open Gemma release for commercial applications.
Key Details
- Four model sizes: E2B (2.3B active params, 128K context), E4B (4.5B active params, 128K context), 26B MoE (4B active / 26B total, 256K context), 31B Dense (256K context)
- Apache 2.0 license: First Gemma release with full commercial-use rights — a major departure from previous Gemma Terms of Use
- Benchmark performance: 31B scores 89.2% on AIME 2026 mathematics, 80.0% on LiveCodeBench v6 coding, 84.3% on GPQA Diamond science; 26B MoE ranks #6 globally on Arena AI with just 4B active parameters
- Multimodal natively: All models handle images and text; E2B/E4B also support audio input (speech recognition and understanding)
- Agentic-ready: Native function calling with JSON-structured output and thinking mode support built into all instruction-tuned variants
- On-device targets: E2B and E4B designed for mobile and edge deployment — already available in Android AICore Developer Preview for on-device inference
What Developers and Users Are Saying
Reception on Hacker News has been enthusiastic, with the main thread drawing hundreds of comments. Hugging Face CTO Julien Chaumond called it "BREAKING NEWS" with fire emojis — a rare public signal of genuine excitement from the open-source AI community. Developers on r/LocalLLaMA have been particularly focused on the 26B MoE variant, which activates only 4B parameters during inference, delivering near-31B quality at a fraction of the compute cost.
Some early friction has emerged: a handful of community reports note broken tokenizer implementations in certain quantized versions within the first 24 hours of release. Google and the Hugging Face team responded quickly to address these issues. Developers on the Android platform are excited by the AICore Developer Preview, which signals Google's intent to bring Gemma 4 inference directly to consumer devices without cloud calls.
What This Means for Developers
The Apache 2.0 license change is the most developer-significant aspect of this release. Teams that previously avoided Gemma due to licensing ambiguity can now build commercial products freely. The 26B MoE model in particular offers an exceptional price-to-performance ratio for self-hosted deployments: it ranks #6 globally on open-model benchmarks while activating only 4B parameters per token, making it viable on a single consumer GPU with quantization. Developers building agents should note the native function calling support — no prompt engineering workarounds needed. Those targeting edge or mobile applications should explore the E2B/E4B models via the Android AICore Developer Preview and available ONNX exports.
What's Next
Google has made Gemma 4 available immediately across Hugging Face (all variants), Ollama (ollama run gemma4), Vertex AI, and Kaggle. Fine-tuning is supported through TRL, Unsloth Studio, and Vertex AI custom container training. The Android AICore Developer Preview signals upcoming native on-device deployment for Android apps. Given the pace of Gemma releases (Gemma 1 in February 2024, Gemma 2 in June 2024, Gemma 3 in March 2025), a Gemma 5 release in late 2026 or early 2027 seems plausible.
Sources
- Google Blog — Gemma 4 official announcement — primary source from Google DeepMind
- Google DeepMind — Gemma 4 model page — technical specs and capabilities
- Hugging Face Blog — Gemma 4 deep dive — architecture details and deployment guide
- Hacker News discussion thread — developer community reaction
- Android Developers Blog — AICore Developer Preview — on-device deployment announcement
- gHacks Tech News — Gemma 4 coverage — independent analysis
Stay up to date with Doolpa
Subscribe to Newsletter →