Google’s Gemma 4: Open-Source AI Finally Catches Up to Commercial Models

3

Google has released its newest open-weight AI model family, Gemma 4, under the Apache 2.0 license—a significant shift that could reshape how businesses adopt open-source AI. For years, Google’s Gemma models have offered strong performance but were hampered by restrictive licensing, pushing many organizations toward alternatives like Mistral or Alibaba’s Qwen. The new Apache 2.0 license removes those barriers, enabling wider commercial use without legal friction.

This timing is particularly noteworthy, as some Chinese AI labs (like Alibaba) are scaling back full open-source releases for their latest models. Google is moving in the opposite direction, opening up its most capable Gemma release yet while leveraging research from its proprietary Gemini 3.

Gemma 4: Models for Every Device

Gemma 4 comes in four models, split into workstation and edge tiers:

  • Workstation Tier: Includes a 31B-parameter dense model and a 26B A4B Mixture-of-Experts (MoE) model, both supporting text, image, and 256K-token context windows.
  • Edge Tier: Consists of the E2B and E4B models, designed for phones, embedded devices, and laptops, with support for text, image, audio, and 128K-token context windows.

The naming convention is crucial: “E” denotes “effective parameters,” meaning the model behaves like a smaller size while technically being larger due to Google’s Per-Layer Embeddings (PLE). The “A” in A4B stands for “active parameters,” indicating that only a fraction of the model’s total parameters activate during inference, delivering high intelligence with lower compute costs.

MoE Architecture: Performance with Efficiency

The 26B A4B MoE model uses 128 small “experts,” activating only eight per token plus one always-on expert. This results in performance comparable to dense models in the 27B–31B range, but with inference speeds similar to a 4B model. This means fewer GPUs, lower latency, and cheaper per-token inference for production workloads like coding assistants or document processing.

Gemma 4 also employs a hybrid attention mechanism that combines local sliding window attention with full global attention, enabling long context windows (256K) without excessive memory consumption.

Native Multimodality: Vision, Audio, and Function Calling

Unlike previous open models that bolted on multimodality as an afterthought, Gemma 4 integrates vision, audio, and function calling at the architectural level:

  • Vision: Supports variable aspect-ratio images with configurable visual token budgets for tasks like OCR, document parsing, and fine-grained analysis.
  • Audio: Native audio processing (ASR and translation) on-device, compressed to 305 million parameters for responsiveness.
  • Function Calling: Built-in from the ground up, optimizing multi-turn agentic flows with multiple tools and reducing prompt engineering overhead.

Benchmarks and Performance

Gemma 4 benchmarks strongly:

  • 31B Dense: 89.2% on AIME 2026 (mathematical reasoning), 80.0% on LiveCodeBench v6 (coding), and Codeforces ELO of 2,150.
  • 26B A4B MoE: 88.3% on AIME 2026, 77.1% on LiveCodeBench v6, and 82.3% on GPQA Diamond (science reasoning).
  • Edge Models: E4B (42.5% on AIME 2026) and E2B (37.5% on AIME 2026) outperform previous Gemma versions despite being smaller.

While Qwen, GLM, and Kimi compete in this parameter range, Gemma 4 stands out by combining strong performance with a truly permissive license and native multimodality.

What’s Next?

Google has released both pre-trained base models and instruction-tuned variants, encouraging custom fine-tuning. The serverless deployment option via Cloud Run with GPU support could significantly reduce the cost of deploying open models in production. Additional model sizes are likely to follow, but the current Gemma 4 family offers a complete open AI solution competitive with proprietary models. For enterprises hesitant to adopt open AI due to licensing concerns, Google has now removed that barrier.