IBM Granite 4.1 Released — 8B Beats Prior 32B MoE (April 2026)

IBM Research on April 29, 2026 released Granite 4.1, the company's most expansive open-weight model release to date — a family of dense decoder-only language models in 3B, 8B, and 30B sizes, plus refreshed Granite Speech, Granite Vision, Granite Embeddings, and Granite Guardian variants. The headline result is that the 8B instruct model matches or outperforms IBM's previous-generation Granite 4.0-H-Small (32B Mixture-of-Experts) across nearly every benchmark while running on roughly a quarter of the parameters.

What Happened

The release dropped on the IBM Research blog and Hugging Face, distributed under Apache 2.0. According to IBM, all Granite 4.1 language models share a single dense, decoder-only architecture and were trained on approximately 15 trillion tokens across multiple phases — broad pre-training followed by progressive annealing toward higher-quality technical, scientific, and mathematical data tuned for instruction following. Final training stages extended the models' context window to as much as 512K tokens.

IBM is positioning the family as enterprise-ready alternatives to Llama, Qwen, and Gemma in instruction following and tool calling. The launch went live simultaneously on Hugging Face, LM Studio, Ollama, OpenRouter, Replicate, Unsloth, AnythingLLM, watsonx, and Weights & Biases — one of the broadest day-zero distributions IBM has ever shipped.

IBM Granite 4.1 family of open source AI models — IBM Research's Granite 4.1 release covers language, speech, vision, embedding, and Guardian models — all under Apache 2.0.

Key Details

Three dense language sizes — 3B, 8B, and 30B in both base and instruct configurations, all decoder-only and Apache 2.0 licensed.
8B beats the prior-gen 32B MoE — on BFCL V3 (function calling) the 8B scored 68.3 vs. 64.7 for Granite 4.0-H-Small, with similar or larger gains on IFEval, AlpacaEval, MMLU-Pro, BBH, GSM8K, DeepMind-Math, Evalplus, ArenaHard, and MBPP+.
15 trillion training tokens — multi-phase curriculum focused heavily on instruction-following and tool-calling for enterprise use.
Up to 512K-token context across the language models — one of the longest open-weight context windows shipped to date.
Multimodal companions — Granite Vision 4.1 ships alongside ChartNet, a million-scale chart-understanding dataset; Granite Speech 4.1 targets state-of-the-art transcription accuracy; Granite Guardian addresses harm detection.
Day-zero ecosystem support — available on Hugging Face, Ollama, LM Studio, OpenRouter, Replicate, Unsloth, AnythingLLM, watsonx, and Weights & Biases out of the gate.

What Developers Are Saying

On Hacker News the announcement post crossed 260 points within hours, with the top comments split between excitement at the dense-beats-MoE story and skepticism that 8B-class scores can rival frontier closed models in real-world workloads. Early commenters on r/LocalLLaMA called the 8B's BFCL V3 score genuinely competitive with Mistral and Llama 3.1 8B-instruct on tool calling, with a meaningful edge in instruction following. Independent reviewer accounts on Hugging Face flagged that the simpler dense architecture — without the MoE routing tricks of Granite 4.0-H-Small — should make Granite 4.1 8B noticeably easier to fine-tune for downstream tasks, which IBM Research called out explicitly in the launch post.

The most cited critique is the same one every Apache-2.0 release faces: enterprise customers will still want benchmark independence on their own internal evals before retiring an existing 32B MoE deployment. A few commenters noted that Granite 4.0 only landed in October 2025, so this dot-release pace is aggressive even by 2026 open-source standards.

What This Means for Developers

For teams running open-weight models behind firewalls, Granite 4.1 8B is the most interesting drop-in alternative to Llama 3.x 8B and Qwen3 8B announced this month — especially if instruction following or function calling is the bottleneck. The 30B size is positioned as a direct competitor to Gemma 3 27B and Qwen3 30B-A3B for multilingual enterprise workloads. The 3B size remains tuned for on-device and edge deployment.

Because all three sizes are dense and Apache 2.0, fine-tuning workflows that already work for Llama 3 should port over with minimal changes. The 512K context window is a real differentiator for retrieval-heavy enterprise use cases, where the prior-gen 128K became a constraint. IBM's broad day-zero ecosystem support means developers can pull the model into existing Ollama or Unsloth pipelines today without waiting for community quantizations.

What's Next

IBM said the Granite 4.1 collection is the foundation for the next watsonx assistant and agent updates, and the team explicitly invited the community to fine-tune and benchmark the models against larger MoE alternatives. Quantized GGUF builds for Ollama and LM Studio shipped on day one; FP4 and INT4 variants are expected within weeks. ChartNet, the chart-understanding dataset behind Granite Vision 4.1, was also released openly — useful for any team training their own document-AI stack.

Sources

IBM Research blog — Introducing the Granite 4.1 family — primary release post
Hugging Face — ibm-granite/granite-4.1-8b — model card and weights
Hugging Face Blog — How Granite 4.1 LLMs are built — technical deep dive
Hacker News discussion — community reaction
IBM Granite product page — family overview and licensing
Firethering — analysis of the 8B vs 32B MoE benchmarks — independent benchmark commentary