Mistral Medium 3.5 — 128B Open-Weight Model & Vibe Agents

French AI lab Mistral AI on April 29, 2026 released Mistral Medium 3.5, a 128-billion-parameter dense language model with open weights, alongside a new Vibe remote-coding-agent CLI and a Work mode for its Le Chat assistant. The model scores 77.6% on SWE-Bench Verified and ships under a modified MIT-style license that allows self-hosting on as few as four GPUs.

What Happened

Mistral published the model card on Hugging Face at mistralai/Mistral-Medium-3.5-128B and announced the release on its official news page. The company describes Medium 3.5 as its “first flagship merged model” — a single dense 128B-parameter checkpoint that combines instruction-following, reasoning and coding capabilities that previously lived in separate Mistral releases.

Alongside the weights, Mistral shipped two product updates: Vibe remote agents, an asynchronous cloud-coding workflow launchable from the Vibe CLI or directly inside Le Chat; and Work mode (Preview) in Le Chat, a multi-step agent UI for tasks like email triage, research synthesis and cross-tool actions. Both surfaces are powered by Medium 3.5 for users on the Pro, Team and Enterprise tiers.

Mistral AI cover image announcing Mistral Medium 3.5 and Vibe remote agents — Mistral AI: the official launch artwork for Mistral Medium 3.5 and the Vibe remote-agents update from the company’s announcement page.

Key Details

128B dense parameters, 256k context — a single dense checkpoint rather than the mixture-of-experts route Meta and DeepSeek have favoured this year.
SWE-Bench Verified: 77.6% — Mistral’s headline coding score, ahead of most open-weight models but trailing some smaller proprietary systems.
τ³-Telecom: 91.4% — the agentic-tool-use benchmark Mistral leads on for telco-style multi-tool workflows.
Open weights, modified MIT license — downloadable on Hugging Face and mirrored as mistral-medium-3.5 on Ollama.
Self-host on 4 GPUs — the company says the model fits in 70 GB of VRAM at Q4 quantisation, putting it in reach of a fully-loaded Mac Studio.
Hosted API pricing: $1.50 input / $7.50 output per million tokens — the figure that drew the loudest criticism on launch day.

What Developers and Users Are Saying

The reception on the Hacker News thread for the release was muted. Decrypt summarised the mood as a wall of online “meh” reactions, with the most-upvoted criticism being the price-per-quality ratio: Alibaba’s Qwen 3.6 27B scores 72.4% on SWE-Bench Verified at less than a quarter of the parameter count, and ships under Apache-2.0. University of Washington professor Pedro Domingos posted on X that regular AI companies brag about how much better their model is on benchmarks; only Mistral brags about how much worse its one is, a comment that was widely shared.

The defenders pushed back along two lines. First, the open weights are a durability play: a model anyone can download and self-host doesn’t have to win the leaderboard today to stay relevant in 2027. Second, the practical inference story is unusually good for a dense flagship — the Hugging Face thread highlighted that 70 GB at Q4 means a single $3,500 Mac Studio with 128 GB unified memory can run Medium 3.5 locally, something neither GPT-5 nor Claude Opus 4.6 will ever offer.

What This Means for Developers

If you build with open-weight models, Medium 3.5 is now the strongest dense Mistral checkpoint to evaluate against Llama 4 Maverick and Qwen 3.6. The Vibe remote-agents CLI is the most concrete piece for day-to-day developer use: you can kick off a long-running coding job from a terminal on your laptop, walk away, and pick up the output in Le Chat or back at the CLI — the same async-coding-agent ergonomics OpenAI shipped with Codex and Anthropic with Claude Code, now hosted in Europe with EU-jurisdiction data handling.

For self-hosters, the headline is the 4-GPU floor: previous Mistral flagships needed 8×H100 to serve. The hosted API at $1.50/$7.50 per million tokens is competitive with Claude Sonnet but materially more expensive than Qwen 3.6 hosted on the major inference clouds, so cost-sensitive teams will likely default to running the open weights themselves on Ollama or vLLM.

What’s Next

Mistral has flagged that Medium 3.5 will roll out on AWS Bedrock and Azure AI Foundry in May, and that Work mode in Le Chat will exit Preview “in the coming weeks.” The company is hiring aggressively for its Paris data-centre buildout announced in March 2026, suggesting Medium 3.5 is intended to anchor inference workloads on its own infrastructure rather than third-party clouds longer term.

Sources

Mistral AI — official announcement — primary source from the company’s news page.
Hugging Face model card — weights, license, technical specs.
Mistral docs — Mistral Medium 3.5 model card — benchmark scores and capabilities.
Hacker News discussion — developer reactions and benchmark scrutiny.
Decrypt — “Internet Is Not Impressed” — critical reception summary.
GN Crypto News — pricing criticism — competitive-pricing analysis vs. Qwen 3.6.