5/10 Model Release 1 Jun 2026, 16:01 UTC

JetBrains introduces Mellum2, a 12B Mixture-of-Experts model optimized for developer tools.

JetBrains moving to a 12B MoE architecture for Mellum2 signals a strategic shift toward high-efficiency, low-latency inference for IDEs. By leveraging MoE, they can deliver top-tier code completion with minimal compute overhead, significantly reducing time-to-first-token. This puts pressure on GitHub Copilot by proving that purpose-built, smaller expert models can compete with massive generic LLMs in strict coding environments.

What Happened

JetBrains has officially announced Mellum2, a new 12-billion parameter Mixture-of-Experts (MoE) model designed specifically to power AI features across its popular suite of developer tools and IDEs.

Technical Details

Mellum2 utilizes a sparse Mixture-of-Experts architecture. While the total parameter count sits at 12B, only a specific subset of these parameters (the "experts") is activated during any single forward pass. This architectural choice drastically reduces computational overhead and memory bandwidth requirements compared to a dense 12B model. Trained heavily on high-quality source code, API documentation, and developer-centric datasets, Mellum2 is highly optimized for IDE-specific tasks like inline code completion, refactoring, and natural language-to-code generation. The MoE setup enables rapid token generation, which is critical for real-time autocomplete features where latency directly impacts the developer experience.

Why It Matters

From an engineering perspective, this release highlights a broader industry trend: moving away from massive, generalized LLMs in favor of specialized, highly efficient architectures. JetBrains is prioritizing inference speed and cost-efficiency without sacrificing code quality. By owning their model stack rather than relying entirely on third-party APIs like OpenAI or Anthropic, JetBrains reduces vendor lock-in and improves margins on their AI assistant subscriptions. More importantly, controlling the model allows them to tightly couple Mellum2's capabilities with their proprietary IDE Abstract Syntax Trees (ASTs) and static analysis tools, resulting in suggestions that are deeply context-aware and syntactically sound.

What To Watch Next

Keep an eye on how JetBrains integrates Mellum2 into flagship tools like IntelliJ IDEA and PyCharm in upcoming release cycles. Specifically, watch for independent benchmarks comparing Mellum2's latency and acceptance rates against GitHub Copilot's underlying models. Furthermore, monitor whether JetBrains offers quantized versions of Mellum2 for local, on-device inference—a highly requested feature for enterprise developers operating under strict data privacy and compliance constraints.

Sources

https://huggingface.co/blog/JetBrains/mellum2-launch

jetbrains moe code-generation developer-tools llm