6/10 Model Release 27 May 2026, 09:00 UTC

MiniMax teases M3 with 1M context, while DeepSeek V4 and Grok V9-Medium preview upcoming releases.

The real story here is MiniMax's Sparse Attention (MSA) architecture, which promises massive 15.6x decoding speedups for 1M-token contexts, fundamentally altering the economics of long-context agents. While Grok V9-Medium's 1.5T scale is notable, MiniMax and DeepSeek's continued focus on extreme inference efficiency will likely dictate the next wave of production API routing.

A flurry of model announcements on X signals an intensifying race toward ultra-efficient, long-context inference and massive scale. The most technically significant update comes from MiniMax, teasing its upcoming M3 model alongside previews from DeepSeek and xAI.

What Happened & Technical Details MiniMax's M3 introduces a novel MiniMax Sparse Attention (MSA) architecture designed specifically for ultra-long context windows up to 1M tokens. The claimed performance gains are staggering: 9.7× faster prefilling and 15.6× faster decoding compared to traditional attention mechanisms. Simultaneously, DeepSeek previewed V4, claiming it closes the gap with frontier models while maintaining their signature low-cost pricing strategy. Finally, xAI announced that Grok V9-Medium, a massive 1.5T parameter foundation model, has completed training and entered the fine-tuning/RL phase, with an API release slated for 2–3 weeks.

Why It Matters From an engineering perspective, the MiniMax M3 update is the standout. Traditional dense attention scales quadratically, making 1M-token contexts prohibitively expensive and slow for real-time agentic workflows. If MSA delivers a 15.6× decoding speedup without severe degradation in retrieval accuracy (the classic sparse attention trade-off), it changes the math for building RAG-heavy applications, complex coding assistants, and long-document analysis tools. DeepSeek V4's preview reinforces a market trend where Chinese labs are aggressively driving down the cost-per-token of frontier-class models. Meanwhile, Grok V9-Medium at 1.5T parameters shows xAI is continuing to brute-force scale, though its ultimate utility for developers will depend heavily on its serving efficiency.

What to Watch Next Engineers should look out for independent benchmarks on MiniMax M3's "Needle In A Haystack" (NIAH) performance to verify if the MSA architecture maintains high recall at the 1M token limit. For Grok V9-Medium, the upcoming API pricing announcement will determine whether it can compete as a viable routing alternative to OpenAI and Anthropic, or if highly optimized models from DeepSeek and MiniMax will dominate the cost-effective frontier tier.

Sources

x-search-4c51ba2b-2026052709

minimax deepseek grok sparse-attention llm-inference