Signals
Back to feed
4/10 Model Release 5 Jun 2026, 03:00 UTC

NVIDIA releases 550B Nemotron 3 Ultra alongside new edge VLMs from Liquid AI and StepFun

NVIDIA's 550B Nemotron 3 Ultra brings a 1M context window and latent MoE architecture to open-weights, directly challenging proprietary models for complex agentic workflows. Meanwhile, Liquid AI's structured JSON-extracting VLMs provide a highly optimized solution for edge-based visual data parsing on any SoC. This dual push in massive agentic reasoning and hyper-efficient edge vision drastically lowers the barrier for local AI deployments.

What Happened On June 5, 2026, the open-weights AI ecosystem saw a massive influx of powerful new models, headlined by NVIDIA's Nemotron 3 Ultra, Liquid AI's LFM2.5-VL series, and StepFun's Step 3.7 Flash. These releases span the spectrum from massive data center reasoning engines to hyper-efficient edge vision models.

Technical Details NVIDIA's Nemotron 3 Ultra is a 550-billion parameter text generation model utilizing a latent Mixture-of-Experts (MoE) architecture with 55B active parameters during inference. It boasts a massive 1M token context window and achieves 5x faster inference speeds compared to its predecessors, specifically targeting agentic workflows and complex reasoning.

On the edge side, Liquid AI released two open-weight Vision-Language Models (VLMs): LFM2.5-VL-1.6B-Extract and LFM2.5-VL-450M-Extract. These are purpose-built to ingest images and field lists to output strictly structured JSON, optimized to run locally on any device System on Chip (SoC).

Additionally, StepFun dropped Step 3.7 Flash, a multimodal model featuring a 256k context window and blazing-fast inference speeds of up to 400 tokens per second.

Why It Matters From an engineering standpoint, this cohort of releases solves two distinct deployment bottlenecks. Nemotron 3 Ultra provides a viable, high-speed open-weights alternative for heavy agentic orchestration, which has traditionally been dominated by closed APIs. Its latent MoE design keeps compute costs manageable despite the massive 550B total parameter count. Conversely, Liquid AI's Extract models eliminate the need for brittle regex or massive cloud-based VLMs for standard visual data extraction tasks, pushing reliable JSON generation directly to edge devices.

What to Watch Next Monitor the community benchmarks for Nemotron 3 Ultra's performance on long-context retrieval and multi-step agentic tasks compared to proprietary models. For Liquid AI, watch for adoption rates in mobile and IoT applications, particularly how well the 450M model maintains JSON schema adherence under strict edge compute constraints.

model-releases open-weights nvidia edge-ai vlm