Signals
Back to feed
6/10 Research 6 May 2026, 12:03 UTC

OpenAI introduces Multipath Reliable Connection (MRC) via OCP to optimize large-scale AI training networks.

Traditional RoCEv2 struggles with load balancing and rapid link failures at the 100k+ GPU scale required for frontier models. By open-sourcing MRC, OpenAI is pushing a crucial standard for multipath routing and sub-millisecond recovery in Ethernet-based AI fabrics. This accelerates the industry's shift away from proprietary interconnects like InfiniBand toward highly resilient, commodity Ethernet architectures.

What happened

OpenAI has introduced Multipath Reliable Connection (MRC), a new networking protocol designed to enhance the resilience and performance of massive AI training clusters. Released as an open specification through the Open Compute Project (OCP), MRC aims to solve the networking bottlenecks inherent in scaling GPU clusters to the tens or hundreds of thousands.

Technical details

Modern AI training relies heavily on synchronous operations across thousands of GPUs, where a single dropped packet or failed link can stall the entire cluster. Currently, RDMA over Converged Ethernet (RoCEv2) is the standard for Ethernet AI fabrics, but it relies on single-path routing (ECMP) which is prone to hash collisions, incast congestion, and slow failure recovery.

MRC acts as an extension to the transport layer that natively supports multipathing. It enables dynamic per-packet or per-flowlet load balancing across multiple network paths, maximizing bisection bandwidth utilization. More importantly, MRC implements rapid, sub-millisecond path switching when a link degrades or fails. Instead of waiting for higher-level protocols or centralized SDN controllers to route around the failure—which causes expensive idle GPU time—MRC handles it transparently at the transport level.

Why it matters

As AI labs push toward 100k+ GPU clusters, network reliability becomes a primary constraint on cluster uptime and training efficiency (goodput). Proprietary solutions like NVIDIA's InfiniBand handle these issues well but lock operators into a single-vendor ecosystem. By contributing MRC to OCP, OpenAI is commoditizing high-performance AI networking. This gives hyperscalers and cloud providers the blueprint to build highly resilient, lossless Ethernet fabrics using multi-vendor hardware, directly challenging InfiniBand's dominance in the AI data center.

What to watch next

Monitor the adoption of MRC by major network silicon vendors (like Broadcom, Marvell, and Cisco) and its integration into next-generation SmartNICs and switches. The speed at which OCP members ratify and implement MRC in hardware will dictate how quickly Ethernet can achieve true parity with InfiniBand for frontier model training. Furthermore, watch for how this specification interacts with the Ultra Ethernet Consortium (UEC), as MRC's goals heavily overlap with UEC's mandate to overhaul Ethernet for AI workloads.

networking infrastructure open-source openai rocev2