News stories tagged with #low-latency
Nvidia Integrates Groq 3 LPU into Vera-Rubin Platform: A New Era of Low-Latency AI Inference Begins
At GTC 2026, Nvidia announced the integration of Groq’s 3rd-generation Language Processing Unit (LPU) into its new Vera-Rubin-NVL72 platform to dramatically boost AI inference throughput with ultra-low latency. Designed specifically for inference workloads, the LPU leverages high SRAM and internal bandwidth for rapid token processing. This technology complements Nvidia’s existing GPU ecosystem and is deployed in new LPX racks. Partners such as HPE and Giga Computing showcased next-generation AI factories and high-performance computing infrastructure at the event, built around these advancements.