Addressing Connectivity Bottlenecks at Rack-Scale

As AI clusters scale from single nodes to thousand-GPU fabrics, connectivity is becoming the bottleneck that prevents all that compute from reaching its potential. NVIDIA’s Blackwell and Vera Rubin architectures have fundamentally redefined what a rack can do—but that density creates new challenges at every layer: scale-up, scale-out, memory, and storage. When any one of them lags, you’re leaving expensive GPU cycles on the table. The next phase of AI infrastructure will be won or lost in the interconnect layer.

At NVIDIA GTC 2026, I’ll be presenting on how Astera Labs is tackling each of these challenges head-on—and why purpose-built connectivity is no longer optional infrastructure. It’s the difference between a GPU that runs and a GPU that performs. Each one is a tax on GPU performance. Here’s how we’re eliminating them.

Challenge 1: Scale-Up Flexibility Without Sacrificing Performance

The future of scale-up is hybrid. PCIe and UALink-based scale-up fabrics deliver rapid iteration and component flexibility—the right choice for teams that need the freedom to swap, upgrade, and customize. Meanwhile, NVLink—NVIDIA’s high-speed, chip-to-chip interconnect fabric—delivers the ultra-high bandwidth and low latency that tightly coupled GPU workloads demand. Our Scorpio PCIe Smart Fabric Switch and signal conditioners enable open standards-based scale-up PODs, while our participation in the NVLink Fusion ecosystem means custom ASICs and third-party CPUs can now access NVLink’s performance advantages in semi-custom AI system designs.

Challenge 2: Seamless Interoperability with Scale-out Networks

In our recently expanded Cloud-Scale Interop Lab, we rigorously test Taurus Ethernet Smart Cable Modules to ensure seamless interoperability with Spectrum-X switches and ConnectX SuperNICs to enable larger AI clusters across multiple racks. Taurus Smart Cable Modules enhance signal integrity and bandwidth utilization for 100G/lane Ethernet switch-to-switch and switch-to-server interconnects, extending reach up to 7 meters while delivering advanced fleet management and deep diagnostic capabilities via COSMOS, a unified software layer supporting PCIe, CXL, and Ethernet protocols. Beyond software and interop, we’re running NCCL (NVIDIA Collective Communications Library) performance test to ensure link reliability, performance and RAS under stress of a real AI workload. The result: deployment flexibility with plug-and-play AECs for popular switches and NICs.

Challenge 3: Memory Capacity Is Choking AI Inference

KV Cache—the memory structure that stores key-value vectors from prior tokens to avoid recomputation—is one of the central bottlenecks in serving large language models at scale in production environments. It’s memory-hungry by design, and as context windows grow longer, so does the pressure on system memory. Traditional DRAM capacity limits are a hard ceiling that even the most capable accelerators can’t outrun. Our Leo CXL Memory Controller, deployed in the Penguin Solutions KV Cache Server with SMART Modular memory, delivers up to 3.6x more system memory with lower latency than SSD-based alternatives, directly accelerating inference performance for NVIDIA Dynamo-powered deployments.

Challenge 4: Storage Can’t Keep GPUs Fed

Next-generation AI workloads like Graph Neural Networks demand storage that behaves more like memory. NVIDIA’s Storage-Next architecture addresses this by enabling GPUs to initiate and control storage I/O directly. Our Scorpio Smart Fabric Switch is central to such architectures—enabling disaggregated, NVMe-over-Fabrics storage that the GPU accesses with dramatically reduced latency compared to traditional CPU-mediated I/O, ensuring accelerators stay fully utilized.

Ecosystem Collaboration as a Force Multiplier

Connectivity solutions are only as good as the broader ecosystem they operate in. To that end, Astera Labs’ Cloud-Scale Interop Lab is a core part of how we deliver value—continuously validating our portfolio across hundreds of XPUs, NICs, memory, and storage devices in partnership with a broad set of hardware and software collaborators. By replicating real-world customer topologies and workloads, we ensure our customers can deploy with confidence rather than spending months on integration validation themselves.

AI infrastructure at scale is a team sport. We’re proud to work alongside NVIDIA and our ecosystem partners to make sure that when you light up a rack, every component is pulling its weight.

Come See Us at GTC

I’ll be presenting session EX82087 on Tuesday, March 17 at 1pm PT. Visit the Astera Labs booth (#116) for a demo, and let’s talk connectivity.