As AI accelerators evolve, computing platforms are increasingly integrating multiple PCIe® generations, creating new challenges in platform design. In addition, hyperscalers and system integrators are deploying platforms at unprecedented speed, further driving the need to design for modularity and scale.
Astera Labs’ P-Series Scorpio Smart Fabric Switches are purpose-built to overcome these challenges by enabling seamless mixed traffic head-node connectivity across diverse PCIe generations and topologies.
This blog explores how to design for evolving server boards that must become increasingly modular to keep up with rapid advancements in AI infrastructure.
Changing server designs
Previous generation of AI server designs were based on the standardized Universal Baseboard (UBB) defined by the Open Compute Project.
These platforms feature an 8-GPU sled attached to a Host Interface Board (HIB) that provides scale-out and ingest and is cabled into a Host Board.
Figure 1: Traditional AI server design
Host Interface Boards have been traditionally built around a fixed CPU:GPU ratio which worked well with the standardized Universal Baseboard. However, AI platforms are now evolving into different shapes and GPU-to-CPU ratios. Host Interface Boards with a fixed GPU:NIC:CPU ratio can’t scale across new configurations required by hyperscalers.
Enter PCIe 6 and Scorpio– changing the way accelerator boards are designed
With GPUs and SSDs moving to PCIe 6 speeds, while NICs and CPUs remain at PCIe 5, accelerator boards are now evolving into three distinct form factors.
- 8-GPU UBB Board: Continuation of original UBB architecture with reduced operating point for GPUs due to thermal cooling capacity. The server has eight NICs and two CPUs.
- 2-GPU Board with x86 CPU: Dual GPU baseboards that are either water or air cooled and are connected to HIBs and Host Boards. The server has multiple dual GPU boards, multiple NICs, and two CPUs
- 2-GPU 1-CPU Board: Dual GPU baseboards paired with CPUs are co-located on the same accelerator board. This configuration does not require a host board because the CPU is integrated with the GPUs. The server has multiple 2 GPU and 1 CPU boards matched with NICs.
Scorpio P-Series Fabric Switches are built around a modular architecture enabling re-usable host interface boards that can be deployed across these different accelerator boards.
Figure 2: Modular switch ingest board for multiple PCIe 6 platforms
- 8-GPU UBB Board
- For each 2 GPUs, there is one host interface board featuring two Scorpio PCIe switches and 4 NICs
- This modular board provides connectivity for two 400 Gb NICs to each PCIe 6×16 connected GPU
- 2-GPU Board with x86 CPU
- For each dual GPU board, there is one host interface board featuring two Scorpio PCIe switches and four NICs
- This modular board provides connectivity for two 400Gb NICs to each PCIe 6×16 connected GPU
- 2-GPU, 1-CPU Board
- For each dual GPU, single CPU board, there is one host interface board featuring two Scorpio PCIe switches and four NICs
- This modular board provides connectivity for two 400 Gb NICs to each PCIe 6×16 connected GPU
3 key benefits of Scorpio’s modular approach
Scorpio Smart Fabric Switches deliver three key benefits for server design.
- Design Once and Repeat. Simply build and test one building block using Scorpio, and then replicate across the platform, even multiple platforms.
- Easy Setup. By breaking the larger platform into smaller components, the switch configuration and building block validation is simplified.
- Rapid Qualification. Reduce test complexity at the system level by leveraging test and validation of the building blocks.
Summary
As bandwidth speeds will vary across platforms depending on the generation of PCIe being implemented, designing for modularity and flexibility is key. By using Scorpio P-Series Fabric Switches in your design, you can choose to use any PCIe generation of GPU, NIC, CPU or SSD and still achieve high-speed, reliable connectivity.
Want to dive deeper? Request our white paper “Migrating AI Server Designs to a Modular Scorpio Architecture” to learn how Scorpio is enabling modularity for next-gen AI infrastructure.