New challenges arise for cabling complexity, serviceability, and rack power
As new compute-intensive machine learning (ML) and artificial intelligence (AI) workloads drive servers to adopt faster PCI-Express® 5.0 Links, lower-latency cache-coherent protocols like Compute Express Link™ (CXL™), and a dizzying array of memory, storage, AI processor (AIP), smart NIC, FPGA, and GPU elements, so too is heterogeneous computing pushing the need for blazing-fast networks to interconnect the resources. Distributed compute/memory/storage nodes have spawned requirements for a high-bandwidth, low-latency, and-perhaps most importantly-scalable and serviceable network topology, one which can support the explosion of “east-west” traffic brought about by resource disaggregation.
100 GbE, based on 25 Gbps/lane technology, is the workhorse of today’s hyperscale datacenter Ethernet networks: connecting servers and storage to leaf switches, and leaf switches to spine switches. In a Clos network topology—common in hyperscale datacenters—scale is achieved by adding more servers under a new leaf switch and connecting this leaf switch to all next-level spine switches (Figure 1). Likewise, bandwidth may be scaled by adding additional spine switches (Figure 2).
An obvious bottleneck for both the latency of the network (how many “hops” from server to server) as well as the bandwidth of the network segments (port speed) is the leaf and spine switch-often the same physical piece of equipment. Switch silicon providers have been working hard on both fronts: increasing I/O speeds to 50 Gbps/lane today and moving to 100 Gbps/lane in 2021; and increasing the number of ports (the “radix”) to 64 eight-lane ports (e.g., QSFP-DD or OSFP form factors). All told, this amounts to 25.6 Tbps of switching bandwidth, which is helping to enable faster and flatter network topologies based on 400-GbE and 800-GbE ports.
To further enable the full usage of 400-GbE and 800-GbE ports while lowering the overall TCO, we have developed the Taurus Smart Cable Module™ (Taurus SCM™).
With an increase in port speed and port count comes new challenges for these lowest layers in the data center network. The first key challenge is how do you effectively distribute switch port bandwidth to servers? Server network interface cards (NICs) lag switch ports in terms of bandwidth. While 400-GbE switch ports came to market in 2020, server NIC ports are generally at 200-GbE or less. More fundamentally, there is a disparity between per-lane speed of server NIC ports and switch ports. While switch ports now run at 50 Gbps/lane, NIC ports are (for the most part) still at 25 Gbps/lane. When 100 Gbps/lane switch ports arrive in 2021, NIC ports will just be making the transition to 50 Gbps/lane. There is a price to pay for connecting a switch directly to a NIC: switch port bandwidth is underutilized, by a factor of two! Resolving this disparity requires rate conversion on the NIC side up to the higher 100 Gbps/lane speed to fully-utilize the switch port bandwidth.
The most abundant (and cost-sensitive) interconnect in the data center is the top-of-rack (ToR) switch to server interconnect. Painstaking efforts are made to keep this interface low-cost and easy-to-maintain. As such, passive direct attached copper (DAC) cables are traditionally used for these 1-3 meter links. As ToR switch ports move to 100 Gbps/lane, such reach becomes questionable and cable thickness, bend radius, and weight become a concern.
The second key challenge is how do you connect 400/800-GbE switch ports to one another in a practical manner? A simple Clos network topology connects every leaf switch to every spine switch; and in turn, every spine switch is connected to some number “exit leaf” switches (at least two), as shown in Figure 4.
The leaf and spine switches are generally not co-located (i.e., >10 meters apart). For example, the leaf switches may be at the top (or middle) or each server rack, whereas the spine switches may be at the end of a row or a cluster of rows. The spine switches, on the other hand, are often co-located (i.e., <3 meters apart), as shown in Figure 4.
Spine-to-exit-leaf interconnects (3 meters or less) can be serviced by copper cables, but they are not without their challenges.
Various solutions have been proposed to these challenges – some in use already for 50 Gbps/lane, and many expected to be adopted for 100 Gbps/lane.
While heterogeneous computing trends have led to the cramming of more processing power and bandwidth into servers and other end-node boxes, these same trends have resulted in an explosion in “east-west” traffic: data moving between servers, GPU trays, storage, and other end-node systems in the network. Luckily, 100 Gbps/lane technology will help carry this increased bandwidth; but with it comes new interconnect challenges and bottlenecks in switch-to-switch and switch-to-server connectivity. Smart Electrical Cables with Taurus Smart Cable Modules help remove these bottlenecks and offer built-in advanced cable and fleet management capabilities critical to ensuring high reliability and system up-time. Additionally, as a Smart Cable Module, Taurus can be designed into any cable OEM’s higher bandwidth copper Ethernet cable solution, which support a flexible supply chain of “active” plus “smart” cables for hyperscalers that will be key to supporting a new age for intelligent networks.