CXL

Leo CXL Smart Memory Controllers: A Commitment to Interoperability

April 30, 2024 by Lori Zielinski

Meeting compliance standards and supporting plug-and-play interoperability are critical, not just for Astera Labs and our customers, but also for the continued success of the AI and Cloud Infrastructure ecosystem. The CXL Consortium’s compliance test events provide consortium members with a venue to collaborate and test the functionality and interoperability of end-products, which enables end customers to seamlessly deploy CXL solutions.

We achieved a significant milestone with Leo CXL Smart Memory Controller, which successfully passed the rigorous CXL 1.1 compliance testing at the CXL Consortium Compliance Test Event in October 2023. Our Leo memory controller and A1000 card passed the demanding CXL 1.1 Compliance Verification (CV) tests, Physical Layer tests, and Protocol suite tests. As a result, our Leo CXL Smart Memory Controller and A1000 add-in card were added to the CXL Consortium Integrators List in October 2023.

Dr. Joachim Peerlings, Vice President of Network and Data Center Solutions, Keysight said, “Keysight collaborated closely with Astera Labs to facilitate testing of their Leo CXL Smart Memory Controllers. These controllers play a crucial role in accelerating AI and cloud infrastructure with memory expansion enabling robust CXL Link capabilities. This achievement will accelerate industry adoption of this important high-speed interconnect for coherent sharing of memory between computing devices. This development was enabled by Keysight’s PCIe 6.0 / CXL Protocol Analyzer and Exerciser.”

Joe Mendolia, Vice President, PSG Marketing, Teledyne LeCroy said, “Teledyne LeCroy congratulates Astera Labs for the Leo CXL Smart Memory Controller portfolio’s inclusion on the CXL Consortium Integrators List. This milestone was achieved through the use of our industry leading CXL protocol analyzer and exerciser and the result of our deep collaboration on CXL protocol testing demonstrating that Leo is ready for robust, real-world CXL deployments as the ecosystem starts to emerge.”

The Road Ahead: Building on Compliance with the Cloud-Scale Interop Lab

At Astera Labs, we don’t simply stop at compliance. In our Cloud-Scale Interop Lab, we thoroughly tested Leo’s hardware and firmware with a diverse array of CXL hosts, memory devices, and operating systems. We conducted a battery of tests from the physical layer to the system level in real-world use cases to simulate large-scale cloud deployments. These tests ensure the ability of our Leo CXL Smart Memory Controllers to withstand the rigors of demanding heterogeneous data center environments.

Cloud-Scale Interop Lab Test Suite

Leo’s entry into the CXL Consortium Integrators List combined with rigorous supplemental validation in our Cloud-Scale Interop Lab with DDR5-5600 memory modules is a strong testament of robustness, forged through deep partnerships with multiple vendors, to ensure interoperability across increasingly diverse CXL architectures. This translates to faster time-to-market and reduced deployment costs for our customers.

We invite you to download our Leo Interop Reports today for more details.

Astera Labs Delivers Industry-First CXL Interop with DDR5-5600 Memory Modules

August 7, 2023 by Lori Zielinski

Earlier this year, we announced the launch of our Cloud-Scale Interop Lab for CXL to provide robust interoperability testing between our Leo Memory Connectivity Platform and a growing ecosystem of CXL supported CPUs, memory modules and operating systems. By providing this critical testing, we enable customers to deploy CXL-attached memory with confidence by minimizing interoperational risk, reducing system development time and costs, and accelerating time-to-market.

At that time, we released Interop Reports for DDR5-4800 memory modules with leading vendors. We’re now excited to announce the availability of Interop Reports for the latest DDR5-5600 memory modules, demonstrating our commitment to delivering industry-leading performance for our customers.

Through close collaboration with our ecosystem partners including AMD, Intel, Micron, Samsung, and SK hynix, we have further optimized our Leo Memory Connectivity Platform to enable best-in-class performance gains with DDR5-5600 RDIMMs.

DDR5 memory modules support:

Higher transfer speed for RCD (registered clock driver) to retransmit instructions
PMIC (power management integrated circuit) to enhance power management and monitoring
Independent subchannels to improve data throughput in server applications
SPD (serial presence detect) to support sideband access to improve usability and telemetry of critical parameters
Temperature sensor ICs on the RDIMM to enable constant monitoring of module temperatures

These DDR5 features can create challenges for interoperability. For example, memory vendors use different combinations of RCDs and PMICs, thereby enlarging the RDIMM test matrix and scope of regression testing. With our firmware-defined solution, we’ve fine-tuned Leo to overcome these challenges.

Figure 1: DDR5 Module Layout

We’ve conducted rigorous electrical testing, application-level testing, analysis, and performance tuning on a wide-spectrum of RDIMMs with our Leo Memory Connectivity Platform.

Leo also comes with fleet management capabilities for cloud-scale deployments. With the Leo SDK, our Cloud-Scale Interop Lab can orchestrate firmware updates and functional tests throughout various lab environments. This ensures that all Leo SVBs have the most up-to-date firmware to validate new RDIMMs, and we continuously run regression tests against strategic RDIMM selections with different electrical properties from all major memory vendors.

Validating DDR5-5600 RDIMMs bring significant benefits for CXL-attached memory, including:

Higher Perf per RDIMM, lowering TCO (Total Cost of Ownership)
Increased memory bandwidth and lower latency to saturate PCIe 5.0 x16 CXL 1.1/2.0 link
Accelerated time-to-market with flexible supply chain

Rigorous interoperability testing

To ensure high confidence in our Leo Memory Connectivity Platform, we’ve developed robust orchestration and automation capabilities. Our comprehensive suite includes industry standard tools, such as MLC (Memory Latency Checker), to measure performance and ensure consistency across similar memory capacities, configurations and ranks. Based on our MLC tests, we can see that 64GB DDR5-5600 RDIMMs deliver ~14% boost in MLC performance compared to DDR5-4800 RDIMMs.

Figure 2: Relative MLC Performance

For the MLC test, each NUMA node was configured with 1DPC (DIMM per channel). Eight DDR5-4800 RDIMMs were populated per socket, running at 4800 MT/s and two DDR5-5600 RDIMMs were populated on the Leo SVB (System Validation Board). Below we can see, each NUMA node has 128GB of memory capacity, but Leo can support higher memory speeds.

Figure 3: NUMA nodes from Terminal

This test has been done with all the major memory vendors, supporting different capacities, speeds, and PMIC/RCD combinations. This is a proof point for the stability and performance that we have achieved with our production-ready Leo Memory Connectivity Platform.

Ecosystem Support

We are collaborating on interop testing with industry leaders delivering high-performance DDR5 memory modules for the expanding CXL market.

Siva Makineni, Vice President of Advance Memory Systems at Micron Technology, said: “We’re pleased to continue our collaboration with Astera Labs on interoperability testing, power and performance optimization to bring up the industry-first Cloud-Scale Interop Lab.”

Jangseok Choi, Vice President of New Business Planning Team at Samsung Electronics, said: “Our memory modules combined with CXL enable servers to expand memory capacity to tens of terabytes, and we’re excited to partner with Astera Labs to confirm our DDR5-5600 solution is interoperable with Leo Smart Memory Controller and various configurations of processors.”

Hyungsoo Kim, VP and Head of DRAM Application Engineering Group at SK hynix, said: “SK hynix is committed to providing customers with flexible CXL memory expansion with increased bandwidth and capacity. This collaboration with Astera Labs and SK hynix engineering teams from both headquarters and U.S. Engineering Center (UEC) to validate our DDR5-5600 memory modules with Astera Labs’ Leo Memory Connectivity Platform and CXL-capable CPUs is the next critical step in enabling a CXL ecosystem that meets the performance needs of our customers with the most advanced and innovative memory technologies such as DDR5.”

Summary

As CXL adoption gains momentum this year, we remain at the forefront, working closely with our customers and partners to deploy innovative CXL solutions.

Our support for DDR5-5600 RDIMMs on Leo Memory Connectivity Platform is a significant milestone for CXL innovation and a proof point for the industry for how disaggregated memory can offer significant memory bandwidth without compromising memory intensive workloads, such as AI, HPC, Big Data Analytics, and more. With our relentless pursuit of industry-leading performance and collaboration with key ecosystem partners, we continue to drive innovation and shape the future of memory connectivity.

To access the latest Interop Reports featuring DDR5-5600 RDIMMs, visit www.asteralabs.com/getreports.

Resources:

Video: Interop Testing with Leo Memory Connectivity Platform and DDR5-5600 RDIMMs
Video: How We Test Leo Memory Connectivity Platform
Infographic: Three Reasons Why We Test for Interoperability

The Generative AI Impact: Accelerating the Need for Intelligent Connectivity Solutions

May 30, 2023 by Lori Zielinski

We have entered the Age of Artificial Intelligence and Generative AI is developing at a rapid pace and becoming integral to our lives. According to Bank of America analysts, “just as the iPhone led to an explosion in the use of smartphones and phone apps, ChatGPT-like technology is revolutionizing AI”.[1] Generative AI is changing every aspect of our lives including education, healthcare, entertainment, customer service, legal analysis, software development, engineering, manufacturing, and more.

A key Generative AI performance breakthrough is the transformer: a machine learning architecture that better understands context. These transformers enable large language models (LLMs) including OpenAI ChatGPT and DALL-E, Microsoft Bing/Copilot, and Google Bard to create human-like text interaction, artistic images, and expert-level personal assistants – capabilities that seemed futuristic just months ago.

The race to deploy and build on these tools is predicted to drive robust demand for processors, memory, and connectivity chips that are used for training and running these AI models. This represents a significant opportunity for savvy tech companies and investors. According to a May 2023 report by Next Move Strategy Consulting, the global AI chip market is expected to increase from $28.83B in 2022 to $304.9B in 2030 and grow at a CAGR of 29.0%.

Source: Next Move Strategy Consulting

AI Compute Outpacing Moore’s Law

In order to achieve great performance, general LLMs require extensive pretraining, using large datasets. For example, ChatGPT was reportedly trained on 570GB of text, taking 34 days despite running on 10,000 GPUs.[2] [3] As models become multi-modal (able to process text, imagery, and audio), training data size for general LLMs may increase.

The computational power required to train large models on big datasets is rapidly outpacing Moore’s Law. While single GPU performance is increasing by about 2x every 18-24 months (Moore’s Law plus architectural improvements), training dataset and AI model size are growing by 10x and 35x, respectively, in the same time frames.[4] If the training needs to be completed in a reasonable number of days, the power of dozens or even 1000s of GPUs may be required.

Source: Sevilla, Jaime, et al. “Compute trends across three eras of machine learning

Fortunately, transformer training lends itself to being parallelized as it can run on multiple GPUs. Using this parallel processing technique allows Generative AI models like ChatGPT to learn a great deal about our world in a matter of weeks.

Disaggregated parallel compute architectures, however, still require processing nodes to share vast amounts of data in real-time with each other. This creates a bottleneck at the interconnects between processing as well as memory components. These connectivity bottlenecks become more challenging as hardware scales and the physical distance between processing, memory and switching modules grows.

Overcoming Connectivity Bottlenecks

Astera Labs delivers semiconductor-based connectivity solutions purpose-built to unleash the full potential of intelligent data infrastructure at cloud-scale. To date, we’ve developed three class-defining, first-to-market product lines that deliver critical connectivity for high-value artificial intelligence and machine learning applications.

Our connectivity solutions enable disaggregated, heterogeneous Generative AI architectures that reduce cost and power compared to optical solutions. Our Aries Smart Retimers extend the reach for PCIe NICs, GPUs, FPGAs, and NVMe drives within servers. Our Leo Memory Connectivity Platform enables CXL-attached memory expansion, pooling, and sharing for cloud servers. And, our Taurus Smart Cable Modules are optimal for Ethernet connectivity for switch-to-switch and switch-to-server topologies. We’ve also included support for fleet management capabilities and health monitoring diagnostics to predict failures before they happen and to increase multi-tenant AI system performance and uptime, ease re-provisioning, and lower total cost of ownership (TCO).

Learn more about our intelligent connectivity solutions:

Aries PCIe®/CXL™ Smart Retimers enable generative AI applications by improving signal integrity, extending reach up to 3x and supporting high bandwidth communication between GPUs or accelerators required to manage the complex datasets involved in training the large language models.
Leo CXL Memory Connectivity Platform eliminates memory bandwidth and capacity bottlenecks by enabling up to 2TB more memory per CPU. Leo is optimized for meeting complex computational needs of generative AI workloads at low latency.
Taurus Ethernet Smart Cable Modules™ remove rack-level ethernet bottlenecks by overcoming reach, signal integrity, and bandwidth utilization issues in 100G/Lane Ethernet Switch-to-Switch and Switch-to-Server applications. By enabling thinner cables and supporting gearboxing and breakouts, these modules are optimized for high-density, high-throughput AI/ML rack configurations.

Accelerating Time-to-Market

At Astera Labs, we understand that supporting plug-and-play interoperability is critical for deploying these new architectures at scale. In our industry-first Cloud-Scale Interop Lab, we rigorously test our portfolio for standards compliance and system-level interoperation with all major hosts, endpoints, and memory modules – so our customers can deploy with confidence and accelerate time-to-market.

Conclusion

The Age of AI is here, and Astera Labs’ intelligent connectivity solutions for data infrastructure can help you seize the booming Generative AI opportunity. Contact us to learn more.

References

1] Artificial intelligence like ChatGPT is on the brink of an ‘iPhone moment’ thanks to ‘warp-speed’ development, Bank of America says, Fortune Magazine, Prakash, March 1, 2023

2] ChatGPT and generative AI are booming, but the costs can be extraordinary, CNBC, Vanian/Leswing, March 13, 2023

3] Update: ChatGPT runs 10k Nvidia training GPUs with potential for thousands more, FierceElectronics, Hamblin, February 11, 2023

4] Sevilla, Jaime, et al. “Compute trends across three eras of machine learning.” 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022.

The Importance of Security Features in a CXL Memory Controller to Protect Mission-Critical Cloud Data

February 23, 2023 by AsteraAdmin

The explosion of modern applications such as Artificial Intelligence, Machine Learning and Deep Learning is changing the very nature of computing and transforming businesses. These applications have opened myriad ways for companies to improve their business development processes, operations, and security and to provide better customer experiences. To support these applications, platforms are being designed to utilize SoCs that can process large data sets in cloud data centers, have specialized processing power to service the use cases, create customized solutions, and scale this market. The market size of AI was valued at $65.48 billion in 2020, and is projected to reach $1,581.70 billion by 2030, growing at a CAGR of 38.0% from 2021 to 2030, according to a recent report by Allied Market Research.

With this exponential growth comes rising concerns about the security on these platforms running mission-critical applications in emerging markets such as healthcare, automotive, and data analytics. Security is one of the major factors that is contributing to the complexity as well as cost of development and maintenance of these systems. In fact, per the 2022 report published by IBM Security, the global average total cost of a data breach increased to USD 4.35 million in 2022 and is the highest in history. What’s even more concerning is that it took an average of 207 days to identify the breach and 70 days to contain it.

The complexity of the threats by malicious actors who can breach these platforms has been increasing significantly over the past decade and it needs to be addressed with concrete security measures at the hardware, software, and protocol level on platform SoCs.

Figure 1: Average total cost of a data breach in USD millions (Source: IBM Security, 2022)

Security Threats Plaguing Cloud-Centric SoCs

The challenge for chipmakers is not only to develop high performance SoCs for cloud applications, but also to have features that can counter the sophistication of the threat vectors to secure confidential and sensitive assets on the platforms. The big question companies now ask is how much security is enough and, even if a device starts out with its security intact, will it remain secure throughout its lifetime.

This became evident with recently discovered vulnerabilities: Meltdown, Spectre, and Foreshadow which were based on speculative execution and branch prediction. Through such incidents, we have discovered how attackers use sophisticated attack vectors to breach a system. These attack vectors include:

Attacks on data-in-transit such as intrusion attacks using sniffing devices, which can lead to data leakage or alteration of code or data being transmitted on high-speed links.
Attacks on data-in-use such as side-channel attacks which include Electromagnetic (EM) attack and Differential power analysis (DPA), where the information during code execution is exploited to alter the device behavior.
Attacks on data-at-rest such as availability related threats, including disruption or Denial-of-Service (DoS) attacks against stored data in systems.

Figure 2: Security attack vectors on data

As these attacks become more sophisticated, next generation interconnect standards such as Compute Express LinkTM (CXL™) are also continuously adapting to protect against these threats by defining better security protocols to provide data confidentiality, integrity, and data encryption (IDE) mechanisms transiting a CXL link.

Security Features Needed in a CXL-based Memory Controller

Cloud based applications such as AI and ML require SOCs that can increase memory bandwidth to unlock the performance required for next-generation data centers. Compute Express Link™ is an open standard developed to provide high-speed, low-latency, cache-coherent interconnect for processors, accelerators, and memory expansion. CXL Type-3 memory controllers can provide a cost-effective and high-performance solution to expand memory bandwidth and capacity. Additionally, to protect against attacks described earlier, a CXL Type-3 device also needs to implement security features using cryptographic techniques defined in CXL 2.0 specification as well as other industry standard data encryption, authentication mechanisms. The following sections describe some of the important security features that a CXL-based memory controller needs to implement to protect sensitive assets in a data center.

CXL 2.0 IDE

Considering the modern threat vectors, the CXL Consortium, in close collaboration with other industry-standard bodies such as PCI-SIG and Distributed Management Task Force (DMTF), has incorporated the Integrity and Data Encryption (IDE) features in the CXL 2.0 specification. IDE features are designed to provide confidentiality, integrity, and replay protection at a FLIT level (Flow Control Units). It defines Message Authentication Code (MAC) which are designed to protect against attacks such as interception of packets between point-to-point devices CXL links. While security is an essential requirement, system designers must also consider the performance needs of their systems when enabling IDE. To address the balance between performance and security, CXL 2.0 specification defines two IDE modes:

Containment Mode where the data is released for further processing only after the integrity check passes. This mode impacts both latency and bandwidth. The bandwidth impact comes from the fact that integrity value is sent quite frequently.
Skid Mode where the data will be released for further processing without waiting for the integrity value to be received and checked. This allows for less frequent transmission of the integrity value. Skid mode allows for near zero latency and very low bandwidth overhead.

Hardware-based Root of Trust (RoT)

An immutable hardware-based Root-of-Trust (RoT) is essential for implementing an entity that can be trusted to always behave in the expected manner and is the foundation upon which all further security layers are created. To ensure that all the layers involved in device operation are secure, it is imperative to extend the circle of trust from hardware-based RoT to every single component that stores firmware and configuration settings used by the device.

Secure Boot

Extending security from RoT requires implementation of a secure boot mechanism to verify the integrity of every code being loaded on the device before it’s allowed to execute. Secure boot process uses asymmetric private-public key pair. A private key is used with a corresponding asymmetric public key in a cryptographic algorithm for computation and verification of digital signatures. The private key is uniquely associated with the owner, is not made public, and is used for generation of the digital signature of the data. The public key is used to verify a digital signature that was signed using the corresponding private key. Since the public key itself isn’t considered a device secret, it is made public. Immutable RoT enforces authentication of the next stage mutable bootloader by checking the code for proper signature by an approved signer. Secure boot is considered successful if the integrity check passes and fails if it doesn’t. This process is repeated on all other layers of the firmware.

Memory Encryption

Memory encryption is an important feature for a CXL-based Memory Controller since it interfaces with off-chip memory devices to enable memory expansion, pooling and sharing. Encrypting memory is one of the most reliable techniques to prevent data being accessed across different guests/domains/zones/realms.
AES-XTS, is the de-facto cryptographic algorithm for protecting the confidentiality of data-at-rest on storage devices. It is a standards-based symmetric algorithm defined by NIST SP800-38E and IEEE Std 1619-2018 specifications. Advanced memory encryption technologies also involve integrity and protocol level anti-replay techniques for high-end use-cases. DRAM inline cipher engines protect data in use for secure memory transactions at high data rates between hosts and attached memory. With memory encryption in place, even if any of the isolation techniques have been compromised, the data being accessed is still protected by cryptography and it prevents physical attacks like a hardware bus probing on the interface.

Conclusion: Leo Memory Connectivity Platform Provides End-to-End Security

Security is essential for high-performance CXL interconnects to protect private and sensitive user information transmitted on the links. Leo Memory Connectivity Platform provides a complete set of end-to-end security features to protect mission-critical user data. Leo’s security features provide data confidentiality, integrity, and data encryption (IDE) mechanisms transiting a CXL link, as well as additional security features to ensure modern cloud-based systems and valuable user data are protected. These security measures apply to a wide variety of use models, offer broad interoperability, and align to industry best practices.

To learn more about Leo Memory Connectivity Platform, please visit www.AsteraLabs.com/Leo.

References

CXL 2.0: IDE for CXL.cache, CXL.mem protocols
Distributed Management Task Force
Open Compute Platform Specification

Connectivity Is Key to Harnessing the Data Reshaping Our World

November 10, 2021 by Susan Nayak

by Jitendra Mohan

We are surrounded by data, from images captured on our mobile phones and videos streamed online to elaborate sensor fusion data required for autonomous driving. As shown in Figure 1, it is estimated that in 2020, 500 hours of video were uploaded to YouTube every minute and >400,000 hours of video were streamed every minute from Netflix alone.

a-minute-on-the-internet — Figure 1: A snapshot of content generated in one minute on various websites and applications (Source: Visual Capitalist via Statista)

Every facet of our life, especially in this post COVID-19 world, from shopping online to interacting with friends and family over social media is made possible by data – and the amount of data we generate is only expected to increase dramatically. In fact, the total amount of data is expected to grow to 175 Zettabytes (1 ZB = 1 trillion gigabytes) by 2025, up from 33 ZB in 2018 (Figure 2).

Indeed, our world is going through a digital transformation with the prevalence of Big Data as we monitor and digitize everything and systematically extract information from raw data to gain valuable insights.

annual-size-of-the-global-datasphere — Figure 2: Amount of data expected to be used through the year 2025 (Source: Data Age 2025, sponsored by Seagate with data from IDC Global DataSphere, Nov 2018)

Big Data Lives in the Cloud

Along with explosion of data creation, there is an accompanying shift in how the data is stored and analyzed. While data creation continues to happen at the endpoints, edge, and increasingly in the cloud, a disproportionate amount of this data is stored in the cloud. As shown in Figure 3, IDC estimates that by 2025, nearly 50% of worldwide data will be stored in public clouds. Public clouds will also account for nearly 60% of worldwide server deployments, thereby providing necessary compute power to process all the data stored in the cloud. Cloud Service Providers (CSPs) operate large data centers that excel at storing and managing big data while providing storage and computing on-demand to end-users. Gartner estimates that by next year (2022), public clouds will be essential for 90% of data and analytics innovation.

where-is-the-data-stored — Figure 3: Data is increasingly being stored in Public Clouds (Source: Data Age 2025, sponsored by Seagate with data from IDC Global DataSphere, Nov 2018)

AI/ML Complexity Doubles Every 4 Months

Artificial Intelligence (AI) workloads running in the cloud bring together data storage and data analytics to address business problems that would otherwise be impossible to tackle.

Gartner estimates that by 2025, 75% of enterprises will operationalize AI to provide insights and predictions in complex business situations. This proliferation of AI requires that the underlying models be quick, reliable, and accurate. Machine Learning (ML), especially Deep Learning (DL), combines AI algorithms, big training data, and purpose-built heterogeneous compute hardware to handle extremely large and constantly evolving datasets.

While ML has been around for decades, over the last eight to ten years, ML model complexity has far outpaced Moore’s law bound advancements in a single compute node. Figure 4 depicts a historical compilation of the compute requirements of training AI systems showing that ML complexity is doubling approximately every 3.4 months, compared with Moore’s law of doubling of transistor count in ICs every two years!

Not surprisingly, the industry is investing considerable resources and effort in distributing ML workloads over multiple compute units within a server and across multiple servers in a large data center.

ml-complexity-growing-exponentially-1024x681 — Figure 4: ML complexity growing exponentially in the Modern Era far outpacing Moore’s Law (Source: OpenAI)

800G Ethernet and Compute Express Link (CXL) Enable Distributed ML

Distributing ML workloads across multiple compute nodes is a challenging problem. Running massive parallel distributed training over state-of-the-art interconnects from just a few years ago highlights the connectivity bottleneck where 90% of wall time is spent in communication and synchronization overhead. The industry has responded with advances in interconnect technology, both in terms of increased data rates and reduced latencies to drive resource disaggregation and move large amounts of data within and across multiple servers.

Ethernet is the dominant interconnect technology to connect various servers across a data center. A typical data center networking topology connects a variety of servers over copper interconnects for North-South data traffic patterns and optical interconnects for East-West data traffic patterns. While existing 200G (8x25G) Ethernet interconnects are based on 25Gbps NRZ signaling rates, upcoming deployments in 2023 are designed for 800G (8x100G) interconnects based on 100Gbps PAM-4 signaling to quadruple the available interconnect bandwidth across servers.

Unlike Ethernet, there has not been a dominant industry standard cache coherent interface for connectivity within the server. Over the last five years, several standards like CCIX, Gen-Z, OpenCAPI, NVLink, and more recently, Compute Express Link™ (CXL™) have outlined cache coherent interconnects for distributing processing of ML workloads. Of these, NVLink is a proprietary standard used largely by Nvidia devices. CXL, on the other hand, has gained wide industry adoption in a short time to emerge as the unified cache coherent server interconnect standard.

CXL defines a scalable high bandwidth (16x32Gbps), low latency (nano seconds) cache coherent interconnect fabric running over the ubiquitous PCI Express® (PCIe®) interconnect. The CXL protocol, first introduced by Intel in 2019, has now established itself as the industry standard to interconnect various processing and memory elements within a server as well as to enable disaggregated composable architectures within a rack.

CXL 1.1 defines multiple device types that implement cxl.io, cxl.cache, and cxl.mem protocols. CXL 2.0, introduced in 2020 by the CXL Consortium, adds capabilities for a switching fabric for fanout and extended support for memory pooling for increased capacity and bandwidth. CXL 3.0, currently in development, will further enable peer-to-peer connectivity and even higher throughput when combined with PCIe 6.0.

Intelligent Cloud Connectivity Solutions

Over the last three years, Astera Labs introduced a portfolio of CXL, PCIe, and Ethernet connectivity solutions to remove performance bottlenecks created by ML workloads in complex heterogeneous topologies. Astera’s industry leading products, address the challenges of connectivity and resource sharing within and across servers. These solutions, in the form of ICs and boards, are purpose-built for the cloud and offer the industry’s highest performance, broad interoperation, deep diagnostics, and cloud-scale fleet management features.

Connectivity Solutions

Aries Smart Retimers: First introduced in 2019 alongside CXL 1.1 standard, the Aries portfolio of PCIe 4.0/5.0, and CXL 2.0 Smart Retimers overcome challenging signal integrity issues while delivering sub-10ns latency and robust interoperation.
Taurus Smart Cable Module™ (Taurus SCM™): Taurus SCMs enable Ethernet connectivity at 200GbE/400GbE/800GbE over a thin copper cable while providing a robust supply chain necessary for cloud-scale deployments.

Memory Accelerator Solutions

Leo CXL Memory Connectivity Platform: Industry’s first CXL SoC solution implementing cxl.mem protocol, the Leo CXL Memory Accelerator Platform allows a CPU to access and manage CXL-attached DRAM and persistent memory, enabling the efficient utilization of centralized memory resources at scale without impacting performance.

cxl-1-1-helps-implement-protocols — Figure 5: CXL 1.1 helps implement CXL.io, CXL.cache, and CXL.mem protocols (Source: CXL Consortium via Venture Beat)

Conclusions

As our appetite for creating and consuming massive amounts of data continues to grow, so too will our need for increased cloud capacity to store and analyze this data. Additionally, the server connectivity backbone for data center infrastructure needs to evolve as complex AI and ML workloads become mainstream in the cloud. Astera Labs and our expanding portfolio of CXL, PCIe, and Ethernet connectivity solutions are essential to unlock the higher bandwidth, lower latencies, and deeper system insights needed to overcome performance bottlenecks holding back data-centric systems.

CXL