Large server deployments for Artificial Intelligence (AI) and general-purpose computing in hyperscale data centers provide enormous benefits in terms of raw compute power, efficiency, and cost amortization. The on-demand nature and low up-front cost of cloud computing is attractive to an increasing number of enterprises. However, managing such a large fleet of systems presents complex challenges of observability, data collection, and fault isolation.
Astera Labs’ Intelligent Connectivity Platform which includes the COnnectivity System Management and Optimization Software (COSMOS) suite addresses those challenges by providing link management, fleet management, and reliability/availability/serviceability (RAS) features across the entire product portfolio.
Leveraging a software-defined architecture, the Intelligent Connectivity Platform consists of semiconductor-based high-speed connectivity integrated circuits (ICs) and the COSMOS suite operating in on-chip microcontrollers and system baseboard/system management controllers (BMCs/SMCs). Working in concert, these solutions provide an array of customizable diagnostics and telemetry features useful for managing and optimizing a large fleet of systems.
COSMOS: Extensive monitoring and diagnostics capabilities
The COnnectivity System Management and Optimization Software (COSMOS) suite comprises:
COSMOS provides three distinct capabilities:
Learn more: Request the white paper today
Advanced resource monitoring and fleet management have become critical in cloud environments that run AI workloads requiring parallel processing across thousands of servers. For these parallel workloads, efficiency may degrade the level of a single server running at the lowest utilization which has a significant impact on total cost of ownership (TCO) and return-on-investment (ROI) for the entire infrastructure.
Request the Cloud Infrastructure Fleet Management Made Easy with COSMOS white paper today to learn how COSMOS plays an important role in monitoring the all-important AI data center infrastructure at cloud-scale.