Building the Software Stack for AI Infrastructure 2.0: Why Standards-Based Connectivity Management Matters

How our collaboration with ASPEED and Insyde on OpenBMC support advances the vision of open, interoperable AI rack infrastructure

We’re at an inflection point in AI infrastructure. As I watch hyperscalers architect their next-generation AI racks, I see a fundamental shift happening—one that goes far beyond just faster GPUs and higher bandwidth connections. We’re witnessing the birth of AI Infrastructure 2.0: purpose-built, rack-scale platforms designed around open standards, diverse supply chains, and true vendor interoperability.

But here’s what keeps me up at night: the software layer hasn’t kept pace. While we’ve made tremendous strides in silicon performance, the management and orchestration software still feels like it’s stuck in the previous era of vendor-locked, proprietary solutions.

That’s why, today, we’re announcing strategic collaborations with ASPEED and Insyde that represent a critical step toward realizing this vision. By integrating our COSMOS software with OpenBMC through collaboration with both ASPEED’s BMC SoCs and Insyde’s system management expertise, we’re delivering something the industry desperately needs: truly standardized connectivity management that treats high-speed switches and retimers as first-class citizens in the rack management stack.

The Challenge

Picture a typical AI rack deployment today. Your team has carefully selected best-in-class components—CPUs from one vendor, GPUs from another, switches from a third, memory controllers from a fourth. Each component is exceptional in isolation, but when you try to manage them as a unified system, you hit painful realities:

API fragmentation is killing productivity – Every ODM and CSP is dealing with a multitude of vendor-specific APIs, struggling to unify device-level software features into their BMC management stack. Teams spend months building and maintaining one-off integrations, diverting scarce engineering resources away from optimizing AI workloads.

“Flying blind” on connectivity health – Data center operators lack granular telemetry on electrical margins, link training events, and critical connectivity issues through standard interfaces. When links degrade, teams can’t quickly pinpoint whether the issue is upstream, downstream, or within spec limits.

Costly downtime from undiagnosed issues – Without proper RAS capabilities, correctable link errors cascade into GPU performance degradation or system failures, translating to thousands of dollars in lost compute cycles per hour.

The Future We’re Building

This is where our three-way collaboration becomes transformational. When we talk about “software intelligence that keeps pace with silicon innovation,” we’re describing a future where:

Standards-based APIs eliminate integration churn – COSMOS integration with OpenBMC delivers standardized Redfish APIs that insulate operators from vendor-specific complexity. The D-Bus Interface Layer abstracts vendor specifics, enabling smoother integration across multiple software modules into a unified layer for greater efficiency.

Seamless Real-time telemetry via open, standard interfaces – COSMOS delivers deep link visibility and performance metrics through the OpenBMC open-source framework, giving data center operators actionable insights that integrate seamlessly with mission control and automation systems.

Proactive management prevents downtime – COSMOS provides granular link details, firmware logs, and event tracking enable corrective actions before minor errors cascade into failures – sustaining PCIe and other interconnect performance while maximizing AI infrastructure ROI.

Building an Open Ecosystem

Our collaboration with these industry leaders isn’t just about technical integration—it’s about proving that open ecosystems can deliver enterprise-grade reliability at hyperscale.

“In collaboration with Astera Labs, we’ll be delivering a standards-based management and monitoring stack on OCP hardware that hyperscalers can seamlessly integrate into monitoring systems, extend across fleet orchestration layers, and leverage to maximize ROI from AI infrastructure,” said Chris Lin, chairman and president at ASPEED. “This relationship demonstrates our commitment to an open, interoperable ecosystem that accelerates deployment and ensures reliability at rack-scale.”

“OpenBMC has emerged as the de facto standard for platform management, and our collaboration with Astera Labs extends this proven framework to next-generation connectivity fabrics,” said Jeffrey Wang, president of Insyde Software. “By enabling standardized API support for retimers and scale-up switches, we will empower our mutual customers to accelerate AI infrastructure deployments with confidence—knowing their management and monitoring stack is rooted in open standards, validated at hyperscale, and ready for the future.”

Looking Ahead
These collaborations with ASPEED and Insyde are just the beginning. Over the next 18 months, you’ll see us extending this standards-based approach across our entire Intelligent Connectivity Platform. We’re working with ecosystem collaborators to ensure that as AI Infrastructure 2.0 emerges, the software foundation is ready from development to deployment—reducing churn for our customers and partners.

The companies that will thrive in this new era are those that build for openness, interoperability, and choice. That’s the future we’re committed to delivering.

Building the Software Stack for AI Infrastructure 2.0: Why Standards-Based Connectivity Management Matters

About Michael Ocampo, Ecosystem Alliance Manager

Share:

Related Articles