AI Inferencing: Recommendation System
1.7x Boost in AI Inferencing Performance with CXL
Memory Bottleneck for CPU-based Recommendation System
Without CXL
Challenge:
- DLRM model sizes continue to grow beyond the capacity of a single system
- Inefficient power utilization with sparse data across distributed systems
- Limited memory bandwidth for embedding table lookup operations
- Lack of software support for memory tiering
Memory Expansion for CPU-based Recommendation System
With CXL
Solution:
- Increases memory capacity by 133% and memory bandwidth by 66%
- Boosts AI inferencing performance by 73%
- Leo provides more memory bandwidth for each AI inferencing system
- Lowers power consumption with consolidated server fleet