Unlocking Next-Generation AI: The Critical Role of Compute, Memory, and Interconnect
Last week, I joined a panel discussion exploring a topic that’s shaping the future of AI: the need to refocus on the interplay between compute, memory, and interconnect technologies. Too often, these elements are treated in isolation, yet their synergy, or lack of, defines how efficiently modern workloads run and how effectively organizations can extract value from their infrastructure. Aligning these three pillars is essential for removing the bottlenecks that hold back next-generation systems.
System Bottlenecks: Beyond Compute Alone
One key takeaway from the discussion was that bottlenecks are not one-size-fits-all. They vary significantly based on the installation type and workload. Enterprise on-premises environments face very different challenges from cloud service providers, but both share a common priority: extending the lifespan of investments and reducing total cost of running services.
AI workloads, in particular, are evolving rapidly. While compute performance often dominates the conversation, it’s only one piece of the puzzle. True performance comes from balancing compute power with memory capacity and interconnect bandwidth. As AI evolves from current solutions to delivering its full value through reasoning and agentic systems, these bottlenecks will shift again. Flexibility and programmability across the system stack will be critical to staying ahead of these changes.
Performance vs. Power: Striking the Right Balance
At Rivos, our focus is on balancing compute elements for both performance and efficiency. Different tasks require different architectures: CPUs are well-suited for prompt management and query processing, while GPGPUs excel at database searches and high-speed LLM processing. Efficiency isn’t solely about benchmark numbers but about sustained performance for real workloads, while managing system-level power consumption.
Key to this is a level of coherency between the CPUs and GPGPUs, which allows data to be shared seamlessly between compute elements. This reduces unnecessary data movement, improving responsiveness while cutting power use. But compute alone isn’t enough: tightly integrating the right level and type of memory with compute avoids expensive, energy-intensive data transfers that slow system responses.
Hardware Flexibility for Evolving Workloads
Programmable architectures play a vital role in enabling systems to flex as model requirements change. Running multiple model types on a single solution maximizes ROI, not just at the time of purchase but throughout the lifetime of the deployment. Many customers need infrastructure that can shift between inference, fine-tuning, and training without a full rebuild.
Flexible memory architectures and scalable compute options are equally important. Keeping a variety of memory types close to compute resources reduces recompute overhead and energy waste, while right-sized compute avoids overprovisioning that drives up both capital and operational costs. The balance between compute and memory is at the heart of future-proofing AI infrastructure.
Chiplets: Reshaping SoC design
Chiplets are reshaping hardware design by offering greater flexibility, improved yields, and heterogeneous integration. They allow us to build bigger and more cost effective systems while mixing and matching technologies to fit specific use cases. At Rivos, this heterogeneity lets us also integrate multiple types of memory seamlessly, with minimal software overhead, enabling us to deliver scalable solutions.
Standards like UltraEthernet, PCIe and UCIe are vital for addressing these issues. By standardizing key elements, we can select the best IP for each use case, push performance where it matters most, and work from a stable platform that benefits the whole industry. Similarly, RISC-V’s open architecture allows us to innovate while maintaining the compatibility and software support needed for broad adoption. Rivos has been a key contributor to drive server-class specifications for the RISC-V community, including driving the RVA23 profile.
The Shift Toward System-Level Optimization
Across industries, system-level optimization has consistently overtaken component-level gains, and AI will be no exception. The winning solutions won’t simply have the fastest compute or the largest memory; they’ll be designed holistically to deliver balanced, efficient performance across the entire stack.
Achieving this requires open standards and collaborative engineering. As the industry diversifies to meet domain-specific needs and adapt to evolving enterprise requirements, co-design between hardware and software will be the key to unlocking the next generation of AI.