Multi-Chiplet SoCs
Rivos has designed a data center-class SoC that delivers standard and parallel compute by combining Rivos’ high-performance, fully featured 64-bit RVA23 RISC-V CPU cores with cache-coherent, Rivos-designed SIMT GPGPU. A tightly integrated memory subsystem includes on-chip high-bandwidth HBM and high-capacity, server-class DDR5 RDIMMs–both directly accessible by the heterogeneous compute components. Given the importance of connectivity, multiple Ultra Ethernet NICs are integrated to give high bandwidth via a standard interconnect. Rivos SoC has been designed to deliver exceptional performance and energy efficiency on the most demanding AI and next-generation workloads.
Integrated high performance 64-bit RISC-V CPU
- Fully featured Data Center class 64-bit CPU designed for sustained throughput and compliant to the RISC-V RVA23 application processor profile and forthcoming RISC-V Server Platform Specification.
- Each CPU has dedicated large L1 caches and a cluster of 4 CPUs shares a large L2 cache. The system provides a shared Last Level Cache in front of the memory.
- Rivos SoCs can operate in a fully self-hosted configuration using the integrated CPUs as the host CPU, or can be connected as a PCIe device to a 3rd party host CPU. Rivos and Canonical have announced a partnership to support Enterprise Ubuntu Linux.
Learn more about Rivos and RISC-V
Scalable SIMT GPGPU
- Rivos GPGPU includes an array of Processing Elements (PEs) with large scratchpad memory, connected to an on-die Last Level Cache for HBM (LLCH) and on-chip HBM3e memory.
- Each PE includes a single-instruction multiple-thread (SIMT) core, L1 caches and scratch memory As well as dedicated fixed function logic blocks, including Matrix Multiplication (MatMul) engine and accumulator; DMA engine, and decode and decompression engines.
- The PE is broadly equivalent to the SM/CU of other SIMT GPUs and the MatMul is broadly equivalent to Tensor/Matrix cores.
- Rivos’ GPGPU has been designed for ease of adoption of existing frameworks and APIs without a redesign of your code.
Contact Rivos to find out more
Unified Shared Memory & Large Cache
Unified shared memory design provides fully coherent high capacity DDR5 and high bandwidth HBM3e, directly addressable by both the GPGPU Processing Elements and CPU cores, providing a number of key benefits:
- Higher HBM utilization and total memory capacity, as PEs can directly access both HBM & DDR5.
- Minimize copies and transparently move pointers between the GPGPU and CPU.
- Memory and compute level parallelism allowing maximum cache and memory utilization.
Integrated Ultra Ethernet Connectivity
- The multiple 800G NICs are compliant with the Ultra Ethernet Consortium 1.0 Specification.
- They provide hardware assisted messaging, remote memory access, and remote atomic operations using Ultra Ethernet Transport.
- UE provides a high bandwidth interface for scale up and out; directly interconnecting small clusters and using off the shelf Ethernet switches to build clusters of tens or hundreds of thousands of GPGPUs.
(Learn more link to UE Blog)
