UnifabriX Uses CXL To Improve HPC Performance

0
13

CXL promises to remake the way computing systems are architected. It runs on PCIe and can extend the memory on individual CPUs, but its biggest promise is in providing network arbitrated memory pools that can allocate some higher latency memory as required to CPUs or to software defined virtual machines. CXL-based products are starting to appear in the market in 2023.

CXL looks to remake data centers but the advantages of a higher latency memory for use in high performance computing (HPC) applications has not been evident, at least until UnifabriX demonstrated bandwidth and capacity advantages with their CXL-based smart memory node at the 2022 Super Computing Conference (SC22). There is a just released video showing UnifabriX demonstrations for memory and storage HPC applications showing HPC advantages.

UnifabriX says that the product is based upon its Resource Processing Unit (RPU). The RPU is in built into its CXL Smart Memory Node, shown below. This is a 2U rack-mounted server with serviceable EDSFF E3 media bays. The product contains up to 64TB capacity in DDR5/DDR4 memory and NVMe SSDs.

The company says the product is compliant with CXL 1.1 and 2.0 and works on PCIe Gen5. They also says it is CXL 3.0 ready and supports both PCIe Gen5 and CXL expansion. It also supports NVMe SSD access through CXL (SSD CXL over Memory). The product is meant for use in bare-metal and virtualized environments over a wide range of applications, including HPC, AI and databases.

As with other CXL products, the memory node offers expanded memory, but it can also provide higher performance. In particular, at the 2022 Super Computer Conference (SC22) the memory node was used to running an HPCG performance benchmark versus the benchmark without help from the memory node. The results are shown below.

For the conventional HPCG benchmark, as the number of CPU cores processing the benchmark increases, initially the performance increases roughly linearly with the number of processor cores. However, by about 50 CPU cores the performance flattens out without any performance improvements as the number of cores increases. By the time you get to 100 cores available, only 50 cores are being used. This is because there is no additional memory bandwidth available.

If the memory node is added to provide additional CXL memory in addition to the memory directly connected to the CPU cores, we see that scaling of performance with cores can continue. The memory node improves overall HPCG performance by moving lower priority data from the CPU near memory to the CXL far memory. This prevents saturating the near memory and allows continuous scaling of performance with additional processor cores. As shown above the memory node improved HPCG benchmark performance by more than 26%.

The company has worked closely with Intel on its CXL solution and Intel mentions these results as well as other 3rd party testing in its recent product brief about it Infrastructure Processing Unites (IPUs) (Intel Agilex FPGA Accelerators Bring Improved TCO, Performance and Flexibility to 4th Gen Intel Xeon Platforms).

In addition to providing memory capacity and bandwidth enhancements, the memory node can also provide NVMe SSD access through CXL as well. The company says that their plans are to include memory, storage and networking through the CXL/PCIe interface, hence the name unifabriX. With networking included as well their boxes could replace top of rack (TOR) solutions as well as provide memory and storage access.

The UnifabriX memory node, utilizing the company’s Resource Processing Unit, provides a path to overcome direct connect DRAM bandwidth limitations in HPC applications using shared CXL memory.

Source