Andromeda has more cores than Frontier system • The Register


Waferscale AI chip and systems maker Cerebras says its Andromeda supercomputer has more compute cores than Frontier – the world’s first and only publicly verified cluster to break the exascale barrier.

However, there is a big catch: Andromeda won’t be able to perform the wide range of high-performance computing work that’s possible on Oak Ridge National Lab’s Frontier, which achieved a peak performance of 1.1 exaflops on HPC world’s standard Linpack benchmark earlier this year.

The catch is that “true” HPC requires double-precision (64-bit) floating point capabilities. The coupling of these CS-2s allows Andromeda to hit more than an exaflop of sparse 16-bit half precision (FP16) and 120 petaflops of dense FP16 – two formats used to train on deep neural networks, according to Cerebras.

Revealed today at the Supercomputing 2022 (SC22) event, Andromeda consists of 16 CS-2 systems, each of which are powered by Cerebras’s massive Wafer-Scale Engine 2 (WSE-2) chip and connected by the startup’s SwarmX interconnect fabric.

With each WSE-2 chip sporting 850,000 compute cores, Andromeda hits the 13.5 million core mark – more than the 8.7 million AMD CPU and GPU cores keeping Frontier running at the US Department of Energy’s Oak Ridge facility. Cerebras was keen to point this out out but there’s more to the issue than simple core count.

It’s far from an apples-to-apples comparison, given the architectural differences between each core type and the kinds of workloads they’re optimized for. Whereas CPUs and GPUs can handle a broader range of HPC workloads, the WSE-2 chip only supports FP16 and 32-bit single precision (FP32) formats, which means it won’t help out with chunky 64-bit double precision (FP64) math.

In an interview with The Register, Cerebras CEO Andrew Feldman had no illusions that Frontier is a more powerful machine for a wider span of applications.

“For supercomputer work, traditional supercomputer work, big simulations, trajectory analysis, it’s a better machine. It’s a bigger machine,” said Feldman, who previously tried to shepherd a new era of Arm-based server CPUs at AMD before leaving in 2014 to start Cerebras.

But he said the core comparison between Andromeda and Frontier is relevant because there are certain problems in the supercomputing world – like deep learning – that benefit from as many cores as possible, regardless of the core. And it’s no small feat getting that many cores to work together effectively.

“Our cores are smaller. Our cores are optimized for AI. Our cores don’t have 64-bit double precision. But at AI, they’re unparalleled. And 13 and a half million of them is really, really hard. And to get them to behave like a single machine on a single problem, and to be able to get access to them via a few lines on a scientific notebook, like a Jupyter Notebook, is unheard of,” he said.

And here’s the science

This is a claim apparently backed up by the DOE’s Argonne National Laboratory.

In a statement provided by Cerebras, Rick Stevens, Argonne’s associate lab director who has been the face of the much-delayed Aurora supercomputer, said Andromeda achieved “near-perfect linear scaling” when training the GPT3-XL large-language model on the COVID-19 genome across one, two, four, eight, and 16 nodes.

“Linear scaling is amongst the most sought-after characteristics of a big cluster, and Cerebras Andromeda’s delivered 15.87x throughput across 16 CS-2 systems, compared to a single CS-2, and a reduction in training time to match. Andromeda sets a new bar for AI accelerator performance,” he said.

Feldman suggested that Argonne was unable to do the same work with its Polaris supercomputer, which is powered by 2,000 Nvidia A100 GPUs. He said this was because “the GPUs were unable to do the work because of GPU memory and memory bandwidth limitations.” While Argonne hasn’t come out and said this explicitly, it can be implied from the lab’s research paper detailing its work, Feldman alleged.

“I think they were under a fair bit of political pressure. That makes Nvidia look bad. And big companies don’t like to be made to look bad,” he added.

What allows Cerebras to support models with trillions of parameters is the startup’s MemoryX technology, which, when combined with the SwarmX fabric, enables models to run in clusters of up to 192 CS-2 systems, according to Feldman.

While Cerebras has notched a good assortment of customers and made impressive performance claims, it will need to withstand the collective financial might of Nvidia, AMD, and Intel, and come out on the other side of this withering economy in which we now find ourselves. ®