Cerebras Unveils Andromeda, a 13.5 Million Core AI Supercomputer
Cerebras Systems unveiled Andromeda, a 13.5 million core AI supercomputer, now available and being used for commercial and academic work. Built with a cluster of 16 Cerebras CS-2 systems and leveraging Cerebras MemoryX and SwarmX technologies, Andromeda delivers more than 1 Exaflop of AI compute and 120 Petaflops of dense compute at 16-bit half precision. It is the only AI supercomputer to ever demonstrate near-perfect linear scaling on large language model workloads relying on simple data parallelism alone.
With more than 13.5 million AI-optimized compute cores and fed by 18,176 3rd Gen AMD EPYC processors, Andromeda features more cores than 1,953 Nvidia A100 GPUs and 1.6 times as many cores as the largest supercomputer in the world, Frontier, which has 8.7 million cores. Unlike any known GPU-based cluster, Andromeda delivers near-perfect scaling via simple data parallelism across GPT-class large language models, including GPT-3, GPT-J, and GPT-NeoX.
Near-perfect scaling means that as additional CS-2s are used, training time is reduced in near-perfect proportion. This includes large language models with very large sequence lengths, a task that is impossible to achieve on GPUs. In fact, GPU's impossible work was demonstrated by one of Andromeda’s first users, who achieved near-perfect scaling on GPT-J at 2.5 billion and 25 billion parameters with long sequence lengths — MSL of 10,240. The users attempted to do the same work on Polaris, a 2,000 Nvidia A100 cluster, and the GPUs were unable to do the work because of GPU memory and memory bandwidth limitations.
Andromeda’s near-perfect scaling across the largest natural language processing models is made possible by the second-generation Cerebras Wafer Scale Engine (WSE-2), the industry’s largest and most powerful processor, and by MemoryX and Swarm X technologies. MemoryX enables even a single CS-2 to support multi-trillion parameter models. SwarmX technology links MemoryX to a cluster of CS-2s. Together these industry-leading technologies enable Cerebras’ large clusters to avoid two of the major challenges plaguing traditional clusters used for modern AI work: the complexity of parallel programming and the performance degradation of distributed computing.
The 16 CS-2s powering Andromeda run in a strictly data parallel mode, enabling simple and easy model distribution, and single-keystroke scaling from 1 to 16 CS-2s. Sending AI jobs to Andromeda can be done quickly and painlessly from a Jupyter notebook, and users can switch from one model to another with a few keystrokes. Because the Cerebras WSE-2 processor, at the heart of its CS-2s, has 1,000 times more memory bandwidth than a GPU, Andromeda can harvest structured and unstructured sparsity as well as static and dynamic sparsity. These are things other hardware accelerators, including GPUs, simply can’t do. The result is that Cerebras can train models over 90% sparse to state-of-the-art accuracy.
Andromeda can be used simultaneously by multiple users. Users can easily specify how many of Andromeda’s CS-2s they want to use within seconds. This means Andromeda can be used as a 16 CS-2 supercomputer cluster for a single user working on a single job, or 16 individual CS-2 systems for sixteen distinct users with sixteen distinct jobs, or any combination in between.
Andromeda is deployed in Santa Clara, California, in 16 racks at Colovore, a leading high-performance data center. The 16 CS-2 systems, with a combined 13.5 million AI-optimized cores are fed by 284 64-core AMD 3rd Gen EPYC processors. The SwarmX fabric, which links the MemoryX parameter storage solution to the 16 CS-2s, provides more than 96.8 terabits of bandwidth. Through gradient accumulation, Andromeda can support all batch sizes.