Cerebras Unveils Andromeda, a 13.5 Million Core AI Supercomputer

Cerebras Unveils Andromeda, a 13.5 Million Core AI Supercomputer

Foto: Business Wire

Cerebras Systems unveiled Andromeda, a 13.5 million core AI supercomputer, now available and being used for commercial and academic work. Built with a cluster of 16 Cerebras CS-2 systems and leveraging Cerebras MemoryX and SwarmX technologies, Andromeda delivers more than 1 Exaflop of AI compute and 120 Petaflops of dense compute at 16-bit half precision. It is the only AI supercomputer to ever demonstrate near-perfect linear scaling on large language model workloads relying on simple data parallelism alone.

With more than 13.5 million AI-optimized compute cores and fed by 18,176 3rd Gen AMD EPYC processors, Andromeda features more cores than 1,953 Nvidia A100 GPUs and 1.6 times as many cores as the largest supercomputer in the world, Frontier, which has 8.7 million cores. Unlike any known GPU-based cluster, Andromeda delivers near-perfect scaling via simple data parallelism across GPT-class large language models, including GPT-3, GPT-J, and GPT-NeoX.

Near-perfect scaling means that as additional CS-2s are used, training time is reduced in near-perfect proportion. This includes large language models with very large sequence lengths, a task that is impossible to achieve on GPUs. In fact, GPU's impossible work was demonstrated by one of Andromeda’s first users, who achieved near-perfect scaling on GPT-J at 2.5 billion and 25 billion parameters with long sequence lengths — MSL of 10,240. The users attempted to do the same work on Polaris, a 2,000 Nvidia A100 cluster, and the GPUs were unable to do the work because of GPU memory and memory bandwidth limitations.

Andromeda’s near-perfect scaling across the largest natural language processing models is made possible by the second-generation Cerebras Wafer Scale Engine (WSE-2), the industry’s largest and most powerful processor, and by MemoryX and Swarm X technologies. MemoryX enables even a single CS-2 to support multi-trillion parameter models. SwarmX technology links MemoryX to a cluster of CS-2s. Together these industry-leading technologies enable Cerebras’ large clusters to avoid two of the major challenges plaguing traditional clusters used for modern AI work: the complexity of parallel programming and the performance degradation of distributed computing.

The 16 CS-2s powering Andromeda run in a strictly data parallel mode, enabling simple and easy model distribution, and single-keystroke scaling from 1 to 16 CS-2s. Sending AI jobs to Andromeda can be done quickly and painlessly from a Jupyter notebook, and users can switch from one model to another with a few keystrokes. Because the Cerebras WSE-2 processor, at the heart of its CS-2s, has 1,000 times more memory bandwidth than a GPU, Andromeda can harvest structured and unstructured sparsity as well as static and dynamic sparsity. These are things other hardware accelerators, including GPUs, simply can’t do. The result is that Cerebras can train models over 90% sparse to state-of-the-art accuracy.

Andromeda can be used simultaneously by multiple users. Users can easily specify how many of Andromeda’s CS-2s they want to use within seconds. This means Andromeda can be used as a 16 CS-2 supercomputer cluster for a single user working on a single job, or 16 individual CS-2 systems for sixteen distinct users with sixteen distinct jobs, or any combination in between.

Andromeda is deployed in Santa Clara, California, in 16 racks at Colovore, a leading high-performance data center. The 16 CS-2 systems, with a combined 13.5 million AI-optimized cores are fed by 284 64-core AMD 3rd Gen EPYC processors. The SwarmX fabric, which links the MemoryX parameter storage solution to the 16 CS-2s, provides more than 96.8 terabits of bandwidth. Through gradient accumulation, Andromeda can support all batch sizes.

More from category

Germany Approves Bosch and Mercedes-Benz’s Driverless Parking System

Germany Approves Bosch and Mercedes-Benz’s Driverless Parking System

3 Dec 2022 comment

Bosch and Mercedes-Benz have reached an important milestone on the way to automated driving. Germany’s Federal Motor Transport Authority (KBA) has approved their highly automated parking system for use in the P6 parking garage run by APCOA at Stuttgart Airport.

Automated Driving with 5G Network Slicing and Quality of Service

Automated Driving with 5G Network Slicing and Quality of Service

30 Nov 2022 comment

Deutsche Telekom, BMW, Valeo, Ericsson, and Qualcomm announced the world's first demonstration of an automated driving application supported by 5G Standalone (SA) network slicing with controlled network features for Quality of Service (QoS).

A New European Supercomputer Was Inaugurated in Italy

A New European Supercomputer Was Inaugurated in Italy

28 Nov 2022 comment

The European Commission together with the European High-Performance Computing Joint Undertaking (EuroHPC JU), the Italian Ministry of Universities and Research, and the CINECA consortium inaugurated Europe's latest supercomputer.