Technology

Microsoft introduces AI accelerator for US Azure prospects


Microsoft has introduced that Azure’s US central datacentre area is the primary to obtain a brand new synthetic intelligence (AI) inference accelerator, Maia 200.

Microsoft describes Maia 200 as an inference powerhouse, constructed on TSMC 3nm course of with native FP8/FP4 (floating level) tensor cores, a redesigned reminiscence system that makes use of 21 6GB of the newest high-speed reminiscence structure (HBM3e). That is able to transferring knowledge at 7TB per second. Maia additionally gives 272MB of on-chip reminiscence plus knowledge motion engines, which Microsoft stated is used to maintain large fashions fed, quick and extremely utilised.

In keeping with the corporate, these {hardware} options imply Maia 200 is able to delivering thrice the FP4 efficiency of the third era Amazon Trainium, and FP8 efficiency above Google’s seventh-generation tensor processing unit. Microsoft stated Maia 200 represents its best inference system but, providing 30% higher price efficiency over current techniques, however on the time of writing, it was unable to offer a date as to when the product could be out there exterior of the US.

Together with its US Central datacentre area, Microsoft additionally introduced that its US West 3 datacentre area close to Phoenix, Arizona would be the subsequent to be up to date with Maia 200.

In a weblog put up describing how Maia 200 is being deployed, Scott Guthrie, Microsoft govt vice-president for cloud and AI, stated the setup includes racks of trays configured with 4 Maia accelerators. Every tray is absolutely linked with direct, non‑switched hyperlinks, to maintain excessive‑bandwidth communication native for optimum inference effectivity.

He stated the identical communication protocol is used for intra-rack and inter-rack networking utilizing the Maia AI transport protocol to offer a option to scale clusters of Maia 200 accelerators with minimal community hops.

“This unified cloth simplifies programming, improves workload flexibility and reduces stranded capability whereas sustaining constant efficiency and value effectivity at cloud scale,” added Guthrie.

Guthrie stated Maia 200 introduces a brand new sort of two-tier scale-up design constructed on normal ethernet. “A customized transport layer and tightly built-in NIC [network interface card] unlocks efficiency, robust reliability and vital price benefits with out counting on proprietary materials,” he added.

In apply, this implies every accelerator provides as much as 1.4TB per second of devoted scale-up bandwidth and, in line with Guthrie, allows Microsoft to offer predictable, high-performance collective operations throughout clusters of as much as 6,144 accelerators.

What this all means, a minimum of from Guthrie’s perspective, is that the Maia 200 structure is able to delivering scalable efficiency for dense inference clusters whereas decreasing energy utilization and general whole price of possession throughout Azure’s world fleet of datacentres.

On the software program facet, he stated a classy simulation pipeline was used to information the Maia 200 structure from its earliest levels. The pipeline concerned modelling the computation and communication patterns of massive language fashions with excessive constancy.

“This early co-development setting enabled us to optimise silicon, networking and system software program as a unified entire – lengthy earlier than first silicon,” stated Guthrie, including that Microsoft additionally developed a major emulation setting, which was used from low-level kernel validation all the way in which to full mannequin execution and efficiency tuning.

As a part of the roll-out, the corporate is providing AI builders a preview of the Maia 200 software program developer’s equipment.