News

Microsoft’s DeepSpeed library can train trillion-parameter AI models with fewer GPUs

Microsoft released an upgraded version of its DeepSpeed library for training highly complex AI models. The company claims that the new approach being used, known as 3D parallelism, is able to adapt to the varying needs of workload requirements to power very large models while balancing scaling efficiency.

Massive AI models comprising of billions of parameters have made it possible to improve efficiency in various domains. According to studies, such models are excellent performers because they can absorb the nuances of language, grammar, concepts, and context. This enables them to perform a range of tasks from summarizing speeches to generating code by browsing GitHub.

However, these training models require massive computational resources. A 2018 OpenAI analysis shows that between 2012 and 2018, the amount of computing power used in the largest AI training runs grew more than 300,000 times, which means it has doubled almost every 3.5 months.

This is where DeepSpeed comes in, as it is capable of training AI models consisting of up to trillions of parameters. It mainly utilizes three techniques to this end: data parallel training, model parallel training, and pipeline parallel training.

Now, conventionally, training a trillion-parameter model would require the memory storage of at least 400 NVIDIA A100 GPUs, each of which has a capacity of 40GB. In fact, Microsoft estimates that 4000 A100s running at 50% capacity for 100 days would complete the training for such a complex model.

DeepSpeed is able to make this process much simpler by dividing larger models into smaller components between four pipeline stages. Layers within each pipeline stage are further divided between four “worker” components, which do the actual job of training. Moreover, each pipeline is replicated across two data-parallel instances, and the workers are mapped to multi-GPU systems. These innovative techniques, and other performance improvements, allow DeepSpeed to train a trillion-parameter model across as few as 800 NVIDIA V100 GPUs.

These [new techniques in DeepSpeed] offer extreme compute, memory, and communication efficiency, and they power model training with billions to trillions of parameters,” Microsoft wrote in a blog post. “The technologies also allow for extremely long input sequences and power on hardware systems with a single GPU, high-end clusters with thousands of GPUs, or low-end clusters with very slow ethernet networks … We [continue] to innovate at a fast rate, pushing the boundaries of speed and scale for deep learning training.

 

Sponsored
Hamza Zakir

Platonist. Humanist. Unusually edgy sometimes.

Share
Published by
Hamza Zakir

Recent Posts

Greentree Holdings Ltd Aims for 35% Stake in TRG Pakistan

AKD Securities, the manager of the offer, informed the main stock exchange on Monday that…

11 seconds ago

PTV Faces Criticism Over Misleading Chemotherapy Statements

ISLAMABAD: On Pakistan Television (PTV), medical experts raised serious concerns over false information on chemotherapy…

29 mins ago

OpenAI Rolls Out Advanced Voice Mode for macOS ChatGPT App

OpenAI has introduced Advanced Voice Mode to ChatGPT's desktop applications for macOS apps, enabling users…

40 mins ago

Garena Free Fire India Launch Rumors: What Fans Need to Know

Reports suggest that Garena Free Fire is set to make a much-anticipated return to India.…

19 hours ago

Albania Bans TikTok for One Year: Here’s the Reason!

The Albanian government has announced a ban on the social media platform TikTok for a…

23 hours ago

Google Pixel 9 Pro vs. 8 Pro: Biggest Upgrades Compared

The launch of Google’s latest Pixel lineup brings an exciting chance to compare the new…

1 day ago