News

Microsoft’s DeepSpeed library can train trillion-parameter AI models with fewer GPUs

Microsoft released an upgraded version of its DeepSpeed library for training highly complex AI models. The company claims that the new approach being used, known as 3D parallelism, is able to adapt to the varying needs of workload requirements to power very large models while balancing scaling efficiency.

Massive AI models comprising of billions of parameters have made it possible to improve efficiency in various domains. According to studies, such models are excellent performers because they can absorb the nuances of language, grammar, concepts, and context. This enables them to perform a range of tasks from summarizing speeches to generating code by browsing GitHub.

However, these training models require massive computational resources. A 2018 OpenAI analysis shows that between 2012 and 2018, the amount of computing power used in the largest AI training runs grew more than 300,000 times, which means it has doubled almost every 3.5 months.

This is where DeepSpeed comes in, as it is capable of training AI models consisting of up to trillions of parameters. It mainly utilizes three techniques to this end: data parallel training, model parallel training, and pipeline parallel training.

Now, conventionally, training a trillion-parameter model would require the memory storage of at least 400 NVIDIA A100 GPUs, each of which has a capacity of 40GB. In fact, Microsoft estimates that 4000 A100s running at 50% capacity for 100 days would complete the training for such a complex model.

DeepSpeed is able to make this process much simpler by dividing larger models into smaller components between four pipeline stages. Layers within each pipeline stage are further divided between four “worker” components, which do the actual job of training. Moreover, each pipeline is replicated across two data-parallel instances, and the workers are mapped to multi-GPU systems. These innovative techniques, and other performance improvements, allow DeepSpeed to train a trillion-parameter model across as few as 800 NVIDIA V100 GPUs.

These [new techniques in DeepSpeed] offer extreme compute, memory, and communication efficiency, and they power model training with billions to trillions of parameters,” Microsoft wrote in a blog post. “The technologies also allow for extremely long input sequences and power on hardware systems with a single GPU, high-end clusters with thousands of GPUs, or low-end clusters with very slow ethernet networks … We [continue] to innovate at a fast rate, pushing the boundaries of speed and scale for deep learning training.

 

Sponsored
Hamza Zakir

Platonist. Humanist. Unusually edgy sometimes.

Share
Published by
Hamza Zakir

Recent Posts

Rumors Indicate iPhone 17 May Feature Unconventional Camera Design

Apple is reportedly preparing for a significant design overhaul with its iPhone 17 series, blending…

2 hours ago

First AI-Powered Teacher Launched in Pakistan’s Private School

Karachi: A private school in Karachi has unveiled Pakistan’s first AI-powered teacher, a groundbreaking move…

3 hours ago

Yahoo Surprises Users with Its Latest Android Launcher

Third-party apps have long been a staple of the Android ecosystem, but their appeal has…

4 hours ago

Phase-II Review of PTCL-Telenor Deal Finalized by CCP

ISLAMABAD: The Competition Commission of Pakistan (CCP) has completed its Phase-II review of Pakistan Telecommunication…

5 hours ago

Xiaomi’s SU7 Achieves New Production Record, Driving Q3 Growth

Xiaomi has shattered records by producing 100,000 vehicles in just 230 days. This is nearly…

6 hours ago

Teachers Can Now Access OpenAI’s Free AI Course

OpenAI, in collaboration with nonprofit organization Common Sense Media, announced on Wednesday the launch of…

7 hours ago