DeepSpeed

{{short description|Microsoft open source library}}

{{Infobox software

| name = DeepSpeed

| logo = DeepSpeed logo.svg

| screenshot =

| screenshot size =

| caption =

| author = Microsoft Research

| developer = Microsoft

| released = {{Start date and age|2020|05|18}}

| latest release version = v0.16.5

| latest release date = {{Start date and age|2025|03|27}}

| repo = {{URL|https://github.com/microsoft/DeepSpeed}}

| programming language = Python, CUDA, C++

| operating system =

| genre = Software library

| license = Apache License 2.0

| website = {{URL|deepspeed.ai}}

}}

DeepSpeed is an open source deep learning optimization library for PyTorch.{{Cite web|url=https://uk.pcmag.com/news-analysis/127085/microsoft-updates-windows-azure-tools-with-an-eye-on-the-future|title=Microsoft Updates Windows, Azure Tools with an Eye on The Future|date=May 22, 2020|website=PCMag UK}}

Library

The library is designed to reduce computing power and memory use and to train large distributed models with better parallelism on existing computer hardware.{{Cite web|url=https://www.infoworld.com/article/3526449/microsoft-speeds-up-pytorch-with-deepspeed.html|title=Microsoft speeds up PyTorch with DeepSpeed|first=Serdar|last=Yegulalp|date=February 10, 2020|website=InfoWorld}}{{Cite web|url=https://www.neowin.net/news/microsoft-unveils-fifth-most-powerful-supercomputer-in-the-world/|title=Microsoft unveils "fifth most powerful" supercomputer in the world|website=Neowin|date=18 June 2023 }} DeepSpeed is optimized for low latency, high throughput training. It includes the Zero Redundancy Optimizer (ZeRO) for training models with 1 trillion or more parameters.{{Cite web|url=https://venturebeat.com/2020/02/10/microsoft-trains-worlds-largest-transformer-language-model/|title=Microsoft trains world's largest Transformer language model|date=February 10, 2020}} Features include mixed precision training, single-GPU, multi-GPU, and multi-node training as well as custom model parallelism. The DeepSpeed source code is licensed under MIT License and available on GitHub.{{Cite web|url=https://github.com/microsoft/DeepSpeed|title=microsoft/DeepSpeed|date=July 10, 2020|via=GitHub}}

The team claimed to achieve up to a 6.2x throughput improvement, 2.8x faster convergence, and 4.6x less communication.{{Cite web|date=2021-05-24|title=DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression|url=https://www.microsoft.com/en-us/research/blog/deepspeed-accelerating-large-scale-model-inference-and-training-via-system-optimizations-and-compression/|access-date=2021-06-19|website=Microsoft Research|language=en-US}}

See also

References

{{Reflist}}

Further reading

  • {{cite arXiv|last1=Rajbhandari|first1=Samyam|last2=Rasley|first2=Jeff|last3=Ruwase|first3=Olatunji|last4=He|first4=Yuxiong|title=ZeRO: Memory Optimization Towards Training A Trillion Parameter Models|year=2019|class=cs.LG |eprint=1910.02054}}