neural processing unit

{{Short description|Hardware acceleration unit for artificial intelligence tasks}}

{{Use American English|date=January 2019}}

{{Use mdy dates|date=October 2021}}

A neural processing unit (NPU), also known as AI accelerator or deep learning processor, is a class of specialized hardware accelerator{{cite web |url=https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator |title=Intel unveils Movidius Compute Stick USB AI Accelerator |date=July 21, 2017 |access-date=August 11, 2017 |archive-url=https://web.archive.org/web/20170811193632/https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator |archive-date=August 11, 2017 }} or computer system{{cite web |url=https://insidehpc.com/2017/06/inspurs-unveils-gx4-ai-accelerator/ |title=Inspurs unveils GX4 AI Accelerator |date=June 21, 2017}}{{citation |title=Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors |last=Wiggers |first=Kyle |date=November 6, 2019 |url=https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ |publication-date=November 6, 2019 |orig-date=2019 |archive-url=https://web.archive.org/web/20200306120524/https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ |archive-date=March 6, 2020 |access-date=March 14, 2020}} designed to accelerate artificial intelligence (AI) and machine learning applications, including artificial neural networks and computer vision. Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. Their applications include algorithms for robotics, Internet of things, and data-intensive or sensor-driven tasks.{{cite web |url=https://www.eetimes.com/google-designing-ai-processors/ |title=Google Designing AI Processors|date=May 18, 2016 }} Google using its own AI accelerators. They are often manycore designs and focus on low-precision arithmetic, novel dataflow architectures, or in-memory computing capability. {{As of|2024}}, a typical AI integrated circuit chip contains tens of billions of MOSFETs.{{cite web|url=https://www.datacenterdynamics.com/en/news/nvidia-reveals-new-hopper-h100-gpu-with-80-billion-transistors/|title=Nvidia reveals new Hopper H100 GPU, with 80 billion transistors|last=Moss|first=Sebastian|date=2022-03-23|website=Data Center Dynamics|access-date=2024-01-30}}

AI accelerators are used in mobile devices such as Apple iPhones and Huawei cellphones,{{Cite web|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970|title=HUAWEI Reveals the Future of Mobile AI at IFA}} and personal computers such as Intel laptops,{{Cite web|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html|title=Intel's Lunar Lake Processors Arriving Q3 2024|website=Intel|date=May 20, 2024 }} AMD laptops{{cite web|title=AMD XDNA Architecture|url=https://www.amd.com/en/technologies/xdna.html}} and Apple silicon Macs.{{Cite web |title=Deploying Transformers on the Apple Neural Engine |url=https://machinelearning.apple.com/research/neural-engine-transformers |access-date=2023-08-24 |website=Apple Machine Learning Research |language=en-US}} Accelerators are used in cloud computing servers, including tensor processing units (TPU) in Google Cloud Platform{{Cite journal|date=2017-06-24|title=In-Datacenter Performance Analysis of a Tensor Processing Unit|journal=ACM SIGARCH Computer Architecture News|volume=45|issue=2|pages=1–12|language=EN|doi=10.1145/3140659.3080246|doi-access=free |last1=Jouppi |first1=Norman P. |last2=Young |first2=Cliff |last3=Patil |first3=Nishant |last4=Patterson |first4=David |last5=Agrawal |first5=Gaurav |last6=Bajwa |first6=Raminder |last7=Bates |first7=Sarah |last8=Bhatia |first8=Suresh |last9=Boden |first9=Nan |last10=Borchers |first10=Al |last11=Boyle |first11=Rick |last12=Cantin |first12=Pierre-luc |last13=Chao |first13=Clifford |last14=Clark |first14=Chris |last15=Coriell |first15=Jeremy |last16=Daley |first16=Mike |last17=Dau |first17=Matt |last18=Dean |first18=Jeffrey |last19=Gelb |first19=Ben |last20=Ghaemmaghami |first20=Tara Vazir |last21=Gottipati |first21=Rajendra |last22=Gulland |first22=William |last23=Hagmann |first23=Robert |last24=Ho |first24=C. Richard |last25=Hogberg |first25=Doug |last26=Hu |first26=John |last27=Hundt |first27=Robert |last28=Hurt |first28=Dan |last29=Ibarz |first29=Julian |last30=Jaffey |first30=Aaron |display-authors=1 |arxiv=1704.04760 }} and Trainium and Inferentia chips in Amazon Web Services.{{cite web | title = How silicon innovation became the 'secret sauce' behind AWS's success| website = Amazon Science| date = July 27, 2022| url = https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success| access-date = July 19, 2024}} Many vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

Graphics processing units designed by companies such as Nvidia and AMD often include AI-specific hardware, and are commonly used as AI accelerators, both for training and inference.{{cite web| last1 = Patel| first1 = Dylan| last2 = Nishball| first2 = Daniel| last3 = Xie| first3 = Myron| title = Nvidia's New China AI Chips Circumvent US Restrictions| url=https://www.semianalysis.com/p/nvidias-new-china-ai-chips-circumvent| website = SemiAnalysis| date=2023-11-09| access-date=2024-02-07}} All models of Intel Meteor Lake processors have a built-in versatile processor unit (VPU) for accelerating inference for computer vision and deep learning.{{Cite web|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips|website=PCMAG|date=August 2022 }}

References

{{Reflist|32em}}