Manycore processor

{{Short description|Multi-core processor with a large number of cores}}

Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores (from a few tens of cores to thousands or more). Manycore processors are used extensively in embedded computers and high-performance computing.

Contrast with multicore architecture

{{Noref|section|date=December 2022}}

Manycore processors are distinct from multi-core processors in being optimized from the outset for a higher degree of explicit parallelism, and for higher throughput (or lower power consumption) at the expense of latency and lower single-thread performance.

The broader category of multi-core processors, by contrast, are usually designed to efficiently run both parallel and serial code, and therefore place more emphasis on high single-thread performance (e.g. devoting more silicon to out-of-order execution, deeper pipelines, more superscalar execution units, and larger, more general caches), and shared memory. These techniques devote runtime resources toward figuring out implicit parallelism in a single thread. They are used in systems where they have evolved continuously (with backward compatibility) from single core processors. They usually have a 'few' cores (e.g. 2, 4, 8) and may be complemented by a manycore accelerator (such as a GPU) in a heterogeneous system.

Motivation

Cache coherency is an issue limiting the scaling of multicore processors. Manycore processors may bypass this with methods such as message passing,{{cite web|url=https://cseweb.ucsd.edu/classes/fa12/cse291-c/talks/SCC-80-core-cern.pdf|title=The Future of Many Core Computing: A tale of two processors|last=Mattson|first=Tim|date=January 2010}} scratchpad memory, DMA,{{cite web|url=http://meseec.ce.rit.edu/756-projects/spring2006/d2/6/cell-architecture-final.pdf|title=IBM Cell Processor|last1=Hendry|first1=Gilbert|last2=Kretschmann|first2=Mark}} partitioned global address space,{{cite arXiv|title=Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany|eprint=1412.5538|last1=Olofsson|first1=Andreas|last2=Nordström|first2=Tomas|last3=Ul-Abdin|first3=Zain|class=cs.AR|year=2014}} or read-only/non-coherent caches. A manycore processor using a network on a chip and local memories gives software the opportunity to explicitly optimise the spatial layout of tasks (e.g. as seen in tooling developed for TrueNorth).{{cite web|url=https://www.youtube.com/watch?v=6O6igM4lMDc |archive-url=https://ghostarchive.org/varchive/youtube/20211221/6O6igM4lMDc |archive-date=2021-12-21 |url-status=live|title=IBM SyNAPSE Deep Dive Part 3|last=Amir|first=Arnon|date=June 11, 2015|publisher=IBM Research}}{{cbignore}}

Manycore processors may have more in common (conceptually) with technologies originating in high-performance computing such as clusters and vector processors.{{cite web|title=cell architecture|url=http://www.blachford.info/computer/Cell/Cell1_v2.html}}"The Cell architecture is like nothing we have ever seen in commodity microprocessors, it is closer in design to multiprocessor vector supercomputers"

GPUs may be considered a form of manycore processor having multiple shader processing units, and only being suitable for highly parallel code (high throughput, but extremely poor single thread performance).

Programming models

  • Message passing interface
  • OpenCL{{citation |url= http://www.eetimes.com/electronics-news/4217092/OEMs-show-systems-with-Intel-MIC-chips |title= OEMs show systems with Intel MIC chips |author= Rick Merritt |date= June 20, 2011 |publisher= EE Times |work= www.eetimes.com}} or other APIs supporting compute kernels
  • Partitioned global address space
  • Actor model
  • OpenMP{{cite conference |last1=Barker |first1=J |last2=Bowden |first2=J |year=2013 |title=Manycore Parallelism through OpenMP |conference=IWOMP |book-title=OpenMP in the Era of Low Power Devices and Accelerators |publisher=Springer |series=Lecture Notes in Computer Science, vol 8122|doi=10.1007/978-3-642-40698-0_4 }}
  • Dataflow

Classes of manycore systems

Specific manycore architectures

  • ZettaScaler [https://en.wikichip.org/wiki/zettascaler], Japanese PEZY Computing 2,048-core modules
  • Xeon Phi coprocessor, which has MIC (Many Integrated Cores) architecture
  • Tilera
  • Adapteva Epiphany Architecture, a manycore chip using PGAS scratchpad memory
  • Coherent Logix hx3100 Processor, a 100-core DSP/GPP processor based on HyperX Architecture
  • Movidius Myriad 2, a manycore vision processing unit (VPU)
  • Kalray, a manycore PCI-e accelerator for data-intensive tasks
  • Teraflops Research Chip, a manycore processor using message passing
  • TrueNorth, an AI accelerator with a manycore network on a chip architecture
  • Green arrays, a manycore processor using message passing aimed at low power applications
  • Sunway SW26010, a 260-core manycore processor used in the then top 1 supercomputer Sunway TaihuLight
  • SW52020, an improved 520-core{{Cite web|last=Morgan|first=Timothy Prickett|date=2021-02-10|title=A First Peek At China's Sunway Exascale Supercomputer|url=http://www.nextplatform.com/2021/02/10/a-sneak-peek-at-chinas-sunway-exascale-supercomputer/|access-date=2021-11-18|website=The Next Platform|language=en-US}}{{Cite web|last=Hemsoth|first=Nicole|date=2021-04-19|title=China's Exascale Prototype Supercomputer Tests AI Workloads|url=http://www.nextplatform.com/2021/04/19/chinas-exascale-prototype-supercomputer-tests-ai-workloads/|access-date=2021-11-18|website=The Next Platform|language=en-US}} variant of SW26010, with 512-bit SIMD (also adding support for half-precision), used in a prototype, meant for an exascale system (and in the future 10 exascale system), and according to datacenterdynamics China is rumored to already have two separate exascale systems secretly{{Citation needed|date=October 2023}}
  • Eyeriss, a manycore processor designed for running convolutional neural nets for embedded vision applications{{cite web|url=https://www.mit.edu/~sze/eyeriss.html|title=Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks|author1=Chen, Yu-Hsin |author2=Krishna, Tushar |author3=Emer, Joel |author4=Sze, Vivienne |work=IEEE International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers|year=2016|pages=262–263}}
  • Graphcore, a manycore AI accelerator

Specific manycore computers with 1M+ CPU cores

A number of computers built from multicore processors have one million or more individual CPU cores. Examples include:

Specific computers with 5 million or more CPU cores

Quite a few supercomputers have over 5 million CPU cores. When there are also coprocessors, e.g. GPUs used with, then those cores are not listed in the core-count, then quite a few more computers would hit those targets.

  • Sunway TaihuLight, a massively parallel (10 million CPU cores) Chinese supercomputer, once one of the fastest supercomputers in the world, using a custom manycore architecture.{{Citation needed|date=December 2018}} As of November 2018, it was the world's third fastest supercomputer (as ranked by the TOP500 list), obtaining its performance from 40,960 SW26010 manycore processors, each containing 256 cores.

See also

References

{{Reflist}}