ARM Cortex-A72

{{Short description|Central processing unit}}

{{Infobox CPU

|name = ARM Cortex-A72

|pcode1 = Maya

|image =

|image_size =

|caption =

|produced-start = 2016

|produced-end =

|slowest =

|fastest =

|slow-unit =

|fast-unit =

|fsb-slowest =

|fsb-fastest =

|fsb-slow-unit =

|fsb-fast-unit =

|size-from = 16 nm

|size-to =

|soldby =

|designfirm = ARM Holdings

|manuf1 =

|core1 =

|sock1 =

|pack1 =

|brand1 =

|arch = ARMv8-A

|cpuid =

|code =

|numcores = 1–4 per cluster, multiple clusters

|l1cache = 80 KiB (48 KiB I-cache with parity, 32 KiB D-cache with ECC) per core

|l2cache = 512 KiB to 4 MiB

|l3cache = None

|application =

|predecessor = ARM Cortex-A57

|successor = ARM Cortex-A73

}}

The ARM Cortex-A72 is a central processing unit implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings' Austin design centre. The Cortex-A72 is a 3-way decode out-of-order superscalar pipeline.{{cite web | url=http://www.arm.com/products/processors/cortex-a/cortex-a72-processor.php | title=Cortex-A72 Processor | publisher=ARM Holdings | access-date=2014-02-02}} It is available as SIP core to licensees, and its design makes it suitable for integration with other SIP cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC). The Cortex-A72 was announced in 2015 to serve as the successor of the Cortex-A57, and was designed to use 20% less power or offer 90% greater performance.{{cite news|last1=Frumusanu|first1=Andrei|title=ARM Announces Cortex-A72, CCI-500, and Mali-T880|url=http://www.anandtech.com/show/8957/arm-announces-cortex-a72|access-date=29 March 2017|publisher=Anandtech|date=3 February 2015}}{{cite news|last1=Frumusanu|first1=Andrei|title=ARM Reveals Cortex-A72 Architecture Details|url=http://www.anandtech.com/show/9184/arm-reveals-cortex-a72-architecture-details|access-date=29 March 2017|publisher=Anandtech|date=23 April 2015}}

Overview

  • Pipelined processor with deeply out-of-order, speculative issue 3-way superscalar execution pipeline
  • DSP and NEON SIMD extensions are mandatory per core
  • VFPv4 Floating Point Unit onboard (per core)
  • Hardware virtualization support
  • Thumb-2 instruction set encoding reduces the size of 32-bit programs with little impact on performance.
  • TrustZone security extensions
  • Program Trace Macrocell and CoreSight Design Kit for unobtrusive tracing of instruction execution
  • 32 KiB data (2-way set-associative) + 48 KiB instruction (3-way set-associative) L1 cache per core
  • Integrated low-latency level-2 (16-way set-associative) cache controller, 512 KB to 4 MB configurable size per cluster
  • 48-entry fully associative L1 instruction translation lookaside buffer (TLB) with native support for 4 KiB, 64 KiB, and 1 MB page sizes
  • 32-entry fully associative L1 data TLB with native support for 4 KiB, 64 KiB, and 1 MB page sizes
  • 4-way set-associative of 1024-entry unified L2 TLB per core, supports hit-under-miss
  • Sophisticated branch prediction algorithm that significantly increases performance and reduces energy from misprediction and speculation
  • Early IC tag –3-way L1 cache at direct-mapped power*
  • Regionalized TLB and μBTB tagging
  • Small-offset branch-target optimizations
  • Suppression of superfluous branch predictor accesses

Chips

  • Broadcom BCM2711 (used in Raspberry Pi 4{{Cite news|url=https://www.raspberrypi.org/blog/raspberry-pi-4-on-sale-now-from-35/|title=Raspberry Pi 4 on sale now from $35|date=2019-06-24|work=Raspberry Pi|access-date=2019-06-24|language=en-GB}})
  • Qualcomm Snapdragon 650, 652, and 653
  • NXP i.MX8, Layerscape LS1026A/LS1046A, LS2044A/LS2084A, LS2048A/LS2088A, LX2160A/LX2120A/LX2080A, LS1028A
  • Texas Instruments Jacinto 7 family of automotive and industrial SoC processors.
  • Rockchip RK3399
  • AWS Graviton

See also

References

{{Reflist}}