ARM Cortex-A72

ARM Cortex-A72
General information
Launched	2016
Designed by	ARM Holdings
Cache
L1 cache	80 KiB (48 KiB I-cache with parity, 32 KiB D-cache with ECC) per core
L2 cache	512 KiB to 4 MiB
L3 cache	None
Architecture and classification
Technology node	16 nm
Instruction set	ARMv8-A
Physical specifications
Cores	1–4 per cluster, multiple clusters;
Products, models, variants
Product code name	Maya;
History
Predecessor	ARM Cortex-A57
Successor	ARM Cortex-A73

The ARM Cortex-A72 is a central processing unit implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings' Austin design centre. The Cortex-A72 is a 3-way decode out-of-order superscalar pipeline.^[1] It is available as SIP core to licensees, and its design makes it suitable for integration with other SIP cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC). The Cortex-A72 was announced in 2015 to serve as the successor of the Cortex-A57, and was designed to use 20% less power or offer 90% greater performance.^[2]^[3]

Overview

Pipelined processor with deeply out-of-order, speculative issue 3-way superscalar execution pipeline
DSP and NEON SIMD extensions are mandatory per core
VFPv4 Floating Point Unit onboard (per core)
Hardware virtualization support
Thumb-2 instruction set encoding reduces the size of 32-bit programs with little impact on performance.
TrustZone security extensions
Program Trace Macrocell and CoreSight Design Kit for unobtrusive tracing of instruction execution
32 KiB data (2-way set-associative) + 48 KiB instruction (3-way set-associative) L1 cache per core
Integrated low-latency level-2 (16-way set-associative) cache controller, 512 KB to 4 MB configurable size per cluster
48-entry fully associative L1 instruction translation lookaside buffer (TLB) with native support for 4 KiB, 64 KiB, and 1 MB page sizes
32-entry fully associative L1 data TLB with native support for 4 KiB, 64 KiB, and 1 MB page sizes
- 4-way set-associative of 1024-entry unified L2 TLB per core, supports hit-under-miss
Sophisticated branch prediction algorithm that significantly increases performance and reduces energy from misprediction and speculation
Early IC tag –3-way L1 cache at direct-mapped power*
Regionalized TLB and μBTB tagging
Small-offset branch-target optimizations
Suppression of superfluous branch predictor accesses

Broadcom BCM2711 (used in Raspberry Pi 4^[4])
Qualcomm Snapdragon 650, 652, and 653
NXP i.MX8, Layerscape LS1026A/LS1046A, LS2044A/LS2084A, LS2048A/LS2088A, LX2160A/LX2120A/LX2080A, LS1028A
Texas Instruments Jacinto 7 family of automotive and industrial SoC processors.
Rockchip RK3399
AWS Graviton

^ ^a ^b "Cortex-A72 Processor". ARM Holdings. Retrieved 2014-02-02.
^ Frumusanu, Andrei (3 February 2015). "ARM Announces Cortex-A72, CCI-500, and Mali-T880". Anandtech. Retrieved 29 March 2017.
^ Frumusanu, Andrei (23 April 2015). "ARM Reveals Cortex-A72 Architecture Details". Anandtech. Retrieved 29 March 2017.
^ "Raspberry Pi 4 on sale now from $35". Raspberry Pi. 2019-06-24. Retrieved 2019-06-24.