Ampere (microarchitecture)
{{Short description|GPU microarchitecture by Nvidia}}
{{Use American English|date=December 2022}}
{{Use mdy dates|date=September 2018}}
{{Infobox GPU microarchitecture
| name = Ampere
| image =
| image_size =
| caption =
| alt =
| launched = {{Start date and age|2020|05|14}}
| discontinued =
| soldby =
| designfirm = Nvidia
| manuf1 = TSMC
| manuf2 = Samsung
| process = TSMC N7 {{small|(professional)}}
{{Nowrap|Samsung 8N {{small|(consumer)}}}}
| codename = GA10x
| products-desktop1 = GeForce RTX 30 series
| products-hedt1 = RTX A series
| products-server1 = A100
| directx-version = DirectX 12 Ultimate (Feature Level 12_2)
| direct3d-version = Direct3D 12.0
| shadermodel-version = Shader Model 6.8
| opencl-version = OpenCL 3.0
| opengl-version = OpenGL 4.6
| opengles-version =
| cuda-version = Compute Capability 8.6
| optix-version =
| mantle-api =
| vulkan-api = Vulkan 1.3
| opengl-compute-version =
| cuda-compute-version =
| directcompute-version =
| compute =
| slowest =
| slow-unit =
| fastest =
| fast-unit =
| shader-clock =
| l0-cache =
| l1-cache = 192{{nbsp}}KB per SM {{small|(professional)}}
128{{nbsp}}KB per SM {{small|(consumer)}}
| l2-cache = 2{{nbsp}}MB to 6{{nbsp}}MB
| l3-cache =
| memory-support = {{ubl |GDDR6 |GDDR6X |HBM2}}
| memory-clock =
| pcie-support = PCIe 4.0
| encode-codec = {{hlist |H.264 |H.265}}
| decode-codec = {{hlist |H.264 |H.265 |AV1}}
| color-depth = {{hlist |8-bit |10-bit}}
| encoders = NVENC
| display-outputs = {{ubl |DisplayPort 1.4a |HDMI 2.1}}
| predecessor = Turing {{small|(consumer)}}
Volta {{small|(professional)}}
| successor = Ada Lovelace {{small|(consumer)}}
Hopper {{small|(datacenter)}}
| support status = Supported
}}
Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020, and is named after French mathematician and physicist André-Marie Ampère.{{Cite web|url=http://nvidianews.nvidia.com/news/nvidias-new-ampere-data-center-gpu-in-full-production|title=NVIDIA's New Ampere Data Center GPU in Full Production|website=NVIDIA News|date=May 14, 2020}}{{Cite web|url=https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/|title=NVIDIA Ampere Architecture In-Depth|date=May 14, 2020|website=NVIDIA Developer Blog|first1=Ronny|last1=Krashinsky|first2=Olivier|last2=Giroux|first3=Stephen|last3= Jones|first4=Nick|last4=Stam|first5=Sridhar|last5=Ramaswamy}}
Nvidia announced the Ampere architecture GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020.{{Cite web |title=NVIDIA Delivers Greatest-Ever Generational Leap with GeForce RTX 30 Series GPUs |url=http://nvidianews.nvidia.com/news/nvidia-delivers-greatest-ever-generational-leap-in-performance-with-geforce-rtx-30-series-gpus |website=Nvidia Newsroom |language=en-US |date=September 1, 2020 |access-date=April 9, 2023}}{{Cite web |title=NVIDIA GeForce Ultimate Countdown |url=https://www.nvidia.com/en-us/geforce/special-event/ |website=Nvidia |language=en-US}} Nvidia announced the A100 80 GB GPU at SC20 on November 16, 2020.{{Cite web |title=NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World's Most Powerful GPU for AI Supercomputing |url=https://nvidianews.nvidia.com/news/nvidia-doubles-down-announces-a100-80gb-gpu-supercharging-worlds-most-powerful-gpu-for-ai-supercomputing |website=Nvidia Newsroom |language=en-US |date=November 16, 2020 |access-date=April 9, 2023}} Mobile RTX graphics cards and the RTX 3060 based on the Ampere architecture were revealed on January 12, 2021.{{Cite web|url=https://www.nvidia.com/en-us/geforce/special-event/|title=NVIDIA GeForce Beyond at CES 2023|website=NVIDIA}}
Nvidia announced Ampere's successor, Hopper, at GTC 2022, and "Ampere Next Next" (Blackwell) for a 2024 release at GPU Technology Conference 2021.
Details
Architectural improvements of the Ampere architecture include the following:
- CUDA Compute Capability 8.0 for A100 and 8.6 for the GeForce 30 series{{Cite web |title=I.7. Compute Capability 8.x |url=https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability-8-x |website=Nvidia |language=en-US |access-date=September 23, 2020}}
- TSMC's 7 nm FinFET process for A100
- Custom version of Samsung's 8 nm process (8N) for the GeForce 30 series{{Cite web |last=Bosnjak |first=Dominik |date=September 1, 2020 |title=Samsung's old 8nm tech at the heart of NVIDIA's monstrous Ampere cards |url=https://www.sammobile.com/news/samsung-8nm-process-nvidia-geforce-rtx-30-ampere |website=SamMobile |language=en-US|access-date=September 19, 2020}}
- Third-generation Tensor Cores with FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration. The individual Tensor cores have with 256 FP16 FMA operations per clock 4x processing power (GA100 only, 2x on GA10x) compared to previous Tensor Core generations; the Tensor Core Count is reduced to one per SM.
- Second-generation ray tracing cores; concurrent ray tracing, shading, and compute for the GeForce 30 series
- High Bandwidth Memory 2 (HBM2) on A100 40 GB & A100 80 GB
- GDDR6X memory for GeForce RTX 3090, RTX 3080 Ti, RTX 3080, RTX 3070 Ti
- Double FP32 cores per SM on GA10x GPUs
- NVLink 3.0 with a 50 Gbit/s per pair throughput
- PCI Express 4.0 with SR-IOV support (SR-IOV is reserved only for A100)
- Multi-instance GPU (MIG) virtualization and GPU partitioning feature in A100 supporting up to seven instances
- PureVideo feature set K hardware video decoding with AV1 hardware decoding{{Cite web |last=Delgado |first=Gerardo |date=September 1, 2020 |title=GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode |url=https://www.nvidia.com/en-us/geforce/news/rtx-30-series-av1-decoding/ |website=Nvidia |language=en-US |access-date=April 9, 2023}} for the GeForce 30 series and feature set J for A100
- 5 NVDEC for A100
- Adds new hardware-based 5-core JPEG decode (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)
= Chips =
- GA100{{Cite web |last=Morgan |first=Timothy Prickett |date=May 29, 2020 |title=Diving Deep Into The Nvidia Ampere GPU Architecture |url=https://www.nextplatform.com/2020/05/28/diving-deep-into-the-nvidia-ampere-gpu-architecture/ |website=The Next Platform |language=en-US |access-date=March 24, 2022}}
- GA102
- GA103
- GA104
- GA106
- GA107
- GA10B
Comparison of Compute Capability: GP100 vs GV100 vs GA100{{cite web |title=NVIDIA A100 Tensor Core GPU Architecture: Unprecedented Accerlation at Every Scale |url=https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf |website=Nvidia |language=en-US |access-date=September 18, 2020}}
class="wikitable" style="text-align:left;" |
GPU features
!Nvidia Tesla P100 !Nvidia Tesla V100 !Nvidia A100 |
---|
GPU codename
|GP100 |GV100 |GA100 |
GPU architecture
|Ampere |
Compute capability
|6.0 |7.0 |8.0 |
Threads / warp
|32 |32 |32 |
Max warps / SM
|64 |64 |64 |
Max threads / SM
|2048 |2048 |2048 |
Max thread blocks / SM
|32 |32 |32 |
Max 32-bit registers / SM
|65536 |65536 |65536 |
Max registers / block
|65536 |65536 |65536 |
Max registers / thread
|255 |255 |255 |
Max thread block size
|1024 |1024 |1024 |
FP32 cores / SM
|64 |64 |64 |
Ratio of SM registers to FP32 cores
|1024 |1024 |1024 |
Shared Memory Size / SM
|64 KB |Configurable up to 96 KB |Configurable up to 164 KB |
Comparison of Precision Support Matrix{{Cite web|url=https://www.nvidia.com/en-us/data-center/tensor-cores/|title=NVIDIA Tensor Cores: Versatility for HPC & AI|website=NVIDIA}}{{Cite web|url=https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html|title=Abstract|website=docs.nvidia.com}}
class="wikitable" style="text-align:center;" |
rowspan="2" |
|colspan="8" |Supported CUDA Core Precisions |colspan="8" |Supported Tensor Core Precisions |
FP16
!FP32 !FP64 !INT1 !INT4 !INT8 !TF32 !BF16 !FP16 !FP32 !FP64 !INT1 !INT4 !INT8 !TF32 !BF16 |
---|
Nvidia Tesla P4
| {{no}} || {{yes}} || {{yes}} || {{no}} || {{no}} || {{yes}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} |
Nvidia P100
| {{yes}} || {{yes}} || {{yes}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} |
Nvidia Volta
| {{yes}} || {{yes}} || {{yes}} || {{no}} || {{no}} || {{yes}} || {{no}} || {{no}} || {{yes}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} |
Nvidia Turing
| {{yes}} || {{yes}} || {{yes}} || {{no}} || {{no}} || {{no}} || {{no}} || {{no}} || {{yes}} || {{no}} || {{no}} || {{yes}} || {{yes}} || {{yes}} || {{no}} || {{no}} |
Nvidia A100
| {{yes}} || {{yes}} || {{yes}} || {{no}} || {{no}} || {{yes}} || {{no}} || {{yes}} || {{yes}} || {{no}} || {{yes}} || {{yes}} || {{yes}} || {{yes}} || {{yes}} || {{yes}} |
Legend:
- FPnn: floating point with nn bits
- INTn: integer with n bits
- INT1: binary
- TF32: TensorFloat32
- BF16: bfloat16
Comparison of Decode Performance
class="wikitable" style="text-align:left;" |
Concurrent streams
!H.264 decode (1080p30) !H.265 (HEVC) decode (1080p30) !VP9 decode (1080p30) |
V100
|16 |22 |22 |
---|
A100
|75 |157 |108 |
Ampere dies
class="wikitable" style="font-size:100%; text-align:center;"
! Die ! GA10F |
Die size
| 826{{nbsp}}mm2 | 628{{nbsp}}mm2 | 496{{nbsp}}mm2 | 392{{nbsp}}mm2 | 276{{nbsp}}mm2 | 200{{nbsp}}mm2 | 448{{nbsp}}mm2 | ? |
---|
Transistors
| 54.2B | 28.3B | 22B | 17.4B | 12B | 8.7B | 21B | ? |
Transistor density
| 65.6 MTr/mm2 | 45.1 MTr/mm2 | 44.4 MTr/mm2 | 44.4 MTr/mm2 | 43.5 MTr/mm2 | 43.5 MTr/mm2 | 46.9 MTr/mm2 | ? |
Graphics processing clusters
| 8 | 7 | 6 | 6 | 3 | 2 | 2 | 1 |
Streaming multiprocessors
| 128 | 84 | 60 | 48 | 30 | 20 | 16 | 12 |
CUDA cores
| 12288 | 10752 | 7680 | 6144 | 3840 | 2560 | 2048 | 1536 |
Texture mapping units
| 512 | 336 | 240 | 192 | 120 | 80 | 64 | 48 |
Render output units
| 192 | 112 | 96 | 96 | 48 | 32 | 32 | 16 |
Tensor cores
| 512 | 336 | 240 | 192 | 120 | 80 | 64 | 48 |
RT cores
| N/A | 84 | 60 | 48 | 30 | 20 | 8 | 12 |
rowspan="2" | L1 cache
| 24{{nbsp}}MB | 10.5{{nbsp}}MB | 7.5{{nbsp}}MB | 6{{nbsp}}MB | 3{{nbsp}}MB | 2.5{{nbsp}}MB | 3{{nbsp}}MB | 1.5{{nbsp}}MB |
192{{nbsp}}KB per SM | colspan="5" | 128{{nbsp}}KB per SM | 192{{nbsp}}KB | 128{{nbsp}}KB |
L2 cache
| 40{{nbsp}}MB | 6{{nbsp}}MB | 4{{nbsp}}MB | 4{{nbsp}}MB | 3{{nbsp}}MB | 2{{nbsp}}MB | 4{{nbsp}}MB | ? |
A100 accelerator and DGX A100
The Ampere-based A100 accelerator was announced and released on May 14, 2020. The A100 features 19.5 teraflops of FP32 performance, 6912 FP32/INT32 CUDA cores, 3456 FP64 CUDA cores, 40 GB of graphics memory, and 1.6 TB/s of graphics memory bandwidth.{{cite news|url=https://www.theverge.com/2020/5/14/21258419/nvidia-ampere-gpu-ai-data-centers-specs-a100-dgx-supercomputer|title=Nvidia's first Ampere GPU is designed for data centers and AI, not your PC|author1=Tom Warren|author2=James Vincent|date=May 14, 2020|publisher=The Verge}} The A100 accelerator was initially available only in the 3rd generation of DGX server, including 8 A100s.{{cite news|author=Smith|first=Ryan|date=May 14, 2020|title=NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator|publisher=AnandTech|url=https://www.anandtech.com/show/15801/nvidia-announces-ampere-architecture-and-a100-products}} Also included in the DGX A100 is 15 TB of PCIe gen 4 NVMe storage, two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.
{{NvidiaDgxAccelerators}}
Products using Ampere
- GeForce MX series
- GeForce MX570 (mobile) (GA107)
- GeForce 20 series
- GeForce RTX 2050 (mobile) (GA107)
- GeForce 30 series
- GeForce RTX 3050 Laptop GPU (GA107)
- GeForce RTX 3050 (GA106 or GA107){{cite web |last1=Igor |first1=Wallossek |date=February 13, 2022|title=The two faces of the GeForce RTX 3050 8GB |url=https://www.igorslab.de/en/the-two-faces-of-geforce-rtx-3050-8gb-different-chips-and-different-thirsts/ |website=Igor's Lab |access-date=February 23, 2022 |ref=igor-ga107}}
- GeForce RTX 3050 Ti Laptop GPU (GA107)
- GeForce RTX 3060 Laptop GPU (GA106)
- GeForce RTX 3060 (GA106 or GA104){{Cite web|last=Shilov|first=Anton|date=September 25, 2021|title=Gainward and Galax List GeForce RTX 3060 Cards With GA104 GPU|url=https://www.tomshardware.com/news/ga104-based-geforce-rtx-3060-listed|website=Tom's Hardware|access-date=September 23, 2022}}
- GeForce RTX 3060 Ti (GA104 or GA103){{Cite news|last=Tyson|first=Mark|date=February 23, 2022|title=Zotac Debuts First RTX 3060 Ti Desktop Cards With GA103 GPU|url=https://www.tomshardware.com/news/zotac-geforce-rtx-3060-ti-ga103|website=Tom's Hardware|access-date=September 23, 2022}}
- GeForce RTX 3070 Laptop GPU (GA104)
- GeForce RTX 3070 (GA104)
- GeForce RTX 3070 Ti Laptop GPU (GA104)
- GeForce RTX 3070 Ti (GA104 or GA102){{Cite web |author=WhyCry |date=October 26, 2022 |title=ZOTAC launches GeForce RTX 3070 Ti with GA102-150 GPU |url=https://videocardz.com/newz/zotac-launches-geforce-rtx-3070-ti-with-ga102-150-gpu |website=VideoCardz |language=en-US |access-date=May 21, 2023}}
- GeForce RTX 3080 Laptop GPU (GA104)
- GeForce RTX 3080 (GA102)
- GeForce RTX 3080 12 GB (GA102)
- GeForce RTX 3080 Ti Laptop GPU (GA103)
- GeForce RTX 3080 Ti (GA102)
- GeForce RTX 3090 (GA102)
- GeForce RTX 3090 Ti (GA102)
- Nvidia Workstation GPUs (formerly Quadro)
- RTX A1000 (mobile) (GA107)
- RTX A2000 (mobile) (GA106)
- RTX A2000 (GA106)
- RTX A3000 (mobile) (GA104)
- RTX A4000 (mobile) (GA104)
- RTX A4000 (GA104)
- RTX A5000 (mobile) (GA104)
- RTX A5500 (mobile) (GA103)
- RTX A4500 (GA102)
- RTX A5000 (GA102)
- RTX A5500 (GA102)
- RTX A6000 (GA102)
- A800 Active
- Nvidia Data Center GPUs (formerly Tesla)
- Nvidia A2 (GA107)
- Nvidia A10 (GA102)
- Nvidia A16 (4 × GA107)
- Nvidia A30 (GA100)
- Nvidia A40 (GA102)
- Nvidia A100 (GA100)
- Nvidia A100 80 GB (GA100)
- Nvidia A100X
- NVIDIA A30X
- Tegra SoCs
- AGX Orin (GA10B)
- Orin NX (GA10B)
- Orin Nano (GA10B)
class="wikitable" style="font-size:85%; text-align:left;"
|+ Products using Ampere (per Chip) | |||||||
Type | GA10B | GA107 | GA106 | GA104 | GA103 | GA102 | GA100 |
---|---|---|---|---|---|---|---|
GeForce MX series
| {{N/a}} || GeForce MX570 (mobile) || {{N/a}} || {{N/a}} || {{N/a}} || {{N/a}} || {{N/a}} | |||||||
GeForce 20 series
| {{N/a}} || GeForce RTX 2050 (mobile) || {{N/a}} || {{N/a}} || {{N/a}} || {{N/a}} || {{N/a}} | |||||||
GeForce 30 series
| {{N/a}} || GeForce RTX 3050 Laptop | GeForce RTX 3050 | GeForce RTX 3060 | GeForce RTX 3060 Ti | GeForce RTX 3070 Ti | {{N/a}} | |||||||
Nvidia Workstation GPUs
| {{N/a}} || RTX A1000 (mobile) || RTX A2000 (mobile) | |||||||
Nvidia Data Center GPUs
| {{N/a}} || Nvidia A2 | |||||||
Tegra SoCs
| AGX Orin |
See also
References
{{reflist}}
External links
- [https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf Nvidia A100 Tensor Core GPU Architecture whitepaper]
- [https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf Nvidia Ampere GA102 GPU Architecture whitepaper]
- [https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/ Nvidia Ampere Architecture]
- [https://www.nvidia.com/en-us/data-center/a100/ Nvidia A100 Tensor Core GPU]
- [https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth/ Nvidia Ampere Architecture In-Depth]
{{Nvidia}}