Project Denver

{{short description|Microarchitecture by Nvidia}}

{{Infobox CPU

| name = Nvidia Denver 1/2

| image =

| image_size =

| caption =

| produced-start = 2014 (Denver)
2016 (Denver 2)

| produced-end =

| slowest =

| fastest =

| slow-unit =

| fast-unit =

| fsb-slowest =

| fsb-fastest =

| fsb-slow-unit =

| fsb-fast-unit =

| size-from = 28 nm (Denver 1)

| size-to = 16 nm (Denver 2)

| soldby =

| designfirm = Nvidia

| manuf1 =

| core1 =

| sock1 =

| pack1 =

| brand1 =

| arch = ARMv8-A

| microarch =

| cpuid =

| code =

| numcores = 2

| l1cache = 192 KiB per core
(128 KiB I-cache with parity, 64 KiB D-cache with ECC)

| l2cache = 2 MiB @ 2 cores

| l3cache =

| application =

}}

{{Infobox CPU

| name = Nvidia Carmel

| image =

| image_size =

| caption =

| produced-start = 2018

| produced-end =

| slowest =

| fastest = 2.3 GHz

| slow-unit =

| fast-unit =

| fsb-slowest =

| fsb-fastest =

| fsb-slow-unit =

| fsb-fast-unit =

| size-from = 12 nm

| size-to =

| soldby =

| designfirm = Nvidia

| manuf1 =

| core1 =

| sock1 =

| pack1 =

| brand1 =

| arch = ARMv8.2-A

| microarch =

| cpuid =

| code =

| numcores = 2

| l1cache = 192 KiB per core
(128 KiB I-cache with parity, 64 KiB D-cache with ECC)

| l2cache = 2 MiB @ 2 cores

| l3cache = (4 MiB @ 8 cores, T194[https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/ NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics] by Dustin Franklin (Nvidia development team for Jetson), December 12, 2018)

| application =

}}

Project Denver is the codename of a central processing unit designed by Nvidia that implements the ARMv8-A 64/32-bit instruction sets using a combination of simple hardware decoder and software-based binary translation (dynamic recompilation) where "Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128 MB cache stored in main memory".{{Cite news|last=Wasson |first=Scott |url=http://techreport.com/news/26906/nvidia-claims-haswell-class-performance-for-denver-cpu-core |title=Nvidia claims Haswell-class performance for Denver CPU core |newspaper=The Tech Report |date=August 11, 2014 |access-date=August 14, 2014}} Denver is a very wide in-order superscalar pipeline. Its design makes it suitable for integration with other SIPs cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC).

Project Denver is targeted at mobile computers, personal computers, servers, as well as supercomputers.{{cite web|url=http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/|title="PROJECT DENVER" PROCESSOR TO USHER IN NEW ERA OF COMPUTING|last=Dally|first=Bill|publisher=Official Nvidia blog|date=January 5, 2011}} Respective cores have found integration in the Tegra SoC series from Nvidia. Initially Denver cores was designed for the 28 nm process node (Tegra model T132 aka "Tegra K1"). Denver 2 was an improved design that built for the smaller, more efficient 16 nm node. (Tegra model T186 aka "Tegra X2").

In 2018, Nvidia released an improved design (codename: "Carmel", based on ARMv8 (64-bit; variant: ARM-v8.2[https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/ NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics] by Dustin Franklin (Nvidia development team for Jetson), December 12, 2018 with 10-way superscalar, functional safety, dual execution, parity & ECC) got integrated into the Tegra Xavier SoC offering a total of 8 cores (or 4 dual-core pairs).[https://wccftech.com/nvidia-drive-xavier-soc-detailed/ NVIDIA Drive Xavier SOC Detailed] by Hassan Mujtaba on Jan 8, 2018 via WccfTech{{Failed verification|date=February 2019}} The Carmel CPU core supports full Advanced SIMD (ARM NEON), VFP (Vector Floating Point), and ARMv8.2-FP16.[https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/ NVIDIA Jetson AGX Xavier Delivers 32 TeraOps for New Era of AI in Robotics] by Dustin Franklin (Nvidia development team for Jetson), December 12, 2018 First published testings of Carmel cores integrated in the Jetson AGX development kit by third party experts took place in September 2018 and indicated a noticeably increased performance as should expected for this real world physical manifestation compared to predecessors systems, despite all doubts the used quickness of such a test setup in general an in particular implies.{{Cite web|url=https://www.phoronix.com/scan.php?page=article&item=nvidia-carmel-quick&num=1|title = A Quick Test of NVIDIA's "Carmel" CPU Performance}} The Carmel design can be found in the Tegra model T194 ("Tegra Xavier") that is designed with a 12 nm structure size.

Overview

  • Pipelined processor with 7-way superscalar execution pipeline
  • 128 KiB instruction + 64 KiB data L1 cache per core (both 4-way), 2 MiB L2 cache (16-way shared){{cite web |url=http://www.pcworld.com/article/2463900/nvidia-reveals-pc-like-performance-for-denver-tegra-k1.html |title=Nvidia reveals PC-like performance for 'Denver' Tegra K1 |first=Mark |last=Hachman |publisher=PC World |date=August 11, 2014 |access-date=September 19, 2014}}
  • Denver also sets aside 128 MiB of main memory as an interpretation cache, which is inaccessible to the main operating system.
  • Running at up to 2.5 GHz{{cite news|url=http://www.extremetech.com/computing/174023-tegra-k1-64-bit-denver-core-analysis-are-nvidias-x86-efforts-hidden-within|title=Tegra K1 64-bit Denver core analysis: Are Nvidia's x86 efforts hidden within?|last= Anthony|first=Sebastian |date= January 6, 2014|publisher=ExtremeTech|access-date=January 7, 2014}}

  • ARM code is translated either by a hardware translator or through software emulation to an instruction set that is internal to Project Denver. ARM instructions can be reordered, removed if they do not contribute to the end result, or otherwise optimized if software emulation is used.
  • Chips

    A dual-core Denver CPU was paired with a Kepler-based GPU solution to form the Tegra K1; the dual-core 2.3 GHz Denver-based K1 was first used in the HTC Nexus 9 tablet, released November 3, 2014.{{Cite web|url=http://www.phonearena.com/news/Nexus-9-storms-through-Geekbench-Tegra-K1-outperforms-Apple-iPhone-6s-A8_id61825|title=Nexus 9 storms through Geekbench, Tegra K1 outperforms Apple iPhone 6's A8|date=16 October 2014 }}{{cite web|url=http://www.anandtech.com/show/7620/nvidia-announces-tegra-k1-soc-project-logan-cortex-a15-kepler|title=NVIDIA Announces Tegra K1 SoC with Optional Denver CPU Cores|last=Shimpi|first=Anand |date=January 5, 2014|publisher=Anandtech|access-date=January 6, 2014}} Note, however, that the quad-core Tegra K1, while using the same name, isn't based on Denver.

    The Nvidia Tegra X2 has two Denver2 (ARMv8 64bit) cores inside and another four A57 (ARMv8 64bit) cores using a coherent HMP (Heterogeneous Multi-Processor Architecture) approach.[http://wccftech.com/nvidia-tegra-parker-soc-hot-chips/ NVIDIA Unveils Tegra Parker SOC at Hot Chips – Built on 16nm TSMC Process, Features Pascal and Denver 2 Duo Architecture], August 22, 2016 This pairs the units with a Parker-GPU.

    The Tegra Xavier is pairing an Nvidia Volta-GPU and several special purpose accelerators with 8 CPU cores with the Carmel design. In this design 4 Carmel ASIC macro blocks (with each having 2 cores) are matched to each other with one more crossbar and 4 MiB of L3 memory.

    History

    The existence of Project Denver was revealed at the 2011 Consumer Electronics Show.http://www.nvidia.com/object/ces2011.html Nvidia's press conference webcast In a March 4, 2011 Q&A article CEO Jen-Hsun Huang revealed that Project Denver is a five-year 64-bit ARMv8-A architecture CPU development on which hundreds of engineers had already worked for three and half years and which also has 32-bit ARM instruction set (ARMv7) backward compatibility.{{cite web|url=https://venturebeat.com/2011/03/04/qa-nvidia-chief-explains-his-strategy-for-winning-in-mobile-computing/|title=Q&A: Nvidia chief explains his strategy for winning in mobile computing|first=Dean|last=Takahashi|date=March 4, 2011}} Project Denver was started in Stexar Company (Colorado) as an x86-compatible processor using binary translation, similar to projects by Transmeta. Stexar was acquired by Nvidia in 2006.{{cite web|url=http://vr-zone.com/articles/nvidia-project-denver-lost-in-rockies-to-debut-in-2014-15/14204.html|title=NVIDIA Project Denver "Lost in Rockies", to Debut in 2014-15|first=Theo|last=Valich|date=December 12, 2011}}{{cite news|url=https://www.engadget.com/2006/10/19/nvidia-has-x86-cpu-in-the-works/|title=NVIDIA has x86 CPU in the works?|last=Miller|first=Paul|date=October 19, 2006|publisher=Engadget|access-date=October 19, 2013}}{{cite web|url=http://www.brightsideofnews.com/print/2013/3/20/new-tegra-roadmap-reveals-logan2c-parker-and-kayla-cuda-strategy.aspx|title=New Tegra Roadmap Reveals Logan, Parker and Kayla CUDA Strategy|first=Theo|last=Valich|date=March 20, 2013}}

    According to Tom's Hardware, there are engineers from Intel, AMD, HP, Sun and Transmeta on the Denver team, and they have extensive experience designing superscalar CPUs with out-of-order execution, very long instruction words (VLIW) and simultaneous multithreading (SMT).{{cite news|url=http://www.tomshardware.com/news/vmware-veeam-management-pack-vcenter-fault-tolerance,24643.html|title=64-bit Nvidia Tegra 6 "Parker" Chip May Arrive in 2014. Devices with a 64-bit Tegra 6 could launch before the end of 2014.|last=Parrish|first=Kevin|date=October 14, 2013|publisher=Tom's Hardware & ExtremeTech|access-date=October 19, 2013}}

    According to Charlie Demerjian, the Project Denver CPU may internally translate the ARM instructions to an internal instruction set, using firmware in the CPU.{{cite web|url=http://www.semiaccurate.com/2011/08/05/what-is-project-denver-based-on/|title=What is Project Denver based on?|first=Charlie|last=Demerjian|publisher=Semiaccurate|date=August 5, 2011}} Also according to Demerjian, Project Denver was originally intended to support both ARM and x86 code using code morphing technology from Transmeta, but was changed to the ARMv8-A 64-bit instruction set because Nvidia could not obtain a license to Intel's patents.

    The first consumer device shipping with Denver CPU cores, Google's Nexus 9, was announced on October 15, 2014. The tablet was manufactured by HTC and features the dual-core Tegra K1 SoC. The Nexus 9 was the first 64-bit Android device available to consumers.{{cite news|url=https://arstechnica.com/gadgets/2014/10/google-announces-the-nexus-6-nexus-9-and-android-5-0-lollipop/|title=Google announces Nexus 6, Nexus 9, Nexus Player, and Android 5.0 Lollipop|date=October 15, 2014|first=Ron|last=Amadeo}}

    See also

    References

    {{reflist}}