hyper-threading

{{short description|Proprietary simultaneous multithreading implementation by Intel}}

{{Use dmy dates|date=August 2020}}

File:Hyper-threaded CPU.png), decoded and reordered by the front end (white boxes represent pipeline bubbles), and passed to the execution core capable of executing instructions from two different programs during the same clock cycle.{{cite web

| url = https://arstechnica.com/features/2002/10/hyperthreading/1/

| title = Introduction to Multithreading, Superthreading and Hyperthreading

| date = 2002-10-03 | access-date = 2015-09-30

| first = Jon | last = Stokes | website = Ars Technica

| pages = 2–3

}}{{cite web

| url = http://www.cs.sfu.ca/~fedorova/Teaching/CMPT886/Spring2007/papers/hyper-threading.pdf

| title = Hyper-Threading Technology Architecture and Microarchitecture

| date = 2006-12-12

| access-date = 2015-09-30

| author1 = Deborah T. Marr

| author2 = Frank Binns

| author3 = David L. Hill

| author4 = Glenn Hinton

| author5 = David A. Koufaty

| author6 = J. Alan Miller

| author7 = Michael Upton

| website = cs.sfu.ca

| archive-url = https://web.archive.org/web/20150923211343/http://www.cs.sfu.ca/~fedorova/Teaching/CMPT886/Spring2007/papers/hyper-threading.pdf

| archive-date = 23 September 2015

| url-status = dead

}}{{cite web

| url = http://www.anandtech.com/show/6355/intels-haswell-architecture/6

| title = The Haswell Front End – Intel's Haswell Architecture Analyzed

| date = 2012-10-05 | access-date = 2015-09-30

| author = Anand Lal Shimpi | publisher = AnandTech

}}]]

Hyper-threading (officially called Hyper-Threading Technology or HT Technology and abbreviated as HTT or HT) is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on x86 microprocessors. It was introduced on Xeon server processors in February 2002 and on Pentium 4 desktop processors in November 2002.{{cite web |url=http://www.xbitlabs.com/articles/cpu/display/pentium4-3066.html |title=Intel Pentium 4 3.06GHz CPU with Hyper-Threading Technology: Killing Two Birds with a Stone.. |publisher=X-bit labs |access-date=2014-06-04 |url-status=dead |archive-url=https://web.archive.org/web/20140531105602/http://www.xbitlabs.com/articles/cpu/display/pentium4-3066.html |archive-date=31 May 2014}} Since then, Intel has included this technology in Itanium, Atom, and Core 'i' Series CPUs, among others.{{cite web

|url=https://www.intel.co.uk/content/www/uk/en/architecture-and-technology/hyper-threading/hyper-threading-technology.html

|title=Intel® Hyper-Threading Technology (Intel® HT Technology)

|publisher=Intel

|access-date=2021-10-24}}

For each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline; it takes advantage of superscalar architecture, in which multiple instructions operate on separate data in parallel. With HTT, one physical core appears as two processors to the operating system, allowing concurrent scheduling of two processes per core. In addition, two or more processes can use the same resources: If resources for one process are not available, then another process can continue if its resources are available.

In addition to requiring simultaneous multithreading support in the operating system, hyper-threading can be properly utilized only with an operating system specifically optimized for it.[http://software.intel.com/en-us/articles/required-components-interchangeability-list-for-the-intel-pentiumr-4-processor-with-ht-technology Intel Required Components Interchangeability List for the Intel Pentium 4 Processor with HT Technology], includes list of Operating Systems that include optimizations for Hyper-Threading Technology; they are Windows XP Professional 64, Windows XP MCE, Windows XP Home, Windows XP Professional, some versions of Linux such as COSIX Linux 4.0, RedHat Linux 9 (Professional and Personal versions), RedFlag Linux Desktop 4.0 and SuSe Linux 8.2 (Professional and Personal versions)

Overview

File:KL Intel Pentium 4 Northwood.jpg

Hyper-Threading Technology is a form of simultaneous multithreading technology introduced by Intel, while the concept behind the technology has been patented by Sun Microsystems. Architecturally, a processor with Hyper-Threading Technology consists of two logical processors per core, each of which has its own processor architectural state. Each logical processor can be individually halted, interrupted or directed to execute a specified thread, independently from the other logical processor sharing the same physical core.{{cite web

|url = http://sc.tamu.edu/systems/eos/nehalem.pdf

|title = The Architecture of the Nehalem Processor and Nehalem-EP SMP Platforms

|date = 2011-03-17

|access-date = 2014-03-21

|first = Michael E. |last=Thomadakis

|publisher = Texas A&M University

|page = 23

|url-status = dead

|archive-url = https://web.archive.org/web/20140811023120/http://sc.tamu.edu/systems/eos/nehalem.pdf

|archive-date = 11 August 2014}}

Unlike a traditional dual-processor configuration that uses two separate physical processors, the logical processors in a hyper-threaded core share the execution resources. These resources include the execution engine, caches, and system bus interface; the sharing of resources allows two logical processors to work with each other more efficiently, and allows a logical processor to borrow resources from a stalled logical core (assuming both logical cores are associated with the same physical core). A processor stalls when it must wait for data it has requested, in order to finish processing the present thread. The degree of benefit seen when using a hyper-threaded, or multi-core, processor depends on the needs of the software, and how well it and the operating system are written to manage the processor efficiently.

Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as the usual "physical" processor plus an extra "logical" processor to the host operating system (HTT-unaware operating systems see two "physical" processors), allowing the operating system to schedule two threads or processes simultaneously and appropriately. When execution resources in a hyper-threaded processor are not in use by the current task, and especially when the processor is stalled, those execution resources can be used to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.){{Cite book |title=Computer Architecture: A Quantitative Approach |last1=Hennessy | first1=John L. |last2=Patterson |first2=David A. |others=Asanović, Krste, Bakos, Jason D., Colwell, Robert P., Bhattacharjee, Abhishek, 1984-, Conte, Thomas M., 1964-|date=7 December 2017 |isbn=978-0128119051|edition= Sixth|location=Cambridge, MA|oclc=983459758}}

This technology is transparent to operating systems and programs. The minimum that is required to take advantage of hyper-threading is symmetric multiprocessing (SMP) support in the operating system, since the logical processors appear no different to the operating system than physical processors.

It is possible to optimize operating system behavior on multi-processor, hyper-threading capable systems. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's thread scheduler is unaware of hyper-threading, it will treat all four logical processors the same. If only two threads are eligible to run, it might choose to schedule those threads on the two logical processors that happen to belong to the same physical processor. That processor would be extremely busy, and would share execution resources, while the other processor would remain idle, leading to poorer performance than if the threads were scheduled on different physical processors. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors, which is, in a sense, a limited form of the scheduler changes required for NUMA systems.

History

The first published paper describing what is now known as hyper-threading in a general purpose computer was written by Edward S. Davidson and Leonard. E. Shar in 1973."A multiminiprocessor system implemented through pipelining", by Leonard Shar and Edward Davidson, IEEE Computer, Feb. 1974, pp. 42-51, vol. 7 https://www.computer.org/csdl/magazine/co/1974/02/4251/13rRUyoyhIt

Denelcor, Inc. introduced multi-threading with the Heterogeneous Element Processor (HEP) in 1982. The HEP pipeline could not hold multiple instructions from the same process. Only one instruction from a given process was allowed to be present in the pipeline at any point in time. Should an instruction from a given process block the pipe, instructions from other processes would continue after the pipeline drained.

US patent for the technology behind hyper-threading was granted to Kenneth Okin at Sun Microsystems in November 1994. At that time, CMOS process technology was not advanced enough to allow for a cost-effective implementation.{{Citation|last=Okin |first=Kenneth |title=United States Patent: 5361337 - Method and apparatus for rapidly switching processes in a computer system |date=1 November 1994 |url=http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%252Fnetahtml%252FPTO%252Fsrchnum.htm&r=1&f=G&l=50&s1=5361337.PN.&OS=PN/5361337&RS=PN/5361337 |access-date=2016-05-24 |url-status=dead |archive-url=https://web.archive.org/web/20150921143211/http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1 |archive-date=21 September 2015}}

Intel implemented hyper-threading on an x86 architecture processor in 2002 with the Foster MP-based Xeon. It was also included on the 3.06 GHz Northwood-based Pentium 4 in the same year, and then remained as a feature in every Pentium 4 HT, Pentium 4 Extreme Edition and Pentium Extreme Edition processor since. The Intel Core & Core 2 processor lines (2006) that succeeded the Pentium 4 model line didn't utilize hyper-threading. The processors based on the Core microarchitecture did not have hyper-threading because the Core microarchitecture was a descendant of the older P6 microarchitecture. The P6 microarchitecture was used in earlier iterations of Pentium processors, namely, the Pentium Pro, Pentium II and Pentium III (plus their Celeron & Xeon derivatives at the time). Windows 2000 SP3 and Windows XP SP1 have added support for hyper-threading.

Intel released the Nehalem microarchitecture (Core i7) in November 2008, in which hyper-threading made a return. The first generation Nehalem processors contained four physical cores and effectively scaled to eight threads. Since then, both two- and six-core models have been released, scaling four and twelve threads respectively.

{{cite web

|url=https://www.intel.com/consumer/learn/desktop/corei7-extreme-detail.htm

|title=Extreme Gaming with the Intel® Core™ i7 Processor Extreme Edition

|archive-url=https://web.archive.org/web/20081201122911/http://www.intel.com/Consumer/Learn/Desktop/corei7-extreme-detail.htm

|archive-date=1 December 2008

|url-status=dead

|website=www.intel.com}}

Earlier Intel Atom cores were in-order processors, sometimes with hyper-threading ability, for low power mobile PCs and low-price desktop PCs.{{cite web|url=http://www.intel.com/technology/atom/microarchitecture.htm |title=Intel® Atom™ Processor Microarchitecture |publisher=Intel.com |date=2011-03-18 |access-date=2011-04-05}} The Itanium 9300 launched with eight threads per processor (two threads per core) through enhanced hyper-threading technology. The next model, the Itanium 9500 (Poulson), features a 12-wide issue architecture, with eight CPU cores with support for eight more virtual cores via hyper-threading.{{cite web|url=http://www.tomshardware.com/news/intel-itanium-poulson-dual-domain-hyper-threading,13279.html |title=Intel Discloses New Itanium Poulson Features |date=24 August 2011 |publisher=Tomshardware.com |access-date=2017-07-02}} The Intel Xeon 5500 server chips also utilize two-way hyper-threading.{{cite web|url=http://www.intel.com/p/en_US/products/server/processor |title=Server Processor Index Page |publisher=Intel.com |date=2011-03-18 |access-date=2011-04-05}}{{cite web|url=http://www.intel.com/business/resources/demos/xeon5500/performance/demo.htm |title=Intel Xeon Processor 5500 Series |publisher=Intel.com |access-date=2011-04-05}}

Performance claims

According to Intel, the first hyper-threading implementation used only 5% more die area than the comparable non-hyperthreaded processor, but the performance was 15–30% better.

{{cite journal

|url=http://www.intel.com/technology/itj/2002/volume06issue01/vol6iss1_hyper_threading_technology.pdf

|title=Hyper-Threading Technology

|journal=Intel Technology Journal

|volume=06

|issue=1

|date=14 February 2012

|issn=1535-766X

|archive-url=https://web.archive.org/web/20121019025809/http://www.intel.com/technology/itj/2002/volume06issue01/vol6iss1_hyper_threading_technology.pdf

|url-status=dead

|archive-date=19 October 2012

}}

{{cite web

|url=https://software.intel.com/en-us/articles/how-to-determine-the-effectiveness-of-hyper-threading-technology-with-an-application

|title=How to Determine the Effectiveness of Hyper-Threading Technology with an Application

|date=28 April 2011

|url-status=dead

|archive-url=https://web.archive.org/web/20100202034723/http://software.intel.com/en-us/articles/how-to-determine-the-effectiveness-of-hyper-threading-technology-with-an-application

|archive-date=2 February 2010

|website=software.intel.com}} Intel claims up to a 30% performance improvement compared with an otherwise identical, non-simultaneous multithreading Pentium 4. Tom's Hardware states: "In some cases a P4 running at 3.0 GHz with HT on can even beat a P4 running at 3.6 GHz with HT turned off."{{cite web|url=http://www.tomshardware.com/reviews/single-cpu-dual-operation,549-25.html |title=Summary: In Some Cases The P4 3.0HT Can Even Beat The 3.6 GHz Version : Single CPU in Dual Operation: P4 3.06 GHz with Hyper-Threading Technology |publisher=Tomshardware.com |date=2002-11-14 |access-date=2011-04-05}} Intel also claims significant performance improvements with a hyper-threading-enabled Pentium 4 processor in some artificial-intelligence algorithms.

Overall the performance history of hyper-threading was a mixed one in the beginning. As one commentary on high-performance computing from November 2002 notes:{{cite web|title=A Study of Hyper-Threading in High-Performance Computing Clusters|url=https://ahelp.com/wp-content/uploads/2025/03/4q02-Len.pdf|publisher=Dell|access-date=12 November 2012|author=Tau Leng|author2=Rizwan Ali |author3=Jenwei Hsieh |author4=Christopher Stanton |page=4|date=November 2002}}

Hyper-Threading can improve the performance of some MPI applications, but not all. Depending on the cluster configuration and, most importantly, the nature of the application running on the cluster, performance gains can vary or even be negative. The next step is to use performance tools to understand what areas contribute to performance gains and what areas contribute to performance degradation.

As a result, performance improvements are very application-dependent;{{cite web

| url = http://www.extremetech.com/computing/133121-maximized-performance-comparing-the-effects-of-hyper-threading-software-updates

| title = Maximized performance: Comparing the effects of Hyper-Threading, software updates

| date = 24 July 2012 | access-date = 2 March 2015

| author = Joel Hruska | website = extremetech.com

}} however, when running two programs that require full attention of the processor, it can actually seem like one or both of the programs slows down slightly when Hyper-Threading Technology is turned on.{{cite web|url=http://users.telenet.be/nicvroom/performanceP4.htm|title=CPU Performance Evaluation - Benchmark - Pentium 4 2.8 and 3.0|website=users.telenet.be|access-date=12 April 2011|archive-date=24 February 2021|archive-url=https://web.archive.org/web/20210224131422/http://users.telenet.be/nicvroom/performanceP4.htm|url-status=dead}} This is due to the replay system of the Pentium 4 tying up valuable execution resources, equalizing the processor resources between the two programs, which adds a varying amount of execution time. The Pentium 4 "Prescott" and the Xeon "Nocona" processors received a replay queue that reduces execution time needed for the replay system and completely overcomes the performance penalty.{{cite web|title=Replay: Unknown Features of the NetBurst Core. Page 15|url=http://www.xbitlabs.com/articles/cpu/display/replay_15.html#sect0|website=Replay: Unknown Features of the NetBurst Core.|publisher=Xbitlabs|access-date=24 April 2011|url-status=dead|archive-url=https://web.archive.org/web/20110514180659/http://www.xbitlabs.com/articles/cpu/display/replay_15.html#sect0|archive-date=14 May 2011}}

According to a November 2009 analysis by Intel, performance impacts of hyper-threading result in increased overall latency in case the execution of threads does not result in significant overall throughput gains, which vary by the application. In other words, overall processing latency is significantly increased due to hyper-threading, with the negative effects becoming smaller as there are more simultaneous threads that can effectively use the additional hardware resource utilization provided by hyper-threading.{{cite web

|url = https://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology

|title = Performance Insights to Intel Hyper-Threading Technology

|date = 20 November 2009

|access-date = 26 February 2015

|first = Antonio |last = Valles

|publisher = Intel

|url-status = dead

|archive-url = https://web.archive.org/web/20150217050949/https://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/

|archive-date = 17 February 2015}} A similar performance analysis is available for the effects of hyper-threading when used to handle tasks related to managing network traffic, such as for processing interrupt requests generated by network interface controllers (NICs).{{cite web

| url = https://calomel.org/network_performance.html

| title = Network Tuning and Performance

| date = 12 November 2013 | access-date = 26 February 2015

| website = calomel.org

}} Another paper claims no performance improvements when hyper-threading is used for interrupt handling.{{cite web

| url = https://www.kernel.org/doc/Documentation/networking/scaling.txt

| title = Linux kernel documentation: Scaling in the Linux Networking Stack

| date = 1 December 2014 | access-date = 2 March 2015

| publisher = kernel.org

| quote = Per-cpu load can be observed using the mpstat utility, but note that on processors with hyperthreading (HT), each hyperthread is represented as a separate CPU. For interrupt handling, HT has shown no benefit in initial tests, so limit the number of queues to the number of CPU cores in the system.

}}

Drawbacks

{{Anchor|Drawback}}

When the first HT processors were released, many operating systems were not optimized for hyper-threading technology (e.g. Windows 2000 and Linux older than 2.4).{{cite web|url=http://www.intel.com/support/processors/pentium4/sb/cs-017371.htm#1c|title=Hyper-Threading Technology – Operating systems that include optimizations for Hyper-Threading Technology |publisher=Intel.com |date=2011-09-19 |access-date=2012-02-29}}

In 2006, hyper-threading was criticised for energy inefficiency.{{cite book|title=Sustainable Practices: Concepts, Methodologies, Tools and Applications|isbn=9781466648524|page=666|publisher=Information Resources Management Association|date=December 2013}} For example, ARM (a specialized, low-power, CPU design company), stated that simultaneous multithreading can use up to 46% more power than ordinary dual-core designs. Furthermore, they claimed that SMT increases cache thrashing by 42%, whereas dual core results in a 37% decrease.{{cite web|url=http://www.theinquirer.net/inquirer/news/1037948/arm-fan-hyperthreading |archive-url=https://web.archive.org/web/20090906005322/http://www.theinquirer.net/inquirer/news/1037948/arm-fan-hyperthreading |url-status=unfit |archive-date=6 September 2009 |title=ARM is no fan of HyperThreading |publisher=theinquirer.net |date=2006-08-02 |access-date=2012-02-29}}

In 2010, ARM said it might include simultaneous multithreading in its future chips;{{cite web|first=Tom|last=Jermoluk |url=http://www.top500.org/blog/2010/10/13/about_mips_and_mips |title=About MIPS and MIPS | TOP500 Supercomputing Sites |website=Top500.org |date=2010-10-13 |access-date=2011-04-05 |url-status=dead |archive-url=https://web.archive.org/web/20110613203023/http://www.top500.org/blog/2010/10/13/about_mips_and_mips |archive-date=13 June 2011}} however, this was rejected in favor of their 2012 64-bit design.{{cite web|url=http://www.techdesignforums.com/blog/2012/10/30/arm-64bit-cortex-a53-a57-launch/|title=ARM launches first 64bit processor core for servers and smartphones|date=30 October 2012|website=Tech Design Forum}} ARM produced SMT cores in 2018.{{Cite web |title=Arm launches first SMT-capable Cortex core {{!}} bit-tech.net |url=https://bit-tech.net/news/arm-launches-first-smt-capable-cortex-core/1/ |access-date=2023-12-02 |website=bit-tech.net |language=en}}

In 2013, Intel dropped SMT in favor of out-of-order execution for its Silvermont processor cores, as they found this gave better performance with better power efficiency than a lower number of cores with SMT.{{Cite news |title= Deep inside Intel's first viable mobile processor: Silvermont |author= Rik Myslewski |work= The Register |date= 8 May 2013 |url= https://www.theregister.co.uk/2013/05/08/intel_silvermont_microarchitecture/ |access-date= 13 January 2014 }}

In 2017, it was revealed that Intel's Skylake and Kaby Lake processors had a bug in their implementation of hyper-threading that could cause data loss.{{Cite news|url=https://www.theregister.co.uk/2017/06/25/intel_skylake_kaby_lake_hyperthreading/|title=Intel's Skylake and Kaby Lake CPUs have nasty hyper-threading bug|first=Richard|last=Chirgwin|date=25 June 2017|access-date=4 July 2017|work=The Register}} Microcode updates were later released to address the issue.{{cite news|url=https://arstechnica.com/information-technology/2017/06/skylake-kaby-lake-chips-have-a-crash-bug-with-hyperthreading-enabled/|title=Skylake, Kaby Lake Chips Have a Crash Bug with Hyperthreading Enabled|work=Ars Technica|access-date=25 November 2017|date=26 June 2017}}

In 2019, with Coffee Lake, Intel temporarily moved away from including hyper-threading in mainstream Core i7 desktop processors except for highest-end Core i9 parts or Pentium Gold CPUs.{{cite news|url=https://www.anandtech.com/show/14256/intel-9th-gen-core-processors-all-the-desktop-and-mobile-45w-cpus-announced|title=Intel 9th Gen Core Processors: All the Desktop and Mobile 45W CPUs Announced|first=Ian|last=Cutress|date=23 April 2019|work=AnandTech}} It also began to recommend disabling hyper-threading, as new CPU vulnerability attacks were revealed which could be mitigated by disabling HT.

{{cite news

|url=https://www.tomshardware.co.uk/intel-disable-hyper-threading-spectre-attack,news-60647.html

|title=Intel's New Spectre-Like Flaw Affects Chips Made Since 2008

|first=Lucian|last=Armasu

|date=14 May 2019

|url-status=dead

|archive-url=https://web.archive.org/web/20190804172710/https://www.tomshardware.co.uk/intel-disable-hyper-threading-spectre-attack,news-60647.html

|archive-date=4 August 2019

|work=Tom's Hardware}}

Security

In May 2005, Colin Percival demonstrated that a malicious thread on a Pentium 4 can use a timing-based side-channel attack to monitor the memory access patterns of another thread with which it shares a cache, allowing the theft of cryptographic information. This is not actually a timing attack, as the malicious thread measures the time of only its own execution. Potential solutions to this include the processor changing its cache eviction strategy or the operating system preventing the simultaneous execution, on the same physical core, of threads with different privileges.{{cite web

| url = http://www.daemonology.net/papers/htt.pdf

| title = Cache Missing for Fun and Profit

| date = 2005-05-14 | access-date = 2016-06-14

| first = Colin | last = Percival | website = Daemonology.net

}} In 2018 the OpenBSD operating system has disabled hyper-threading "in order to avoid data potentially leaking from applications to other software" caused by the Foreshadow/L1TF vulnerabilities.{{Cite news|url=https://www.theregister.co.uk/2018/06/20/openbsd_disables_intels_hyperthreading/|title=OpenBSD disables Intel's hyper-threading over CPU data leak fears|access-date=2018-08-24}}{{Cite web|url=https://marc.info/?l=openbsd-tech&m=153504937925732&w=2|title='Disable SMT/Hyperthreading in all Intel BIOSes' - MARC|website=marc.info|access-date=2018-08-24}} In 2019 a set of vulnerabilities led to security experts recommending the disabling of hyper-threading on all devices.{{cite news|author1-first=Andy|author1-last=Greenberg|url=https://www.wired.com/story/intel-mds-attack-speculative-execution-buffer/|title=Meltdown Redux: Intel Flaw Lets Hackers Siphon Secrets from Millions of PCs|newspaper=WIRED|date=14 May 2019|access-date=14 May 2019}}

See also

References

{{Reflist}}