Native Command Queuing

File:NCQ.svg

In computing, Native Command Queuing (NCQ) is an extension of the Serial ATA protocol allowing hard disk drives to internally optimize the order in which received read and write commands are executed. This can reduce the amount of unnecessary drive head movement, resulting in increased performance (and slightly decreased wear of the drive) for workloads where multiple simultaneous read/write requests are outstanding, most often occurring in server-type applications.

History

Native Command Queuing was preceded by Parallel ATA's version of Tagged Command Queuing (TCQ). ATA's attempt at integrating TCQ was constrained by the requirement that ATA host bus adapters use ISA bus device protocols to interact with the operating system. The resulting high CPU overhead and negligible performance gain contributed to a lack of market acceptance for ATA TCQ.

NCQ differs from TCQ in that, with NCQ, each command is of equal importance, but NCQ's host bus adapter also programs its own first party DMA engine with CPU-given DMA parameters during its command sequence whereas TCQ interrupts the CPU during command queries and requires it to modulate the ATA host bus adapter's third party DMA engine. NCQ's implementation is preferable because the drive has more accurate knowledge of its performance characteristics and is able to account for its rotational position. Both NCQ and TCQ have a maximum queue length of 32 outstanding commands.[https://web.archive.org/web/20101218131508/http://seagate.com/content/pdf/whitepaper/D2c_tech_paper_intc-stx_sata_ncq.pdf PDF white paper on NCQ from Intel and Seagate]{{Cite web |url=http://www.t13.org/Documents/UploadedDocuments/docs2004/d1532v1r4b-ATA-ATAPI-7.pdf |title=Volume 1 of the final draft of the ATA-7 standard |access-date=2007-01-02 |archive-date=2012-02-06 |archive-url=https://web.archive.org/web/20120206032939/http://www.t13.org/Documents/UploadedDocuments/docs2004/d1532v1r4b-ATA-ATAPI-7.pdf |url-status=dead }} Because the ATA TCQ is rarely used, Parallel ATA (and the IDE mode of some chipsets) usually only support one outstanding command per port.

For NCQ to be enabled, it must be supported and enabled in the SATA host bus adapter and in the hard drive itself. The appropriate driver must be loaded into the operating system to enable NCQ on the host bus adapter.[http://download.intel.com/support/chipsets/imsm/sb/sata2_ncq_overview.pdf "SATA II Native Command Queuing Overview", Intel Whitepaper, April 2003. ]

Many newer chipsets support the Advanced Host Controller Interface (AHCI), which allows operating systems to universally control them and enable NCQ. DragonFly BSD has supported AHCI with NCQ since 2.3 in 2009.{{cite web |author= Matthew Dillon |author-link= Matthew Dillon |date= 2009-06-04 |url= http://www.dragonflybsd.org/mailarchive/kernel/2009-06/msg00004.html |title= "Re: DragonFly-2.3.1.165.g25822 master sys/dev/disk/ahci Makefile TODO ahci.c ahci.h ahci_attach.c ahci_cam.c ahci_dragonfly.c ahci_dragonfly.h atascsi.h" }}{{cite web |author= Matthew Dillon |author-link= Matthew Dillon |date= 2009 |url= http://bxr.su/d/share/man/man4/ahci.4 |title= ahci(4) — Advanced Host Controller Interface for Serial ATA |website= BSD Cross Reference |publisher= DragonFly BSD}}

  • {{cite book |section=ahci - Advanced Host Controller Interface for Serial ATA |title=DragonFly On-Line Manual Pages |url=http://mdoc.su/d/ahci.4 }} Linux kernels support AHCI natively since version 2.6.19, and FreeBSD fully supports AHCI since version 8.0. Windows Vista and Windows 7 also natively support AHCI, but their AHCI support (via the msahci service) must be manually enabled via registry editing if controller support was not present during their initial install. Windows 7's AHCI enables not only NCQ but also TRIM support on SSD drives (with their supporting firmware). Older operating systems such as Windows XP require the installation of a vendor-specific driver (similar to installing a RAID or SCSI controller) even if AHCI is present on the host bus adapter, which makes initial setup more tedious and conversions of existing installations relatively difficult as most controllers cannot operate their ports in mixed AHCI–SATA/IDE/legacy mode.

Hard disk drives

= Performance =

A 2004 test with the first-generation NCQ drive (Seagate 7200.7 NCQ) found that while NCQ increased IOMeter performance, desktop application performance decreased.{{cite web|url=http://techreport.com/review/7750/seagate-barracuda-7200-7-ncq-hard-drive/13 |title=Seagate's Barracuda 7200.7 NCQ hard drive - The Tech Report - Page 13 |date=17 December 2004 |publisher=The Tech Report |access-date=2014-01-11}} One review in 2010 found improvements on the order of 9% (on average) with NCQ enabled in a series of Windows multitasking tests.{{cite web|url=http://techreport.com/review/8624/multitasking-with-native-command-queuing/5 |title=Multitasking with Native Command Queuing - The Tech Report - Page 5 |date=3 August 2005 |publisher=The Tech Report |access-date=2014-01-11}}

NCQ can negatively interfere with the operating system's I/O scheduler, decreasing performance;{{Cite journal |last1=Yu |first1=Y. J. |last2=Shin |first2=D. I. |last3=Eom |first3=H. |last4=Yeom |first4=H. Y. |year=2010 |title=NCQ vs. I/O scheduler |journal=ACM Transactions on Storage |volume=6 |pages=1–37 |doi=10.1145/1714454.1714456 |s2cid=14414608}} [http://www.cs.albany.edu/~sdc/CSI500/Fal10/DiskArmSchedulingPapers/a2-yu.pdf] this has been observed in practice on Linux with RAID-5.{{cite web|url=http://serverfault.com/questions/305890/poor-linux-software-raid-5-performance-with-ncq |title=hard drive - Poor Linux software RAID 5 performance with NCQ |publisher=Server Fault |access-date=2014-01-11}} There is no mechanism in NCQ for the host to specify any sort of deadlines for an I/O, like how many times a request can be ignored in favor of others. In theory, a queued request can be delayed by the drive an arbitrary amount of time while it is serving other (possibly new) requests under I/O pressure. Since the algorithms used inside drives' firmware for NCQ dispatch ordering are generally not publicly known, this introduces another level of uncertainty for hardware/firmware performance. Tests at Google around 2008 have shown that NCQ can delay an I/O for up to 1–2 seconds. A proposed workaround is for the operating system to artificially starve the NCQ queue sooner in order to satisfy low-latency applications in a timely manner.Gwendal Grignou, NCQ Emulation, FLS'08 [https://www.usenix.org/legacy/publications/login/2008-06/openpdfs/lsf08reports.pdf talk summary (p. 109)] [https://www.usenix.org/legacy/event/lsf08/tech/IO_grignou.pdf slides]

On some drives' firmware, such as the WD Raptor circa 2007, read-ahead is disabled when NCQ is enabled, resulting in slower sequential performance.{{cite web|url=https://lkml.org/lkml/2007/4/3/159 |title=Mark Lord: Re: Lower HD transfer rate with NCQ enabled? |publisher=LKML |date=2007-04-03 |access-date=2014-01-11}}

SATA solid-state drives profit significantly from being able to queue multiple commands for parallel workloads. For PCIe-based NVMe SSDs, the queue depth was even increased to support a maximum of 65,535 queues with up to 65,535 commands each.

= Safety (FUA) =

{{See also|Disk buffer#Force Unit Access (FUA)}}

One lesser-known feature of NCQ is that, unlike its ATA TCQ predecessor, it allows the host to specify whether it wants to be notified when the data reaches the disk's platters, or when it reaches the disk's buffer (on-board cache). Assuming a correct hardware implementation, this feature allows data consistency to be guaranteed when the disk's on-board cache is used in conjunction with system calls like fsync.{{cite web|author=Marshall Kirk McKusick|author-link=Marshall Kirk McKusick|url=http://queue.acm.org/detail.cfm?id=2367378 |title=Disks from the Perspective of a File System - ACM Queue |publisher=Queue.acm.org |access-date=2014-01-11}} The associated write flag, which is also borrowed from SCSI, is called Force Unit Access (FUA).{{cite book|author=Gregory Smith|title=PostgreSQL 9.0: High Performance|url=https://archive.org/details/postgresqlhighpe00smit|url-access=limited|year=2010|publisher=Packt Publishing Ltd|isbn=978-1-84951-031-8|page=[https://archive.org/details/postgresqlhighpe00smit/page/n98 78]}}http://www.seagate.com/docs/pdf/whitepaper/D2c_tech_paper_intc-stx_sata_ncq.pdf {{Bare URL PDF|date=March 2022}}{{cite web

| url = https://lwn.net/Articles/400541/

| title = The end of block barriers

| date = 2010-08-18 | access-date = 2015-06-27

| author = Jonathan Corbet | publisher = LWN.net

}}

Solid-state drives

NCQ is also used in newer solid-state drives where the drive encounters latency on the host, rather than the other way around. For example, Intel's X25-E Extreme solid-state drive uses NCQ to ensure that the drive has commands to process while the host system is busy processing CPU tasks.{{cite web|url=http://techreport.com/articles.x/15931/1|title=Intel's X25-E Extreme solid-state drive - Now with single-level cell flash memory|first=Geoff|last=Gasior|publisher=Tech Report|date=November 23, 2008}}

NCQ also enables the SSD controller to complete commands concurrently (or partly concurrently, for example using pipelines) where the internal organisation of the device enables such processing.

The NVM Express (NVMe) standard also supports command queuing, in a form optimized for SSDs.{{cite web

| url = https://www.sata-io.org/sites/default/files/documents/NVMe%20and%20AHCI%20as%20SATA%20Express%20Interface%20Options%20-%20Whitepaper_.pdf

| title = AHCI and NVMe as Interfaces for SATA Express Devices – Overview

| date = 2013-08-09 | access-date = 2013-10-02

| author = Dave Landsman | publisher = SATA-IO

}} NVMe allows multiple queues for a single controller and device, allowing at the same time much higher depths for each queue, which more closely matches how the underlying SSD hardware works.{{cite web|url=http://www.nvmexpress.org/about/nvm-express-overview/ |title=NVM Express Overview |website=nvmexpress.org |access-date=2014-11-26}}

See also

References

{{Reflist|30em}}