Operational taxonomic unit

{{Short description|Classification by similarity of DNA}}

{{Use dmy dates|date=April 2017}}

File:Operational Taxonomic Units.png

An operational taxonomic unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by Robert R. Sokal and Peter H. A. Sneath in the context of numerical taxonomy, where an "operational taxonomic unit" is simply the group of organisms currently being studied.Sokal & Sneath: Principles of Numerical Taxonomy, San Francisco: W.H. Freeman, 1957 In this sense, an OTU is a pragmatic definition to group individuals by similarity, equivalent to but not necessarily in line with classical Linnaean taxonomy or modern evolutionary taxonomy.

Nowadays, however, the term "OTU" is commonly used in a different context and refers to clusters of (uncultivated or unknown) organisms, grouped by DNA sequence similarity of a specific taxonomic marker gene (originally coined as mOTU; molecular OTU).{{Cite journal | last1 = Blaxter | first1 = M. | last2 = Mann | first2 = J. | last3 = Chapman | first3 = T. | last4 = Thomas | first4 = F. | last5 = Whitton | first5 = C. | last6 = Floyd | first6 = R. | last7 = Abebe | first7 = E. | title = Defining operational taxonomic units using DNA barcode data. | journal = Philos Trans R Soc Lond B Biol Sci | volume = 360 | issue = 1462 | pages = 1935–43 |date=October 2005 | doi = 10.1098/rstb.2005.1725 | pmid = 16214751 | pmc=1609233}} In other words, OTUs are pragmatic proxies for "species" (microbial or metazoan) at different taxonomic levels, in the absence of traditional systems of biological classification as are available for macroscopic organisms. For several years, OTUs have been the most commonly used units of diversity, especially when analysing small subunit 16S (for prokaryotes) or 18S rRNA (for eukaryotes{{Cite journal|last1=Sommer|first1=Stephanie A.|last2=Woudenberg|first2=Lauren Van|last3=Lenz|first3=Petra H.|last4=Cepeda|first4=Georgina|last5=Goetze|first5=Erica|date=2017|title=Vertical gradients in species richness and community composition across the twilight zone in the North Pacific Subtropical Gyre|journal=Molecular Ecology|language=en|volume=26|issue=21|pages=6136–6156|doi=10.1111/mec.14286|pmid=28792641|issn=1365-294X|hdl=11336/53966|doi-access=free|bibcode=2017MolEc..26.6136S |hdl-access=free}}) marker gene sequence datasets.

Sequences can be clustered according to their similarity to one another, and operational taxonomic units are defined based on the similarity threshold (usually 97% similarity; however also 100% similarity is common, also known as single variants{{Cite journal|last1=Porter|first1=Teresita M.|last2=Hajibabaei|first2=Mehrdad|date=2018|title=Scaling up: A guide to high-throughput genomic approaches for biodiversity analysis|journal=Molecular Ecology|language=en|volume=27|issue=2|pages=313–338|doi=10.1111/mec.14478|pmid=29292539|issn=1365-294X|doi-access=free|bibcode=2018MolEc..27..313P }}) set by the researcher. It remains debatable how well this commonly-used method recapitulates true microbial species phylogeny or ecology. Although OTUs can be calculated differently when using different algorithms or thresholds, research by Schmidt et al. (2014) demonstrated that microbial OTUs were generally ecologically consistent across habitats and several OTU clustering approaches.{{cite journal|last1=Schmidt|first1=Thomas S. B.|last2=Rodrigues|first2=João F. Matias|last3=von Mering|first3=Christian|title=Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale|journal=PLOS Comput Biol|date=24 April 2014|volume=10|issue=4|pages=e1003594|doi=10.1371/journal.pcbi.1003594|pmid=24763141|pmc=3998914|bibcode=2014PLSCB..10E3594S|issn=1553-7358 |doi-access=free }} The number of OTUs defined may be inflated due to errors in DNA sequencing.{{Cite journal | last1 = Kunin | first1 = V. | last2 = Engelbrektson | first2 = A. | last3 = Ochman | first3 = H. | last4 = Hugenholtz | first4 = P. | title = Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. | journal = Environ Microbiol | volume = 12 | issue = 1 | pages = 118–23 |date=Jan 2010 | doi = 10.1111/j.1462-2920.2009.02051.x | pmid = 19725865 | bibcode = 2010EnvMi..12..118K | url = https://digital.library.unt.edu/ark:/67531/metadc932456/ }}

OTU clustering approaches

There are three main approaches to clustering OTUs:{{cite journal | vauthors = Kopylova E, Navas-Molina JA, Mercier C, Xu ZZ, Mahé F, He Y, Zhou HW, Rognes T, Caporaso JG, Knight R | display-authors = 6 | title = Open-Source Sequence Clustering Methods Improve the State Of the Art | journal = mSystems | volume = 1 | issue = 1 | pages = e00003–15 | date = 2016-02-23 | pmid = 27822515 | pmc = 5069751 | doi = 10.1128/mSystems.00003-15 | editor-first = Nicola | editor-last = Segata }}

  • De novo, for which the clustering is based on similarities between sequencing reads.
  • Closed-reference, for which the clustering is performed against a reference database of sequences.
  • Open-reference, where clustering is first performed against a reference database of sequences, then any remaining sequences that could not be mapped to the reference are clustered de novo.

OTU clustering algorithms

  • Hierarchical clustering algorithms (HCA): uclust{{cite journal|last1=Edgar|first1=Robert C.|title=Search and clustering orders of magnitude faster than BLAST|journal=Bioinformatics|date=1 October 2010|volume=26|issue=19|pages=2460–2461|doi=10.1093/bioinformatics/btq461|pmid=20709691|language=en|issn=1367-4803|doi-access=free}} & cd-hit{{cite journal |last1=Fu |first1=Limin |last2=Niu |first2=Beifang |last3=Zhu |first3=Zhengwei |last4=Wu |first4=Sitao |last5=Li |first5=Weizhong |title=CD-HIT: accelerated for clustering the next-generation sequencing data |journal=Bioinformatics |date=1 December 2012 |volume=28 |issue=23 |pages=3150–3152 |doi=10.1093/bioinformatics/bts565 |pmid=23060610 |pmc=3516142 }} & ESPRIT
  • Bayesian clustering: CROP{{cite journal | last1 = Hao | first1 = X. | last2 = Jiang | first2 = R. | last3 = Chen | first3 = T. | year = 2011| title = Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. | journal = Bioinformatics | volume = 27 | issue = 5| pages = 611–618 | doi = 10.1093/bioinformatics/btq725 | pmid = 21233169 | pmc = 3042185 }}

See also

References

{{Reflist|2}}

Further reading

  • {{cite journal | last1 = Chen | first1 = W. | last2 = Zhang | first2 = C. K. | last3 = Cheng | first3 = Y. | last4 = Zhang | first4 = S. | last5 = Zhao | first5 = H. | year = 2013| title = A comparison of methods for clustering 16S rRNA sequences into OTUs. | journal = PLOS ONE | volume = 8 | issue = 8| page = e70837 | doi = 10.1371/journal.pone.0070837 | pmid = 23967117 | pmc = 3742672 | bibcode = 2013PLoSO...870837C | doi-access = free }}

Category:Genomics

Category:Classification systems

Category:Classification algorithms

Category:Taxonomy (biology)