Phylogenomics#Establishment of Evolutionary Relationships
{{Short description|Intersection of the fields of evolution and genomics}}
Phylogenomics is the intersection of the fields of evolution and genomics.{{cite journal|title=Overview of the First Phylogenomics Conference|journal=BMC Ecol. Evol.|vauthors=Philippe H, Blanchette M|date=2007-02-08|doi=10.1186/1471-2148-7-S1-S1|volume=7|page=S1|doi-access=free |pmid=17288567 |bibcode=2007BMCEE...7S...1P |hdl=1866/709|hdl-access=free}} The term has been used in multiple ways to refer to analysis that involves genome data and evolutionary reconstructions.{{cite journal | vauthors = Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K | title = Statistics and truth in phylogenomics | journal = Molecular Biology and Evolution | volume = 29 | issue = 2 | pages = 457–472 | date = February 2012 | pmid = 21873298 | pmc = 3258035 | doi = 10.1093/molbev/msr202}} It is a group of techniques within the larger fields of phylogenetics and genomics. Phylogenomics draws information by comparing entire genomes, or at least large portions of genomes.{{cite journal | vauthors = Pennisi E | author-link=Elizabeth Pennisi | title = Evolution. Building the tree of life, genome by genome | journal = Science | volume = 320 | issue = 5884 | pages = 1716–1717 | date = June 2008 | pmid = 18583591 | doi = 10.1126/science.320.5884.1716 | s2cid = 206580993}} Phylogenetics compares and analyzes the sequences of single genes, or a small number of genes, as well as many other types of data. Four major areas fall under phylogenomics:
- Prediction of gene function
- Establishment and clarification of evolutionary relationships
- Gene family evolution
- Prediction and retracing lateral gene transfer.
The ultimate goal of phylogenomics is to reconstruct the evolutionary history of species through their genomes. This history is usually inferred from a series of genomes by using a genome evolution model and standard statistical inference methods (e.g. Bayesian inference or maximum likelihood estimation). {{Cite book| vauthors = Simion P, Delsuc F, Phillipe H |title=Phylogenetics in the Genomic Era|year=2020|url=https://hal.inria.fr/PGE|pages=2.1.1–2.1.34|chapter=2.1 To What Extent Current Limits of Phylogenomics Can Be Overcome?}}
Prediction of gene function
When Jonathan Eisen originally coined phylogenomics, it applied to prediction of gene function. Before the use of phylogenomic techniques, predicting gene function was done primarily by comparing the gene sequence with the sequences of genes with known functions. When several genes with similar sequences but differing functions are involved, this method alone is ineffective in determining function. A specific example is presented in the paper "Gastronomic Delights: A movable feast".{{cite journal | vauthors = Eisen JA, Kaiser D, Myers RM | title = Gastrogenomic delights: a movable feast | journal = Nature Medicine | volume = 3 | issue = 10 | pages = 1076–1078 | date = October 1997 | pmid = 9334711 | pmc = 3155951 | doi = 10.1038/nm1097-1076 }} Gene predictions based on sequence similarity alone had been used to predict that Helicobacter pylori can repair mismatched DNA.{{cite journal | vauthors = Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill S, Dougherty BA, Nelson K, Quackenbush J, Zhou L, Kirkness EF, Peterson S, Loftus B, Richardson D, Dodson R, Khalak HG, Glodek A, McKenney K, Fitzegerald LM, Lee N, Adams MD, Hickey EK, Berg DE, Gocayne JD, Utterback TR, Peterson JD, Kelley JM, Cotton MD, Weidman JM, Fujii C, Bowman C, Watthey L, Wallin E, Hayes WS, Borodovsky M, Karp PD, Smith HO, Fraser CM, Venter JC | display-authors = 6 | title = The complete genome sequence of the gastric pathogen Helicobacter pylori | journal = Nature | volume = 388 | issue = 6642 | pages = 539–547 | date = August 1997 | pmid = 9252185 | doi = 10.1038/41483 | doi-access = free | bibcode = 1997Natur.388..539T }} This prediction was based on the fact that this organism has a gene for which the sequence is highly similar to genes from other species in the "MutS" gene family which included many known to be involved in mismatch repair. However, Eisen noted that H. pylori lacks other genes thought to be essential for this function (specifically, members of the MutL family). Eisen suggested a solution to this apparent discrepancy – phylogenetic trees of genes in the MutS family revealed that the gene found in H. pylori was not in the same subfamily as those known to be involved in mismatch repair. Furthermore, he suggested that this "phylogenomic" approach could be used as a general method for prediction functions of genes. This approach was formally described in 1998.{{cite journal | vauthors = Eisen JA | title = Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis | journal = Genome Research | volume = 8 | issue = 3 | pages = 163–167 | date = March 1998 | pmid = 9521918 | doi = 10.1101/gr.8.3.163 | doi-access = free }} For reviews of this aspect of phylogenomics see Brown D, Sjölander K. Functional classification using phylogenomic inference.{{cite journal | vauthors = Brown D, Sjölander K | title = Functional classification using phylogenomic inference | journal = PLOS Computational Biology | volume = 2 | issue = 6 | pages = e77 | date = June 2006 | pmid = 16846248 | pmc = 1484587 | doi = 10.1371/journal.pcbi.0020077 | bibcode = 2006PLSCB...2...77B | doi-access = free }}{{cite journal | vauthors = Sjölander K | title = Phylogenomic inference of protein molecular function: advances and challenges | journal = Bioinformatics | volume = 20 | issue = 2 | pages = 170–179 | date = January 2004 | pmid = 14734307 | doi = 10.1093/bioinformatics/bth021 | doi-access = free }}
Prediction and retracing lateral gene transfer
Traditional phylogenetic techniques have difficulty establishing differences between genes that are similar because of lateral gene transfer and those that are similar because the organisms shared an ancestor. By comparing large numbers of genes or entire genomes among many species, it is possible to identify transferred genes, since these sequences behave differently from what is expected given the taxonomy of the organism. Using these methods, researchers were able to identify over 2,000 metabolic enzymes obtained by various eukaryotic parasites from lateral gene transfer.{{cite journal | vauthors = Whitaker JW, McConkey GA, Westhead DR | title = The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes | journal = Genome Biology | volume = 10 | issue = 4 | pages = R36 | year = 2009 | pmid = 19368726 | pmc = 2688927 | doi = 10.1186/gb-2009-10-4-r36 | doi-access = free }}
Gene family evolution
The comparison of complete gene sets for a group of organisms allows the identification of events in gene evolution such as gene duplication or gene deletion. Often, such events are evolutionarily relevant. For example, multiple duplications of genes encoding degradative enzymes of certain families is a common adaptation in microbes to new nutrient sources. On the contrary, loss of genes is important in reductive evolution, such as in intracellular parasites or symbionts. Whole genome duplication events, which potentially duplicate all the genes in a genome at once, are drastic evolutionary events with great relevance in the evolution of many clades, and whose signal can be traced with phylogenomic methods.
Establishment of evolutionary relationships
Traditional single-gene studies are effective in establishing phylogenetic trees among closely related organisms, but have drawbacks when comparing more distantly related organisms or microorganisms. This is because of lateral gene transfer, convergence, and varying rates of evolution for different genes. By using entire genomes in these comparisons, the anomalies created from these factors are overwhelmed by the pattern of evolution indicated by the majority of the data.{{cite journal | vauthors = Delsuc F, Brinkmann H, Philippe H | title = Phylogenomics and the reconstruction of the tree of life | journal = Nature Reviews. Genetics | volume = 6 | issue = 5 | pages = 361–375 | date = May 2005 | pmid = 15861208 | doi = 10.1038/nrg1603 | s2cid = 16379422 | citeseerx = 10.1.1.333.1615 }}Philippe H, Snell EA, Bapteste E, Lopez P, Holland PW, Casane D "Phylogenomics of eukaryotes: impact of missing data on large alignments Mol Biol Evol 2004 Sep;21(9):1740-52. .{{cite journal | vauthors = Jeffroy O, Brinkmann H, Delsuc F, Philippe H | title = Phylogenomics: the beginning of incongruence? | journal = Trends in Genetics | volume = 22 | issue = 4 | pages = 225–231 | date = April 2006 | pmid = 16490279 | doi = 10.1016/j.tig.2006.02.003 | url = https://hal.archives-ouvertes.fr/halsde-00315496/file/Jeffroy-TrendsGenet06_HAL.pdf }} Using this method, it is theoretically possible to create fully resolved phylogenetic trees, and timing constraints can be recovered more accurately.{{cite journal | vauthors = dos Reis M, Inoue J, Hasegawa M, Asher RJ, Donoghue PC, Yang Z | title = Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny | journal = Proceedings. Biological Sciences | volume = 279 | issue = 1742 | pages = 3491–3500 | date = September 2012 | pmid = 22628470 | pmc = 3396900 | doi = 10.1098/rspb.2012.0683 }}{{cite journal | vauthors = Kober KM, Bernardi G | title = Phylogenomics of strongylocentrotid sea urchins | journal = BMC Evolutionary Biology | volume = 13 | pages = 88 | date = April 2013 | issue = 1 | pmid = 23617542 | pmc = 3637829 | doi = 10.1186/1471-2148-13-88 | doi-access = free | bibcode = 2013BMCEE..13...88K }} However, in practice this is not always the case. Due to insufficient data, multiple trees can sometimes be supported by the same data when analyzed using different methods.{{cite journal|last=Philippe|first=Herve'|author2=Delsuc, Frederic |author3=Brinkmann, Henner |author4= Lartillot, Nicolas |title=Phylogenomics|journal=Annual Review of Ecology, Evolution, and Systematics|year=2005|volume=36|pages=541–562|doi=10.1146/annurev.ecolsys.35.112202.130205}}
Notable results of phylogenomics (in the sense of massive multigene phylogenies):
- Using 135 genes from 65 different species of photosynthetic organisms, it has been discovered that most of the photosynthetic eukaryotes are linked and possibly share a single ancestor. These included plants, alveolates, rhizarians, haptophytes and cryptomonads. This has been referred to as the Plants+HC+SAR megagroup. This study concatenates these genes together in what's called a "supermatrix" approach.{{cite journal | vauthors = Burki F, Shalchian-Tabrizi K, Pawlowski J | title = Phylogenomics reveals a new 'megagroup' including most photosynthetic eukaryotes | journal = Biology Letters | volume = 4 | issue = 4 | pages = 366–369 | date = August 2008 | pmid = 18522922 | pmc = 2610160 | doi = 10.1098/rsbl.2008.0224 }}
- The root of the bacterial tree of life and the extent of horizontal gene transfer was determined by tracing the evolution of 11,272 gene families. This is a "supertree" approach.{{cite journal |last1=Coleman |first1=Gareth A. |last2=Davín |first2=Adrián A. |last3=Mahendrarajah |first3=Tara A. |last4=Szánthó |first4=Lénárd L. |last5=Spang |first5=Anja |last6=Hugenholtz |first6=Philip |last7=Szöllősi |first7=Gergely J. |last8=Williams |first8=Tom A. |title=A rooted phylogeny resolves early bacterial evolution |journal=Science |date=7 May 2021 |volume=372 |issue=6542 |doi=10.1126/science.abe0511}}
- The root of the archaeal tree of life was determined using a 45-protein supermatrix analysis and a 3242-protein supertree analysis. The 31,236 gene families in archaea are then put on the tree to determine what the ancestral archaea may have.{{cite journal |last1=Williams |first1=Tom A. |last2=Szöllősi |first2=Gergely J. |last3=Spang |first3=Anja |last4=Foster |first4=Peter G. |last5=Heaps |first5=Sarah E. |last6=Boussau |first6=Bastien |last7=Ettema |first7=Thijs J. G. |last8=Embley |first8=T. Martin |title=Integrative modeling of gene and genome evolution roots the archaeal tree of life |journal=Proceedings of the National Academy of Sciences |date=6 June 2017 |volume=114 |issue=23 |doi=10.1073/pnas.1618463114}}
- Using 120 proteins from bacteria or 53 proteins from archaea (supermatrix), the Genome Taxonomy Database generates a taxonomy of all bacteria and archaea with high-quality sequenced genomes.{{cite journal |last1=Parks |first1=DH |last2=Chuvochina |first2=M |last3=Chaumeil |first3=PA |last4=Rinke |first4=C |last5=Mussig |first5=AJ |last6=Hugenholtz |first6=P |title=A complete domain-to-species taxonomy for Bacteria and Archaea. |journal=Nature Biotechnology |date=September 2020 |volume=38 |issue=9 |pages=1079–1086 |doi=10.1038/s41587-020-0501-8 |pmid=32341564 |url=https://www.researchgate.net/publication/340954053 |biorxiv=10.1101/771964|s2cid=216560589 }}
Databases
See also
- Archaeopteryx (phylogenomics software)
- Microbial phylogenetics
- Phylogenetics
- Sequence alignment
- Supertree
== References ==
{{reflist|25em}}
Further reading
{{refbegin}}
- {{cite journal |last1=Williams |first1=Tom A |last2=Davin |first2=Adrian A |last3=Szánthó |first3=Lénárd L |last4=Stamatakis |first4=Alexandros |last5=Wahl |first5=Noah A |last6=Woodcroft |first6=Ben J |last7=Soo |first7=Rochelle M |last8=Eme |first8=Laura |last9=Sheridan |first9=Paul O |last10=Gubry-Rangin |first10=Cecile |last11=Spang |first11=Anja |last12=Hugenholtz |first12=Philip |last13=Szöllősi |first13=Gergely J |title=Phylogenetic reconciliation: making the most of genomes to understand microbial ecology and evolution |journal=The ISME Journal |date=8 January 2024 |volume=18 |issue=1 |doi=10.1093/ismejo/wrae129}}
- {{cite journal |last1=Zhou |first1=Xiaofan |last2=Shen |first2=Xing-Xing |last3=Hittinger |first3=Chris Todd |last4=Rokas |first4=Antonis |title=Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets |journal=Molecular Biology and Evolution |date=1 February 2018 |volume=35 |issue=2 |pages=486–503 |doi=10.1093/molbev/msx302}} (compares RAxML/ExaML, PhyML, IQ-TREE, and FastTree)
{{refend}}
{{Phylogenetics}}