supertree
{{Short description|Phylogenetic tree combining multiple sub-trees}}
A supertree is a single phylogenetic tree assembled from a combination of smaller phylogenetic trees, which may have been assembled using different datasets (e.g. morphological and molecular) or a different selection of taxa.{{Cite journal| last1 = Bansal | first1 = M.| last2 = Burleigh | first2 = J.| last3 = Eulenstein | first3 = O.| last4 = Fernández-Baca | first4 = D.| title = Robinson-Foulds supertrees| journal = Algorithms for Molecular Biology| volume = 5| pages = 18| year = 2010| pmid = 20181274| pmc = 2846952| doi = 10.1186/1748-7188-5-18| doi-access = free}} Supertree algorithms can highlight areas where additional data would most usefully resolve any ambiguities.{{cite web|url=http://genome.cs.iastate.edu/supertree/introduction/intro_content.html|title=Supertree: Introduction|publisher=genome.cs.iastate.edu}} The input trees of a supertree should behave as samples from the larger tree.{{Cite journal| last1 = Gordon | first1 = A.| title = Consensus supertrees: the synthesis of rooted trees containing overlapping sets of labeled leaves| journal = Journal of Classification| volume = 3| issue = 2| pages = 335–348| year = 1986| doi = 10.1007/BF01894195| s2cid = 122146129}}
Construction methods
The construction of a supertree scales exponentially with the number of taxa included; therefore for a tree of any reasonable size, it is not possible to examine every possible supertree and weigh its success at combining the input information. Heuristic methods are thus essential, although these methods may be unreliable; the result extracted is often biased or affected by irrelevant characteristics of the input data.
The most well known method for supertree construction is Matrix Representation with Parsimony (MRP), in which the input source trees are represented by matrices with 0s, 1s, and ?s (i.e., each edge in each source tree defines a bipartition of the leafset into two disjoint parts, and the leaves on one side get 0, the leaves on the other side get 1, and the missing leaves get ?), and the matrices are concatenated and then analyzed using heuristics for maximum parsimony.{{Cite journal|title = Phylogenetic inference based on matrix representation of trees| journal = Molecular Phylogenetics and Evolution | author = Mark A. Ragan |volume = 1|
number = 1 |
pages = 53–58 |
year = 1992|
issn = 1055-7903 |
doi = 10.1016/1055-7903(92)90035-F| pmid = 1342924 }}
Another approach for supertree construction include a maximum likelihood version of MRP called "MRL" (matrix representation with likelihood), which analyzes the same MRP matrix but uses heuristics for maximum likelihood instead of for maximum parsimony to construct the supertree.
The Robinson-Foulds distance is the most popular of many ways of measuring how similar a supertree is to the input trees. It is a metric for the number of clades from the input trees that are retained in the supertree. Robinson-Foulds optimization methods search for a supertree that minimizes the total (summed) Robinson-Foulds differences between the (binary) supertree and each input tree. In this case the supertree can hence be view as the median of the input tree according to the Robinson-Foulds distance. Alternative approaches have been developed to infer median supertree based on different metrics, e.g. relying on triplet or quartet decomposition of the trees.{{Cite journal |last1=Ranwez |first1=Vincent |last2=Criscuolo |first2=Alexis |last3=Douzery |first3=Emmanuel J.P. |date=2010-06-15 |title=S uper T riplets : a triplet-based supertree approach to phylogenomics |url=https://academic.oup.com/bioinformatics/article/26/12/i115/283753 |journal=Bioinformatics |language=en |volume=26 |issue=12 |pages=i115–i123 |doi=10.1093/bioinformatics/btq196 |issn=1367-4811 |pmc=2881381 |pmid=20529895}}
A recent innovation has been the construction of Maximum Likelihood supertrees and the use of "input-tree-wise" likelihood scores to perform tests of two supertrees.{{Cite journal|title = L.U.St: a tool for approximated maximum likelihood supertree reconstruction|journal = BMC Bioinformatics|date = 2014-06-12|issn = 1471-2105|pmc = 4073192|pmid = 24925766|pages = 183|volume = 15|issue = 1|doi = 10.1186/1471-2105-15-183|first1 = Wasiu A.|last1 = Akanni|first2 = Christopher J.|last2 = Creevey|first3 = Mark|last3 = Wilkinson|first4 = Davide|last4 = Pisani | doi-access=free }}
Additional methods include the Min Cut Supertree approach,{{Cite journal| last1 = Semple | first1 = C.| title = A supertree method for rooted trees| journal = Discrete Applied Mathematics| volume = 105| issue = 1–3| pages = 147–158| year = 2000| doi = 10.1016/S0166-218X(00)00202-X| doi-access = free| citeseerx = 10.1.1.24.6784}} Most Similar Supertree Analysis (MSSA), Distance Fit (DFIT) and Quartet Fit (QFIT), implemented in the software CLANN.{{Cite journal|title = Clann: investigating phylogenetic information through supertree analyses|url = http://bioinformatics.oxfordjournals.org/content/21/3/390|journal = Bioinformatics|date = 2005-02-01|issn = 1367-4803|pmid = 15374874|pages = 390–392|volume = 21|issue = 3|doi = 10.1093/bioinformatics/bti020|first1 = C. J.|last1 = Creevey|first2 = J. O.|last2 = McInerney|doi-access = free}}{{Cite book|journal = |volume = 537|pages = 139–161|publisher = Humana Press|date = 2009-01-01|isbn = 978-1-58829-910-9|series = Methods in Molecular Biology|doi = 10.1007/978-1-59745-251-9_7|pmid = 19378143|editor-first = David|editor-last = Posada|chapter = Trees from Trees: Construction of Phylogenetic Supertrees Using Clann|last1 = Creevey|first1 = C. J.|last2 = McInerney|first2 = J. O.| title=Bioinformatics for DNA Sequence Analysis |chapter-url = http://pure.aber.ac.uk/ws/files/2544220/Creevey_McInerney_2009_Trees_from_trees_construction_of_phylogenetic_supertrees_using_clann.pdf}}
Application
Supertrees have been applied to produce phylogenies of many groups, notably the angiosperms,{{Cite journal
| last1 = Davies | first1 = T.
| last2 = Barraclough | first2 = T.
| last3 = Chase | first3 = M. |author3-link=Mark Wayne Chase
| last4 = Soltis | first4 = P. |author4-link=Pamela S. Soltis
| last5 = Soltis | first5 = D. |author5-link=Douglas E. Soltis
| last6 = Savolainen | first6 = V. |author6-link=Vincent Savolainen
| title = Darwin's abominable mystery: Insights from a supertree of the angiosperms
| journal = Proceedings of the National Academy of Sciences of the United States of America
| volume = 101
| issue = 7
| pages = 1904–1909
| year = 2004
| pmid = 14766971
| pmc = 357025
| doi = 10.1073/pnas.0308127100
|bibcode = 2004PNAS..101.1904D | doi-access = free
| last1 = Pisani | first1 = D.
| last2 = Cotton | first2 = J.
| last3 = McInerney | first3 = J.
| title = Supertrees disentangle the chimerical origin of eukaryotic genomes
| journal = Molecular Biology and Evolution
| volume = 24
| issue = 8
| pages = 1752–1760
| year = 2007
| pmid = 17504772
| doi = 10.1093/molbev/msm095
| doi-access = free
}} and mammals.{{Cite journal
| last1 = Bininda-Emonds | first1 = O.
| last2 = Cardillo | first2 = M.
| last3 = Jones | first3 = K.
| last4 = MacPhee | first4 = R.
| last5 = Beck | first5 = R.
| last6 = Grenyer | first6 = R.
| last7 = Price | first7 = S.
| last8 = Vos | first8 = R.
| last9 = Gittleman | first9 = J.
| last10 = Purvis | first10 = A.
| title = The delayed rise of present-day mammals
| journal = Nature
| volume = 446
| issue = 7135
| pages = 507–512
| year = 2007
| pmid = 17392779
| doi = 10.1038/nature05634
|bibcode = 2007Natur.446..507B | s2cid = 4314965
}} They have also been applied to larger-scale problems such as the origins of diversity, vulnerability to extinction,{{Cite journal| last1 = Davies | first1 = T.| last2 = Fritz | first2 = S.| last3 = Grenyer | first3 = R.| last4 = Orme | first4 = C.| last5 = Bielby | first5 = J.| last6 = Bininda-Emonds | first6 = O.| last7 = Cardillo | first7 = M.| last8 = Jones | first8 = K.| last9 = Gittleman | first9 = J.| last10 = Mace | first10 = G. M.| last11 = Purvis | first11 = A.| title = Phylogenetic trees and the future of mammalian biodiversity| journal = Proceedings of the National Academy of Sciences of the United States of America| volume = 105 Suppl 1| issue = Supplement_1| pages = 11556–11563| year = 2008| pmid = 18695230| pmc = 2556418| doi = 10.1073/pnas.0801917105|bibcode = 2008PNAS..10511556D | doi-access = free}} and evolutionary models of ecological structure.{{Cite journal| last1 = Webb | first1 = C. O.| last2 = Ackerly | first2 = D. D.| last3 = McPeek | first3 = M. A.| last4 = Donoghue | first4 = M. J.| title = Phylogenies and Community Ecology| journal = Annual Review of Ecology and Systematics| volume = 33| pages = 475–505| year = 2002| doi = 10.1146/annurev.ecolsys.33.010802.150448| s2cid = 535590}}
Further reading
- {{cite book
| url = https://books.google.com/books?id=8w8__RqKneQC&q=supertree&pg=PR13
| title = Phylogenetic supertrees: combining information to reveal the tree of life
| isbn = 978-1-4020-2328-6
| author1 = Bininda-Emonds, O. R. P
| year = 2004| publisher = Springer
}}
- {{Cite journal| last1 = Bininda-Emonds | first1 = O. R. P.| last2 = Gittleman | first2 = J. L.| last3 = Steel | first3 = M. A. | author3-link=Mike Steel (mathematician)| title = The (Super)Tree of Life: Procedures, Problems, and Prospects | jstor = 3069263 | journal = Annual Review of Ecology and Systematics | volume = 33 | pages = 265–289 | year = 2002 | doi = 10.1146/annurev.ecolsys.33.010802.150511}}