minimum evolution

{{Multiple issues|

{{lead rewrite|date=June 2020|reason=Context is a slow and non standard.}}

{{COI|date=June 2020}}

}}

Minimum evolution is a distance method employed in phylogenetics modeling. It shares with maximum parsimony the aspect of searching for the phylogeny that has the shortest total sum of branch lengths.{{cite book | last=Catanzaro | first= Daniele | title = Estimating phylogenies from molecular data, in Mathematical approaches to polymer sequence analysis and related problems | publisher=Springer, New York | date = 2010 }}{{cite journal | vauthors = Catanzaro D | title = The minimum evolution problem: Overview and classification | journal = Networks | volume = 53 | issue = 2 | pages = 112–125 | date = 2009 | doi = 10.1002/net.20280 | s2cid = 6018514 }}

The theoretical foundations of the minimum evolution (ME) criterion lay in the seminal works of both Kidd and Sgaramella-Zonta (1971){{cite journal | vauthors = Kidd KK, Sgaramella-Zonta LA | title = Phylogenetic analysis: Concepts and methods | journal = American Journal of Human Genetics | volume = 23 | pages = 235–252 | date = 1971 | issue = 3 | pmid = 5089842 | pmc = 1706731 }} and Rzhetsky and Nei (1993).{{cite journal | vauthors = Rzhetsky A, Nei M | title = Theoretical foundations of the minimum evolution method of phylogenetic inference | journal = Molecular Biology and Evolution | volume = 10 | pages = 21073–1095 | date = 1993 }} In these frameworks, the molecular sequences from taxa are replaced by a set of measures of their dissimilarity (i.e., the so-called "evolutionary distances") and a fundamental result states that if such distances were unbiased estimates of the true evolutionary distances from taxa (i.e., the distances that one would obtain if all the molecular data from taxa were available), then the true phylogeny of taxa would have an expected length shorter than any other possible phylogeny T compatible with those distances.

Relationships with and comparison with other methods

= Maximum parsimony =

It is worth noting here a subtle difference between the maximum-parsimony criterion and the ME criterion: while maximum-parsimony is based on an abductive heuristic, i.e., the plausibility of the simplest evolutionary hypothesis of taxa with respect to the more complex ones, the ME criterion is based on Kidd and Sgaramella-Zonta's conjectures that were proven true 22 years later by Rzhetsky and Nei. These mathematical results set the ME criterion free from the Occam's razor principle and confer it a solid theoretical and quantitative basis.

Maximum-parsimony criterion, which uses Hamming distance branch lengths, was shown to be statistically inconsistent in 1978. This led to an interest in statistically consistent alternatives such as ME.

= Neighbor joining =

Neighbor joining may be viewed as a greedy heuristic for the balanced minimum evolution (BME) criterion. Saito and Nei's 1987 NJ algorithm far predates the BME criterion of 2000. For two decades, researchers used NJ without a firm theoretical basis for why it works.{{cite journal |author=Gascuel O, Steel M |title=Neighbor-joining revealed |journal=Mol Biol Evol |volume=23 |issue=11 |pages=1997–2000 |year=2006 |doi=10.1093/molbev/msl072 |pmid=16877499|url=https://hal-lirmm.ccsd.cnrs.fr/lirmm-00136653/document |doi-access=free }}

Statistical consistency

The ME criterion is known to be statistically consistent whenever the branch lengths are estimated via the Ordinary Least-Squares (OLS) or via linear programming.{{cite book | vauthors = Desper R, Gascuel O | title = The minimum evolution distance-based approach to phylogenetic inference in Mathematics of Evolution and Phylogeny | publisher=Oxford University Press, New York | date = 2005 }}{{cite journal | vauthors = Catanzaro D, Aringhieri R, Di Summa M, Pesenti R | title = A branch-price-and-cut algorithm for the minimum evolution problem | journal = European Journal of Operational Research | volume = 244 | issue = 3 | pages = 753–765 | date = 2015 | doi = 10.1016/j.ejor.2015.02.019 | s2cid = 1549028 }}

However, as observed in Rzhetsky & Nei's article, the phylogeny having the minimum length under the OLS branch length estimation model may be characterized, in some circumstance, by negative branch lengths, which unfortunately are empty of biological meaning.

To solve this drawback, Pauplin{{cite journal | vauthors = Pauplin Y | s2cid = 8619412 | title = Direct calculation of a tree length using a distance matrix | journal = Journal of Molecular Evolution | volume = 51 | pages = 41–47 | date = 2000| issue = 1 | doi = 10.1007/s002390010065 | pmid = 10903371 | bibcode = 2000JMolE..51...41P }} proposed to replace OLS with a new particular branch length estimation model, known as balanced basic evolution (BME). Richard Desper and Olivier Gascuel{{cite journal | vauthors = Desper R, Gascuel O | title = Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting | journal = Molecular Biology and Evolution | volume = 21 | issue = 3 | pages = 587–98 | date = March 2004 | pmid = 14694080 | doi = 10.1093/molbev/msh049 | doi-access = free }} showed that the BME branch length estimation model ensures the general statistical consistency of the minimum length phylogeny as well as the non-negativity of its branch lengths, whenever the estimated evolutionary distances from taxa satisfy the triangle inequality.

Le Sy Vinh and Arndt von Haeseler{{cite journal | vauthors = Vihn LS, von Haeseler A | title = Shortest triplet clustering: Reconstructing large phylogenies using representative sets | journal = BMC Bioinformatics | volume = 6 | pages = 1–14 | date = 2005 | doi = 10.1186/1471-2105-6-92 | pmid = 15819989 | pmc = 1097715 | doi-access = free }} have shown, by means of massive and systematic simulation experiments, that the accuracy of the ME criterion under the BME branch length estimation model is by far the highest in distance methods and not inferior to those of alternative criteria based e.g., on Maximum Likelihood or Bayesian Inference. Moreover, as shown by Daniele Catanzaro, Martin Frohn and Raffaele Pesenti,{{cite journal | vauthors = Catanzaro D, Frohn M, Pesenti R | title = An information theory perspective on the Balanced Minimum Evolution Problem | journal = Operations Research Letters | volume = 48 | issue = 3 | pages = 362–367 | date = 2020 | doi = 10.1016/j.orl.2020.04.010 | s2cid = 218998400 }} the minimum length phylogeny under the BME branch length estimation model can be interpreted as the (Pareto optimal) consensus tree between concurrent minimum entropy processes encoded by a forest of n phylogenies rooted on the n analyzed taxa. This particular information theory-based interpretation is conjectured to be shared by all distance methods in phylogenetics.

Algorithmic aspects

The "minimum evolution problem" (MEP), in which a minimum-summed-length phylogeny is derived from a set of sequences under the ME criterion, is said to be NP-hard.{{cite journal | vauthors = Catanzaro D, Labbé M, Pesenti R, Salazar-González JJ | title = Mathematical models to reconstruct phylogenetic trees under the minimum evolution criterion | journal = Networks | volume = 53 | issue = 2 | pages = 126–140 | date = 2009 | doi = 10.1002/net.20281 | s2cid = 17792339 }}{{cite journal | vauthors = Catanzaro D, Aringhieri R, Di Summa M, Pesenti R | title = A branch-price-and-cut algorithm for the minimum evolution problem | journal = European Journal of Operational Research | volume = 244 | issue = 3 | pages = 753–765 | date = 2015 | doi = 10.1016/j.ejor.2015.02.019 | s2cid = 1549028 }} The "balanced minimum evolution problem" (BMEP), which uses the newer BME criterion, is APX-hard.{{cite journal |last1=Catanzaro |first1=Daniele |last2=Frohn |first2=Martin |last3=Gascuel |first3=Olivier |last4=Pesenti |first4=Raffaele |title=A tutorial on the balanced minimum evolution problem |journal=European Journal of Operational Research |date=July 2022 |volume=300 |issue=1 |pages=1–19 |doi=10.1016/j.ejor.2021.08.004|doi-access=free |hdl=10278/3742270 |hdl-access=free }}

A number of exact algorithms solving BMEP have been described.{{cite journal | vauthors = Aringhieri R, Catanzaro D, Di Summa M | title = Optimal solutions for the balanced minimum evolution problem | journal = Computers and Operations Research | volume = 38 | issue = 12 | pages = 1845–1854 | date = 2011 | doi = 10.1016/j.cor.2011.02.020 | hdl = 2318/86826 | s2cid = 9514013 | hdl-access = free }}{{cite journal | vauthors = Catanzaro D, Labbé M, Pesenti R, Salazar-González JJ | title = The balanced minimum evolution problem | journal = INFORMS Journal on Computing | volume = 24 | issue = 2 | pages = 276–294 | date = 2012 | doi = 10.1287/ijoc.1110.0455 }}{{cite journal | vauthors = Catanzaro D, Labbé M, Pesenti R | title = The balanced minimum evolution problem under uncertain data | journal = Discrete Applied Mathematics | volume = 161 | issue = 13–14 | pages = 1789–1804 | date = 2013 | doi = 10.1016/j.dam.2013.03.012 | doi-access = free }}{{cite journal | vauthors = Catanzaro D, Pesenti R | title = Enumerating Vertices of the Balanced Minimum Evolution Polytope | journal = Computers and Operations Research | volume = 109 | pages = 209–217 | date = 2019 | doi = 10.1016/j.cor.2019.05.001 | s2cid = 164835227 }} The best known exact algorithm{{cite journal | vauthors = Catanzaro D, Pesenti R, Wolsey L | title = On the Balanced Minimum Evolution Polytope | journal = Discrete Optimization | volume = 36 | date = 2020 | page = 100570 | doi = 10.1016/j.disopt.2020.100570 | s2cid = 213389485 }} remains impractical for more than a dozen taxa, even with multiprocessing. There is only one approximation algorithm with proven error bounds, published in 2012.

In practical use, BMEP is overwhelmingly implemented by heuristic search. The basic, aforementioned neighbor-joining algorithm implements a greedy version of BMEP. FastME, the "state-of-the-art", starts with a rough tree then improves it using a set of topological moves such as Nearest Neighbor Interchanges (NNI). Compared to NJ, it is about as fast and more accurate.{{cite web |title=ATGC: FastME |url=http://www.atgc-montpellier.fr/fastme/ |website=www.atgc-montpellier.fr}} Metaheuristics have also been used.{{cite journal | vauthors = Catanzaro D, Pesenti R, Milinkovitch MC | title = An ant colony optimization algorithm for phylogenetic estimation under the minimum evolution principle | journal = BMC Evolutionary Biology | volume = 7 | pages = 228 | date = 2007| doi = 10.1186/1471-2148-7-228 | pmid = 18005416 | pmc = 2211314 | doi-access = free }}

See also

References

{{reflist}}

Further reading

  • {{cite journal | vauthors = Catanzaro D, Pesenti R, Wolsey L | title = On the Balanced Minimum Evolution Polytope | journal = Discrete Optimization | volume = 36 | date = 2020| page = 100570 | doi = 10.1016/j.disopt.2020.100570 | s2cid = 213389485 }}

{{Phylogenetics}}

{{DEFAULTSORT:Minimum Evolution}}

Category:Phylogenetics

Category:Computational phylogenetics