topological data analysis

{{short description|Analysis of datasets using techniques from topology}}

In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.{{citation needed|date=September 2022}}

The initial motivation is to study the shape of data. TDA has combined algebraic topology and other tools from pure mathematics to allow mathematically rigorous study of "shape". The main tool is persistent homology, an adaptation of homology to point cloud data. Persistent homology has been applied to many types of data across many fields. Moreover, its mathematical foundation is also of theoretical importance. The unique features of TDA make it a promising bridge between topology and geometry.{{citation needed|date=September 2022}}

Basic theory

= Intuition =

TDA is premised on the idea that the shape of data sets contains relevant information. Real high-dimensional data is typically sparse, and tends to have relevant low dimensional features. One task of TDA is to provide a precise characterization of this fact. For example, the trajectory of a simple predator-prey system governed by the Lotka–Volterra equations{{Cite journal|title = Topological data analysis|journal = Inverse Problems|date = 2011-12-01|volume = 27|issue = 12|doi = 10.1088/0266-5611/27/12/120201|first1 = Charles|last1 = Epstein|author1-link=Charles Epstein (mathematician)|first2 = Gunnar|last2 = Carlsson|author2-link=Gunnar Carlsson|first3 = Herbert|last3 = Edelsbrunner|author3-link=Herbert Edelsbrunner|pages=120201|arxiv = 1609.08227|bibcode = 2011InvPr..27a0101E| s2cid=250913810 }} forms a closed circle in state space. TDA provides tools to detect and quantify such recurrent motion.{{Cite web|title=diva-portal.org/smash/record.jsf?pid=diva2%253A575329&dswid=4297 |url=http://www.diva-portal.org/smash/record.jsf?pid=diva2%253A575329&dswid=4297 |website=www.diva-portal.org |access-date=2015-11-05 |url-status=dead |archive-url=https://web.archive.org/web/20151119021029/http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A575329 |archive-date=November 19, 2015 }}

Many algorithms for data analysis, including those used in TDA, require setting various parameters. Without prior domain knowledge, the correct collection of parameters for a data set is difficult to choose. The main insight of persistent homology is to use the information obtained from all parameter values by encoding this huge amount of information into an understandable and easy-to-represent form. With TDA, there is a mathematical interpretation when the information is a homology group. In general, the assumption is that features that persist for a wide range of parameters are "true" features. Features persisting for only a narrow range of parameters are presumed to be noise, although the theoretical justification for this is unclear.

= Early history =

Precursors to the full concept of persistent homology appeared gradually over time.{{cite book |last1=Edelsbrunner |first1=H. |first2=D. |last2=Morozov |chapter=Persistent Homology |chapter-url=https://www.taylorfrancis.com/chapters/edit/10.1201/9781315119601-24/persistent-homology-herbert-edelsbrunner-dmitriy-morozov |title=Handbook of Discrete and Computational Geometry |publisher=CRC |edition=3rd |date=2017 |isbn=9781315119601 |doi=10.1201/9781315119601 |url=https://www.taylorfrancis.com/books/edit/10.1201/9781315119601/handbook-discrete-computational-geometry-jacob-goodman-joseph-rourke-csaba-toth |editor1=Csaba D. Toth|editor2= Joseph O'Rourke|editor3= Jacob E. Goodman }} In 1990, Patrizio Frosini introduced a pseudo-distance between submanifolds, and later the size function, which on 1-dim curves is equivalent to the 0th persistent homology.{{Cite journal|title = A distance for similarity classes of submanifolds of a Euclidean space|journal = Bulletin of the Australian Mathematical Society|date = 1990-12-01|issn = 1755-1633|pages = 407–415|volume = 42|issue = 3|doi = 10.1017/S0004972700028574|first = Patrizio|last = Frosini|doi-access = free}}{{Cite journal|title = Measuring shapes by size functions|first = Patrizio|last = Frosini| editor-first1=David P. | editor-last1=Casasent | journal = Proc. SPIE, Intelligent Robots and Computer Vision X: Algorithms and Techniques| series=Intelligent Robots and Computer Vision X: Algorithms and Techniques |doi = 10.1117/12.57059|volume = 1607 |pages=122–133|date =1992| bibcode=1992SPIE.1607..122F | s2cid=121295508 }} Nearly a decade later, Vanessa Robins studied the images of homomorphisms induced by inclusion.Robins V. Towards computing homology from finite approximations[C]//Topology proceedings. 1999, 24(1): 503-532. Finally, shortly thereafter, Herbert Edelsbrunner et al. introduced the concept of persistent homology together with an efficient algorithm and its visualization as a persistence diagram.{{Cite journal|title = Topological Persistence and Simplification|journal = Discrete & Computational Geometry|date = 2002-11-01|issn = 0179-5376|pages = 511–533|volume = 28|issue = 4|doi = 10.1007/s00454-002-2885-2|last1 = Edelsbrunner|last2 = Letscher|last3 = Zomorodian|doi-access = free}} Gunnar Carlsson et al. reformulated the initial definition and gave an equivalent visualization method called persistence barcodes,{{Cite journal|title = Persistence barcodes for shapes|journal = International Journal of Shape Modeling|date = 2005-12-01|issn = 0218-6543|pages = 149–187|volume = 11|issue = 2|doi = 10.1142/S0218654305000761|first1 = Gunnar|last1 = Carlsson|first2 = Afra|last2 = Zomorodian|first3 = Anne|last3 = Collins|first4 = Leonidas J.|last4 = Guibas|citeseerx = 10.1.1.5.2718}} interpreting persistence in the language of commutative algebra.

In algebraic topology the persistent homology has emerged through the work of Sergey Barannikov on Morse theory. The set of critical values of smooth Morse function was canonically partitioned into pairs "birth-death", filtered complexes were classified, their invariants, equivalent to persistence diagram and persistence barcodes, together with the efficient algorithm for their calculation, were described under the name of canonical forms in 1994 by Barannikov.{{Cite journal|title = Framed Morse complex and its invariants |url = https://www.researchgate.net/publication/267672645 |journal = Advances in Soviet Mathematics | date = 1994|pages = 93–115|volume = 21|first = Sergey|last = Barannikov |series = ADVSOV |doi=10.1090/advsov/021/03|isbn = 9780821802373 |s2cid = 125829976 }}{{Cite web |url=https://events.berkeley.edu/?event_ID=121726&date=2018-11-29&tab=academic |title=UC Berkeley Mathematics Department Colloquium: Persistent homology and applications from PDE to symplectic topology |publisher=events.berkeley.edu |access-date=2021-03-27 |archive-date=2021-04-18 |archive-url=https://web.archive.org/web/20210418211352/https://events.berkeley.edu/?event_ID=121726&date=2018-11-29&tab=academic |url-status=dead }}

= Concepts =

Some widely used concepts are introduced below. Note that some definitions may vary from author to author.

A point cloud is often defined as a finite set of points in some Euclidean space, but may be taken to be any finite metric space.

The Čech complex of a point cloud is the nerve of the cover of balls of a fixed radius around each point in the cloud.

A persistence module $\mathbb{U}$ indexed by $\Z$ is a vector space $U_t$ for each $t \in \Z$ , and a linear map $u_t^s\colon U_s \to U_t$ whenever $s \leq t$ , such that $u_t^t=1$ for all $t$ and $u_t^su_s^r=u_t^r$ whenever $r \leq s \leq t.$ {{Cite book|publisher = ACM|date = 2009-01-01 |isbn = 978-1-60558-501-7|pages = 237–246|series = SCG '09|doi = 10.1145/1542362.1542407|first1 = Frédéric|last1 = Chazal|first2 = David|last2 = Cohen-Steiner|first3 = Marc|last3 = Glisse|first4 = Leonidas J.|last4 = Guibas|first5 = Steve Y.|last5 = Oudot| title=Proceedings of the twenty-fifth annual symposium on Computational geometry | chapter=Proximity of persistence modules and their diagrams |s2cid = 840484|citeseerx = 10.1.1.473.2112}} An equivalent definition is a functor from $\mathbb{Z}$ considered as a partially ordered set to the category of vector spaces.

The persistent homology group $PH$ of a point cloud is the persistence module defined as $PH_k(X)= \prod H_k(X_r)$ , where $X_r$ is the Čech complex of radius $r$ of the point cloud $X$ and $H_k$ is the homology group.

A persistence barcode is a multiset of intervals in $\R$ , and a persistence diagram is a multiset of points in $\Delta$ ( $:= \{(u,v) \in \R^2\mid u,v \geq 0, u \leq v\}$ ).

The Wasserstein distance between two persistence diagrams $X$ and $Y$ is defined as $W_p[L_q](X,Y):= \inf_{\varphi: X \to Y} \left[ \sum_{x \in X} (\Vert x-\varphi(x)\Vert _q)^p\right]^{1/p}$ where $1 \leq p,q \leq \infty$ and $\varphi$ ranges over bijections between $X$ and $Y$ . Please refer to figure 3.1 in Munch {{cite thesis |last=Munch |first=E. |title=Applications of persistent homology to time varying systems |publisher=Duke University |date=2013 |isbn=9781303019128 |hdl=10161/7180}} for illustration.

The bottleneck distance between $X$ and $Y$ is $W_{\infty}[L_q](X,Y):= \inf_{\varphi: X \to Y} \sup_{x \in X} \Vert x-\varphi(x)\Vert _q.$ This is a special case of Wasserstein distance, letting $p=\infty$ .

= Basic property =

== Structure theorem ==

The first classification theorem for persistent homology appeared in 1994 via Barannikov's canonical forms. The classification theorem interpreting persistence in the language of commutative algebra appeared in 2005:{{Cite journal|title = Computing Persistent Homology|journal = Discrete & Computational Geometry|date = 2004-11-19|issn = 0179-5376|pages = 249–274|volume = 33|issue = 2|doi = 10.1007/s00454-004-1146-y|first1 = Afra|last1 = Zomorodian|first2 = Gunnar|last2 = Carlsson|doi-access = free}} for a finitely generated persistence module $C$ with field $F$ coefficients,

$H(C;F) \simeq \bigoplus_i x^{t_i} \cdot F[x] \oplus \left(\bigoplus_j x^{r_j} \cdot (F[x]/(x^{s_j}\cdot F[x]))\right).$

Intuitively, the free parts correspond to the homology generators that appear at filtration level $t_i$ and never disappear, while the torsion parts correspond to those that appear at filtration level $r_j$ and last for $s_j$ steps of the filtration (or equivalently, disappear at filtration level $s_j+r_j$ ).

Persistent homology is visualized through a barcode or persistence diagram. The barcode has its root in abstract mathematics. Namely, the category of finite filtered complexes over a field is semi-simple. Any filtered complex is isomorphic to its canonical form, a direct sum of one- and two-dimensional simple filtered complexes.

== Stability ==

Stability is desirable because it provides robustness against noise. If $X$ is any space which is homeomorphic to a simplicial complex, and $f,g:X\to \mathbb{R}$ are continuous tame{{cite book|last1=Shikhman|first1=Vladimir|title=Topological Aspects of Nonsmooth Optimization|date=2011|publisher=Springer |isbn=9781461418979|pages=169–170|url=https://books.google.com/books?id=_D4FL3i4vIQC&q=%22tame+set%22+semialgebraic&pg=PA169|access-date=22 November 2017|language=en}} functions, then the persistence vector spaces $\{H_k(f^{-1}([0,r]))\}$ and $\{H_k(g^{-1}([0,r]))\}$ are finitely presented, and $W_{\infty}(D(f),D(g)) \leq \lVert f-g \rVert_{\infty}$ , where $W_{\infty}$ refers to the bottleneck distance{{Cite journal|title = Stability of Persistence Diagrams|journal = Discrete & Computational Geometry|date = 2006-12-12|issn = 0179-5376|pages = 103–120|volume = 37|issue = 1|doi = 10.1007/s00454-006-1276-5|first1 = David|last1 = Cohen-Steiner|first2 = Herbert|last2 = Edelsbrunner|author2-link=Herbert Edelsbrunner|first3 = John|last3 = Harer|doi-access = free}} and $D$ is the map taking a continuous tame function to the persistence diagram of its $k$ -th homology.

= Workflow =

The basic workflow in TDA is:{{Cite journal|title = Barcodes: The persistent topology of data|journal = Bulletin of the American Mathematical Society|date = 2008-01-01|issn = 0273-0979|pages = 61–75|volume = 45|issue = 1|doi = 10.1090/S0273-0979-07-01191-3|first = Robert|last = Ghrist|doi-access = free}}

class="wikitable"

|point cloud

| $\to$

|nested complexes

| $\to$

|persistence module

| $\to$

|barcode or diagram

If $X$ is a point cloud, replace $X$ with a nested family of simplicial complexes $X_{r}$ (such as the Čech or Vietoris-Rips complex). This process converts the point cloud into a filtration of simplicial complexes. Taking the homology of each complex in this filtration gives a persistence module $H_i(X_{r_0})\to H_i(X_{r_1}) \to H_i(X_{r_2}) \to \cdots$
Apply the structure theorem to obtain the persistent Betti numbers, persistence diagram, or equivalently, barcode.

Graphically speaking, File:Illustration of Typical Workflow in TDA.jpeg

Computation

The first algorithm over all fields for persistent homology in algebraic topology setting was described by Barannikov through reduction to the canonical form by upper-triangular matrices. The algorithm for persistent homology over $F_2$ was given by Edelsbrunner et al. Afra Zomorodian and Carlsson gave the practical algorithm to compute persistent homology over all fields. Edelsbrunner and Harer's book gives general guidance on computational topology.{{harvnb|Edelsbrunner|Harer|2010}}

One issue that arises in computation is the choice of complex. The Čech complex and the Vietoris–Rips complex are most natural at first glance; however, their size grows rapidly with the number of data points. The Vietoris–Rips complex is preferred over the Čech complex because its definition is simpler and the Čech complex requires extra effort to define in a general finite metric space. Efficient ways to lower the computational cost of homology have been studied. For example, the α-complex and witness complex are used to reduce the dimension and size of complexes.{{Cite book|contribution = Topological estimation using witness complexes|publisher = Eurographics Association|title = SPBG'04 Symposium on Point - Based Graphics 2004|date = 2004-01-01|location = Aire-la-Ville, Switzerland, Switzerland|isbn = 978-3-905673-09-8|pages = 157–166|doi = 10.2312/SPBG/SPBG04/157-166|first1 = Vin|last1 = De Silva|first2 = Gunnar|last2 = Carlsson| s2cid=2928987 }}

Recently, Discrete Morse theory has shown promise for computational homology because it can reduce a given simplicial complex to a much smaller cellular complex which is homotopic to the original one.{{Cite journal|title = Morse Theory for Filtrations and Efficient Computation of Persistent Homology|journal = Discrete & Computational Geometry|date = 2013-07-27|issn = 0179-5376|pages = 330–353|volume = 50|issue = 2|doi = 10.1007/s00454-013-9529-6|first1 = Konstantin|last1 = Mischaikow|first2 = Vidit|last2 = Nanda|doi-access = free}} This reduction can in fact be performed as the complex is constructed by using matroid theory, leading to further performance increases.{{cite arXiv|last1=Henselman|first1=Gregory|last2=Ghrist|first2=Robert|author2-link=Robert Ghrist|title=Matroid Filtrations and Computational Persistent Homology|date=2016|class=math.AT |eprint=1606.00199}} Another recent algorithm saves time by ignoring the homology classes with low persistence.{{Cite journal|title = An output-sensitive algorithm for persistent homology|journal = Computational Geometry|date = 2013-05-01|pages = 435–447|volume = 46|series = 27th Annual Symposium on Computational Geometry (SoCG 2011)|issue = 4|doi = 10.1016/j.comgeo.2012.02.010|first1 = Chao|last1 = Chen|first2 = Michael|last2 = Kerber|doi-access = free}}

Various software packages are available, such as [http://appliedtopology.github.io/javaplex/ javaPlex], [http://www.mrzv.org/software/dionysus/ Dionysus], [http://www.sas.upenn.edu/~vnanda/perseus/index.html Perseus], [https://archive.today/20130629100858/http://phat.googlecode.com/ PHAT], [https://github.com/DIPHA/dipha/ DIPHA], [https://project.inria.fr/gudhi/software/ GUDHI], [https://github.com/Ripser/ripser Ripser], and [https://CRAN.R-project.org/package=TDAstats TDAstats]. A comparison between these tools is done by Otter et al.{{cite journal|title = A roadmap for the computation of persistent homology|journal= EPJ Data Science|volume= 6|arxiv= 1506.08903|date = 2015-06-29|first1 = Nina|last1 = Otter|first2 = Mason A.|last2 = Porter|first3 = Ulrike|last3 = Tillmann|first4 = Peter|last4 = Grindrod|first5 = Heather A.|last5 = Harrington|issue= 1|pages= 17|author5-link=Heather Harrington|doi= 10.1140/epjds/s13688-017-0109-5|pmid= 32025466|pmc= 6979512|bibcode= 2015arXiv150608903O}} [https://github.com/giotto-ai/giotto-tda Giotto-tda] is a Python package dedicated to integrating TDA in the machine learning workflow by means of a scikit-learn [https://scikit-learn.org/stable/] API. An R package [https://cran.r-project.org/web/packages/TDA/index.html TDA] is capable of calculating recently invented concepts like landscape and the kernel distance estimator.{{cite arXiv|title = Introduction to the R package TDA|eprint= 1411.1830|date = 2014-11-07|first1 = Brittany Terese|last1 = Fasy|first2 = Jisu|last2 = Kim|first3 = Fabrizio|last3 = Lecci|first4 = Clément|last4 = Maria|class= cs.MS}} The [https://topology-tool-kit.github.io/ Topology ToolKit] is specialized for continuous data defined on manifolds of low dimension (1, 2 or 3), as typically found in scientific visualization. [https://bitbucket.org/hubwag/cubicle Cubicle] is optimized for large (gigabyte-scale) grayscale image data in dimension 1, 2 or 3 using cubical complexes and discrete Morse theory. Another R package, [https://CRAN.R-project.org/package=TDAstats TDAstats], uses the Ripser library to calculate persistent homology.{{Cite journal|title = TDAstats: R pipeline for computing persistent homology in topological data analysis|journal = Journal of Open Source Software|date = 2018|pages=860|volume = 3|issue = 28|doi = 10.21105/joss.00860|first1 = Raoul|last1 = Wadhwa|first2 = Drew|last2 = Williamson|first3 = Andrew|last3 = Dhawan|first4 = Jacob|last4 = Scott|pmid = 33381678|pmc = 7771879|bibcode = 2018JOSS....3..860R|doi-access = free}}

Visualization

High-dimensional data is impossible to visualize directly. Many methods have been invented to extract a low-dimensional structure from the data set, such as principal component analysis and multidimensional scaling.{{cite journal |last1=Liu |first1=S. |last2=Maljovec |first2=D. |last3=Wang |first3=B. |last4=Bremer |first4=P.T. |last5=Pascucci |first5=V. |title=Visualizing high-dimensional data: Advances in the past decade |journal=IEEE Transactions on Visualization and Computer Graphics |volume=23 |issue=3 |pages=1249–68 |date=2016 |doi=10.1109/TVCG.2016.2640960 |pmid=28113321|s2cid=745262 |doi-access=free }} However, it is important to note that the problem itself is ill-posed, since many different topological features can be found in the same data set. Thus, the study of visualization of high-dimensional spaces is of central importance to TDA, although it does not necessarily involve the use of persistent homology. However, recent attempts have been made to use persistent homology in data visualization.

Carlsson et al. have proposed a general method called MAPPER.{{cite book |first1=G. |last1=Singh |first2=F. |last2=Mémoli |first3=G. |last3=Carlsson |chapter=Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition |chapter-url=https://research.math.osu.edu/tgda/mapperPBG.pdf |title=Point-based graphics 2007 : Eurographics/IEEE VGTC symposium proceedings |date=2007 |isbn=9781568813660 |doi=10.2312/SPBG/SPBG07/091-100 |s2cid=5703368 }} It inherits the idea of Jean-Pierre Serre that a covering preserves homotopy.{{Cite book |title=Differential Forms in Algebraic Topology |url=https://books.google.com/books?id=COuPBAAAQBAJ |publisher=Springer |date=2013-04-17 |isbn=978-1-4757-3951-0 |first1=Raoul |last1=Bott |author1-link=Raoul Bott|first2=Loring W. |last2=Tu|author2-link=Loring W. Tu}} A generalized formulation of MAPPER is as follows:

Let $X$ and $Z$ be topological spaces and let $f\colon X\to Z$ be a continuous map. Let $\mathbb{U} = \{U_{\alpha}\}_{\alpha \in A}$ be a finite open covering of $Z$ . The output of MAPPER is the nerve of the pullback cover $M(\mathbb{U},f):=N(f^{-1}(\mathbb{U}))$ , where each preimage is split into its connected components.{{cite arXiv |title=Mutiscale Mapper: A Framework for Topological Summarization of Data and Maps |eprint=1504.03763 |date=2015-04-14 |first1=Tamal K. |last1=Dey|author1-link=Tamal Dey |first2=Facundo |last2=Memoli |first3=Yusu |last3=Wang |author3-link=Yusu Wang |class=cs.CG}} This is a very general concept, of which the Reeb graph{{Cite journal |title=Robust on-line computation of Reeb graphs: simplicity and speed. |journal=ACM Transactions on Graphics |date=2007 |pages=58.1–58.9 |volume=33 |doi=10.1145/1275808.1276449 |first1=Valerio |last1=Pascucci |first2=Giorgio |last2=Scorzelli |first3=Peer-Timo |last3=Bremer |first4=Ajith |last4=Mascarenhas |doi-access=free}} and merge trees are special cases.

This is not quite the original definition. Carlsson et al. choose $Z$ to be $\R$ or $\R^2$ , and cover it with open sets such that at most two intersect. This restriction means that the output is in the form of a complex network. Because the topology of a finite point cloud is trivial, clustering methods (such as single linkage) are used to produce the analogue of connected sets in the preimage $f^{-1}(U)$ when MAPPER is applied to actual data.

Mathematically speaking, MAPPER is a variation of the Reeb graph. If the $M(\mathbb{U},f)$ is at most one dimensional, then for each $i \geq 0$ , $H_i(X) \simeq H_0(N(\mathbb{U});\hat{F}_i) \oplus H_1(N(\mathbb{U});\hat{F}_{i-1}).$ {{cite arXiv |title=Sheaves, Cosheaves and Applications |eprint=1303.3255 |date=2013-03-13 |first=Justin |last=Curry |class=math.AT}} The added flexibility also has disadvantages. One problem is instability, in that some change of the choice of the cover can lead to major change of the output of the algorithm.{{Cite journal |title=A fast algorithm for constructing topological structure in large data |url=http://projecteuclid.org/euclid.hha/1355321072 |journal=Homology, Homotopy and Applications |date=2012-01-01 |issn=1532-0073 |pages=221–238 |volume=14 |issue=1 |first1=Xu |last1=Liu |first2=Zheng |last2=Xie |first3=Dongyun |last3=Yi|doi=10.4310/hha.2012.v14.n1.a11 |doi-access=free}} Work has been done to overcome this problem.

Three successful applications of MAPPER can be found in Carlsson et al.{{Cite journal |title=Extracting insights from the shape of complex data using topology |journal=Scientific Reports |date=2013-02-07 |pmc=3566620 |pmid=23393618 |volume=3 |pages=1236 |doi=10.1038/srep01236 |first1=P. Y. |last1=Lum |first2=G. |last2=Singh |first3=A. |last3=Lehman |first4=T. |last4=Ishkanov |first5=M. |last5=Vejdemo-Johansson |first6=M. |last6=Alagappan |first7=J. |last7=Carlsson |first8=G. |last8=Carlsson |bibcode=2013NatSR...3.1236L}} A comment on the applications in this paper by J. Curry is that "a common feature of interest in applications is the presence of flares or tendrils".{{cite arXiv |title=Topological Data Analysis and Cosheaves |eprint=1411.0613 |date=2014-11-03 |first=Justin |last=Curry |class=math.AT}}

A free implementation of MAPPER written by Daniel Müllner and Aravindakshan Babu is available [http://danifold.net/mapper/ online]. MAPPER also forms the basis of Ayasdi's AI platform.

Multidimensional persistence

Multidimensional persistence is important to TDA. The concept arises in both theory and practice. The first investigation of multidimensional persistence was early in the development of TDA.{{cite journal |last1=Frosini |first1=P |last2=Mulazzani |first2=M. |title=Size homotopy groups for computation of natural size distances |journal=Bulletin of the Belgian Mathematical Society, Simon Stevin |volume=6 |issue=3 |pages=455–464 |date=1999 |doi=10.36045/bbms/1103065863 |doi-access=free }} Carlsson-Zomorodian introduced the theory of multidimensional persistence in {{cite book |last1=Carlsson |first1=G. |last2=Zomorodian |first2=A. |title=Algorithms and Computation |chapter=Computing Multidimensional Persistence |series=Lecture Notes in Computer Science |chapter-url=https://jocg.org/index.php/jocg/article/view/2973 |publisher=Springer |volume=42 |date=2009 |issue=1 |isbn=978-3-642-10631-6 |pages=71–93 |doi=10.1007/978-3-642-10631-6_74 }} and in collaboration with Singh {{cite journal |last1=Carlsson |first1=G. |last2=Singh |first2=A. |last3=Zomorodian |first3=A. |title=Computing multidimensional persistence |journal=Journal of Computational Geometry |volume=1 |issue= |pages=72–100 |date=2010 |doi=10.20382/jocg.v1i1a6 |s2cid=15529723 }} introduced the use of tools from symbolic algebra (Grobner basis methods) to compute MPH modules. Their definition presents multidimensional persistence with n parameters as a $\mathbb{Z}^n$ graded module over a polynomial ring in n variables. Tools from commutative and homological algebra are applied to the study of multidimensional persistence in work of Harrington-Otter-Schenck-Tillman.{{cite journal |last1=Harrington |first1=H. |last2=Otter |first2=N. |last3=Schenck |first3=H. |last4=Tillman |first4=U.|author4-link=Ulrike Tillman |title=Stratifying multiparameter persistent homology |journal=SIAM Journal on Applied Algebra and Geometry |volume= 3|issue=3 |pages=439–471 |date=2019 |doi=10.1137/18M1224350 |arxiv=1708.07390|s2cid=119689059 }} The first application to appear in the literature is a method for shape comparison, similar to the invention of TDA.{{Cite journal |title=Multidimensional Size Functions for Shape Comparison |journal=Journal of Mathematical Imaging and Vision |date=2008-05-17 |issn=0924-9907 |pages=161–179 |volume=32 |issue=2 |doi=10.1007/s10851-008-0096-z |first1=S. |last1=Biasotti |first2=A. |last2=Cerri |first3=P. |last3=Frosini |first4=D. |last4=Giorgi |first5=C. |last5=Landi|bibcode=2008JMIV...32..161B |s2cid=13372132 }}

The definition of an n-dimensional persistence module in $\R^n$ is

vector space $V_s$ is assigned to each point in $s=(s_1,\ldots ,s_n)$
map $\rho_s^t\colon V_s \to V_t$ is assigned if $s\leq t$ ( $s_i \leq t_i, i=1,\ldots ,n)$
maps satisfy $\rho_r^t=\rho_s^t \circ \rho_r^s$ for all $r \leq s \leq t$

It might be worth noting that there are controversies on the definition of multidimensional persistence.

One of the advantages of one-dimensional persistence is its representability by a diagram or barcode. However, discrete complete invariants of multidimensional persistence modules do not exist.{{Cite journal |title=The Theory of Multidimensional Persistence |journal=Discrete & Computational Geometry |date=2009-04-24 |issn=0179-5376 |pages=71–93 |volume=42 |issue=1 |doi=10.1007/s00454-009-9176-0 |first1=Gunnar |last1=Carlsson |first2=Afra |last2=Zomorodian |doi-access=free}} The main reason for this is that the structure of the collection of indecomposables is extremely complicated by Gabriel's theorem in the theory of quiver representations,{{cite journal |last1=Derksen |first1=Harm |last2=Weyman |first2=Jerzy|author-link=Jerzy Weyman |title=Quiver representations |journal=Notices of the American Mathematical Society |volume=52 |issue=2 |pages=200–6 |date=2005 |doi= |url=https://www.ams.org/journals/notices/200502/fea-weyman.pdf}} although a finitely generated n-dim persistence module can be uniquely decomposed into a direct sum of indecomposables due to the Krull-Schmidt theorem.{{cite journal |first=Michael F. |last=Atiyah |author-link=Michael Atiyah|title=On the Krull-Schmidt theorem with application to sheaves |journal=Bulletin de la Société Mathématique de France |volume=84 |issue= |pages=307–317 |date=1956 |doi= 10.24033/bsmf.1475|url=http://www.numdam.org/item/10.24033/bsmf.1475.pdf}}

Nonetheless, many results have been established. Carlsson and Zomorodian introduced the rank invariant $\rho_M(u,v)$ , defined as the $\rho_M(u,v)=\mathrm{rank}(x^{u-v}\colon M_u\to M_v)$ , in which $M$ is a finitely generated n-graded module. In one dimension, it is equivalent to the barcode. In the literature, the rank invariant is often referred as the persistent Betti numbers (PBNs). In many theoretical works, authors have used a more restricted definition, an analogue from sublevel set persistence. Specifically, the persistence Betti numbers of a function $f:X\to \R^k$ are given by the function $\beta_f\colon \Delta^{+} \to \mathrm{N}$ , taking each $(u,v) \in \Delta^{+}$ to $\beta_f(u,v):= \mathrm{rank} (H(X(f\leq u)\to H(X(f\leq v)))$ , where $\Delta^{+} := \{(u,v)\in \R^k\times\R^k : u\leq v\}$ and $X(f\leq u):=\{x\in X:f(x)\leq u\}$ .

Some basic properties include monotonicity and diagonal jump.Cerri A, Di Fabio B, Ferri M, et al. Multidimensional persistent homology is stable[J]. {{arxiv|0908.0064}}, 2009. Persistent Betti numbers will be finite if $X$ is a compact and locally contractible subspace of $\mathbb{R}^n$ .{{Cite journal |title=Finiteness of rank invariants of multidimensional persistent homology groups |journal=Applied Mathematics Letters |date=2011-04-01 |pages=516–8 |volume=24 |issue=4 |doi=10.1016/j.aml.2010.11.004 |first1=Francesca |last1=Cagliari |first2=Claudia |last2=Landi |s2cid=14337220 |arxiv=1001.0358}}

Using a foliation method, the k-dim PBNs can be decomposed into a family of 1-dim PBNs by dimensionality deduction.{{Cite journal |title=One-dimensional reduction of multidimensional persistent homology |journal=Proceedings of the American Mathematical Society |date=2010-01-01 |issn=0002-9939 |pages=3003–17 |volume=138 |issue=8 |doi=10.1090/S0002-9939-10-10312-8 |first1=Francesca |last1=Cagliari |first2=Barbara |last2=Di Fabio |first3=Massimo |last3=Ferri |s2cid=18284958 |arxiv=math/0702713}} This method has also led to a proof that multi-dim PBNs are stable.{{Cite journal |title=Betti numbers in multidimensional persistent homology are stable functions |journal=Mathematical Methods in the Applied Sciences |date=2013-08-01 |issn=1099-1476 |pages=1543–57 |volume=36 |issue=12 |doi=10.1002/mma.2704 |first1=Andrea |last1=Cerri |first2=Barbara Di |last2=Fabio |first3=Massimo |last3=Ferri |first4=Patrizio |last4=Frosini |first5=Claudia |last5=Landi |bibcode=2013MMAS...36.1543C |s2cid=9938133 |url=http://amsacta.unibo.it/2923/}} The discontinuities of PBNs only occur at points $(u,v) (u\leq v)$ where either $u$ is a discontinuous point of $\rho_M (\star,v)$ or $v$ is a discontinuous point of $\rho (u,\star)$ under the assumption that $f\in C^0(X,\mathbb{R}^k)$ and $X$ is a compact, triangulable topological space.{{Cite journal |title=Necessary conditions for discontinuities of multidimensional persistent Betti numbers |journal=Mathematical Methods in the Applied Sciences |date=2015-03-15 |issn=1099-1476 |pages=617–629 |volume=38 |issue=4 |doi=10.1002/mma.3093 |first1=Andrea |last1=Cerri |first2=Patrizio |last2=Frosini |bibcode=2015MMAS...38..617C|s2cid=5537858 }}

Persistent space, a generalization of persistent diagram, is defined as the multiset of all points with multiplicity larger than 0 and the diagonal.{{Cite book |publisher=Springer|location= Berlin, Heidelberg |date=2013-03-20 |isbn=978-3-642-37066-3 |pages=180–191 |series=Lecture Notes in Computer Science |doi=10.1007/978-3-642-37067-0_16 |first1=Andrea |last1=Cerri |first2=Claudia |last2=Landi |title=Discrete Geometry for Computer Imagery |chapter=The Persistence Space in Multidimensional Persistent Homology |volume=7749 |editor-first=Rocio |editor-last=Gonzalez-Diaz |editor-first2=Maria-Jose |editor-last2=Jimenez |editor-first3=Belen |editor-last3=Medrano}} It provides a stable and complete representation of PBNs. An ongoing work by Carlsson et al. is trying to give geometric interpretation of persistent homology, which might provide insights on how to combine machine learning theory with topological data analysis.{{cite arXiv |title=Numeric Invariants from Multidimensional Persistence |eprint=1411.4022 |date=2014-11-14 |first1=Jacek |last1=Skryzalin |first2=Gunnar |last2=Carlsson |class=cs.CG}}

The first practical algorithm to compute multidimensional persistence was invented very early.{{Cite book |publisher=Springer Berlin Heidelberg |date=2009-12-16 |isbn=978-3-642-10630-9 |pages=730–9 |series=Lecture Notes in Computer Science |doi=10.1007/978-3-642-10631-6_74 |first1=Gunnar |last1=Carlsson |first2=Gurjeet |last2=Singh |first3=Afra |last3=Zomorodian |title=Algorithms and Computation |chapter=Computing Multidimensional Persistence |volume=5878 |s2cid=15529723 |editor-first=Yingfei |editor-last=Dong |editor-first2=Ding-Zhu |editor-last2=Du |editor-first3=Oscar |editor-last3=Ibarra |citeseerx=10.1.1.313.7004}} After then, many other algorithms have been proposed, based on such concepts as discrete morse theory{{cite arXiv |title=Reducing complexes in multidimensional persistent homology theory |eprint=1310.8089 |date=2013-10-30 |first1=Madjid |last1=Allili |first2=Tomasz |last2=Kaczynski |first3=Claudia |last3=Landi |class=cs.CG}} and finite sample estimating.{{cite journal |last1=Cavazza |first1=N. |last2=Ferri |first2=M. |last3=Landi |first3=C. |title=Estimating multidimensional persistent homology through a finite sampling |journal= International Journal of Computational Geometry and Applications|volume=25 |issue=3 |pages=187–205 |date=2010 |doi= 10.1142/S0218195915500119|url=https://www.worldscientific.com/doi/abs/10.1142/S0218195915500119 |arxiv=1507.05277|s2cid=4803380 }}

Other persistences

The standard paradigm in TDA is often referred as sublevel persistence. Apart from multidimensional persistence, many works have been done to extend this special case.

= Zigzag persistence =

The nonzero maps in persistence module are restricted by the preorder relationship in the category. However, mathematicians have found that the unanimity of direction is not essential to many results. "The philosophical point is that the decomposition theory of graph representations is somewhat independent of the orientation of the graph edges".{{Cite journal |title=Zigzag Persistence |journal=Foundations of Computational Mathematics |date=2010-04-21 |issn=1615-3375 |pages=367–405 |volume=10 |issue=4 |doi=10.1007/s10208-010-9066-0 |first1=Gunnar |last1=Carlsson |author1-link=Gunnar Carlsson| first2=Vin de |last2=Silva |doi-access=free}} Zigzag persistence is important to the theoretical side. The examples given in Carlsson's review paper to illustrate the importance of functorality all share some of its features.

= Extended persistence and levelset persistence =

There are some attempts to loosen the stricter restriction of the function.{{Cite journal |title=Extending Persistence Using Poincaré and Lefschetz Duality |journal=Foundations of Computational Mathematics |date=2008-04-04 |issn=1615-3375 |pages=79–103 |volume=9 |issue=1 |doi=10.1007/s10208-008-9027-z |first1=David |last1=Cohen-Steiner |first2=Herbert |last2=Edelsbrunner |first3=John |last3=Harer|s2cid=33297537 }} Please refer to the Categorification and cosheaves and Impact on mathematics sections for more information.

It's natural to extend persistence homology to other basic concepts in algebraic topology, such as cohomology and relative homology/cohomology.{{Cite journal |title=Dualities in persistent (co)homology |journal=Inverse Problems |volume=27 |issue=12 |doi=10.1088/0266-5611/27/12/124003 |first1=Vin |last1=de Silva |first2=Dmitriy |last2=Morozov |first3=Mikael |last3=Vejdemo-Johansson|s2cid=5706682 |pages=124003 |arxiv=1107.5665 |bibcode=2011InvPr..27l4003D |year=2011}} An interesting application is the computation of circular coordinates for a data set via the first persistent cohomology group.{{Cite journal |title=Persistent Cohomology and Circular Coordinates |journal=Discrete & Computational Geometry |date=2011-03-30 |issn=0179-5376 |pages=737–759 |volume=45 |issue=4 |doi=10.1007/s00454-011-9344-x |first1=Vin de |last1=Silva |first2=Dmitriy |last2=Morozov |first3=Mikael |last3=Vejdemo-Johansson |s2cid=31480083 |arxiv=0905.4887}}

= Circular persistence =

Normal persistence homology studies real-valued functions. The circle-valued map might be useful, "persistence theory for circle-valued maps promises to play the role for some vector fields as does the standard persistence theory for scalar fields", as commented in Dan Burghelea et al.{{Cite journal |title=Topological Persistence for Circle-Valued Maps |journal=Discrete & Computational Geometry |date=2013-04-09 |issn=0179-5376 |pages=69–98 |volume=50 |issue=1 |doi=10.1007/s00454-013-9497-x |first1=Dan |last1=Burghelea |author1-link=Dan Burghelea|first2=Tamal K. |last2=Dey |s2cid=17407953 |arxiv=1104.5646}} The main difference is that Jordan cells (very similar in format to the Jordan blocks in linear algebra) are nontrivial in circle-valued functions, which would be zero in real-valued case, and combining with barcodes give the invariants of a tame map, under moderate conditions.

Two techniques they use are Morse-Novikov theorySergey P. Novikov, Quasiperiodic structures in topology[C]//Topological methods in modern mathematics, Proceedings of the symposium in honor of John Milnor's sixtieth birthday held at the State University of New York, Stony Brook, New York. 1991: 223-233. and graph representation theory.{{Cite book |title=Handbook of Graph Theory |url=https://books.google.com/books?id=mKkIGIea_BkC |publisher=CRC Press |date=2004-06-02 |isbn=978-0-203-49020-4 |first1=Jonathan L. |last1=Gross |first2=Jay |last2=Yellen}} More recent results can be found in D. Burghelea et al.{{cite arXiv |title=Topology of angle valued maps, bar codes and Jordan blocks |eprint=1303.4328 |date=2015-06-04 |first1=Dan |last1=Burghelea|author1-link=Dan Burghelea |first2=Stefan |last2=Haller |class=math.AT}} For example, the tameness requirement can be replaced by the much weaker condition, continuous.

= Persistence with torsion =

The proof of the structure theorem relies on the base domain being field, so not many attempts have been made on persistence homology with torsion. Frosini defined a pseudometric on this specific module and proved its stability.{{Cite journal |title=Stable Comparison of Multidimensional Persistent Homology Groups with Torsion |journal=Acta Applicandae Mathematicae |date=2012-06-23 |issn=0167-8019 |pages=43–54 |volume=124 |issue=1 |doi=10.1007/s10440-012-9769-0 |first=Patrizio |last=Frosini |s2cid=4809929 |arxiv=1012.4169}} One of its novelty is that it doesn't depend on some classification theory to define the metric.

Categorification and cosheaves

One advantage of category theory is its ability to lift concrete results to a higher level, showing relationships between seemingly unconnected objects. Peter Bubenik et al.{{Cite journal |title=Categorification of Persistent Homology |journal=Discrete & Computational Geometry |date=2014-01-28 |issn=0179-5376 |pages=600–627 |volume=51 |issue=3 |doi=10.1007/s00454-014-9573-x |first1=Peter |last1=Bubenik |first2=Jonathan A. |last2=Scott |s2cid=11056619 |arxiv=1205.3669}} offers a short introduction of category theory fitted for TDA.

Category theory is the language of modern algebra, and has been widely used in the study of algebraic geometry and topology. It has been noted that "the key observation of is that the persistence diagram produced by depends only on the algebraic structure carried by this diagram."{{Cite journal |title=Metrics for Generalized Persistence Modules |journal=Foundations of Computational Mathematics |date=2014-10-09 |issn=1615-3375 |pages=1501–31 |volume=15 |issue=6 |doi=10.1007/s10208-014-9229-5 |first1=Peter |last1=Bubenik |first2=Vin de |last2=Silva |first3=Jonathan |last3=Scott |s2cid=16351674 |citeseerx=10.1.1.748.3101}} The use of category theory in TDA has proved to be fruitful.

Following the notations made in Bubenik et al., the indexing category $P$ is any preordered set (not necessarily $\N$ or $\R$ ), the target category $D$ is any category (instead of the commonly used $\mathrm{Vect}_{\mathbb{F}}$ ), and functors $P \to D$ are called generalized persistence modules in $D$ , over $P$ .

One advantage of using category theory in TDA is a clearer understanding of concepts and the discovery of new relationships between proofs. Take two examples for illustration. The understanding of the correspondence between interleaving and matching is of huge importance, since matching has been the method used in the beginning (modified from Morse theory). A summary of works can be found in Vin de Silva et al.{{Cite book |publisher=ACM |date=2013-01-01 |location=New York, NY, USA |isbn=978-1-4503-2031-3 |pages=397–404 |series=SoCG '13 |doi=10.1145/2462356.2462402 |first1=Vin |last1=de Silva |first2=Vidit |last2=Nanda|title=Proceedings of the twenty-ninth annual symposium on Computational geometry |chapter=Geometry in the space of persistence modules |s2cid=16326608 }} Many theorems can be proved much more easily in a more intuitive setting. Another example is the relationship between the construction of different complexes from point clouds. It has long been noticed that Čech and Vietoris-Rips complexes are related. Specifically, $V_r(X) \subset C_{\sqrt{2}r}(X) \subset V_{2r}(X)$ . The essential relationship between Cech and Rips complexes can be seen much more clearly in categorical language.

The language of category theory also helps cast results in terms recognizable to the broader mathematical community. Bottleneck distance is widely used in TDA because of the results on stability with respect to the bottleneck distance. In fact, the interleaving distance is the terminal object in a poset category of stable metrics on multidimensional persistence modules in a prime field.{{Cite journal |title=The Theory of the Interleaving Distance on Multidimensional Persistence Modules |journal=Foundations of Computational Mathematics |date=2015-03-24 |issn=1615-3375 |pages=613–650 |volume=15 |issue=3 |doi=10.1007/s10208-015-9255-y |first=Michael |last=Lesnick |s2cid=17184609 |arxiv=1106.5305}}{{Cite journal |title=Natural Pseudo-Distance and Optimal Matching between Reduced Size Functions |journal=Acta Applicandae Mathematicae |date=2008-10-14 |issn=0167-8019 |pages=527–554 |volume=109 |issue=2 |doi=10.1007/s10440-008-9332-1 |first1=Michele |last1=d'Amico |first2=Patrizio |last2=Frosini |first3=Claudia |last3=Landi |s2cid=1704971 |arxiv=0804.3500 |bibcode=2008arXiv0804.3500D}}

Sheaves, a central concept in modern algebraic geometry, are intrinsically related to category theory. Roughly speaking, sheaves are the mathematical tool for understanding how local information determines global information. Justin Curry regards level set persistence as the study of fibers of continuous functions. The objects that he studies are very similar to those by MAPPER, but with sheaf theory as the theoretical foundation. Although no breakthrough in the theory of TDA has yet used sheaf theory, it is promising since there are many beautiful theorems in algebraic geometry relating to sheaf theory. For example, a natural theoretical question is whether different filtration methods result in the same output.{{Cite journal |title=Filtrations induced by continuous functions |journal=Topology and Its Applications |date=2013-08-01 |pages=1413–22 |volume=160 |issue=12 |doi=10.1016/j.topol.2013.05.013 |first1=B. |last1=Di Fabio |first2=P. |last2=Frosini |s2cid=13971804 |arxiv=1304.1268 |bibcode=2013arXiv1304.1268D}}

Stability

Stability is of central importance to data analysis, since real data carry noises. By usage of category theory, Bubenik et al. have distinguished between soft and hard stability theorems, and proved that soft cases are formal. Specifically, general workflow of TDA is

class="wikitable"

|data

| $\stackrel{F}{\longrightarrow}$

|topological persistence module

| $\stackrel{H}{\longrightarrow}$

|algebraic persistence module

| $\stackrel{J}{\longrightarrow}$

|discrete invariant

The soft stability theorem asserts that $HF$ is Lipschitz continuous, and the hard stability theorem asserts that $J$ is Lipschitz continuous.

Bottleneck distance is widely used in TDA. The isometry theorem asserts that the interleaving distance $d_I$ is equal to the bottleneck distance. Bubenik et al. have abstracted the definition to that between functors $F,G\colon P\to D$ when $P$ is equipped with a sublinear projection or superlinear family, in which still remains a pseudometric. Considering the magnificent characters of interleaving distance,{{cite arXiv |title=Multidimensional Interleavings and Applications to Topological Inference |eprint=1206.1365 |date=2012-06-06 |first=Michael |last=Lesnick |class=math.AT}} here we introduce the general definition of interleaving distance(instead of the first introduced one): Let $\Gamma, K \in \mathrm{Trans_P}$ (a function from $P$ to $P$ which is monotone and satisfies $x \leq \Gamma(x)$ for all $x\in P$ ). A $(\Gamma, K)$ -interleaving between F and G consists of natural transformations $\varphi\colon F \Rightarrow G\Gamma$ and $\psi\colon G \Rightarrow FK$ , such that $(\psi\Gamma)=\varphi F\eta_{K\Gamma}$ and $(\varphi\Gamma)=\psi G\eta_{\Gamma K}$ .

The two main results are

Let $P$ be a preordered set with a sublinear projection or superlinear family. Let $H:D \to E$ be a functor between arbitrary categories $D,E$ . Then for any two functors $F,G\colon P\to D$ , we have $d_I(HF,HG) \leq d_I(F,G)$ .
Let $P$ be a poset of a metric space $Y$ , $X$ be a topological space. And let $f,g\colon X\to Y$ (not necessarily continuous) be functions, and $F,G$ to be the corresponding persistence diagram. Then $d_I(F,G) \leq d_{\infty}(f,g):=\sup_{x\in X}d_Y(f(x),g(x))$ .

These two results summarize many results on stability of different models of persistence.

For the stability theorem of multidimensional persistence, please refer to the subsection of persistence.

Structure theorem

The structure theorem is of central importance to TDA; as commented by G. Carlsson, "what makes homology useful as a discriminator between topological spaces is the fact that there is a classification theorem for finitely generated abelian groups".{{Cite journal |title=Topology and data |journal=Bulletin of the American Mathematical Society |date=2009-01-01 |issn=0273-0979 |pages=255–308 |volume=46 |issue=2 |doi=10.1090/S0273-0979-09-01249-X |first=Gunnar |last=Carlsson |doi-access=free}} (see the fundamental theorem of finitely generated abelian groups).

The main argument used in the proof of the original structure theorem is the standard structure theorem for finitely generated modules over a principal ideal domain. However, this argument fails if the indexing set is $(\mathbb{R},\leq)$ .

In general, not every persistence module can be decomposed into intervals.{{cite arXiv |title=The structure and stability of persistence modules |eprint=1207.3674 |date=2012-07-16 |first1=Frederic |last1=Chazal |first2=Vin |last2=de Silva |first3=Marc |last3=Glisse |first4=Steve |last4=Oudot |class=math.AT}} Many attempts have been made at relaxing the restrictions of the original structure theorem.{{Clarify|date=December 2015}} The case for pointwise finite-dimensional persistence modules indexed by a locally finite subset of $\mathbb{R}$ is solved based on the work of Webb.{{Cite journal |title=Decomposition of graded modules |journal=Proceedings of the American Mathematical Society |date=1985-01-01 |issn=0002-9939 |pages=565–571 |volume=94 |issue=4 |doi=10.1090/S0002-9939-1985-0792261-6 |first=Cary |last=Webb |doi-access=free}} The most notable result is done by Crawley-Boevey, which solved the case of $\mathbb{R}$ . Crawley-Boevey's theorem states that any pointwise finite-dimensional persistence module is a direct sum of interval modules.{{Cite journal |title=Decomposition of pointwise finite-dimensional persistence modules |journal=Journal of Algebra and Its Applications |volume=14 |issue=5 |doi=10.1142/s0219498815500668 |first=William |last=Crawley-Boevey|s2cid=119635797 |pages=1550066 |year=2015 |arxiv=1210.0819}}

To understand the definition of his theorem, some concepts need introducing. An interval in $(\R,\leq)$ is defined as a subset $I \subset \R$ having the property that if $r, t \in I$ and if there is an $s \in \R$ such that $r \leq s \leq t$ , then $s\in I$ as well. An interval module $k_I$ assigns to each element $s\in I$ the vector space $k$ and assigns the zero vector space to elements in $\R \setminus I$ . All maps $\rho_s^t$ are the zero map, unless $s,t \in I$ and $s\leq t$ , in which case $\rho_s^t$ is the identity map. Interval modules are indecomposable.{{cite arXiv |title=The observable structure of persistence modules |eprint=1405.5644 |date=2014-05-22 |first1=Frederic |last1=Chazal |first2=William |last2=Crawley-Boevey |first3=Vin |last3=de Silva |class=math.RT}}

Although the result of Crawley-Boevey is a very powerful theorem, it still doesn't extend to the q-tame case. A persistence module is q-tame if the rank of $\rho_s^t$ is finite for all $s< t$ . There are examples of q-tame persistence modules that fail to be pointwise finite.{{cite arXiv |title=A subset of Euclidean space with large Vietoris-Rips homology |eprint=1210.4097 |date=2012-10-15 |first=Jean-Marie |last=Droz |class=math.GT}} However, it turns out that a similar structure theorem still holds if the features that exist only at one index value are removed. This holds because the infinite dimensional parts at each index value do not persist, due to the finite-rank condition.{{cite journal |first=Shmuel |last=Weinberger |author-link=Shmuel Weinberger|title=What is... persistent homology? |journal=Notices of the American Mathematical Society |volume=58 |issue=1 |pages=36–39 |date=2011 |doi= |url=https://www.ams.org/journals/notices/201101/rtx110100036p.pdf }} Formally, the observable category $\mathrm{Ob}$ is defined as $\mathrm{Pers}/\mathrm{Eph}$ , in which $\mathrm{Eph}$ denotes the full subcategory of $\mathrm{Pers}$ whose objects are the ephemeral modules ( $\rho^t_s=0$ whenever $s < t$ ).

Note that the extended results listed here do not apply to zigzag persistence, since the analogue of a zigzag persistence module over $\R$ is not immediately obvious.

Statistics

Real data is always finite, and so its study requires us to take stochasticity into account. Statistical analysis gives us the ability to separate true features of the data from artifacts introduced by random noise. Persistent homology has no inherent mechanism to distinguish between low-probability features and high-probability features.

One way to apply statistics to topological data analysis is to study the statistical properties of topological features of point clouds. The study of random simplicial complexes offers some insight into statistical topology. Katharine Turner et al.{{Cite journal |title=Fréchet Means for Distributions of Persistence Diagrams |journal=Discrete & Computational Geometry |date=2014-07-12 |issn=0179-5376 |pages=44–70 |volume=52 |issue=1 |doi=10.1007/s00454-014-9604-7 |first1=Katharine |last1=Turner |first2=Yuriy |last2=Mileyko |first3=Sayan |last3=Mukherjee |first4=John |last4=Harer |s2cid=14293062 |arxiv=1206.2790}} offers a summary of work in this vein.

A second way is to study probability distributions on the persistence space. The persistence space $B_\infty$ is $\coprod_n B_{n}/ {\backsim}$ , where $B_n$ is the space of all barcodes containing exactly $n$ intervals and the equivalences are $\{[x_1,y_1],[x_2,y_2],\ldots,[x_n,y_n]\} \backsim \{[x_1,y_1],[x_2,y_2],\ldots,[x_{n-1},y_{n-1}]\}$ if $x_n = y_n$ .{{Cite journal |title=Topological pattern recognition for point cloud data |journal=Acta Numerica |date=2014-05-01 |issn=1474-0508 |pages=289–368 |volume=23 |doi=10.1017/S0962492914000051 |first=Gunnar |last=Carlsson |author-link=Gunnar Carlsson|doi-access=free}} This space is fairly complicated; for example, it is not complete under the bottleneck metric. The first attempt made to study it is by Yuriy Mileyko et al.{{Cite journal |title=Probability measures on the space of persistence diagrams |journal=Inverse Problems |date=2011-11-10 |issn=0266-5611 |pages=124007 |volume=27 |issue=12 |doi=10.1088/0266-5611/27/12/124007 |first1=Yuriy |last1=Mileyko |first2=Sayan |last2=Mukherjee |first3=John |last3=Harer |s2cid=250676 |bibcode=2011InvPr..27l4007M }} The space of persistence diagrams $D_p$ in their paper is defined as $D_p := \left\{d \mid \sum_{x\in d}\left(2\inf_{y \in \Delta}\lVert x-y \rVert\right)^p < \infty \right\}$ where $\Delta$ is the diagonal line in $\mathbb{R}^2$ . A nice property is that $D_p$ is complete and separable in the Wasserstein metric $W_p(u,v)=\left(\inf_{\gamma\in \Gamma(u,v)}\int_{\mathbb{X}\times \mathbb{X}} \rho^p(x,y) \, \mathrm{d}\gamma(x,y)\right)^{1/p}$ . Expectation, variance, and conditional probability can be defined in the Fréchet sense. This allows many statistical tools to be ported to TDA. Works on null hypothesis significance test,{{cite arXiv |title=Hypothesis Testing for Topological Data Analysis |eprint=1310.7467 |date=2013-10-28 |first1=Andrew |last1=Robinson |first2=Katharine |last2=Turner |class=stat.AP}} confidence intervals,{{Cite journal |title=Confidence sets for persistence diagrams |journal=The Annals of Statistics |date=2014-12-01 |issn=0090-5364 |pages=2301–39 |volume=42 |issue=6 |doi=10.1214/14-AOS1252 |first1=Brittany Terese |last1=Fasy |first2=Fabrizio |last2=Lecci |first3=Alessandro |last3=Rinaldo |first4=Larry |last4=Wasserman |first5=Sivaraman |last5=Balakrishnan |first6=Aarti |last6=Singh |doi-access=free|arxiv=1303.7117 }} and robust estimates{{Cite journal |title=Robust Statistics, Hypothesis Testing, and Confidence Intervals for Persistent Homology on Metric Measure Spaces |journal=Foundations of Computational Mathematics |date=2014-05-15 |issn=1615-3375 |pages=745–789 |volume=14 |issue=4 |doi=10.1007/s10208-014-9201-4 |first1=Andrew J. |last1=Blumberg |first2=Itamar |last2=Gal |first3=Michael A. |last3=Mandell |first4=Matthew |last4=Pancia |s2cid=17150103 |arxiv=1206.4581}} are notable steps.

A third way is to consider the cohomology of probabilistic space or statistical systems directly, called information structures and basically consisting in the triple ( $\Omega,\Pi,P$ ), sample space, random variables and probability laws.{{Cite journal |title=The Homological Nature of Entropy |journal=Entropy |date=2015 |pages=3253–3318 |volume=17 |issue=5 |doi=10.3390/e17053253 |first1=Pierre |last1=Baudot |first2=Daniel |last2=Bennequin |bibcode=2015Entrp..17.3253B |doi-access=free}}{{Cite thesis |title=Topology of Statistical Systems: A Cohomological Approach to Information Theory |url=https://webusers.imj-prg.fr/~juan-pablo.vigneaux/these.pdf |type=PhD |date=2019 |first=Juan-Pablo |last=Vigneaux |publisher=Université Sorbonne Paris Cité |id=tel-02951504}} Random variables are considered as partitions of the n atomic probabilities (seen as a probability (n-1)-simplex, $|\Omega|=n$ ) on the lattice of partitions ( $\Pi_n$ ). The random variables or modules of measurable functions provide the cochain complexes while the coboundary is considered as the general homological algebra first discovered by Gerhard Hochschild with a left action implementing the action of conditioning. The first cocycle condition corresponds to the chain rule of entropy, allowing to derive uniquely up to the multiplicative constant, Shannon entropy as the first cohomology class. The consideration of a deformed left-action generalises the framework to Tsallis entropies. The information cohomology is an example of ringed topos. Multivariate k-Mutual information appear in coboundaries expressions, and their vanishing, related to cocycle condition, gives equivalent conditions for statistical independence.{{Cite journal |title=Topological Information Data Analysis |journal=Entropy |date=2019 |pages=881 |volume=21 |issue=9 |doi=10.3390/e21090881 |first1=Pierre |last1=Baudot |first2=Monica |last2=Tapia |first3=Daniel |last3=Bennequin |first4=Jean-Marc |last4=Goaillard |doi-access=free |bibcode=2019Entrp..21..881B|pmc=7515411 }} Minima of mutual-informations, also called synergy, give rise to interesting independence configurations analog to homotopical links. Because of its combinatorial complexity, only the simplicial subcase of the cohomology and of information structure has been investigated on data. Applied to data, those cohomological tools quantifies statistical dependences and independences, including Markov chains and conditional independence, in the multivariate case.{{Cite journal |title=Neurotransmitter identity and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons |journal=Scientific Reports |volume=8 |pages=13637 |date=2018 |doi=10.1038/s41598-018-31765-z |pmid=30206240 |first1=Monica |last1=Tapia |first2=et |last2=al. |issue=1 |pmc=6134142 |bibcode=2018NatSR...813637T}} Notably, mutual-informations generalize correlation coefficient and covariance to non-linear statistical dependences. These approaches were developed independently and only indirectly related to persistence methods, but may be roughly understood in the simplicial case using Hu Kuo Tin Theorem that establishes one-to-one correspondence between mutual-informations functions and finite measurable function of a set with intersection operator, to construct the Čech complex skeleton. Information cohomology offers some direct interpretation and application in terms of neuroscience (neural assembly theory and qualitative cognition {{Cite journal |title=Elements of qualitative cognition: an Information Topology Perspective |journal=Physics of Life Reviews |date=2019 |pages=263–275 |volume=31 |doi=10.1016/j.plrev.2019.10.003 |pmid=31679788 |first=Pierre |last=Baudot |arxiv=1807.04520|bibcode=2019PhLRv..31..263B |s2cid=207897618 }}), statistical physic, and deep neural network for which the structure and learning algorithm are imposed by the complex of random variables and the information chain rule.{{Cite journal |title=The Poincaré-Shannon Machine: Statistical Physics and Machine Learning Aspects of Information Cohomology |journal=Entropy |date=2019 |pages=881 |volume=21 |issue=9 |doi=10.3390/e21090881 |first=Pierre |last=Baudot |bibcode=2019Entrp..21..881B |doi-access=free|pmc=7515411 }}

Persistence landscapes, introduced by Peter Bubenik, are a different way to represent barcodes, more amenable to statistical analysis.{{cite arXiv |title=Statistical topological data analysis using persistence landscapes |eprint=1207.6437 |date=2012-07-26 |first=Peter |last=Bubenik |class=math.AT}} The persistence landscape of a persistent module $M$ is defined as a function $\lambda:\mathbb{N}\times\mathbb{R}\to \bar{\mathbb{R}}$ , $\lambda(k,t):=\sup(m\geq 0\mid\beta^{t-m,t-m}\geq k)$ , where $\bar{\mathbb{R}}$ denotes the extended real line and $\beta^{a,b}=\mathrm{dim}(\mathrm{im}(M(a\leq b)))$ . The space of persistence landscapes is very nice: it inherits all good properties of barcode representation (stability, easy representation, etc.), but statistical quantities can be readily defined, and some problems in Y. Mileyko et al.'s work, such as the non-uniqueness of expectations, can be overcome. Effective algorithms for computation with persistence landscapes are available.{{cite journal |title=A persistence landscapes toolbox for topological statistics |arxiv=1501.00179 |date=2014-12-31 |first1=Peter |last1=Bubenik |first2=Pawel |last2=Dlotko|s2cid=9789489 |doi=10.1016/j.jsc.2016.03.009|volume=78|journal=Journal of Symbolic Computation|pages=91–114 |bibcode=2015arXiv150100179B}} Another approach is to use revised persistence, which is image, kernel and cokernel persistence.{{Cite book |pages=1011–20 |doi=10.1137/1.9781611973068.110 |first1=David |last1=Cohen-Steiner |first2=Herbert |last2=Edelsbrunner |author2-link=Herbert Edelsbrunner| first3=John |last3=Harer |first4=Dmitriy |last4=Morozov |title=Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms |year=2009 |isbn=978-0-89871-680-1 |citeseerx=10.1.1.179.3236}}

Applications

= Classification of applications =

More than one way exists to classify the applications of TDA. Perhaps the most natural way is by field. A very incomplete list of successful applications includes {{Cite journal | title = A one-dimensional Homologically Persistent Skeleton of an unstructured point cloud in any metric space | url = http://kurlin.org/projects/persistent-skeleton.pdf|journal = Computer Graphics Forum | volume = 34 | issue = 5 | pages = 253–262 | date = 2015 | doi = 10.1111/cgf.12713 | first = V. | last = Kurlin| s2cid = 10610111}} data skeletonization,{{Cite book|url = http://kurlin.org/projects/counting-holes-in-noisy-clouds.pdf|pages = 1458–1463| date = 2014 | doi = 10.1109/CVPR.2014.189 | first = V. | last = Kurlin | chapter=A Fast and Robust Algorithm to Count Topologically Persistent Holes in Noisy Clouds |s2cid = 10118087|title = 2014 IEEE Conference on Computer Vision and Pattern Recognition|isbn = 978-1-4799-5118-5|arxiv = 1312.1492}} shape study,{{Cite conference | chapter = A Homologically Persistent Skeleton is a fast and robust descriptor of interest points in 2D images | contribution-url = http://kurlin.org/projects/persistent-skeleton-dim2.pdf | series = Lecture Notes in Computer Science |title=Proceedings of CAIP: Computer Analysis of Images and Patterns | volume = 9256 | pages = 606–617 | date = 2015 | doi = 10.1007/978-3-319-23192-1_51 | first = V. | last = Kurlin | isbn = 978-3-319-23191-4 }} graph reconstruction,{{Cite journal|title = Retrieval of trademark images by means of size functions|journal = Graphical Models|date = 2006-09-01|pages = 451–471|volume = 68|series = Special Issue on the Vision, Video and Graphics Conference 2005|issue = 5–6|doi = 10.1016/j.gmod.2006.07.001|first1 = A.|last1 = Cerri|first2 = M.|last2 = Ferri|first3 = D.|last3 = Giorgi}}{{Cite journal|title = Gromov-Hausdorff Stable Signatures for Shapes using Persistence|journal = Computer Graphics Forum|date = 2009-07-01|issn = 1467-8659|pages = 1393–1403|volume = 28|issue = 5|doi = 10.1111/j.1467-8659.2009.01516.x|first1 = Frédéric|last1 = Chazal|first2 = David|last2 = Cohen-Steiner|first3 = Leonidas J.|last3 = Guibas|first4 = Facundo|last4 = Mémoli|first5 = Steve Y.|last5 = Oudot|citeseerx = 10.1.1.161.9103|s2cid = 8173320}}{{Cite journal|title = Size functions for comparing 3D models|journal = Pattern Recognition|date = 2008-09-01|pages = 2855–2873|volume = 41|issue = 9|doi = 10.1016/j.patcog.2008.02.003|first1 = S.|last1 = Biasotti|first2 = D.|last2 = Giorgi|first3 = M.|last3 = Spagnuolo|first4 = B.|last4 = Falcidieno|author4-link=Bianca Falcidieno|bibcode = 2008PatRe..41.2855B| url=https://zenodo.org/record/8151838 }}{{Cite book |chapter = Persistence-based Structural Recognition|url = http://www.lix.polytechnique.fr/~maks/papers/li-CVPR-14.pdf |title = IEEE Conference on Computer Vision and Pattern Recognition |first1 = C.|last1 = Li|first2 = M.|last2 = Ovsjanikov |first3 = F.|last3 = Chazal |date=2014 |isbn=978-1-4799-5118-5 |pages=2003–10 |doi=10.1109/CVPR.2014.257|s2cid = 17787875 }}

{{Cite journal|title = Neurotransmitter identity and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons|journal = Scientific Reports| volume = 8 | pages = 13637 | date = 2018 | doi = 10.1038/s41598-018-31765-z|pmid = 30206240|first1 = Monica |last1 = Tapia |first2 = et |last2 = al.|issue = 1|pmc = 6134142|bibcode = 2018NatSR...813637T}}

image analysis,

{{Cite journal|title = Computing Robustness and Persistence for Images|journal = IEEE Transactions on Visualization and Computer Graphics|date = 2010-11-01|issn = 1077-2626|pages = 1251–1260|volume = 16|issue = 6|doi = 10.1109/TVCG.2010.139|pmid = 20975165|first1 = P.|last1 = Bendich|first2 = H.|last2 = Edelsbrunner|first3 = M.|last3 = Kerber|s2cid = 8589124|citeseerx = 10.1.1.185.523}}{{Cite journal|title = On the Local Behavior of Spaces of Natural Images|journal = International Journal of Computer Vision|date = 2007-06-30|issn = 0920-5691|pages = 1–12|volume = 76|issue = 1|doi = 10.1007/s11263-007-0056-x|first1 = Gunnar|last1 = Carlsson|first2 = Tigran|last2 = Ishkhanov|first3 = Vin de|last3 = Silva|first4 = Afra|last4 = Zomorodian|s2cid = 207252002|citeseerx = 10.1.1.463.7101}} material,{{Cite journal |last1=Hiraoka |first1=Yasuaki |last2=Nakamura |first2=Takenobu |last3=Hirata |first3=Akihiko |last4=Escolar |first4=Emerson G. |last5=Matsue |first5=Kaname |last6=Nishiura |first6=Yasumasa |date=2016-06-28 |title=Hierarchical structures of amorphous solids characterized by persistent homology |journal=Proceedings of the National Academy of Sciences of the United States of America |language=en |volume=113 |issue=26 |pages=7035–40 |doi=10.1073/pnas.1520877113 |issn=0027-8424 |pmc=4932931 |pmid=27298351|arxiv=1501.03611 |bibcode=2016PNAS..113.7035H |doi-access=free }}{{cite journal|title = Persistent Homology and Many-Body Atomic Structure for Medium-Range Order in the Glass|journal= Nanotechnology|volume= 26|issue= 30|pages= 304001|arxiv= 1502.07445|date = 2015-02-26|first1 = Takenobu|last1 = Nakamura|first2 = Yasuaki|last2 = Hiraoka|first3 = Akihiko|last3 = Hirata|first4 = Emerson G.|last4 = Escolar|first5 = Yasumasa|last5 = Nishiura|s2cid= 7298655|bibcode= 2015Nanot..26D4001N|doi= 10.1088/0957-4484/26/30/304001|pmid= 26150288}} progression analysis of disease,{{Cite journal|title = Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival|journal = Proceedings of the National Academy of Sciences of the United States of America|date = 2011-04-26|issn = 0027-8424|pmc = 3084136|pmid = 21482760|pages = 7265–7270|volume = 108|issue = 17|doi = 10.1073/pnas.1102826108|first1 = Monica|last1 = Nicolau|first2 = Arnold J.|last2 = Levine|first3 = Gunnar|last3 = Carlsson|bibcode = 2011PNAS..108.7265N|doi-access = free}}{{Cite book|publisher = Springer |location=New York|date = 2011-01-01|isbn = 978-1-4419-7414-3|pages = 433–455|series = AAPS Advances in the Pharmaceutical Sciences Series|first1 = Stephan|last1 = Schmidt|first2 = Teun M.|last2 = Post|first3 = Massoud A.|last3 = Boroujerdi|first4 = Charlotte van|last4 = Kesteren|first5 = Bart A.|last5 = Ploeger|first6 = Oscar E. Della|last6 = Pasqua|first7 = Meindert|last7 = Danhof| title=Clinical Trial Simulations | chapter=Disease Progression Analysis: Towards Mechanism-Based Models | volume=1 |editor-first = Holly H. C.|editor-last = Kimko|editor-first2 = Carl C.|editor-last2 = Peck|doi = 10.1007/978-1-4419-7415-0_19}} sensor network,{{cite journal |last1=De Silva |first1=V. |last2=Ghrist |first2=R. |title=Coverage in sensor networks via persistent homology |journal=Algebraic & Geometric Topology |volume=7 |issue=1 |pages=339–358 |date=2007 |doi=10.2140/agt.2007.7.339 |url=https://msp.org/agt/2007/7-1/p16.xhtml|doi-access=free }} signal analysis,{{Cite journal|title = Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis|journal = Foundations of Computational Mathematics|date = 2014-05-29|issn = 1615-3375|pages = 799–838|volume = 15|issue = 3|doi = 10.1007/s10208-014-9206-z|first1 = Jose A.|last1 = Perea|first2 = John|last2 = Harer|s2cid = 592832|citeseerx = 10.1.1.357.6648}} cosmic web,{{Cite book|title = Transactions on Computational Science XIV|url = http://dl.acm.org/citation.cfm?id=2172419.2172422|publisher = Springer-Verlag|date = 2011-01-01|location = Berlin, Heidelberg|isbn = 978-3-642-25248-8|pages = 60–101|first1 = Rien|last1 = van de Weygaert|first2 = Gert|last2 = Vegter|first3 = Herbert|last3 = Edelsbrunner|first4 = Bernard J. T.|last4 = Jones|first5 = Pratyush|last5 = Pranav|first6 = Changbom|last6 = Park|first7 = Wojciech A.|last7 = Hellwing|first8 = Bob|last8 = Eldering|first9 = Nico|last9 = Kruithof|editor-first = Marina L.|editor-last = Gavrilova|editor-link=Marina Gavrilova|editor-first2 = C. Kenneth|editor-last2 = Tan|editor-first3 = Mir Abolfazl|editor-last3 = Mostafavi}} complex network,{{Cite journal|title = Persistent homology of complex networks - IOPscience|date = 2009-03-01|doi = 10.1088/1742-5468/2009/03/p03034 |first1 = Danijela|last1 = Horak|first2 = Slobodan|last2 = Maletić|first3 = Milan|last3 = Rajković|s2cid = 15592802|volume=2009|issue = 3|journal=Journal of Statistical Mechanics: Theory and Experiment|pages=03034|arxiv = 0811.2203|bibcode = 2009JSMTE..03..034H}}{{Cite journal|title = Persistent Homology of Collaboration Networks|journal = Mathematical Problems in Engineering|date = 2013-06-04|pages = 1–7|volume = 2013|doi = 10.1155/2013/815035|first1 = C. J.|last1 = Carstens|first2 = K. J.|last2 = Horadam|author2-link=Kathy Horadam|doi-access = free}}{{Cite journal|title = Persistent Brain Network Homology From the Perspective of Dendrogram|journal = IEEE Transactions on Medical Imaging|date = 2012-12-01|issn = 0278-0062|pages = 2267–2277|volume = 31|issue = 12|doi = 10.1109/TMI.2012.2219590|pmid = 23008247|first1 = Hyekyoung|last1 = Lee|first2 = Hyejin|last2 = Kang|first3 = M.K.|last3 = Chung|first4 = Bung-Nyun|last4 = Kim|first5 = Dong Soo|last5 = Lee|s2cid = 858022|citeseerx = 10.1.1.259.2692}}{{Cite journal|title = Homological scaffolds of brain functional networks|journal = Journal of the Royal Society Interface|date = 2014-12-06|issn = 1742-5689|pmc = 4223908|pmid = 25401177|pages = 20140873|volume = 11|issue = 101|doi = 10.1098/rsif.2014.0873|first1 = G.|last1 = Petri|first2 = P.|last2 = Expert|first3 = F.|last3 = Turkheimer|first4 = R.|last4 = Carhart-Harris|first5 = D.|last5 = Nutt|first6 = P. J.|last6 = Hellyer|first7 = F.|last7 = Vaccarino}} fractal geometry,{{Cite journal|title = Measuring shape with topology|journal = Journal of Mathematical Physics|date = 2012-07-01|issn = 0022-2488|pages = 073516|volume = 53|issue = 7|doi = 10.1063/1.4737391|first1 = Robert|last1 = MacPherson|first2 = Benjamin|last2 = Schweinhart|s2cid = 17423075|bibcode = 2012JMP....53g3516M|arxiv = 1011.2258}} viral evolution,{{Cite journal|title = Topology of viral evolution|journal = Proceedings of the National Academy of Sciences|date = 2013-11-12|issn = 0027-8424|pmc = 3831954|pmid = 24170857|pages = 18566–18571|volume = 110|issue = 46|doi = 10.1073/pnas.1313480110|first1 = Joseph Minhow|last1 = Chan|first2 = Gunnar|last2 = Carlsson|first3 = Raul|last3 = Rabadan|bibcode = 2013PNAS..11018566C|doi-access = free}} propagation of contagions on networks,{{Cite journal|title = Topological data analysis of contagion maps for examining spreading processes on networks|journal = Nature Communications|date = 2015-08-21|issn = 2041-1723 |pages = 7723|issue = 6|doi = 10.1038/ncomms8723|pmid = 26194875|pmc = 4566922|first1 = D.|last1 = Taylor|first2 = et.|last2 = al|volume=6|arxiv = 1408.1168|bibcode = 2015NatCo...6.7723T}} bacteria classification using molecular spectroscopy,{{Cite journal | title = Topological data analysis: A promising big data exploration tool in biology, analytical chemistry and physical chemistry | journal = Analytica Chimica Acta | volume = 910 | pages = 1–11 | date = 2016 | doi = 10.1016/j.aca.2015.12.037 | pmid = 26873463 | first = M. | last = Offroy| bibcode = 2016AcAC..910....1O }} super-resolution microscopy,{{Cite journal | title = Advanced image-free analysis of the nano-organization of chromatin and other biomolecules by Single Molecule Localization Microscopy (SMLM) | doi = 10.1016/j.csbj.2023.03.009 |journal = Computational and Structural Biotechnology Journal | volume = 21 | pages = 2018–2034| date = 2023 |publisher = Elsevier | first1 = Jonas | last1 = Weidner | first2 = Charlotte | last2 = Neitzel | first3 = Martin | last3 = Gote | first4 = Jeanette | last4 = Deck | first5 = Kim | last5 = Küntzelmann | first6 = Götz | last6 = Pilarczyk | first7 = Martin | last7 = Falk| first8 = Michael | last8 = Hausmann| pmid = 36968017 | pmc = 10030913 }} hyperspectral imaging in physical-chemistry,{{Cite journal | title = Exploring hyperspectral imaging data sets with topological data analysis | journal = Analytica Chimica Acta | volume = 1000 | pages = 123–131 | date = 2018 | doi = 10.1016/j.aca.2017.11.029 | pmid = 29289301 | first = L. | last = Duponchel| bibcode = 2018AcAC.1000..123D }} remote sensing,{{Cite journal | title = When remote sensing meets topological data analysis | journal = Journal of Spectral Imaging | volume = 7 | pages = a1 | date = 2018 | doi = 10.1255/jsi.2018.a1 | first = L. | last = Duponchel| doi-broken-date = 2024-11-11 | doi-access = free }} feature selection,{{Cite journal | volume = 34 | pages = 4747–4754 | date = 2020 | doi = 10.1609/aaai.v34i04.5908 | first1 = Xiaoyun | last1 = Li| first2 = Chenxi | last2 = Wu | first3 = Ping | last3 = Li | title = IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation | journal = Proceedings of the AAAI Conference on Artificial Intelligence | issue = 4 | doi-access = free | arxiv = 2004.01299 }} and early warning signs of financial crashes.{{cite journal | last1=Gidea | first1=Marian | last2=Katz | first2=Yuri | title=Topological data analysis of financial time series: Landscapes of crashes | journal=Physica A: Statistical Mechanics and Its Applications | publisher=Elsevier BV | volume=491 | year=2018 | issn=0378-4371 | doi=10.1016/j.physa.2017.09.028 | pages=820–834| arxiv=1703.04385 | bibcode=2018PhyA..491..820G | s2cid=85550367 }}

Another way is by distinguishing the techniques by G. Carlsson,{{Blockquote|text = one being the study of homological invariants of data on individual data sets, and the other is the use of homological invariants in the study of databases where the data points themselves have geometric structure.|sign = |source = }}

= Impact on mathematics =

Topological data analysis and persistent homology have had impacts on Morse theory.Adams, H., Atanasov, A., & Carlsson, G. (2011, October 6). [https://www.math.colostate.edu/~adams/talks/MorseTheoryInTdaSlides.pdf Morse Theory in Topological Data Analysis]. Presented at the SIAM Conference on Applied Algebraic Geometry. Accessed 28 October 2023 Morse theory has played a very important role in the theory of TDA, including on computation. Some work in persistent homology has extended results about Morse functions to tame functions or, even to continuous functions{{Citation needed|date=April 2022}}. A forgotten result of R. Deheuvels long before the invention of persistent homology extends Morse theory to all continuous functions.{{Cite journal|title = Topologie D'Une Fonctionnelle|journal = Annals of Mathematics|date = 1955-01-01|pages = 13–72|volume = 61|series = Second Series|issue = 1|doi = 10.2307/1969619|first = René|last = Deheuvels|jstor=1969619}}

One recent result is that the category of Reeb graphs is equivalent to a particular class of cosheaf.{{Cite journal|title = Categorified Reeb graphs|journal=Discrete & Computational Geometry|volume=55| issue=4|pages=854–906|first1=Vin|last1=de Silva| first2=Elizabeth|last2=Munch|first3=Amit|last3 = Patel|s2cid=7111141|doi=10.1007/s00454-016-9763-9|date=2016-04-13|arxiv=1501.04147}} This is motivated by theoretical work in TDA, since the Reeb graph is related to Morse theory and MAPPER is derived from it. The proof of this theorem relies on the interleaving distance.

Persistent homology is closely related to spectral sequences.{{Cite book|title = Surveys on Discrete and Computational Geometry: Twenty Years Later : AMS-IMS-SIAM Joint Summer Research Conference, June 18-22, 2006, Snowbird, Utah|url = https://books.google.com/books?id=ecUbCAAAQBAJ|publisher = American Mathematical Society|date = 2008-01-01|isbn = 9780821842393|first = Jacob E.|last = Goodman}}{{cite book |chapter=Persistent homology — a survey |chapter-url=http://www.ams.org/books/conm/453/8802 |volume=453|last1=Edelsbrunner |first1=Herbert |last2=Harer |first2=John |pages=15–18 |doi=10.1090/conm/453/08802 |quote=Section 5 |citeseerx=10.1.1.87.7764 |title=Surveys on Discrete and Computational Geometry: Twenty Years Later |series=Contemporary Mathematics|publisher=AMS|year=2008|isbn=9780821842393 |url=http://www.ams.org/books/conm/453/}} In particular the algorithm bringing a filtered complex to its canonical form permits much faster calculation of spectral sequences than the standard procedure of calculating $E^r_{p, q}$ groups page by page. Zigzag persistence may turn out to be of theoretical importance to spectral sequences.

= DONUT: A Database of TDA Applications =

The [https://donut.topology.rocks/ Database of Original & Non-Theoretical Uses of Topology (DONUT)] is a database of scholarly articles featuring practical applications of topological data analysis to various areas of science. DONUT was started in 2017 by Barbara Giunti, Janis Lazovskis, and Bastian Rieck,Giunti, B., Lazovskis, J., & Rieck, B. (2023, April 24). DONUT -- Creation, Development, and Opportunities of a Database. arXiv. http://arxiv.org/abs/2304.12417. Accessed 28 October 2023 and as of October 2023 currently contains 447 articles.Barbara Giunti, Janis Lazovskis, and Bastian Rieck, Zotero database of real-world applications of Toplogical Data Analysis, 2020. [https://www.zotero.org/groups/tda-applications https://www.zotero.org/groups/tda-applications.] DONUT was featured in the November 2023 issue of the Notices of the American Mathematical Society.Giunti, B., Lazovskis, J., & Rieck, B. (2023). [https://www.ams.org/journals/notices/202310/rnoti-p1640.pdf DONUT: Creation, Development, and Opportunities of a Database]. Notices of the American Mathematical Society, 70(10), 1640–1644. https://doi.org/10.1090/noti2798

= Applications to Adversarial ML =

The stability property of topological features to small perturbations has been applied to make Graph Neural Networks robust against adversaries. Arafat et. al. {{cite arXiv | last1=Arafat | first1=Naheed Anjum | last2=Basu | first2=Debabrota | last3=Gel | first3=Yulia | last4=Chen | first4=Yuzhou | date=2025 |title=When Witnesses Defend: A Witness Graph Topological Layer for Adversarial Graph Learning |eprint=2409.14161 |class=cs.LG}} proposed a robustness framework which systematically integrates both local and global topological graph feature representations, the impact of which is controlled by the robust regularized topological loss. Given the attacker's budget, they derived stability guarantees on the node representations, establishing an important connection between [https://en.wikipedia.org/wiki/Topological_data_analysis#Stability_2 Topological stability] and Adversarial ML.

References

= Brief Introductions =

{{cite web |first=Michael |last=Lesnick |title=Studying the Shape of Data Using Topology |date=2013 |publisher=Institute for Advanced Study |url=https://www.ias.edu/about/publications/ias-letter/articles/2013-summer/lesnick-topological-data-analysis }}
[http://appliedtopology.org/source-material-for-topological-data-analysis/ Source Material for Topological Data Analysis] by Mikael Vejdemo-Johansson

= Monograph =

{{cite book |first=Steve Y. |last=Oudot |title=Persistence Theory: From Quiver Representations to Data Analysis |publisher=American Mathematical Society |date=2015 |isbn=978-1-4704-2545-6 |url=https://books.google.com/books?id=if8dCwAAQBAJ}}

= Textbooks on Topology =

{{cite book |first=Allen |last=Hatcher |title=Algebraic Topology |publisher=Cambridge University Press |date=2002 |isbn=0-521-79540-0 |url=https://pi.math.cornell.edu/~hatcher/AT/ATpage.html}} Available for Download
{{Cite book |title = Computational Topology: An Introduction |url = https://books.google.com/books?id=MDXa6gFRZuIC |publisher = American Mathematical Society |date = 2010 |isbn = 9780821849255 |first1 = Herbert |last1 = Edelsbrunner |first2 = John |last2 = Harer}}
[http://www.math.upenn.edu/~ghrist/notes.html Elementary Applied Topology], by Robert Ghrist

External links

[https://donut.topology.rocks/ Database of Original & Non-Theoretical Uses of Topology (DONUT)]

= Video Lectures =

[https://www.youtube.com/watch?v=2PSqWBIrn90 Introduction to Persistent Homology] and [https://www.youtube.com/watch?v=fUvl-B2lx5Q Topology for Data Analysis], by Matthew Wright
[https://www.youtube.com/watch?v=iOxLgbnl1u4 The Shape of Data], by Gunnar Carlsson

= Other Resources of TDA =

[http://appliedtopology.org/source-material-for-topological-data-analysis/ Applied Topology], by Stanford
[https://www.ima.umn.edu/topology/ Applied algebraic topology research network] {{Webarchive|url=https://web.archive.org/web/20160131140959/http://www.ima.umn.edu/topology/ |date=2016-01-31 }}, by the Institute for Mathematics and its Applications

Category:Computational topology

Category:Data analysis

Category:Homology theory

Category:Applied mathematics

Category:Articles with example R code

topological data analysis

Basic theory

= Intuition =

= Early history =

= Concepts =

= Basic property =

== Structure theorem ==

== Stability ==

= Workflow =

Computation

Visualization

Multidimensional persistence

Other persistences

= Zigzag persistence =

= Extended persistence and levelset persistence =

= Circular persistence =

= Persistence with torsion =

Categorification and cosheaves

Stability

Structure theorem

Statistics

Applications

= Classification of applications =

= Impact on mathematics =

= DONUT: A Database of TDA Applications =

= Applications to Adversarial ML =

See also

References

Further reading

= Brief Introductions =

= Monograph =

= Textbooks on Topology =

External links

= Video Lectures =

= Other Resources of TDA =