persistence barcode

{{Short description|Technique in topological data analysis}}

In topological data analysis, a persistence barcode, sometimes shortened to barcode, is an algebraic invariant associated with a filtered chain complex or a persistence module that characterizes the stability of topological features throughout a growing family of spaces.{{Cite journal |last=Ghrist |first=Robert |date=2007-10-26 |title=Barcodes: The persistent topology of data |url=http://www.ams.org/journal-getitem?pii=S0273-0979-07-01191-3 |journal=Bulletin of the American Mathematical Society |language=en |volume=45 |issue=1 |pages=61–76 |doi=10.1090/S0273-0979-07-01191-3 |issn=0273-0979|doi-access=free }} Formally, a persistence barcode consists of a multiset of intervals in the extended real line, where the length of each interval corresponds to the lifetime of a topological feature in a filtration, usually built on a point cloud, a graph, a function, or, more generally, a simplicial complex or a chain complex. Generally, longer intervals in a barcode correspond to more robust features, whereas shorter intervals are more likely to be noise in the data. A persistence barcode is a complete invariant that captures all the topological information in a filtration.{{Cite journal|title = Framed Morse complex and its invariants |journal = Advances in Soviet Mathematics | date = 1994|pages = 93–115|volume = 21|first = Sergey|last = Barannikov |series = ADVSOV |doi=10.1090/advsov/021/03|isbn = 9780821802373 |s2cid = 125829976 |url = https://hal.archives-ouvertes.fr/hal-01745109/file/GeneralizedMorse.pdf }} In algebraic topology, the persistence barcodes were first introduced by Sergey Barannikov in 1994 as the "canonical forms" invariants consisting of a multiset of line segments with ends on two parallel lines, and later, in geometry processing, by Gunnar Carlsson et al. in 2004.{{Cite book |last1=Carlsson |first1=Gunnar |last2=Zomorodian |first2=Afra |last3=Collins |first3=Anne |last4=Guibas |first4=Leonidas |title=Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing |chapter=Persistence barcodes for shapes |date=2004-07-08 |chapter-url=https://dl.acm.org/doi/10.1145/1057432.1057449 |language=en |location=Nice France |publisher=ACM |pages=124–135 |doi=10.1145/1057432.1057449 |isbn=978-3-905673-13-5|s2cid=456712 }}

Definition

Let \mathbb F be a fixed field. Consider a real-valued function on a chain complex f:K \rightarrow \mathbb{R} compatible with the differential, so that f(\sigma_i) \leq f(\tau) whenever \partial\tau=\sum_i\sigma_i in K. Then for every a \in \mathbb{R} the sublevel set K_a=f^{-1}((-\infty, a]) is a subcomplex of K, and the values of f on the generators in K define a filtration (which is in practice always finite):

: \emptyset = K_0 \subseteq K_1 \subseteq \cdots \subseteq K_n = K .

Then, the filtered complexes classification theorem states that for any filtered chain complex over \mathbb F, there exists a linear transformation that preserves the filtration and brings the filtered complex into so called canonical form, a canonically defined direct sum of filtered complexes of two types: two-dimensional complexes with trivial homology d(e_{a_j})=e_{a_i} and one-dimensional complexes with trivial differential d(e_{a'_i})=0.{{Cite journal|title = Framed Morse complex and its invariants |journal = Advances in Soviet Mathematics | date = 1994|pages = 93–115|volume = 21|first = Sergey|last = Barannikov |series = ADVSOV |doi=10.1090/advsov/021/03|isbn = 9780821802373 |s2cid = 125829976 |url = https://hal.archives-ouvertes.fr/hal-01745109/file/GeneralizedMorse.pdf }} The multiset \mathcal B_f of the intervals [a_i, a_j) or [a_i', \infty) describing the canonical form, is called the barcode, and it is the complete invariant of the filtered chain complex.

The concept of a persistence module is intimately linked to the notion of a filtered chain complex. A persistence module M indexed over \mathbb R consists of a family of \mathbb F-vector spaces \{ M_t \}_{t \in \mathbb R} and linear maps \varphi_{s,t} : M_s \to M_t for each s \leq t such that \varphi_{s,t} \circ \varphi_{r,s} = \varphi_{r,t} for all r \leq s \leq t.{{Cite journal |last1=Zomorodian |first1=Afra |last2=Carlsson |first2=Gunnar |date=2005 |title=Computing Persistent Homology |journal=Discrete & Computational Geometry |language=en |volume=33 |issue=2 |pages=249–274 |doi=10.1007/s00454-004-1146-y |issn=0179-5376|doi-access=free }} This construction is not specific to \mathbb R; indeed, it works identically with any totally-ordered set.

File:An example of the 0-dimensional persistence barcode of a filtered complex.png

A persistence module M is said to be of finite type if it contains a finite number of unique finite-dimensional vector spaces. The latter condition is sometimes referred to as pointwise finite-dimensional.{{Cite journal |last=Crawley-Boevey |first=William |date=2015 |title=Decomposition of pointwise finite-dimensional persistence modules |url=https://www.worldscientific.com/doi/abs/10.1142/S0219498815500668 |journal=Journal of Algebra and Its Applications |language=en |volume=14 |issue=5 |pages=1550066 |doi=10.1142/S0219498815500668 |arxiv=1210.0819 |s2cid=119635797 |issn=0219-4988}}

Let I be an interval in \mathbb R. Define a persistence module Q(I) via Q(I_s)=

\begin{cases}

0, & \text{if } s\notin I;\\

\mathbb F, & \text{otherwise}

\end{cases}, where the linear maps are the identity map inside the interval. The module Q(I) is sometimes referred to as an interval module.{{Cite book |last1=Chazal |first1=Fréderic |url=https://www.worldcat.org/oclc/960458101 |title=The structure and stability of persistence modules |last2=de Silva |first2=Vin |last3=Glisse |first3=Marc |last4=Oudot |first4=Steve |date=2016 |isbn=978-3-319-42545-0 |location=Switzerland |oclc=960458101}}

Then for any \mathbb R-indexed persistence module M of finite type, there exists a multiset \mathcal B_M of intervals such that M \cong \bigoplus_{I \in \mathcal B_M}Q(I), where the direct sum of persistence modules is carried out index-wise. The multiset \mathcal B_M is called the barcode of M, and it is unique up to a reordering of the intervals.

This result was extended to the case of pointwise finite-dimensional persistence modules indexed over an arbitrary totally-ordered set by William Crawley-Boevey and Magnus Botnan in 2020,Botnan, Magnus, and William Crawley-Boevey. "[https://www.ams.org/journals/proc/2020-148-11/S0002-9939-2020-14790-9/S0002-9939-2020-14790-9.pdf Decomposition of persistence modules]." Proceedings of the American Mathematical Society 148, no. 11 (2020): 4581-4596. building upon known results from the structure theorem for finitely generated modules over a PID, as well as the work of Cary Webb for the case of the integers.Webb, Cary. "[https://www.ams.org/journals/proc/1985-094-04/S0002-9939-1985-0792261-6/S0002-9939-1985-0792261-6.pdf Decomposition of graded modules]." Proceedings of the American Mathematical Society 94, no. 4 (1985): 565-571.

References