Graph canonization
In graph theory, a branch of mathematics, graph canonization is the problem of finding a canonical form of a given graph G. A canonical form is a labeled graph Canon(G) that is isomorphic to G, such that every graph that is isomorphic to G has the same canonical form as G. Thus, from a solution to the graph canonization problem, one could also solve the problem of graph isomorphism: to test whether two graphs G and H are isomorphic, compute their canonical forms Canon(G) and Canon(H), and test whether these two canonical forms are identical.
The canonical form of a graph is an example of a complete graph invariant: every two isomorphic graphs have the same canonical form, and every two non-isomorphic graphs have different canonical forms.{{citation
| last1 = Arvind | first1 = Vikraman
| last2 = Das | first2 = Bireswar
| last3 = Köbler | first3 = Johannes
| contribution = A logspace algorithm for partial 2-tree canonization
| doi = 10.1007/978-3-540-79709-8_8
| mr = 2475148
| pages = 40–51
| publisher = Springer, Berlin
| series = Lecture Notes in Comput. Sci.
| title = Computer Science – Theory and Applications: Third International Computer Science Symposium in Russia, CSR 2008 Moscow, Russia, June 7-12, 2008, Proceedings
| volume = 5010
| year = 2008| isbn = 978-3-540-79708-1
}}.{{citation
| last1 = Arvind | first1 = V.
| last2 = Das | first2 = Bireswar
| last3 = Köbler | first3 = Johannes
| contribution = The space complexity of k-tree isomorphism
| doi = 10.1007/978-3-540-77120-3_71
| mr = 2472661
| pages = 822–833
| publisher = Springer, Berlin
| series = Lecture Notes in Comput. Sci.
| title = Algorithms and Computation: 18th International Symposium, ISAAC 2007, Sendai, Japan, December 17-19, 2007, Proceedings
| volume = 4835
| year = 2007| doi-access = free
| isbn = 978-3-540-77118-0
}}. Conversely, every complete invariant of graphs may be used to construct a canonical form.{{citation
| last = Gurevich | first = Yuri
| issue = 63
| journal = Bulletin of the European Association for Theoretical Computer Science
| mr = 1621595
| pages = 115–119
| title = From invariants to canonization
| url = http://research.microsoft.com/en-us/um/people/gurevich/opera/131.pdf
| year = 1997}}. The vertex set of an n-vertex graph may be identified with the integers from 1 to n, and using such an identification a canonical form of a graph may also be described as a permutation of its vertices. Canonical forms of a graph are also called canonical labelings,{{citation
| last1 = Babai | first1 = László | author1-link = László Babai
| last2 = Luks | first2 = Eugene | author2-link = Eugene Luks
| contribution = Canonical labeling of graphs
| doi = 10.1145/800061.808746
| pages = 171–183
| title = Proc. 15th ACM Symposium on Theory of Computing
| year = 1983| doi-access = free
| isbn = 0-89791-099-0 }}. and graph canonization is also sometimes known as graph canonicalization.
Computational complexity
{{Short description|Task in computational graph theory}}
{{unsolved|computer science|Is graph canonization polynomial-time equivalent to the graph isomorphism problem?}}
The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic. Clearly, the graph canonization problem is at least as computationally hard as the graph isomorphism problem. In fact, graph isomorphism is even AC0-reducible to graph canonization. However, it is still an open question whether the two problems are polynomial-time equivalent.
In 2019, László Babai announced a quasi-polynomial-time algorithm for graph canonization, that is, one with running time for some fixed .{{citation|last=Babai|first= László|url=https://par.nsf.gov/servlets/purl/10179675|title=Canonical Form for Graphs in Quasipolynomial Time|date=June 23, 2019}} While the existence of (deterministic) polynomial-time algorithms for graph isomorphism is still an open problem in computational complexity theory, in 1977 László Babai reported that with probability at least 1 − exp(−O(n)), a simple vertex-classification algorithm produces a canonical labeling of a graph chosen uniformly at random from the set of all n-vertex graphs after only two refinement steps. Small modifications and an added depth-first search step produce canonical labelings of such uniformly-chosen random graphs in linear expected time. This result sheds some light on the question of why many reported graph isomorphism algorithms behave well in practice.{{citation|last1 = Babai | first1 = László | authorlink = László Babai|title=On the Isomorphism Problem|publisher=unpublished manuscript|year =1977}}.{{citation
| last1 = Babai | first1 = László | author1-link = László Babai
| last2 = Kucera | first2 = L.
| contribution = Canonical labeling of graphs in linear average time
| doi = 10.1109/SFCS.1979.8
| pages = 39–46
| title = Proc. 20th Annual IEEE Symposium on Foundations of Computer Science
| year = 1979| s2cid = 14697933 }}. This was an important breakthrough in probabilistic complexity theory, which became widely known in its manuscript form and which was still cited as an "unpublished manuscript" long after it was reported at a symposium.
A commonly known canonical form is the lexicographically smallest graph within the isomorphism class, which is the graph of the class with lexicographically smallest adjacency matrix considered as a linear string. However, the computation of the lexicographically smallest graph is NP-hard. {{ citation
| last1 = Babai | first1 = László | author1-link = László Babai
| last2 = Luks | first2 = E.
| contribution = Canonical labeling of graphs
| title = Proc. 15th ACM Symposium on Theory of Computing
| year = 1983
| pages = 171–183}}
For trees, a concise polynomial-time canonization algorithm requiring O(n) space is presented by {{harvtxt|Read|1972}}.{{citation
| last = Read | first = Ronald C.
| contribution = The coding of various kinds of unlabeled trees
| mr = 0344150
| pages = 153–182
| publisher = Academic Press, New York
| title = Graph Theory and Computing
| year = 1972}}. Begin by labeling each vertex with the string 01. Iteratively for each non-leaf x, remove the leading 0 and trailing 1 from x
Applications
Graph canonization is the essence of many graph isomorphism algorithms. One of the leading tools is Nauty.{{citation
| last1 = McKay | first1 = Brendan D.
| last2 = Piperno | first2 = Adolfo
| contribution = Journal of Symbolic Computation
| issn = 0747-7171
| pages = 94–112
| title = Practical graph isomorphism, II
| url = http://pallini.di.uniroma1.it
| year = 2014| volume = 60
| doi = 10.1016/j.jsc.2013.09.003
| s2cid = 17930927
| doi-access = free
| arxiv = 1301.1493
}}.
A common application of graph canonization is in graphical data mining, in particular in chemical database applications.{{citation
| last1 = Cook | first1 = Diane J. | author1-link = Diane J. Cook
| last2 = Holder | first2 = Lawrence B.
| contribution = 6.2.1. Canonical Labeling
| isbn = 978-0-470-07303-2
| pages = 120–122
| title = Mining Graph Data
| url = https://books.google.com/books?id=bHGy0_H0g8QC&pg=PA120
| year = 2007| publisher = John Wiley & Sons }}.
A number of identifiers for chemical substances, such as SMILES and InChI, use canonization steps in their computation, which is essentially the canonization of the graph which represents the molecule.{{cite journal
|last1=Weininger
|first1=David
|last2=Weininger
|first2=Arthur
|last3=Weininger
|first3=Joseph L.
|title=SMILES. 2. Algorithm for generation of unique SMILES notation
|journal=Journal of Chemical Information and Modeling
|volume=29
|issue=2
|pages=97–101
|date=May 1989
|doi=10.1021/ci00062a008
|s2cid=6621315
|last1=Kelley
|first1=Brian
|title=Graph Canonicalization
|journal=Dr. Dobb's Journal
|url=http://www.drdobbs.com/graph-canonicalization/184405341
|date=May 2003
|last1=Scheider
|first1=Nadine
|last2=Sayle
|first2=Roger A.
|last3=Landrum
|first3=Gregory A.
|title=Get Your Atoms in Order — An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm
|journal=Journal of Chemical Information and Modeling
|volume=55
|issue=10
|pages=2111–2120
|date=October 2015
|pmid=26441310
|doi=10.1021/acs.jcim.5b00543
}} These identifiers are designed to provide a standard (and sometimes human-readable) way to encode molecular information and to facilitate the search for such information in databases and on the web.
References
{{reflist}}