De Bruijn graph

{{short description|Directed graph representing overlaps between sequences of symbols}}

In graph theory, an {{mvar|n}}-dimensional De Bruijn graph of {{mvar|m}} symbols is a directed graph representing overlaps between sequences of symbols. It has {{mvar|m{{sup|n}}}} vertices, consisting of all possible {{nowrap|length-{{mvar|n}}}} sequences of the given symbols; the same symbol may appear multiple times in a sequence. For a set of {{mvar|m}} symbols {{math|1=S = {s{{sub|1}}, …, s{{sub|m}}},}} the set of vertices is:

:V=S^n=\{(s_1,\dots,s_1,s_1),(s_1,\dots,s_1,s_2),\dots,(s_1,\dots,s_1,s_m),(s_1,\dots,s_2,s_1),\dots,(s_m,\dots,s_m,s_m)\}.

If one of the vertices can be expressed as another vertex by shifting all its symbols by one place to the left and adding a new symbol at the end of this vertex, then the latter has a directed edge to the former vertex. Thus the set of arcs (that is, directed edges) is

:E=\{((t_1,t_2,\dots,t_n),(t_2,\dots,t_n,s_j)) : t_i \in S, 1\le i\le n, 1\le j\le m \}.

Although De Bruijn graphs are named after Nicolaas Govert de Bruijn, they were invented independently by both de Bruijn and I. J. Good. Much earlier, Camille Flye Sainte-Marie implicitly used their properties.

Properties

  • If {{math|1=n = 1}}, then the condition for any two vertices forming an edge holds vacuously, and hence all the vertices are connected, forming a total of {{math|m{{sup|2}}}} edges.
  • Each vertex has exactly {{mvar|m}} incoming and {{mvar|m}} outgoing edges.
  • Each {{mvar|n}}-dimensional De Bruijn graph is the line digraph of the {{nowrap|{{math|(n – 1)}}-dimensional}} De Bruijn graph with the same set of symbols.
  • Each De Bruijn graph is Eulerian and Hamiltonian. The Euler cycles and Hamiltonian cycles of these graphs (equivalent to each other via the line graph construction) are De Bruijn sequences.

The line graph construction of the three smallest binary De Bruijn graphs is depicted below. As can be seen in the illustration, each vertex of the {{mvar|n}}-dimensional De Bruijn graph corresponds to an edge of the {{nowrap|{{math|(n – 1)}}-dimensional}} De Bruijn graph, and each edge in the {{mvar|n}}-dimensional De Bruijn graph corresponds to a two-edge path in the {{nowrap|{{math|(n – 1)}}-dimensional}} De Bruijn graph.

{{Wide image|DeBruijn-as-line-digraph.svg|600px|alt=Image showing multiple De Bruijn graphs, each the line graph of the previous one|align=center|border=no}}

Dynamical systems

Binary De Bruijn graphs can be drawn in such a way that they resemble objects from the theory of dynamical systems, such as the Lorenz attractor:

{{multiple image|total_width=600px|image1=DeBruijn-3-2.svg|caption1=Binary De Bruijn graph|image2=Lorenz attractor2.svg|caption2=Lorenz attractor|align=center}}

This analogy can be made rigorous: the {{mvar|n}}-dimensional {{mvar|m}}-symbol De Bruijn graph is a model of the Bernoulli map

:x\mapsto mx\ \bmod\ 1.

The Bernoulli map (also called the {{math|2x mod 1}} map for {{math|1=m = 2}}) is an ergodic dynamical system, which can be understood to be a single shift of a p-adic. The trajectories of this dynamical system correspond to walks in the De Bruijn graph, where the correspondence is given by mapping each real {{mvar|x}} in the interval {{math|[0,1)}} to the vertex corresponding to the first {{mvar|n}} digits in the base-{{mvar|m}} representation of {{mvar|x}}. Equivalently, walks in the De Bruijn graph correspond to trajectories in a one-sided subshift of finite type.

File:de_Bruijn_binary_graph.svgs and a {{math|B(2,4)}} sequence. In {{math|B(2,3)}}, each vertex is visited once, whereas in {{math|B(2,4)}}, each edge is traversed once.]]

Embeddings resembling this one can be used to show that the binary De Bruijn graphs have queue number 2 and that they have book thickness at most 5.

Uses

See also

References

{{reflist|refs=

{{cite journal

|last=de Bruijn |first=N. G. | authorlink = Nicolaas Govert de Bruijn

| title = A combinatorial problem

| journal = Indagationes Mathematicae

| volume = 49

| pages = 758–764

| year = 1946

| mr = 0018142}}

{{cite journal

| last = Flye Sainte-Marie | first = C.

| title = Question 48

| journal = L'Intermédiaire des Mathématiciens

| volume = 1

| year = 1894

| pages = 107–110}}

{{cite journal

| last = Good | first = I. J.

| authorlink = I. J. Good

| title = Normal recurring decimals

| journal = Journal of the London Mathematical Society

| volume = 21

| issue = 3

| year = 1946

| pages = 167–169

| doi = 10.1112/jlms/s1-21.3.167}}

{{cite journal

| last1 = Heath | first1 = Lenwood S.

| last2 = Rosenberg | first2 = Arnold L. | author2-link = Arnold L. Rosenberg

| doi = 10.1137/0221055

| issue = 5

| journal = SIAM Journal on Computing

| mr = 1181408

| pages = 927–958

| title = Laying out graphs using queues

| volume = 21

| year = 1992}}

{{cite journal

| last = Leroux | first = Philippe

| arxiv = quant-ph/0209100

| doi = 10.1155/IJMMS.2005.3979

| issue = 24

| journal = International Journal of Mathematics and Mathematical Sciences

| mr = 2204471

| pages = 3979–3996

| title = Coassociative grammar, periodic orbits, and quantum random walk over \Z

| year = 2005| volume = 2005

| doi-access = free

}}

{{cite journal

|last1=Zhang | first1 = Fu Ji

|last2=Lin | first2 = Guo Ning

| title = On the de Bruijn–Good graphs

| journal = Acta Mathematica Sinica

| volume = 30

| year = 1987

| issue = 2

| pages = 195–205}}

{{cite journal

|last1=Pevzner | first1 = Pavel A. | author1-link= Pavel A. Pevzner

|last2=Tang|first2= Haixu

|last3=Waterman|first3= Michael S. | author3-link = Michael Waterman

| title = An Eulerian path approach to DNA fragment assembly

| journal = Proceedings of the National Academy of Sciences

| volume = 98

| issue = 17

| year = 2001

| pages = 9748–9753

| pmid = 11504945

| pmc = 55524

| doi = 10.1073/pnas.171285098

| bibcode = 2001PNAS...98.9748P| doi-access = free }}

{{cite journal

|last1=Pevzner | first1 = Pavel A. | author1-link= Pavel A. Pevzner

|last2=Tang|first2= Haixu

| title = Fragment Assembly with Double-Barreled Data

| journal = Bioinformatics

| volume = 17

| issue=Suppl 1

| year = 2001

| pages = S225–S233

| doi = 10.1093/bioinformatics/17.suppl_1.S225| pmid=11473013

| doi-access = free

}}

{{cite journal

|last1=Zerbino|first1= Daniel R. |last2=Birney|first2= Ewan | title = Velvet: algorithms for de novo short read assembly using de Bruijn graphs

| journal = Genome Research

| volume = 18

| issue = 5

| year = 2008

| pages = 821–829

| doi = 10.1101/gr.074492.107

| pmid = 18349386

| pmc = 2336801}}

{{cite journal

| last1 = Chikhi | first1 = R.

| last2 = Limasset | first2 = A.

| last3 = Jackman | first3 = S.

| last4 = Simpson | first4 = J.

| last5= Medvedev | first5 = P.

| title = On the representation of de Bruijn graphs

| journal = Journal of Computational Biology

| year = 2014

| volume = 22

| issue = 5

| pages = 336–52

| doi = 10.1089/cmb.2014.0160

| pmid = 25629448

| arxiv = 1401.5383

| s2cid = 9502231

}}

{{cite journal

| last = Obrenić | first = Bojana

| doi = 10.1137/0406049

| issue = 4

| journal = SIAM Journal on Discrete Mathematics

| mr = 1241401

| pages = 642–654

| title = Embedding de Bruijn and shuffle-exchange graphs in five pages

| volume = 6

| year = 1993}}

{{cite journal

|last1= Iqbal | first1 = Zamin

|last2= Caccamo | first2 = Mario

|last3= Turner | first3 = Isaac

|last4= Flicek | first4 = Paul

|last5=McVean | first5 = Gil

| title = De novo assembly and genotyping of variants using colored de Bruijn graphs

| journal = Nature Genetics

| volume = 44

| issue = 2

| year = 2012

| pages = 226–32

| doi = 10.1038/ng.1028

| pmc = 3272472

| pmid = 22231483}}

}}