configuration model
File:Degree sequence and different realizations in the configuration model.jpg
In network science, the Configuration Model is a family of random graph models designed to generate networks from a given degree sequence. Unlike simpler models such as the Erdős–Rényi model, Configuration Models preserve the degree of each vertex as a pre-defined property. This flexibility allows the modeler to construct networks with arbitrary degree distributions, making it widely used as a reference model for real-life networks, particularly in social, biological, and technological domains. The concept of "Configuration Model" was first introduced by Béla Bollobás, who laid the foundation for its application in graph theory and network science.Bollobás, B. (1980). A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics, 1(4), 311-316.
{{Network science}}
Rationale for the model
The key advantage of the Configuration Model lies in its ability to decouple the degree sequence from specific edge generation processes.{{Cite book|url=http://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780199206650.001.0001/acprof-9780199206650|title=Networks: An Introduction – Oxford Scholarship|last=Newman|first=Mark|language=en|doi=10.1093/acprof:oso/9780199206650.001.0001|isbn=9780191594175|publisher=Oxford University Press|date=2010-03-25}} This makes it suitable for analyzing networks with heterogeneous degree distributions, such as scale-free networks, which exhibit heavy-tailed degree distributions. By preserving these structural properties, Configuration Models provide a null hypothesis for understanding the role of degree distributions in shaping network properties.
Configuration Models can be specified for different types of graphs:
- Simple graphs: Graphs without self-loops or multi-edges.
- Multi-edge graphs: Graphs allowing multiple edges between the same pair of nodes.
- Loopy graphs: Graphs that include self-loops (edges connecting a node to itself).
- Directed graphs: Models with specified in-degrees and out-degrees for each node.
- Undirected graphs: Models that consider the total degree of each node.
Configuration Models
The family of Configuration Models can be categorized into two main groups:
- Microcanonical Configuration Models: Fix the degree sequence exactly.
- Canonical Configuration Models: Fix the degree sequence in expectation.
= Micro-Canonical Configuration Model =
The Micro-Canonical Configuration Model is the most common variation of the configuration model. It exactly preserves the degree sequence of a given graph by assigning stubs (half-edges) to nodes based on their degrees and then randomly pairing the stubs to form edges.
The preservation of the degree sequence is exact in the sense that all realizations of the model result in graphs with the same predefined degree distribution. This variation is referred to as micro-canonical because it defines a uniform probability space where all possible graphs consistent with the given degree sequence are equally likely to be sampled. This approach ensures that the degree sequence is strictly maintained.
The model can be formalized as stub-labelled model, where stubs are explicitly labelled and paired; alternatively, it can be formalised as a vertex-labelled models, where edges between vertices are indistinguishable.Fosdick, Bailey K., et al. "Configuring random graph models with fixed degree sequences." Siam Review 60.2 (2018): 315-355.
= Canonical Configuration Models =
Canonical Configuration Models relax the exact degree sequence constraint and preserve it only in expectation. These models define probability distributions over the edges of the network.
Canonical configuration models are often referred to as soft configuration models.
== Chung-Lu Configuration Model ==
The Chung-Lu Configuration Model{{Cite journal|last1=Chung|first1=Fan|last2=Lu|first2=Linyuan|date=2002-12-10|title=The average distances in random graphs with given expected degrees|journal=Proceedings of the National Academy of Sciences|language=en|volume=99|issue=25|pages=15879–15882|doi=10.1073/pnas.252631999|issn=0027-8424|pmid=12466502|doi-access=free|pmc=138532|bibcode=2002PNAS...9915879C}} is a canonical configuration model variant for simple graphs (i.e., without multi-edges). It assumes that edges between nodes are independent from each other, differently from the micro-canonical model. The probability of an edge between nodes and is proportional to the product of their degrees and, in the case of a loopy undirected graph, is given by:
where and are the degrees of nodes and , and is the total number of edges in the graph. In this formulation, the expected degree sequence matches the input degrees, but the actual degree sequence in any realization may vary slightly due to the probabilistic nature of edge formation.
=== Application: Modularity Calculation ===
The Chung-Lu configuration model, provides the benchmark in the calculation of network modularity. Modularity measures how well a network is divided into modules or communities, comparing the observed structure against a null model: the configuration model. The modularity is computed as:
where:
- is the adjacency matrix of the observed network, with if there is an edge between and , and otherwise.
- represents the expected probability of an edge between nodes and under the configuration model.
- is the Kronecker delta, equal to 1 if nodes and are in the same module and 0 otherwise.
This equation compares the observed network to the configuration model, where edges are distributed based solely on the degree sequence. Modularity scores provide insights into the quality of community structures in the network, with higher scores indicating stronger modularity.
For further details, refer to the page on modularity.{{Cite journal|last=Newman|first=M. E. J.|date=2004|title=Finding and evaluating community structure in networks|journal=Physical Review E|volume=69|issue=2|pages=026113|doi=10.1103/physreve.69.026113|pmid=14995526|arxiv=cond-mat/0308217|bibcode=2004PhRvE..69b6113N|s2cid=197314}}
== Norros-Reittu Configuration Model ==
The Norros-Reittu Configuration ModelNorros, I., & Reittu, H. (2006). On a conditionally Poissonian graph process. Advances in Applied Probability, 38(1), 59-75. extends the Chung-Lu configuration model by allowing for multi-edges between nodes. In this model, the number of edges between nodes and follows a Poisson distribution, given by:
where is the expected number of edges between nodes and , defined as:
Here:
and are the degrees of nodes and , respectively.
is the total number of edges in the graph.
As for Chung-Lu configuration model, nodes pairs are assumed to be independent from each other.
=== Connection to Degree-Corrected Stochastic Block Model ===
The Norros-Reittu Configuration Model is particularly important as it provides the degree correction for the Degree-Corrected Stochastic Block Model (DC-SBM).Karrer, Brian, and Mark EJ Newman. "Stochastic blockmodels and community structure in networks." Physical Review E—Statistical, Nonlinear, and Soft Matter Physics 83.1 (2011): 016107. By incorporating the degree sequence into the edge probability, this model allows the DC-SBM to distinguish between structural patterns in communities and variations driven by node degrees. This degree correction ensures that the observed degree heterogeneity in networks does not bias the identification of community structures.
== Casiraghi-Nanumyan Hypergeometric Configuration Model ==
The Casiraghi-Nanumyan Hypergeometric Configuration ModelCasiraghi, Giona, and Vahan Nanumyan. "Configuration models as an urn problem." Scientific reports 11.1 (2021): 13416. extends canonical configuration models by accounting for dependencies between edges. Unlike models assuming independent edge generation, this model uses a multivariate hypergeometric distribution to represent the probability of an entire graph configuration. This approach preserves the degree sequence while modeling competition for resources.
The probability of observing a specific graph with adjacency matrix is given by:
P(A) = \frac{\prod_{i where is the number of edges between nodes and , represents the propensity for interaction between nodes and based on their degrees, is the total number of edges in the graph and . This model captures resource competition by enforcing that the sum of interactions across all node pairs is fixed. Forming an edge between two nodes reduces the availability of resources for other connections, reflecting realistic constraints in systems like social or ecological networks. The model naturally generates multi-edges for any node pair and where .
== Configuration Model from Maximum Entropy Principles ==
Starting from the maximum entropy principle, one can generate another canonical configuration model, referred to as the soft configuration model. This model belongs to the family of exponential random graph models (ERGMs) and generates random graphs by constraining the expected degree sequence while maximizing the entropy of the graph model.
The probability of a graph is given by:
where
are Lagrange multipliers ensuring the expected degree of node matches the target degree,
is the degree of node in the graph .
Generating from the Micro-Canonical Configuration Model
The following algorithm describes the generation of the model:
- Take a degree sequence, i. e. assign a degree to each vertex. The degrees of the vertices are represented as half-links or stubs. The sum of stubs must be even in order to be able to construct a graph (). The degree sequence can be drawn from a theoretical distribution or it can represent a real network (determined from the adjacency matrix of the network).
- Choose two stubs uniformly at random and connect them to form an edge. Choose another pair from the remaining stubs and connect them. Continue until you run out of stubs. The result is a network with the pre-defined degree sequence. The realization of the network changes with the order in which the stubs are chosen, they might include cycles (b), self-loops (c) or multi-links (d) (Figure 1).
= Self-loops, multi-edges and implications =
The algorithm described above matches any stubs with the same probability. The uniform distribution of the matching is an essential property in calculating other features of the generated networks. The network generation process does not exclude the development of a self-loop or a multi-link. If we designed the process where self-loops and multi-edges are not allowed, the matching of the stubs would not follow a uniform distribution.
The expected total number of multi-links in a configuration model network would be:
where is the n-th moment of the degree distribution. Therefore, the average number of self-loops and multi-links is a constant for some large networks, and the density of self-loops and multi-links, meaning the number per node, goes to zero as as long as is constant and finite. For some power-law degree distributions where the second moment diverges, density of multi-links may not vanish or may do so more slowly than .
A further consequence of self-loops and multi-edges is that not all possible networks are generated with the same probability. In general, all possible realizations can be generated by permuting the stubs of all vertices in every possible way. The number of permutation of the stubs of node is , so the number of realizations of a degree sequence is . This would mean that each realization occurs with the same probability. However, self-loops and multi-edges can change the number of realizations, since permuting self-edges can result an unchanged realization. Given that the number of self-loops and multi-links vanishes as , the variation in probabilities of different realization will be small but present.
Properties
= Edge probability =
A stub of node can be connected to other stubs (there are
stubs altogether, and we have to exclude the one we are currently observing). The vertex has
stubs to which node can be connected with the same probability (because of the uniform distribution). The probability of a stub of node being connected to one of these
stubs is
. Since node has
stubs, the probability of being connected to is
(
for sufficiently large
). Note that this formula can only be viewed as a probability if , and more precisely it describes the expected number of edges between nodes and . Note that this formula does not apply to the case of self-edges.
Given a configuration model with a degree distribution
, the probability of a randomly chosen node
having degree
is
. But if we took one of the vertices to which we can arrive following one of edges of i, the probability of having degree k is
. (The probability of reaching a node with degree k is
, and there are
such nodes.) This fraction depends on
as opposed to the degree of the typical node with
. Thus, a neighbor of a typical node is expected to have higher degree than the typical node itself. This feature of the configuration model describes well the phenomenon of "my friends having more friends than I do".
= Clustering coefficient =
(the average probability that the neighbors of a node are connected) is computed approximately as follows:
:
where
denotes the probability that a random edge reaches a degree- vertex, and the factors of the form "" rather than "" appear because one stub has been accounted for by the fact that these are neighbors of a common vertex. Evaluating the above results in
:
Using and , with denoting the degree distribution, denoting the average degree, and denoting the number of vertices, the above becomes
:
with denoting the second moment of the degree distribution. Assuming that and are constant, the above behaves as
:
where the constant depends on . Thus, the clustering coefficient becomes small in the limit.
= Giant component =
In the configuration model, a giant component (GC) exists if
:
where and are the first and second moments of the degree distribution. That means that, the critical threshold solely depends on quantities which are uniquely determined by the degree distribution .
Configuration model generates locally tree-like networks, meaning that any local neighborhood in such a network takes the form of a tree. More precisely, if you start at any node in the network and form the set of all nodes at distance or less from that starting node, the set will, with probability tending to 1 as n → ∞, take the form of a tree.{{Cite book|last=Newman|first=Mark|url=http://www.oxfordscholarship.com/view/10.1093/oso/9780198805090.001.0001/oso-9780198805090|title=Networks|date=2018-10-18|publisher=Oxford University Press|isbn=978-0-19-880509-0|volume=1|language=en|doi=10.1093/oso/9780198805090.001.0001}} In tree-like structures, the number of second neighbors averaged over the whole network, , is:
Then, in general, the average number at distance can be written as:
:
Which implies that if the ratio of is larger than one, then the network can have a giant component. This is famous as the Molloy-Reed criterion.{{Cite journal|last1=Molloy|first1=Michael|last2=Reed|first2=Bruce|date=1995-03-01|title=A critical point for random graphs with a given degree sequence|journal=Random Structures & Algorithms|language=en|volume=6|issue=2–3|pages=161–180|citeseerx=10.1.1.24.6195|doi=10.1002/rsa.3240060204|issn=1098-2418}} The intuition behind this criterion is that if the giant component (GC) exists, then the average degree of a randomly chosen vertex in a connected component should be at least 2. Molloy-Reed criterion can also be expressed as:
which implies that, although the size of the GC may depend on and , the number of nodes of degree 0 and 2 have no contribution in the existence of the giant component.
= Diameter =
Configuration model can assume any degree distribution and shows the small-world effect, since to leading order the diameter of the configuration model is just .{{Cite journal|last1=Chung|first1=Fan|last2=Lu|first2=Linyuan|date=2002-12-10|title=The average distances in random graphs with given expected degrees|journal=Proceedings of the National Academy of Sciences|language=en|volume=99|issue=25|pages=15879–15882|doi=10.1073/pnas.252631999|issn=0027-8424|pmid=12466502|doi-access=free|pmc=138532|bibcode=2002PNAS...9915879C}}
= Components of finite size =
As total number of vertices
tends to infinity, the probability to find two giant components is vanishing. This means that in the sparse regime, the model consist of one giant component (if any) and multiple connected components of finite size. The sizes of the connected components are characterized by their size distribution
- the probability that a randomly sampled vertex belongs to a connected component of size
There is a correspondence between the degree distribution and the size distribution
When total number of vertices tends to infinity, , the following relation takes place:{{Cite journal|last=Kryven|first=I|title=General expression for the component size distribution in infinite configuration networks|journal=Physical Review E|volume=95|issue=5|pages=052303|doi=10.1103/PhysRevE.95.052303|year=2017|url=http://dare.uva.nl/personal/pure/en/publications/general-expression-for-the-component-size-distribution-in-infinite-configuration-networks(fa1b270b-61a5-4f20-b496-ddf446fdfe80).html|arxiv=1703.05413|pmid=28618550|bibcode=2017PhRvE..95e2303K|hdl=11245.1/fa1b270b-61a5-4f20-b496-ddf446fdfe80|s2cid=8421307}}
:
\frac{\langle k \rangle}{n-1} u_1^{*n}(n-2),& n>1, \\
p_0 & n=1,
\end{cases}
where and denotes the -fold convolution power. Moreover, explicit asymptotes for
are known when and is close to zero. The analytical expressions for these asymptotes depend on finiteness of the moments of the degree distribution tail exponent (when features a heavy tail), and the sign of Molloy–Reed criterion. The following table summarises these relationships (the constants are provided in).
class="wikitable"
!Moments of !Tail of ! ! |
rowspan="5" |
| rowspan="2" |light tail | -1 or 1 | |
0
| |
rowspan="3" |heavy tail,
| -1 | |
0
| |
1
| |
rowspan="6" |
| rowspan="3" |heavy tail, | -1 | |
0
| |
1
| |
rowspan="3" |heavy tail,
| -1 | |
0
| |
1
| |
rowspan="2" |
|heavy tail, |1 | |
heavy tail,
|1 | |
Modelling
= Comparison to real-world networks =
Three general properties of complex networks are heterogeneous degree distribution, short average path length and high clustering.{{Cite journal|last1=Barabási|first1=Albert-László|last2=Albert|first2=Réka|date=1999-10-15|title=Emergence of Scaling in Random Networks|journal=Science|language=en|volume=286|issue=5439|pages=509–512|doi=10.1126/science.286.5439.509|issn=0036-8075|pmid=10521342|arxiv=cond-mat/9910332|bibcode=1999Sci...286..509B|s2cid=524106}}{{Cite journal|last1=Watts|first1=Duncan J.|last2=Strogatz|first2=Steven H.|date=1998|title=Collective dynamics of 'small-world' networks|journal=Nature|language=En|volume=393|issue=6684|pages=440–442|doi=10.1038/30918|pmid=9623998|issn=1476-4687|bibcode=1998Natur.393..440W|s2cid=4429113}}{{Cite web |title=Configuration Models and Complex Networks |url=https://danlarremore.com/configurationmodels/ |access-date=2024-11-28 |website=danlarremore.com}} Having the opportunity to define any arbitrary degree sequence, the first condition can be satisfied by design. Still, as shown above, the global clustering coefficient is an inverse function of the network size, so for large configuration networks, clustering tends to be small. This feature of the baseline model contradicts the known properties of empirical networks, but extensions of the model can solve this issue. (see {{Cite journal|last=Newman|first=M. E. J.|date=2009|title=Random Graphs with Clustering|journal=Physical Review Letters|volume=103|issue=5|pages=058701|doi=10.1103/physrevlett.103.058701|pmid=19792540|arxiv=0903.4009|bibcode=2009PhRvL.103e8701N|s2cid=28214709}}). All the networks generated by this model are locally tree-like provided the average of the excess degree distribution is either constant or grows slower than the square root of number of links, . In other words, this model prevents forming substructures such as loops in the large size limit. Vanishing of clustering coefficient, is a special case of this more general result. While the tree-like property makes the model not very realistic, so many calculations, such as the generating function methods, are possible for the configuration model thanks to this feature.
Directed configuration model
In the DCM (directed configuration model),{{cite journal |last1=COOPER |first1=COLIN |last2=FRIEZE |first2=ALAN |title=The Size of the Largest Strongly Connected Component of a Random Digraph with a Given Degree Sequence |journal=Combinatorics, Probability and Computing |date=May 2004 |volume=13 |issue=3 |pages=319–337 |doi=10.1017/S096354830400611X |s2cid=27511938 |url=https://www.cambridge.org/core/journals/combinatorics-probability-and-computing/article/size-of-the-largest-strongly-connected-component-of-a-random-digraph-with-a-given-degree-sequence/4E5122A8E9F95AFBF6632791AC792793 |language=en |issn=1469-2163|url-access=subscription }} each node is given a number of half-edges called tails and heads. Then tails and heads are matched uniformly at random to form directed edges. The size of the giant component,{{cite arXiv |last1=Cai |first1=Xing Shi |last2=Perarnau |first2=Guillem |title=The giant component of the directed configuration model revisited |date=10 April 2020|class=math.PR |eprint=2004.04998 }} the typical distance,{{cite journal |last1=van der Hoorn |first1=Pim |last2=Olvera-Cravioto |first2=Mariana |title=Typical distances in the directed configuration model |journal=The Annals of Applied Probability |date=June 2018 |volume=28 |issue=3 |pages=1739–1792 |doi=10.1214/17-AAP1342|arxiv=1511.04553 |s2cid=13683470 }} and the diameter{{cite arXiv |last1=Cai |first1=Xing Shi |last2=Perarnau |first2=Guillem |title=The diameter of the directed configuration model |date=10 March 2020|class=math.PR |eprint=2003.04965 }} of DCM have been studied mathematically.
There also has been extensive research on random walks on DCM.{{cite journal |last1=Bordenave |first1=Charles |last2=Caputo |first2=Pietro |last3=Salez |first3=Justin |title=Random walk on sparse random digraphs |journal=Probability Theory and Related Fields |date=1 April 2018 |volume=170 |issue=3 |pages=933–960 |doi=10.1007/s00440-017-0796-7 |arxiv=1508.06600 |s2cid=55211047 |url=https://doi.org/10.1007/s00440-017-0796-7 |language=en |issn=1432-2064}}{{cite journal |last1=Caputo |first1=Pietro |last2=Quattropani |first2=Matteo |title=Stationary distribution and cover time of sparse directed configuration models |journal=Probability Theory and Related Fields |date=1 December 2020 |volume=178 |issue=3 |pages=1011–1066 |doi=10.1007/s00440-020-00995-6 |s2cid=202565916 |language=en |issn=1432-2064|doi-access=free |hdl=11385/196435 |hdl-access=free |arxiv=1909.05752 }}{{cite arXiv |last1=Cai |first1=Xing Shi |last2=Perarnau |first2=Guillem |title=Minimum stationary values of sparse random directed graphs |date=14 October 2020 |class=math.PR |eprint=2010.07246}}
Some real-world complex networks have been modelled by DCM, such as neural networks,{{cite journal |last1=Amini |first1=Hamed |title=Bootstrap Percolation in Living Neural Networks |journal=Journal of Statistical Physics |date=1 November 2010 |volume=141 |issue=3 |pages=459–475 |arxiv=0910.0627 |bibcode=2010JSP...141..459A |s2cid=7601022 |doi=10.1007/s10955-010-0056-z |language=en |issn=1572-9613}} finance{{cite book |last1=Amini |first1=Hamed |last2=Minca |first2=Andreea |title=Advances in Network Analysis and its Applications |chapter=Mathematical Modeling of Systemic Risk |series=Mathematics in Industry |date=2013 |volume=18 |pages=3–26 |doi=10.1007/978-3-642-30904-5_1 |publisher=Springer |isbn=978-3-642-30903-8 |s2cid=166867930 |language=en}} and social networks.{{cite book |last1=Li |first1=Hui |title=2018 37th Chinese Control Conference (CCC) |chapter=Attack Vulnerability of Online Social Networks |date=July 2018 |pages=1051–1056 |isbn=978-988-15639-5-8 |s2cid=52933445 |doi=10.23919/ChiCC.2018.8482277}}
References
{{Reflist}}