Google matrix

{{Short description|Stochastic matrix representing links between entities}}

{{Multiple issues|

{{context |date= March 2019}}

{{citation style|date=December 2014}}

}}

[[Image:Googlematrixwikipedia2009.jpg|thumb|240px|Fig.1. Google matrix of Wikipedia articles network, written in the bases of PageRank index; fragment of top 200 X 200 matrix elements is shown, total size N=3282257 (from

{{Cite journal

|arxiv=1106.6215

|title=Towards two-dimensional search engines

|journal=Journal of Physics A

|volume=45

|issue=27

|pages=275101

|year=2011

|bibcode=2012JPhA...45A5101E

|doi=10.1088/1751-8113/45/27/275101

|last1=Ermann

|first1=L.

|last2=Chepelianskii

|first2=A. D.

|last3=Shepelyansky

|first3=D. L.

|s2cid=14827486

}})]]

A Google matrix is a particular stochastic matrix that is used by Google's PageRank algorithm. The matrix represents a graph with edges representing links between pages. The PageRank of each page can then be generated iteratively from the Google matrix using the power method. However, in order for the power method to converge, the matrix must be stochastic, irreducible and aperiodic.

Adjacency matrix ''A'' and Markov matrix ''S''

In order to generate the Google matrix G, we must first generate an adjacency matrix A which represents the relations between pages or nodes.

Assuming there are N pages, we can fill out A by doing the following:

  1. A matrix element A_{i, j} is filled with 1 if node j has a link to node i, and 0 otherwise; this is the adjacency matrix of links.
  2. A related matrix S corresponding to the transitions in a Markov chain of given network is constructed from A by dividing the elements of column "j" by a number of k_j=\Sigma_{i=1}^N A_{i,j} where k_j is the total number of outgoing links from node j to all other nodes. The columns having zero matrix elements, corresponding to dangling nodes, are replaced by a constant value 1/N. Such a procedure adds a link from every sink, dangling state a to every other node.
  3. Now by the construction the sum of all elements in any column of matrix S is equal to unity. In this way the matrix S is mathematically well defined and it belongs to the class of Markov chains and the class of Perron-Frobenius operators. That makes S suitable for the PageRank algorithm.

Construction of Google matrix ''G''

Image:Googlematrixcambridge2006.jpg

Then the final Google matrix G can be expressed via S as:

: G_{ij} = \alpha S_{ij} + (1-\alpha) \frac{1}{N} \;\;\;\;\;\;\;\;\;\;\; (1)

By the construction the sum of all non-negative elements inside each matrix column is equal to unity. The numerical coefficient \alpha is known as a damping factor.

Usually S is a sparse matrix and for modern directed networks it has only about ten nonzero elements in a line or column, thus only about 10N multiplications are needed to multiply a vector by matrix G.{{cite book

|last1=Langville |first1=Amy N. | author1-link= Amy Langville

|last2=Meyer |first2=Carl

|title=Google's PageRank and Beyond

|publisher=Princeton University Press

|year=2006

|isbn=978-0-691-12202-1

}}{{Cite web

|last=Austin |first=David

|publisher=AMS Feature Columns

|title=How Google Finds Your Needle in the Web's Haystack

|url=http://www.ams.org/samplings/feature-column/fcarc-pagerank

|year=2008

}}

Examples of Google matrix

An example of the matrix S construction via Eq.(1) within a simple network is given in the article CheiRank.

For the actual matrix, Google uses a damping factor \alpha around 0.85.{{cite web

|title=PageRank Lecture 12

|date=2008-10-09

|last=Law |first=Edith

|url=https://www.cs.cmu.edu/~elaw/pagerank.pdf

}} The term (1-\alpha) gives a surfer probability to jump randomly on any page. The matrix G belongs to the class of Perron-Frobenius operators of Markov chains. The examples of Google matrix structure are shown in Fig.1 for Wikipedia articles hyperlink network in 2009 at small scale and in Fig.2 for University of Cambridge network in 2006 at large scale.

Spectrum and eigenstates of ''G'' matrix

[[Image:Googlematrixcambridge2006spectrum.gif|thumb|280px| Fig3. The spectrum of eigenvalues of the Google matrix of University of Cambridge from Fig.2 at

\alpha=1 , blue points show eigenvalues of isolated subspaces, red points show eigenvalues of core component (from {{Cite journal

|journal=Journal of Physics A

|volume=44 |issue=46 |page=465101

|arxiv=1105.1062

|doi=10.1088/1751-8113/44/46/465101

|title=Universal emergence of PageRank

|date=2011-11-01

|bibcode=2011JPhA...44T5101F

|last1=Frahm |first1=K. M. |last2=Georgeot |first2=B. |last3=Shepelyansky |first3=D. L. |s2cid=16292743 }})]]

For 0 < \alpha < 1 there is only one maximal eigenvalue \lambda =1 with the corresponding right eigenvector which has non-negative elements P_i which can be viewed as stationary probability distribution. These probabilities ordered by their decreasing values give the PageRank vector P_i with the PageRank K_i used by Google search to rank webpages. Usually one has for the World Wide Web that P \propto 1/K^{\beta} with \beta \approx 0.9 . The number of nodes with a given PageRank value scales as N_P \propto 1/P^\nu with the exponent \nu = 1+1/\beta \approx 2.1 .{{cite journal

|last1=Donato |first1=Debora

|last2=Laura |first2=Luigi

|last3=Leonardi |first3=Stefano

|last4=Millozzi |first4=Stefano

|journal=European Physical Journal B

|volume=38 |issue=2

|pages=239–243

|title=Large scale properties of the Webgraph

|date=2004-03-30

|doi=10.1140/epjb/e2004-00056-6

|url=http://www.cosinproject.eu/publications/donato-epjb38-2004.pdf

|citeseerx = 10.1.1.104.2136

|bibcode=2004EPJB...38..239D

|s2cid=10640375

}}{{Cite journal

|last1=Pandurangan |first1=Gopal

|last2=Ranghavan |first2=Prabhakar

|last3=Upfal |first3=Eli

|journal=Internet Mathematics

|volume=3 |issue=1 |pages=1–20

|title=Using PageRank to Characterize Web Structure

|url=http://cs.brown.edu/research/pubs/pdfs/2005/Pandurangan-2005-UPC.pdf

|year=2005 |doi=10.1080/15427951.2006.10129114

|s2cid=101281

|doi-access=free

}} The left eigenvector at \lambda =1 has constant matrix elements. With 0 < \alpha all eigenvalues move as \lambda_i \rightarrow \alpha \lambda_i except the maximal eigenvalue \lambda =1, which remains unchanged. The PageRank vector varies with \alpha but other eigenvectors with \lambda_i < 1 remain unchanged due to their orthogonality to the constant left vector at \lambda = 1 . The gap between \lambda = 1 and other eigenvalue being 1 - \alpha \approx 0.15 gives a rapid convergence of a random initial vector to the PageRank approximately after 50 multiplications on G matrix.

[[Image:Googlematrix1.jpg|thumb|350px| Fig4. Distribution of eigenvalues \lambda_i of Google matrices in the complex plane at \alpha= 1 for dictionary networks: Roget (A, N=1022), ODLIS (B, N=2909) and FOLDOC (C, N=13356); UK university WWW networks: University of Wales (Cardiff) (D, N=2778), Birmingham City University (E, N=10631), Keele University (Staffordshire) (F, N=11437), Nottingham Trent University (G, N=12660), Liverpool John Moores University (H, N=13578)(data for universities are for 2002) (from {{Cite journal

|journal=Physical Review E

|volume=81

|issue=5

|page=056109

|title=Spectral properties of the Google matrix of the World Wide Web and other directed networks

|date=2010-05-25

|doi=10.1103/PhysRevE.81.056109

|pmid=20866299

|arxiv=1002.3342

|bibcode=2010PhRvE..81e6109G

|last1=Georgeot

|first1=Bertrand

|last2=Giraud

|first2=Olivier

|last3=Shepelyansky

|first3=Dima L.

|s2cid=14490804

}})]]

At \alpha=1 the matrix G

has generally many degenerate eigenvalues \lambda =1

(see e.g. [6]). Examples of the eigenvalue spectrum of the Google matrix of various directed networks is shown in Fig.3 from and Fig.4 from.

The Google matrix can be also constructed for the Ulam networks generated by the Ulam method [8] for dynamical maps. The spectral properties of such matrices are discussed in [9,10,11,12,13,15]. In a number of cases the spectrum is described by the fractal Weyl law [10,12].

[[Image:Googlematrix2.jpg|thumb|280px| Fig5. Distribution of eigenvalues \lambda in the complex plane for the Google matrix G of the Linux Kernel version 2.6.32 with matrix size N=285509 at \alpha=0.85, unit circle is shown by solid curve (from {{Cite journal

|journal=European Physical Journal B

|volume=79

|issue=1

|pages=115–120

|arxiv=1005.1395

|doi=10.1140/epjb/e2010-10774-7

|title=Fractal Weyl law for Linux Kernel Architecture

|year=2011

|bibcode=2011EPJB...79..115E

|last1=Ermann

|first1=L.

|last2=Chepelianskii

|first2=A. D.

|last3=Shepelyansky

|first3=D. L.

|s2cid=445348

}})]]

Image:Googlematrix3.gif

The Google matrix can be constructed also for other directed networks, e.g. for the procedure call network of the Linux Kernel software introduced in [15]. In this case the spectrum of \lambda is described by the fractal Weyl law with the fractal dimension d \approx 1.3 (see Fig.5 from ). Numerical analysis shows that the eigenstates of matrix G are localized (see Fig.6 from ). Arnoldi iteration method allows to compute many eigenvalues and eigenvectors for matrices of rather large size [13].

Other examples of G matrix include the Google matrix of brain [17]

and business process management [18], see also. Applications of Google matrix analysis to

DNA sequences is described in [20]. Such a Google matrix approach allows also to analyze entanglement of cultures via ranking of multilingual Wikipedia articles abouts persons [21]

Historical notes

The Google matrix with damping factor was described by Sergey Brin and Larry Page in 1998 [22], see also articles on PageRank history [23],[24].

See also

References

{{Reflist}}

{{refbegin}}

  • {{Cite journal

|last=Serra-Capizzano

|first=Stefano

|journal=SIAM J. Matrix Anal. Appl.

|volume=27 |page=305

|title=Jordan Canonical Form of the Google Matrix: a Potential Contribution to the PageRank Computation

|year=2005

|issue=2

|doi=10.1137/s0895479804441407

|hdl=11383/1494937

|hdl-access=free

}}

  • {{Cite book

|last=Ulam

|first=Stanislaw

|publisher=Interscience |location=New York |page=73

|title=A Collection of Mathematical Problems

|series=Interscience Tracts in Pure and Applied Mathematics

|year=1960

}}

  • {{Cite journal

|author1=Froyland G.

|author2=Padberg K.

|journal=Physica D

|volume=238 |page=1507

|title=Almost-invariant sets and invariant manifolds—Connecting probabilistic and geometric descriptions of coherent structures in flows

|year=2009

|issue=16

|doi=10.1016/j.physd.2009.03.002

|bibcode=2009PhyD..238.1507F

}}

  • {{Cite journal

|author1=Shepelyansky D.L.

|author2=Zhirov O.V.

|journal=Phys. Rev. E

|volume=81 |page=036213

|title=Google matrix, dynamical attractors and Ulam networks

|year=2010

|issue=3

|doi=10.1103/physreve.81.036213

|pmid=20365838

|arxiv=0905.4162

|bibcode=2010PhRvE..81c6213S

|s2cid=15874766

}}

  • {{Cite journal

|author1=Ermann L.

|author2=Shepelyansky D.L.

|journal=Phys. Rev. E |volume=81 |page=036221

|title=Google matrix and Ulam networks of intermittency maps

|year=2010

|issue=3

|doi=10.1103/physreve.81.036221

|pmid=20365846

|arxiv=0911.3823

|bibcode=2010PhRvE..81c6221E

|s2cid=388806

}}

  • {{Cite journal

|author1=Ermann L.

|author2=Shepelyansky D.L.

|journal=Eur. Phys. J. B

|volume=75 |title=Ulam method and fractal Weyl law for Perron-Frobenius operators

|year=2010

|issue=3

|pages=299–304

|doi=10.1140/epjb/e2010-00144-0

|arxiv=0912.5083

|bibcode=2010EPJB...75..299E

|s2cid=54899977

}}

  • {{Cite journal

|author1=Frahm K.M.

|author2=Shepelyansky D.L.

|journal=Eur. Phys. J. B

|volume=76 |title=Ulam method for the Chirikov standard map

|year=2010

|issue=1

|pages=57–68

|doi=10.1140/epjb/e2010-00190-6

|arxiv=1004.1349

|bibcode=2010EPJB...76...57F

|s2cid=55539783

}}

  • {{Cite arXiv

|last=Chepelianskii

|first=Alexei D.

|eprint=1003.5455

|title=Towards physical laws for software architecture

|year=2010

|class=cs.SE

}}

  • {{Cite journal

|author1=Shepelyansky D.L.

|author2=Zhirov O.V.

|journal=Phys. Lett. A

|volume=374 |page=3206

|title=Towards Google matrix of brain

|year=2010

|issue=31–32

|doi=10.1016/j.physleta.2010.06.007

|arxiv=1002.4583

|bibcode=2010PhLA..374.3206S

}}

  • {{Cite journal

|author1=Abel M.

|author2=Shepelyansky D.L.

|journal=Eur. Phys. J. B

|volume=84 |issue=4

|page=493

|arxiv=1009.2631

|title=Google matrix of business process management

|year=2011

|bibcode=2011EPJB...84..493A

|doi=10.1140/epjb/e2010-10710-y

|s2cid=15510734

}}

  • {{Cite journal

|doi=10.1371/journal.pone.0061519

|pmid = 23671568|pmc = 3650020|title = Google Matrix Analysis of DNA Sequences|journal = PLOS ONE|volume = 8|issue = 5|pages = e61519|year = 2013|last1 = Kandiah|first1 = Vivek|last2 = Shepelyansky|first2 = Dima L.|arxiv = 1301.1626|bibcode = 2013PLoSO...861519K|doi-access = free}}

  • {{Cite journal

|doi=10.1371/journal.pone.0074554

|pmid = 24098338|pmc = 3789750|title = Highlighting Entanglement of Cultures via Ranking of Multilingual Wikipedia Articles|journal = PLOS ONE|volume = 8|issue = 10|pages = e74554|year = 2013|last1 = Eom|first1 = Young-Ho|last2 = Shepelyansky|first2 = Dima L.|arxiv = 1306.6259|bibcode = 2013PLoSO...874554E|doi-access = free}}

  • {{Cite journal

|author1= Brin S.

|author2=Page L.

|journal=Computer Networks and ISDN Systems

|volume=30 |page=107

|title=The anatomy of a large-scale hypertextual Web search engine

|year=1998

|issue=1–7

|doi=10.1016/s0169-7552(98)00110-x

|s2cid=7587743

}}

  • {{Cite arXiv

|eprint=1002.2858

|last1=Massimo

|first1=Franceschet

|title=PageRank: Standing on the shoulders of giants

|class=cs.IR

|year=2010

}}

  • {{Cite web

|last=Vigna |first=Sebastiano

|url=http://vigna.di.unimi.it/ftp/papers/SpectralRanking.pdf

|title=Spectral Ranking

|year=2010

}}

{{refend}}