Webgraph

{{Short description|Graph of connected web pages}}

The webgraph describes the directed links between pages of the World Wide Web. A graph, in general, consists of several vertices, some pairs connected by edges. In a directed graph, edges are directed lines or arcs. The webgraph is a directed graph, whose vertices correspond to the pages of the WWW, and a directed edge connects page X to page Y if there exists a hyperlink on page X, referring to page Y.{{cite book |contribution-url=https://nlp.stanford.edu/IR-book/html/htmledition/the-web-graph-1.html|contribution=The web graph|title=Introduction to Information Retrieval|first1=Christopher D.|last1=Manning|first2=Prabhakar|last2=Raghavan|first3=Hinrich|last3=Schütze|publisher=Cambridge University Press|year=2008}}

Properties

  • The degree distribution of the webgraph strongly differs from the degree distribution of the classical random graph model, the Erdős–Rényi model:{{Cite journal |last=Erdős |first=Paul |author-link=Paul Erdős |last2=Rényi |first2=Alfréd |author-link2=Alfréd Rényi |date=1960 |title=On the evolution of random graphs |url=https://static.renyi.hu/~p_erdos/1960-10.pdf |journal=Publication of the Mathematical Institute of the Hungarian Academy of Sciences |volume=5 |pages=17-61}} in the Erdős–Rényi model, there are very few large degree nodes, relative to the webgraph's degree distribution. The precise distribution is unclear,{{cite journal | last1 = Meusel | first1 = R. | last2 = Vigna | first2 = S. | last3 = Lehmberg | first3 = O. | last4 = Bizer | first4 = C. | year = 2015 | title = The Graph Structure in the Web - Analyzed on Different Aggregation Levels | url = https://air.unimi.it/bitstream/2434/372411/2/Vigna_JWebScience_2015.pdf| journal = Journal of Web Science | volume = 1 | issue = 1| pages = 33–47 | doi = 10.1561/106.00000003 | hdl = 2434/372411 | hdl-access = free }} however: it is relatively well described by a lognormal distribution, as well as the Barabási–Albert model for power laws.{{ cite journal|last1=Clauset|first1=A. |last2=Shalizi|first2=C. R.|last3=Newman|first3=M. E. J.|year=2009|title=Power-law distributions in empirical data|journal=SIAM Rev.|volume=51|pages=661–703|doi=10.1137/070710111|issue=4|arxiv=0706.1062|bibcode=2009SIAMR..51..661C|s2cid=9155618 }}

{{cite journal|first1=Albert-László|last1=Barabási|first2=Réka|last2= Albert|date=October 1999|title=Emergence of scaling in random networks|journal=Science|volume=286|issue=5439|pages=509–512|doi=10.1126/science.286.5439.509|url= http://www.nd.edu/~networks/Publication%20Categories/03%20Journal%20Articles/Physics/EmergenceRandom_Science%20286,%20509-512%20(1999).pdf|pmid=10521342|arxiv=cond-mat/9910332|bibcode=1999Sci...286..509B|s2cid=524106 }}.

Applications

The webgraph is used for:

  • computing the PageRank{{Cite journal |last=Brin |first=Sergey |author-link=Sergey Brin |last2=Page |first2=Lawrence |author-link2=Larry Page |date=1998-04-01 |title=The anatomy of a large-scale hypertextual Web search engine |url=http://infolab.stanford.edu/~backrub/google.html |journal=Computer Networks and ISDN Systems |series=Proceedings of the Seventh International World Wide Web Conference |volume=30 |issue=1 |pages=107–117 |doi=10.1016/S0169-7552(98)00110-X |issn=0169-7552}} of the world wide web's pages;
  • computing the personalized PageRank;Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web (WWW '03). ACM, New York, NY, USA, 271–279. {{doi|10.1145/775152.775191}}
  • detecting webpages of similar topics, through graph-theoretical properties only, like co-citation;{{cite journal | last1 = Kumar | first1 = Ravi | last2 = Raghavan | first2 = Prabhakar | last3 = Rajagopalan | first3 = Sridhar | last4 = Tomkins | first4 = Andrew | year = 1999 | title = Trawling the Web for emerging cyber-communities | journal = Computer Networks | volume = 31 | issue = 11–16| pages = 1481–1493 | doi = 10.1016/S1389-1286(99)00040-7 | citeseerx = 10.1.1.89.4025 | s2cid = 7069190 }}
  • and identifying hubs and authorities in the web for HITS algorithm.

References

{{Reflist}}