Distributed search engine
A distributed search engine is a search engine where there is no central server. Unlike traditional centralized search engines, work such as crawling, data mining, indexing, and query processing is distributed among several peers in a decentralized manner where there is no single point of control.
History
{{cleanup|section|reason=Sub-section order is not chronological, but potentially promotional|date=February 2025}}
= Presearch =
{{Main|Presearch (search engine)}}
Started in 2017, Presearch is an ERC20 powered (PRE) search engine powered by a distributed network of community operated nodes which aggregate results from a variety of sources. This powers the searches at presearch.com. This is planned to be a precursor where each node collaborates on a global decentralised index.
{{cite web
|url=https://www.presearch.io/
|title=Presearch is a Decentralized Search Engine
}}
Presearch averages 5 million searches per day and has 2.2 million registered users. On Sept 1, 2021, Presearch was added as a default option to the search engine list on Android for the EU.{{Cite web|last1=297shares|last2=4.3kreads|date=2021-09-01|title=Google Adds Presearch As A Default Option on Android Devices in EU|url=https://www.searchenginejournal.com/google-adds-presearch-as-a-default-option-on-android-in-eu/418184/|access-date=2021-11-10|website=Search Engine Journal|language=en}} On May 27, 2022, Presearch officially transitioned from its Testnet to a Mainnet. This means all search traffic through the service now runs over Presearch's decentralized network of volunteer-run nodes.{{Cite web|last=Kan|first=Michael|date=2022-05-26|title=The Next Google? Decentralized Search Engine 'Presearch' Exits Testing Phase|url=https://www.pcmag.com/news/the-next-google-decentralized-search-engine-presearch-exits-testing-phase|website=PC Magazine|language=en}}
= [[YaCy]] =
On December 15, 2003, Michael Christen announced development of a P2P-based search engine, eventually named YaCy, on the heise online forums.
{{cite web
| title = YaCy: News
| archiveurl = https://web.archive.org/web/20051124084140/http://www.yacy.net/yacy/News.html
| archivedate = 2005-11-24
| url= http://www.yacy.net/yacy/News.html
}}
{{cite web
| url = http://www.heise.de/newsticker/foren/S-Ich-entwickle-eine-P2P-basierende-Suchmaschine-Wer-macht-mit/forum-50682/msg-4744034/read/
| title = Ich entwickle eine P2P-basierende Suchmaschine. Wer macht mit?
| author = Michael Christen
| publisher = heise online
}}
= [[Seeks]] =
Seeks was an open source websearch proxy and collaborative distributed tool for websearch. It ceased to have a usable release in 2016.
= InfraSearch =
In April 2000 several programmers (including Gene Kan, Steve Waterhouse) built a prototype P2P web search engine based on Gnutella called InfraSearch. The technology was later acquired by Sun Microsystems and incorporated into the JXTA project.
{{cite web
|url=http://www.redherring.com/Home/9528
|title=Can peer-to-peer grow up?
|author=Justin Hibbard
|publisher=Red Herring
}}{{dead link|date=January 2017 |bot=InternetArchiveBot |fix-attempted=yes }}
It was meant to run inside the participating websites' databases creating a P2P network that could be accessed through the InfraSearch website.
{{cite web
| title = Move Over Yahoo, Here Comes InfraSearch
| author = Simon Foust
|website= Dmusic
| archiveurl = https://web.archive.org/web/20001013141235/http://www.dmusic.com/news/news.php?id=2614
| archivedate = 2000-10-13
| url= http://www.dmusic.com/news/news.php?id=2614
}}
{{cite magazine
| title = Peer-to-peer networking is poised to revolutionize the Internet once again
| author = Sean M. Dugan
| magazine = InfoWorld
| archiveurl = https://web.archive.org/web/20001018022633/http://www.infoworld.com/articles/op/xml/00/07/17/000717opprophet.xml
| archivedate = 2000-10-18
| url= http://www.infoworld.com/articles/op/xml/00/07/17/000717opprophet.xml
}}
{{cite web
| url = http://news.cnet.com/2100-1023-241223.html
| title = Napster-like technology takes Web search to new level
| author = John Borland
| publisher = Cnet
}}
= Opencola =
On May 31, 2000 Steelbridge Inc. announced development of OpenCOLA a collaborative distributive open source search engine.
{{cite news
| title = Software launched with a little pop
| author = David Akin
| author-link = David Akin
| newspaper = Financial Post
| url= https://nationalpost.com/financialpost.asp?f=000531/303636.html/17/000717opprophet.xml
}}{{dead link|date=September 2016|bot=medic}}{{cbignore|bot=medic}}
It runs on the user's computer and crawls the web pages and links the user puts in their opencola folder and shares resulting index over its P2P network.
{{cite magazine
| url = http://www.techreview.com/web/12360/?a=f
| title = OpenCola-Have Some Code and a Smile
| author = Paul Heltzel
| magazine = Technology Review
}}
= Faroo =
In February 2001 Wolf Garbe published an idea of a peer-to-peer search engine,
{{cite web
|url = http://www.pubzone.org/dblp/journals/wi/Garbe01
|title = BINGOOO - Die Transformation des World Wide Web zur virtuellen Datenbank
|author = Wolf Garbe
|publisher = Wirtschaftinformatik
|language = German
|quote = ... Wir setzen dem das Konzept einer verteilten Peer-to-Peer-Suchmaschine entgegen [We counter with the concept of a distributed peer-to-peer search engine] ...
|access-date = 2010-12-21
|archive-url = https://web.archive.org/web/20140202093532/http://www.pubzone.org/dblp/journals/wi/Garbe01
|archive-date = 2014-02-02
|url-status = dead
}}
started the Faroo prototype in 2004,
{{cite web
|url = http://www.readwriteweb.com/start/2009/12/technical-qa-with-faroo-founder.php
|title = Technical Q&A With FAROO Founder
|author = Bernard Lunn
|publisher = ReadWriteWeb
|quote = ... When I started to work on the first prototype in 2004 ...
|url-status = dead
|archiveurl = https://web.archive.org/web/20110214194656/http://www.readwriteweb.com/start/2009/12/technical-qa-with-faroo-founder.php
|archivedate = 2011-02-14
}}
{{cite web
| title = FAROO: History
| archiveurl = https://web.archive.org/web/20080322000927/http://www.faroo.com/english/download/history.html
| archivedate = 2008-03-22
| url= http://www.faroo.com/english/download/history.html
}}
{{cite web
| url = http://blog.faroo.com/2010/01/03/revisited-deriving-crawler-start-points-from-visited-pages-by-monitoring-http-traffic/
| title = Revisited: Deriving crawler start points from visited pages by monitoring HTTP traffic
| publisher = Faroo
}}
Goals
{{Unreferenced|section|date=February 2025}}
The goals of building a distributed search engine include:
1. to create an independent search engine powered by the community;
2. to make the search operation open and transparent by relying on open-source software;
3. to distribute the advertising revenue to node maintainers, which may help create more robust web infrastructure;
4. to allow researchers to contribute to the development of open-source and publicly-maintainable ranking algorithms and to oversee the training of the algorithm parameters.
Challenges
{{Unreferenced|section|date=February 2025}}
1. The amount of data to be processed is enormous. The size of the visible web is estimated at 5PB spread around 10 billion pages.
2. The latency of the distributed operation must be competitive with the latency of the commercial search engines.
3. A mechanism that prevents malicious users from corrupting the distributed data structures or the rank needs to be developed.
See also
- {{section link|List of search engines|P2P search engines}}
- Distributed processing