YaCy
{{Short description|Peer-to-peer search engine}}
{{multiple issues|
{{promotional|date=January 2019}}
{{More citations needed|date=May 2014}}
}}
{{Infobox software
| name = YaCy
| logo = YaCy logo.png
| screenshot = Yacy-buscador.png
| screenshot size = 250px
| caption =
| author = Michael Christen
| developer = YaCy community
| released = {{Start date and age|2003|df=yes}}{{cite web|url=http://www.heise.de/newsticker/foren/go.shtml?read=1&msg_id=4744034&forum_id=50682 |title=Ich entwickle eine P2P-basierende Suchmaschine. Wer macht mit? |work=Heise Online |language=de |date= 2003-12-15 |access-date=2018-05-09}}
| latest release version = 1.940_202412022212
| latest release date = {{start date and age|df=yes|2024|12|02}}{{cite web|url=https://download.yacy.net/?C=M;O=D |title=Apache Server at download.yacy.net Port 443 |date= 2024-06-01 |access-date=2024-08-27}}{{cite web|url=https://release.yacy.net/?C=M;O=D |title=Apache Server at release.yacy.net Port 443 |date= 2024-10-07 |access-date=2024-10-07}}
| repo = {{URL|https://github.com/yacy/yacy_search_server}}
| programming language = Java
| operating system = Cross-platform
| size = 104-113 MB
| genre = Overlay network, Search engine
| license = GPL-2.0-or-later
| website = {{URL|https://yacy.net/en/}}
}}
YaCy (pronounced “ya see”) is a free distributed search engine built on the principles of peer-to-peer (P2P) networks, created by Michael Christen in 2003.{{cite web|url=https://www.theregister.co.uk/2011/11/29/yacy_google_open_source_engine/ |title=YaCy takes on Google with open source search engine |work=The Register |date= 2011-11-29 |access-date=2012-04-16}}{{cite web|url=https://www.pcworld.com/article/245414/yacy_its_about_freedom_not_beating_google.html |title=YaCy: It's About Freedom, Not Beating Google |work=PC World |date= 2011-12-03 |access-date=2012-04-16}} The engine is written in Java and distributed on several hundred computers, {{As of|2006|9|lc=on}}{{Update inline|date=June 2024}}, so-called YaCy-peers.
Each YaCy-peer independently crawls through the Internet, analyzes and indexes found web pages, and stores indexing results in a common database (so-called index) which is shared with other YaCy-peers using principles of peer-to-peer. This decentralized approach ensures privacy and eliminates the need for a central server.{{Cite web |title=Home - YaCy |url=https://yacy.net/ |access-date=2024-07-01 |website=yacy.net}}
Compared to semi-distributed search engines, the YaCy network has a distributed architecture. All YaCy-peers are equal and no central server exists. It can be run either in a crawling mode or as a local proxy server, indexing web pages visited by the person running YaCy on their computer. Several mechanisms are provided to protect the user's privacy. Search functions are accessed by a locally run web server which provides a search box to enter search terms, and returns search results in a format similar to popular search engines.{{Cite web |title=FAQ - YaCy |url=https://yacy.net/faq/ |access-date=2024-07-04 |website=yacy.net}}
System components
YaCy search engine is based on four elements:{{cite web |url=http://yacy.net/Technology.html |title=YaCy Technology Architecture |publisher=YaCy.net |access-date=2012-02-14 |archive-date=2012-02-05 |archive-url=https://web.archive.org/web/20120205182630/http://yacy.net/Technology.html |url-status=dead }}
;Crawler{{Cite web |title=Demo - YaCy |url=https://yacy.net/demonstration_tutorial_screenshot/ |access-date=2024-08-12 |website=yacy.net}}: A search robot that traverses between web pages, analyzing their content.{{Citation|title=GitHub: YaCy Grid Crawler|date=2021-02-28|url=https://github.com/yacy/yacy_grid_crawler|pages=yacy / yacy_grid_crawler|publisher=YaCy Search Engine|access-date=2021-03-11}}: The crawler is responsible for fetching web pages from the internet. Each peer in the YaCy network can crawl and index websites. The crawling process involves:
:* Discovery: Finding new web pages to index by following links.
:* Fetching: Downloading the content of web pages.
:* Parsing: Extracting relevant information such as text, metadata, and links from the downloaded pages.
;Indexer: It creates a reverse word index (RWI), i.e., each word from the RWI has its list of relevant URLs and ranking information. Words are saved as word hashes.{{Citation|title=GitHub: YaCy Grid Parser|date=2021-02-28|url=https://github.com/yacy/yacy_grid_parser|pages=The YaCy Grid is the second-generation implementation of YaCy|publisher=YaCy Search Engine|access-date=2021-03-11}}
;Search and administration interface: Made as a web interface provided by a local HTTP servlet with a servlet engine.{{Citation|title=GitHub: YaCY Search|date=2021-02-28|url=https://github.com/yacy/yacy-search|pages=yacy / yacy-search forked from cream/yacy-search|publisher=YaCy Search Engine|access-date=2021-03-11}}
;Data storage: Used to store the reverse word index database utilizing a distributed hash table.
Search-engine technology
File:YaCy Network Freeworld 01-12-2011.png
- YaCy is a complete search appliance with user interface, index, administration, and monitoring.
- YaCy harvests web pages with a web crawler. Documents are then parsed, and indexed and the search index is stored locally. If your peer is part of a peer network, then your local search index is also merged into the shared index for that network.
- A search is started, then the local index contributes with a global search index from peers in the YaCy search network.
- The YaCy Grid is a second-generation implementation of the YaCy peer-to-peer search. A YaCy Grid installation comprises microservices that communicate using the Master Connect Program (MCP).
- The YaCy Parser is a microservice that can be deployed using Docker. When the Parser Component is started, it searches for and connects to an MCP. By default, the local host is searched for an MCP, but you can configure one yourself.
YaCy platform architecture
YaCy uses a combination of techniques for the networking, administration, and maintenance of indexing the search engine, including blacklisting, moderation, and communication with the community. Here is how YaCy performs these operations:
- Community components
- # Web forum{{cite web|title=forum.yacy.de|url=http://forum.yacy-websuche.de/|access-date=6 June 2017}}
- # Statistics
- # XML API
- Maintenance
- # Web Server
- # Indexing
- # Crawler with Balancer
- # Peer-to-Peer Server Communication
- Content organization
- # Blacklisting and Filtering
- # Search interface
- # Bookmarks
- # Monitoring search results
Distribution
YaCy is available in packages for Linux, Windows, and Macintosh, and also as a Docker image; it can also be installed on other operating systems either by manually building it, or using a tarball. YaCy requires Java 11, Temurin 11 is recommended.{{Cite web|title=Download - YaCy|url=https://yacy.net/download_installation/|access-date=2025-04-21|website=yacy.net}}
The Debian package can be installed from a repository available at the subdomain of the project's website,{{cite web| url=https://wiki.yacy.net/index.php/En:DebianInstall |title=En:DebianInstall |website=YaCyWiki |access-date=6 October 2019}}{{cite web|title=Dev:TaskSharing|url=https://wiki.yacy.net/index.php/Dev:TaskSharing|access-date=6 October 2019|website=YaCyWiki}} but is not yet maintained in the official Debian package repository.{{cite web| url=https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=452422 |title=#452422 - RFP: yacy -- distributed web crawler and search engine |website=Debian Bug report logs |access-date=2 May 2020}}
See also
{{Portal|Free and open-source software}}
- Dooble – an open-source web browser with an integrated YaCy Search Engine Tool Widget
- List of search engines
- Comparison of search engines
- Seeks
References
{{reflist}}
{{Commons category|YaCy}}
Further reading
[https://linuxreviews.org/YaCy YaCy at LinuxReviews]
External links
- {{Official website|https://yacy.net/en}}
{{Distributed search engines}}
{{Web search engines}}
{{DEFAULTSORT:Yacy}}
Category:Distributed data storage
Category:Free search engine software
Category:Internet properties established in 2003
Category:Internet search engines
Category:Java platform software
Category:Cross-platform software
Category:Software using the GNU General Public License