Apache ZooKeeper

{{Short description|System for distributed coordination}}

{{Infobox software

| name = Apache ZooKeeper

| logo = Apache ZooKeeper logo.svg

| screenshot =

| caption =

| developer = Apache Software Foundation

| latest release version = 3.8.1

| latest release date = {{Start date and age|2023|01|30}}{{cite web|url=https://zookeeper.apache.org/releases.html|title=Apache ZooKeeper - Releases|access-date=12 February 2023}}

| latest preview version =

| latest preview date =

| operating system = Cross-platform

| platform =

| repo = {{URL|https://gitbox.apache.org/repos/asf?p{{=}}zookeeper.git|ZooKeeper Repository}}

| programming language = Java

| genre = Distributed computing

| license = Apache License 2.0

| website = {{URL|https://zookeeper.apache.org}}

}}

Apache ZooKeeper is an open-source server for highly reliable distributed coordination of cloud applications.{{cite web |title=Apache Zookeeper4 |url=https://zookeeper.apache.org/ |access-date=31 January 2021}} It is a project of the Apache Software Foundation.

ZooKeeper is essentially a service for distributed systems offering a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems (see Use cases).{{Cite web|url=https://cwiki.apache.org/confluence/display/ZOOKEEPER/|title=Index - Apache ZooKeeper - Apache Software Foundation|website=cwiki.apache.org|access-date=2016-08-26}} ZooKeeper was a sub-project of Hadoop but is now a top-level Apache project in its own right.

Overview

ZooKeeper's architecture supports high availability through redundant services. The clients can thus ask another ZooKeeper leader if the first fails to answer. ZooKeeper nodes store their data in a hierarchical name space, much like a file system or a tree data structure. Clients can read from and write to the nodes and in this way have a shared configuration service. ZooKeeper can be viewed as an atomic broadcast system, through which updates are totally ordered. The ZooKeeper Atomic Broadcast (ZAB) protocol is the core of the system.{{cite web |url=https://cwiki.apache.org/confluence/display/ZOOKEEPER/ProjectDescription|title=Zookeeper Overview}}

ZooKeeper is used by companies including Yelp, Rackspace, Yahoo!,{{cite web |url=http://wiki.apache.org/hadoop/ZooKeeper/PoweredBy |title=ZooKeeper/Powered By |access-date=2012-01-25 |archive-url=https://web.archive.org/web/20131209063307/http://wiki.apache.org/hadoop/ZooKeeper/PoweredBy |archive-date=2013-12-09 |url-status=dead }} Odnoklassniki, Reddit,{{cite web|url=https://www.reddit.com/r/announcements/comments/4y0m56/why_reddit_was_down_on_aug_11/|title=Why Reddit was down on Aug 11|date=16 August 2016 }} NetApp SolidFire,{{Cite news|url=https://newsroom.netapp.com/blogs/5-big-daas-challenges-and-how-to-overcome-them/|title=5 Big DaaS Challenges and How to Overcome Them {{!}} NetApp Newsroom|date=2016-06-20|work=NetApp Newsroom|access-date=2017-05-24|language=en-US}}{{Dead link|date=October 2019 |bot=InternetArchiveBot |fix-attempted=yes }} Meta,{{Cite news|url=https://code.fb.com/data-infrastructure/location-aware-distribution-configuring-servers-at-scale/|title=Location-Aware Distribution: Configuring servers at scale|date=2018-07-19|work=Facebook Code|access-date=2018-07-20|language=en-US}} Twitter{{Cite news|url=https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/zookeeper-at-twitter.html|title=ZooKeeper at Twitter|date=2018-10-11|work=Twitter Engineering Blog|access-date=2018-12-08|language=en-US}} and eBay as well as open source enterprise search systems like Solr and distributed database systems like Apache Pinot.{{cite web |url=https://cwiki.apache.org/confluence/display/solr/SolrCloud|title=SolrCloud}}{{cite web|url=https://docs.pinot.apache.org/basics/architecture|title=Apache Pinot: Architecture}}

ZooKeeper is modeled after Google's Chubby lock service{{Cite journal|last=Burrows|first=Mike|date=2006|title=The Chubby lock service for loosely-coupled distributed systems|url=https://research.google/pubs/pub27897/|journal=7th USENIX Symposium on Operating Systems Design and Implementation (OSDI)}}{{Cite web|url=https://research.google/pubs/pub33002/|title=Paxos Made Live - An Engineering Perspective (2006 Invited Talk)|last1=Chandra|first1=Tushar Deepak|last2=Griesemer|first2=Robert|date=2007|website=Google Research|language=en|access-date=2020-03-03|last3=Redstone|first3=Joshua}} and was originally developed at Yahoo! for streamlining the processes running on big-data clusters by storing the status in local log files on the ZooKeeper servers. These servers communicate with the client machines to provide them the information. ZooKeeper was developed in order to fix the bugs that occurred while deploying distributed big-data applications.

Some of the prime features of Apache ZooKeeper are:

  • Reliable System: This system is fairly reliable as it keeps working even if some nodes stop working.
  • Simple Architecture: The architecture of ZooKeeper is quite simple as there is a shared hierarchical namespace which helps coordinating the processes.
  • Fast Processing: ZooKeeper is especially fast in "read-dominant" workloads (i.e. workloads in which reads are much more common than writes).
  • Scalable: The performance of ZooKeeper can be improved by adding nodes.

Architecture

Some common terminologies regarding the ZooKeeper architecture:

  • Node: The systems installed on the cluster
  • ZNode: The nodes where the status is updated by other nodes in cluster
  • Client applications: The tools that interact with the distributed applications
  • Server applications: Allows the client applications to interact using a common interface

The services in the cluster are replicated and stored on a set of servers (called an "ensemble"), each of which maintains an in-memory database containing the entire data tree of state as well as a transaction log and snapshots stored persistently. Multiple client applications can connect to a server, and each client maintains a TCP connection through which it sends requests and heartbeats and receives responses and watch events for monitoring.{{cite web|url=https://zookeeper.apache.org/doc/current/zookeeperOver.html|title=Apache Zookeeper 3.9 Documentation}}

Use cases

Client libraries

In addition to the client libraries included with the ZooKeeper distribution, a number of third-party libraries such as Apache Curator and Kazoo are available that make using ZooKeeper easier, add additional functionality, additional programming languages, etc.

Apache projects using ZooKeeper

See also

{{Portal|Computer programming|Free and open-source software}}

References

{{Reflist}}