Apache Storm

{{Short description|Open-source distributed stream processing}}

{{Use dmy dates|date=December 2014}}

{{Infobox software

| name = Apache Storm

| logo = Apache Storm logo.svg

| developer = Backtype, Twitter

| latest release version = 2.8.0

| latest release date = {{Start date and age|df=yes|2025|01|25}}{{cite web|url=https://storm.apache.org/2025/01/25/storm280-released.html|title=Apache Storm 2.8.0 Released|access-date=27 February 2025}}

| operating system = Cross-platform

| size =

| repo = {{URL|https://gitbox.apache.org/repos/asf?p{{=}}storm.git|Storm Repository}}

| programming language = Clojure & Java

| genre = Distributed stream processing

| license = Apache License 2.0

| website = {{URL|https://storm.apache.org/}}

| logo caption = Distributed and fault-tolerant realtime computation

}}

Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz{{cite web|last=Marz|first=Nathan|title=About Nathan Marz|url=http://nathanmarz.com/about/|publisher=Nathan Marz|access-date=28 March 2013}} and team at BackType,{{cite web|title=BackType Website (defunct)|url=http://www.backtype.com/|publisher=BackType|access-date=28 March 2013}} the project was open sourced after being acquired by Twitter.{{cite web|title=A Storm is coming: more details and plans for release|url=https://blog.twitter.com/2011/storm-coming-more-details-and-plans-release|work=Engineering Blog|publisher=Twitter Inc|access-date=29 July 2015}} It uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on 17 September 2011.{{cite web|title=Storm Codebase|url=https://github.com/nathanmarz/storm/commit/9d91adbdbde22e91779b91eb40805f598da5b004|publisher=Github|access-date=8 February 2013}}

A Storm application is designed as a "topology" in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end.{{cite web|title=Tutorial - Components of a Storm cluster|url=http://storm.apache.org/documentation/Tutorial.html|work=Documentation|publisher=Apache Storm|access-date=29 July 2015}}

Storm became an Apache Top-Level Project in September 2014{{cite web|title=Apache Storm Graduates to a Top-Level Project|url=http://hortonworks.com/blog/apache-storm-graduates-top-level-project/}} and was previously in incubation since September 2013.{{cite web|title=Storm Project Incubation Status|url=http://incubator.apache.org/projects/storm.html|publisher=Apache Software Foundation|access-date=29 October 2013}}{{cite web|title=Storm Proposal|url=http://wiki.apache.org/incubator/StormProposal|publisher=Apache Software Foundation|access-date=29 October 2013}}

Development

Apache Storm is developed under the Apache License, making it available to most companies to use.{{cite web|title=Powered By Storm|url=http://storm.apache.org/documentation/Powered-By.html|work=Documentation|publisher=Apache Storm|access-date=29 July 2015}} Git is used for version control and Atlassian JIRA for issue tracking, under the Apache Incubator program.

class="wikitable" border="1"

|+ Major Releases{{Cite web|url=http://storm.apache.org/|title=Apache Storm|website=storm.apache.org|access-date=2017-08-18}}

! Version !! Release Date

2.5.0

|4 Aug 2023

2.4.0

|25 March 2022

2.3.0

|27 September 2021

2.2.0

|30 June 2020

2.1.0

|6 September 2019

1.2.3

|18 July 2019

2.0.0

|30 May 2019

1.1.4

|8 January 2019

1.2.2

| rowspan="2" |4 June 2018

1.1.3
1.0.7

|3 May 2018

1.2.1

|19 February 2018

1.2.0

| rowspan="2" |15 February 2018

1.1.2
1.0.6

|14 February 2018

1.0.5

|15 September 2017

1.1.1

|1 August 2017

1.0.4

|28 July 2017

1.1.029 Mar 2017
1.0.3

|14 February 2017

0.10.2

|14 September 2016

0.9.7

|7 September 2016

1.0.2

|10 August 2016

1.0.1

|6 May 2016

0.10.1

|5 May 2016

1.0.012 April 2016
0.10.0rowspan="2" | 5 November 2015
0.9.6
0.9.54 June 2015
0.9.425 March 2015
0.9.325 November 2014
0.9.225 June 2014
0.9.110 February 2014
Historical (non-Apache) VersionRelease Date
0.9.08 December 2013
0.8.2

|11 January 2013

0.8.1

|6 September 2012

0.8.02 August 2012
0.7.028 February 2012
0.6.015 December 2011
0.5.019 September 2011

Apache Storm architecture

The Apache Storm cluster comprises following critical components:

  • Nodes: There are two types of nodes: Master Nodes and Worker Nodes. A Master Node executes a daemon Nimbus which assigns tasks to machines and monitors their performances. On the other hand, a Worker Node runs the daemon called Supervisor which assigns the tasks to other worker nodes and operates them as per the need. As Storm cannot monitor the state and health of cluster, it deploys ZooKeeper to solve this issue which connects Nimbus with the Supervisors.
  • Components: Storm has three critical components: Topology, Stream, and Spout. Topology is a network made of Stream and Spout. Stream is an unbounded pipeline of tuples and Spout is the source of the data streams which converts the data into the tuple of streams and sends to the bolts to be processed.{{Cite web |title=STREAM PROCESSING BIG DATA PROCESSING |url=http://webprojects.eecs.qmul.ac.uk/ag316/notesSite/BDP_slides/Week8%20%7C%20Stream%20Processing/ECS640-12-StreamProcessing.pdf}}

Peer platforms

Storm is but one of dozens of stream processing engines, for a more complete list see Stream processing. Twitter announced Heron on June 2, 2015{{cite web|title=Flying faster with Twitter Heron|url=https://blog.twitter.com/2015/flying-faster-with-twitter-heron|work=Engineering Blog|publisher=Twitter Inc|access-date=3 June 2015}} which is API compatible with Storm. There are other comparable streaming data engines such as Spark Streaming and Flink.{{cite book |chapter= Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming |date=May 2016 |publisher=IEEE |doi=10.1109/IPDPSW.2016.138 |title=2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |pages=1789–1792 |last1=Chintapalli |first1=Sanket |last2=Dagit |first2=Derek |last3=Evans |first3=Bobby |last4=Farivar |first4=Reza |last5=Graves |first5=Thomas |last6=Holderbaugh |first6=Mark |last7=Liu |first7=Zhuo |last8=Nusbaum |first8=Kyle |last9=Patil |first9=Kishorkumar |last10=Peng |first10=Boyang Jerry |last11=Poulosky |first11=Paul |isbn=978-1-5090-3682-0 |s2cid=2180634 }}

See also

References

{{Reflist}}