Apache Samza

{{Short description|Open-source distributed stream processing}}

{{About-distinguish|Apache Samza|Samba (software)}}

{{Infobox software

| name = Apache Samza

| logo = Apache Samza logo.svg

| caption =

| author = LinkedIn

| developer = Apache Software Foundation

| latest release version = 1.8.0

| latest release date = {{Start date and age|2023|01|17|df=yes}}{{cite web|url=https://samza.apache.org/blog/2023-01-17-announcing-the-release-of-apache-samza--1.8.0|title=Announcing the release of Apache Samza 1.8.0|access-date=28 March 2024}}

| repo = {{URL|https://gitbox.apache.org/repos/asf?p%3Dsamza.git|Samza Repository}}

| programming language = Scala, Java

| operating system = Cross-platform

| genre = Distributed stream processing

| license = Apache License 2.0

| website = {{URL|https://samza.apache.org/}}

}}

Apache Samza is an open-source, near-realtime, asynchronous computational framework for stream processing developed by the Apache Software Foundation in Scala and Java. It has been developed in conjunction with Apache Kafka. Both were originally developed by LinkedIn.{{Cite web|url=https://www.infoq.com/articles/linkedin-samza|title=How LinkedIn Uses Apache Samza|website=InfoQ|access-date=2016-09-28}}

Overview

Samza allows users to build stateful applications that process data in real-time from multiple sources including Apache Kafka.

Samza provides fault tolerance, isolation and stateful processing. Unlike batch systems such as Apache Hadoop or Apache Spark, it provides continuous computation and output, which result in sub-second{{Cite web|url=http://www.vldb.org/pvldb/vol10/p1634-noghabi.pdf|title=Samza: Stateful Scalable Stream Processing at LinkedIn}} response times.

There are many players in the field of real-time stream processing and Samza is one of the mature products.{{Cite web|url=https://www.linkedin.com/pulse/spark-streaming-vs-flink-storm-kafka-streams-samza-choose-prakash|title=Spark Streaming vs Flink vs Storm vs Kafka Streams vs Samza : Choose Your Stream Processing Framework|website=www.linkedin.com|language=en|access-date=2019-07-23}}{{Cite web|url=https://blog.scottlogic.com/2018/07/06/comparing-streaming-frameworks-pt1.html|title=Comparing Apache Spark, Storm, Flink and Samza stream processing engines - Part 1|website=Scott Logic|access-date=2019-07-23}}{{Cite web|url=https://www.digitalocean.com/community/tutorials/hadoop-storm-samza-spark-and-flink-big-data-frameworks-compared|title=Hadoop, Storm, Samza, Spark, and Flink: Big Data Frameworks Compared|website=DigitalOcean|access-date=2019-07-23}} It was added to Apache in 2013.{{Cite web|url=https://blogs.apache.org/samza/|archive-url=https://web.archive.org/web/20131115063058/http://blogs.apache.org/samza/|url-status=dead|archive-date=November 15, 2013|title=Apache Samza|website=blogs.apache.org|access-date=2019-07-23}}

Samza is used by multiple companies.{{Cite web|url=https://samza.apache.org/powered-by/|title=Samza - Powered By|website=samza.apache.org|access-date=2019-07-23}} The biggest installation is in LinkedIn.

See also

References

{{Reflist}}