Apache Mahout
{{Short description|Open-source machine learning algorithms}}
{{Multiple issues|
{{COI|date=February 2021}}
{{Primary sources|date=February 2021}}
}}
{{Infobox software
| name = Apache Mahout
| logo = Apache Mahout logo.svg
| screenshot =
| caption =
| collapsible = yes
| developer = Apache Software Foundation
| latest release version = 14.1
| latest release date = {{Start date and age|2020|10|07|df=yes}}{{Cite web|url=https://mahout.apache.org/|title=Apache Mahout: Scalable machine learning and data mining|access-date=6 March 2019}}
| latest preview version =
| latest preview date =
| operating system = Cross-platform
| size =
| repo = {{URL|https://gitbox.apache.org/repos/asf/mahout.git|Mahout Repository}}
| programming language = Java, Scala
| genre = Machine Learning
| license = Apache License 2.0
| website = {{URL|//mahout.apache.org}}
| released = {{Start date and age|2009|04|7|df=yes}}{{Cite web|url=http://mail-archives.apache.org/mod_mbox/www-announce/200904.mbox/%3C7EDF8CB8-388C-4A44-974E-6977E7AEB396@apache.org%3E|title=Apache Mahout: First release 0.1 released}}
| discontinued = No
}}
{{Portal|Free and open-source software}}
Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily on linear algebra. In the past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark.{{cite web |url= http://www.ibm.com/developerworks/java/library/j-mahout/ |title=Introducing Apache Mahout |work=ibm.com |year=2011 |access-date=13 September 2011}}{{cite web |url= http://www.infoq.com/news/2009/04/mahout |title=InfoQ: Apache Mahout: Highly Scalable Machine Learning Algorithms |work=infoq.com |year=2011 |access-date=13 September 2011}} Mahout also provides Java/Scala libraries for common math operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; a number of algorithms have been implemented.{{cite web|url=http://mahout.apache.org/users/basics/algorithms.html|title=Algorithms - Apache Mahout - Apache Software Foundation|work=cwiki.apache.org|year=2011|access-date=13 September 2011|archive-date=22 December 2013|archive-url=https://web.archive.org/web/20131222013730/http://mahout.apache.org/users/basics/algorithms.html|url-status=dead}}
Features
= Samsara =
Apache Mahout-Samsara refers to a Scala domain specific language (DSL) that allows users to use R-Like syntax as opposed to traditional Scala-like syntax. This allows user to express algorithms concisely and clearly.
val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross s_q)
= Backend agnostic =
Apache Mahout's code abstracts the domain-specific language from the engine where the code is run. While active development is done with the Apache Spark engine, users are free to implement any engine they choose- {{Proper name|H2O}} and Apache Flink have been implemented in the past and examples exist in the code base.
= GPU/CPU accelerators =
The JVM has notoriously slow computation. To improve speed, “native solvers” were added which move in-core, and by extension, distributed BLAS operations out of the JVM, offloading to off-heap or GPU memory for processing via multiple CPUs and/or CPU cores, or GPUs when built against the [https://viennacl.sourceforge.net ViennaCL] library.{{cite web|url=https://on-demand.gputechconf.com/gtc/2017/video/s7572-extending-mahout-samsara-linear-algebra-dsl-to-support-gpu-clusters.mp4|title=Extending Mahout Samsara to GPU Clusters|access-date=29 October 2020|archive-date=3 November 2020|archive-url=https://web.archive.org/web/20201103182841/https://on-demand.gputechconf.com/gtc/2017/video/s7572-extending-mahout-samsara-linear-algebra-dsl-to-support-gpu-clusters.mp4|url-status=dead}} ViennaCL is a highly optimized C++ library with BLAS operations implemented in OpenMP, and OpenCL. As of release 14.1, the OpenMP build considered to be stable, leaving the OpenCL build is still in its experimental POC phase.
=== Recommenders ===
Apache Mahout features implementations of Alternating Least Squares, Co-Occurrence, and Correlated Co-Occurrence, a unique-to-Mahout recommender algorithm that extends co-occurrence to be used on multiple dimensions of data.
History
= Transition from Map Reduce to Apache Spark =
While Mahout's core algorithms for clustering, classification and batch based collaborative filtering were implemented on top of Apache Hadoop using the map/reduce paradigm, it did not restrict contributions to Hadoop-based implementations. Contributions that run on a single node or on a non-Hadoop cluster were also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was originally a separate project and can run stand-alone without Hadoop.
Starting with the release 0.10.0, the project shifted its focus to building a backend-independent programming environment, code named "Samsara".{{Cite web
| url = http://mahout.apache.org/users/environment/in-core-reference.html
| title = Mahout-Samsara's In-Core Linear Algebra DSL Reference
| access-date = 29 February 2016
| archive-date = 2 August 2016
| archive-url = https://web.archive.org/web/20160802233841/https://mahout.apache.org/users/environment/in-core-reference.html
| url-status = dead
| url = http://mahout.apache.org/users/environment/out-of-core-reference.html
| title = Mahout-Samsara's Distributed Linear Algebra DSL Reference
| access-date = 29 February 2016
| archive-date = 2 August 2016
| archive-url = https://web.archive.org/web/20160802233829/https://mahout.apache.org/users/environment/out-of-core-reference.html
| url-status = dead
| url = http://www.weatheringthroughtechdays.com/2015/04/mahout-010x-first-mahout-release-as.html
| title = Mahout 0.10.x: first Mahout release as a programming environment
| website = www.weatheringthroughtechdays.com
| access-date = 2016-02-29
| archive-url = https://web.archive.org/web/20161009224405/http://www.weatheringthroughtechdays.com/2015/04/mahout-010x-first-mahout-release-as.html
| archive-date = 9 October 2016
| url-status = dead
}} The environment consists of an algebraic backend-independent optimizer and an algebraic Scala DSL unifying in-memory and distributed algebraic operators. Supported algebraic platforms are Apache Spark, {{Proper name|H2O}}, and Apache Flink.{{citation needed|date=August 2019}} Support for MapReduce algorithms started being gradually phased out in 2014.{{Cite web
| url = https://issues.apache.org/jira/browse/MAHOUT-1510
| title = MAHOUT-1510 ("Good-bye MapReduce")
}}
= Release history =
class="wikitable"
|+ Release History | ||
Version | Release Date | Notes |
---|---|---|
0.1 | 2009-04-07 | |
0.2 | 2009-11-18 | |
0.3 | 2010-03-17 | |
0.4 | 2010-10-31 | |
0.5 | 2011-05-27 | |
0.6 | 2012-02-06 | |
0.7 | 2012-05-16 | |
0.8 | 2013-07-25 | |
0.9 | 2014-02-01 | |
0.10.0 | 2015-04-11 | Samsara DSL |
0.10.1 | 2015-05-31 | |
0.10.2 | 2015-08-06 | |
0.11.0 | 2015-08-07 | |
0.11.1 | 2015-11-06 | |
0.11.2 | 2016-03-11 | |
0.12.0 | 2016-04-11 | Added Apache Flink engine |
0.12.1 | 2016-05-19 | |
0.12.2 | 2016-06-13 | |
0.13.0 | 2017-04-17 | |
0.14.0 | 2019-03-07 | Source only (no binaries) |
14.1 | 2020-10-07 |
= Developers =
Apache Mahout is developed by a community. The project is managed by a group called the "Project Management Committee" (PMC). The current PMC is Andrew Musselman, Andrew Palumbo, Drew Farris, Isabel Drost-Fromm, Jake Mannix, Pat Ferrel, Paritosh Ranjan, Trevor Grant, Robin Anil, Sebastian Schelter, Stevo Slavić.{{Cite web|url=https://projects.apache.org/committee.html?mahout|title = Apache Committee Information}}
References
{{Reflist}}
External links
- {{Official website}}
{{Apache Software Foundation}}
{{Use dmy dates|date=June 2019}}
{{DEFAULTSORT:Mahout}}
Category:Apache Software Foundation projects