scikit-multiflow

{{short description|Machine learning library for data streams in Python}}

{{lowercase title}}

{{Infobox software

| name = scikit-mutliflow

| logo = Scikit-multiflow-logo.png

| screenshot =

| caption =

| collapsible =

| author = Jacob Montiel, Jesse Read, Albert Bifet, Talel Abdessalem

| developer = The scikit-mutliflow development team and the open research community

| released = {{Start date and age|2018|01|df=yes}}

| latest release version = 0.5.3

| latest release date = {{Start date and age|2020|06|17|df=yes}}{{cite web |title=scikit-mutliflow Version 0.5.3 |url=https://scikit-multiflow.readthedocs.io/en/stable/whats_new.html#version-0-5-0}}{{cite web |title=scikit-learn 0.5.3 |url=https://pypi.org/project/scikit-multiflow/0.5.3/|website=Python Package Index}}

| latest preview version =

| latest preview date =

| repo = https://github.com/scikit-multiflow/scikit-multiflow

| programming language = Python, Cython

| operating system = Linux, macOS, Windows

| platform =

| size =

| language =

| genre = Library for machine learning

| license = BSD 3-clause license

| website = {{URL|https://scikit-multiflow.github.io/}}

}}

scikit-mutliflow (also known as skmultiflow) is a free and open source software machine learning library for multi-output/multi-label and stream data written in Python.{{Cite journal|last1=Montiel|first1=Jacob|last2=Read|first2=Jesse|last3=Bifet|first3=Albert|last4=Abdessalem|first4=Talel|date=2018|title=Scikit-Multiflow: A Multi-output Streaming Framework|url=http://jmlr.org/papers/v19/18-251.html|journal=Journal of Machine Learning Research|volume=19|issue=72|pages=1–5|issn=1533-7928}}

Overview

scikit-multiflow allows to easily design and run experiments and to extend existing stream learning algorithms. It features a collection of classification, regression, concept drift detection and anomaly detection algorithms. It also includes a set of data stream generators and evaluators. scikit-multiflow is designed to interoperate with Python's numerical and scientific libraries NumPy and SciPy and is compatible with Jupyter Notebooks.

Implementation

The scikit-multiflow library is implemented under the open research principles and is currently distributed under the BSD 3-clause license. scikit-multiflow is mainly written in Python, and some core elements are written in Cython for performance. scikit-multiflow integrates with other Python libraries such as Matplotlib for plotting, scikit-learn for incremental learning methods{{Cite web|url=https://scikit-learn.org/stable/modules/computing.html?highlight=incremental#incremental-learning|title=scikit-learn — Incremental learning|last=|first=|date=|website=scikit-learn.org|archive-url=|archive-date=|access-date=2020-04-08}} compatible with the stream learning setting, Pandas for data manipulation, Numpy and SciPy.

Components

The scikit-multiflow is composed of the following sub-packages:

  • anomaly_detection: anomaly detection methods.
  • data: data stream methods including methods for batch-to-stream conversion and generators.
  • drift_detection: methods for concept drift detection.
  • evaluation: evaluation methods for stream learning.
  • lazy: methods in which generalisation of the training data is delayed until a query is received, i.e., neighbours-based methods such as kNN.
  • meta: meta learning (also known as ensemble) methods.
  • neural_networks: methods based on neural networks.
  • prototype: prototype-based learning methods.
  • rules: rule-based learning methods.
  • transform: perform data transformations.
  • trees: tree-based methods, e.g. Hoeffding trees which are a type of decision tree for data streams.

History

scikit-multiflow started as a collaboration between researchers at Télécom Paris (Institut Polytechnique de Paris{{Cite web|url=https://www.ip-paris.fr/en/home-en/|title=Institut Polytechnique de Paris|last=|first=|date=|language=en-GB|archive-url=|archive-date=|access-date=2020-04-08}}) and École Polytechnique. Development is currently carried by the University of Waikato, Télécom Paris, École Polytechnique and the open research community.

See also

{{Portal|Free and open-source software}}

  • Massive Online Analysis (MOA){{Cite journal|last1=Bifet|first1=Albert|last2=Holmes|first2=Geoff|last3=Kirkby|first3=Richard|last4=Pfahringer|first4=Bernhard|date=2010|title=MOA: Massive Online Analysis|url=http://jmlr.org/papers/v11/bifet10a.html|journal=Journal of Machine Learning Research|volume=11|issue=52|pages=1601–1604|issn=1533-7928}}
  • MEKA{{Cite journal|last1=Read|first1=Jesse|last2=Reutemann|first2=Peter|last3=Pfahringer|first3=Bernhard|last4=Holmes|first4=Geoff|date=2016|title=MEKA: A Multi-label/Multi-target Extension to WEKA|url=http://jmlr.org/papers/v17/12-164.html|journal=Journal of Machine Learning Research|volume=17|issue=21|pages=1–5|issn=1533-7928}}

References

{{Reflist|30em}}