CatBoost
{{Short description|Open-source software library developed by Yandex}}
{{Infobox software
| name = CatBoost
| logo = CatBoostLogo.png
| screenshot =
| caption =
| author = Andrey Gulin:{{Cite web|url=https://research.yandex.com/people/102811|title=Andrey Gulin - People - Research at Yandex|website=research.yandex.com}} / Yandex
| developer = Yandex and CatBoost Contributors{{Cite web|url=https://github.com/catboost/catboost|title=catboost/catboost|website=GitHub}}
| released = {{Start date and age|2017|07|18}}{{Cite web|title=Yandex open sources CatBoost, a gradient boosting machine learning library|url=https://techcrunch.com/2017/07/18/yandex-open-sources-catboost-a-gradient-boosting-machine-learning-librar/|access-date=2020-08-30|website=TechCrunch|date=18 July 2017 |language=en-US}}{{Cite web|last=Yegulalp|first=Serdar|date=2017-07-18|title=Yandex open sources CatBoost machine learning library|url=https://www.infoworld.com/article/3209124/yandex-open-sources-catboost-machine-learning-library.html|access-date=2020-08-30|website=InfoWorld|language=en}}
| latest release version = 1.2.3{{Cite web|title=Releases · catboost/catboost|url=https://github.com/catboost/catboost/releases|access-date=2024-03-14|website=GitHub|language=en}}
| latest release date = {{Start date and age|2024|02|23}}
| operating system = Linux, macOS, Windows
| programming language = Python, R, C++, Java
| genre = Machine learning
| license = Apache License 2.0
| website = {{URL|https://catboost.ai/}}
}}
CatBoost{{Cite web|url=https://github.com/catboost/catboost|title=catboost/catboost|date=August 30, 2020|via=GitHub}} is an open-source software library developed by Yandex. It provides a gradient boosting framework which, among other features, attempts to solve for categorical features using a permutation-driven alternative to the classical algorithm.{{cite arXiv|last1=Prokhorenkova|first1=Liudmila|last2=Gusev|first2=Gleb|last3=Vorobev|first3=Aleksandr|last4=Dorogush|first4=Anna Veronika|last5=Gulin|first5=Andrey|date=2019-01-20|title=CatBoost: unbiased boosting with categorical features|class=cs.LG|eprint=1706.09516}} It works on Linux, Windows, macOS, and is available in
Python,{{Cite web|url=https://pypi.python.org/pypi/catboost/|title=Python Package Index PYPI: catboost|access-date=2020-08-20}}
R,{{Cite web|url=https://anaconda.org/conda-forge/r-catboost|title=Conda force package catboost-r|access-date=2020-08-30}} and models built using CatBoost can be used for predictions in C++, Java,{{Cite web|title=Maven Repository: ai.catboost » catboost-prediction|url=https://mvnrepository.com/artifact/ai.catboost/catboost-prediction|access-date=2020-08-30|website=mvnrepository.com}} C#, Rust, Core ML, ONNX, and PMML. The source code is licensed under Apache License and available on GitHub.
InfoWorld magazine awarded the library "The best machine learning tools" in 2017.{{Cite web|url=https://www.infoworld.com/article/3228224/bossie-awards-2017-the-best-machine-learning-tools.html|title=Bossie Awards 2017: The best machine learning tools|first=InfoWorld|last=staff|date=27 September 2017|website=InfoWorld}} along with TensorFlow, Pytorch, XGBoost and 8 other libraries.
Kaggle listed CatBoost as one of the most frequently used machine learning (ML) frameworks in the world. It was listed as the top-8 most frequently used ML framework in the 2020 survey{{Cite web|url=https://www.kaggle.com/kaggle-survey-2020|title=State of Data Science and Machine Learning 2020}} and as the top-7 most frequently used ML framework in the 2021 survey.{{Cite web|url=https://www.kaggle.com/kaggle-survey-2021|title=State of Data Science and Machine Learning 2021}}
As of April 2022, CatBoost is installed about 100000 times per day from PyPI repository{{Cite web|title=PyPI Stats catboost|url=https://pypistats.org/packages/catboost|website=PyPI Stats|language=en-US}}
Features
CatBoost has gained popularity compared to other gradient boosting algorithms primarily due to the following features{{Cite web|last=Joseph|first=Manu|date=2020-02-29|title=The Gradient Boosters V: CatBoost|url=https://deep-and-shallow.com/2020/02/29/the-gradient-boosters-v-catboost/|access-date=2020-08-30|website=Deep & Shallow|language=en}}
- Native handling for categorical features{{cite arXiv|last1=Dorogush|first1=Anna Veronika|last2=Ershov|first2=Vasily|last3=Gulin|first3=Andrey|date=2018-10-24|title=CatBoost: gradient boosting with categorical features support|class=cs.LG|eprint=1810.11363}}
- Fast GPU training{{Cite web|date=2018-12-13|title=CatBoost Enables Fast Gradient Boosting on Decision Trees Using GPUs|url=https://developer.nvidia.com/blog/catboost-fast-gradient-boosting-decision-trees/|access-date=2020-08-30|website=NVIDIA Developer Blog|language=en-US}}
- Visualizations and tools for model and feature analysis
- Using oblivious trees or symmetric trees for faster execution
- Ordered boosting to overcome overfitting
History
In 2009 Andrey Gulin developed MatrixNet, a proprietary gradient boosting library that was used in Yandex to rank search results.
Since 2009 MatrixNet has been used in different projects in Yandex, including recommendation systems and weather prediction.
In 2014–2015 Andrey Gulin with a team of researchers has started a new project called Tensornet that was aimed at solving the problem of "how to work with categorical data". It resulted in several proprietary Gradient Boosting libraries with different approaches to handling categorical data.
In 2016 Machine Learning Infrastructure team led by Anna Dorogush started working on Gradient Boosting in Yandex, including Matrixnet and Tensornet. They implemented and open-sourced the next version of Gradient Boosting library called CatBoost, which has support of categorical and text data, GPU training, model analysis, visualization tools.
CatBoost was open-sourced in July 2017 and is under active development in Yandex and the open-source community.
Application
- JetBrains uses CatBoost for code completion{{Cite web|date=2021-08-20|title=Code Completion, Episode 4: Model Training|url=https://blog.jetbrains.com/blog/2021/08/20/code-completion-episode-4-model-training/|website=JetBrains Developer Blog|language=en-US}}
- Cloudflare uses CatBoost for bot detection{{Cite web|date=2019-02-20|title=Stop the Bots: Practical Lessons in Machine Learning|url=https://blog.cloudflare.com/stop-the-bots-practical-lessons-in-machine-learning/|website=The Cloudflare Blog|language=en-US}}
- Careem uses CatBoost to predict future destinations of the rides{{Cite web|date=2019-02-19|title=How Careem's Destination Prediction Service speeds up your ride|url=https://blog.careem.com/en/careems-destination-prediction-service/|website=Careem|language=en-US}}
See also
{{Portal|Free and open-source software}}
References
{{Reflist}}
External links
- [https://catboost.ai/ CatBoost]
- [https://github.com/catboost/catboost GitHub - catboost/catboost]
- [https://tech.yandex.com/catboost/ CatBoost - Yandex Technology]
Category:Free and open-source software
Category:Open-source artificial intelligence
Category:Software using the Apache license
Category:Python (programming language) scientific libraries
Category:Applied machine learning
Category:Data mining and machine learning software
Category:Free data analysis software