MLOps

{{Short description|Approach to machine learning lifecycle management}}

File:ML Ops Venn Diagram.svg

MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently. It bridges the gap between [https://tech-stack.com/services/machine-learning-development machine learning development] and production operations, ensuring that models are robust, scalable, and aligned with business goals. The word is a compound of "machine learning" and the continuous delivery practice (CI/CD) of DevOps in the software field. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, MLOps is practiced between Data Scientists, DevOps, and Machine Learning engineers to transition the algorithm to production systems.{{cite web |last1=Talagala |first1=Nisha |title=Why MLOps (and not just ML) is your Business' New Competitive Frontier |url=https://aitrends.com/machine-learning/mlops-not-just-ml-business-new-competitive-frontier/ |website=AITrends |accessdate=30 January 2018}} Similar to DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while also focusing on business and regulatory requirements. While MLOps started as a set of best practices, it is slowly evolving into an independent approach to ML lifecycle management. MLOps applies to the entire lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.

Definition

MLOps is a paradigm, including aspects like best practices, sets of concepts, as well as a development culture when it comes to the end-to-end conceptualization, implementation, monitoring, deployment, and scalability of machine learning products. Most of all, it is an engineering practice that leverages three contributing disciplines: machine learning, software engineering (especially DevOps), and data engineering. MLOps is aimed at productionizing machine learning systems by bridging the gap between development (Dev) and operations (Ops). Essentially, MLOps aims to facilitate the creation of machine learning products by leveraging these principles: CI/CD automation, workflow orchestration, reproducibility; versioning of data, model, and code; collaboration; continuous ML training and evaluation; ML metadata tracking and logging; continuous monitoring; and feedback loops.{{Cite journal |last1=Kreuzberger |first1=Dominik |last2=Kühl |first2=Niklas |last3=Hirschl |first3=Sebastian |date=2023 |title=Machine Learning Operations (MLOps): Overview, Definition, and Architecture |url=https://ieeexplore.ieee.org/document/10081336 |journal=IEEE Access |volume=11 |pages=31866–31879 |doi=10.1109/ACCESS.2023.3262138 |issn=2169-3536|arxiv=2205.02302 |bibcode=2023IEEEA..1131866K |s2cid=248524628 }}

History

The challenges of the ongoing use of machine learning in applications were highlighted in a 2015 paper.{{cite journal |last1=Sculley |first1=D. |last2=Holt |first2=Gary |last3=Golovin |first3=Daniel |last4=Davydov |first4=Eugene |last5=Phillips |first5=Todd |last6=Ebner |first6=Dietmar |last7=Chaudhary |first7=Vinay |last8=Young |first8=Michael |last9=Crespo |first9=Jean-Francois |last10=Dennison |first10=Dan |title=Hidden Technical Debt in Machine Learning Systems |journal=NIPS Proceedings |date=7 December 2015 |issue=2015 |url=https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf |accessdate=14 November 2017}} The predicted growth in machine learning included an estimated doubling of ML pilots and implementations from 2017 to 2018, and again from 2018 to 2020.{{cite web |last1=Sallomi |first1=Paul |last2=Lee |first2=Paul |title=Deloitte Technology, Media and Telecommunications Predictions 2018 |url=https://www2.deloitte.com/content/dam/Deloitte/global/Images/infographics/technologymediatelecommunications/gx-deloitte-tmt-2018-predictions-full-report.pdf |website=Deloitte |accessdate=13 October 2017}} MLOps rapidly began to gain traction among AI/ML experts, companies, and technology journalists as a solution that can address the complexity and growth of machine learning in businesses. https://www.meetup.com/MLOps-Silicon-Valley/?_cookie-check=o1SkbKRfUlSuQoT3

Reports show a majority (up to 88%) of corporate machine learning initiatives are struggling to move beyond test stages. However, those organizations that actually put machine learning into production saw a 3–15% profit margin increases.{{cite web |last1=Bughin |first1=Jacques |last2=Hazan |first2=Eric |last3=Ramaswamy |first3=Sree |last4=Chui |first4=Michael |last5=Allas |first5=Tera |last6=Dahlström |first6=Peter |last7=Henke |first7=Nicolaus |last8=Trench |first8=Monica |title=Artificial Intelligence The Next Digital Frontier? |url=https://www.mckinsey.com/~/media/McKinsey/Industries/Advanced%20Electronics/Our%20Insights/How%20artificial%20intelligence%20can%20deliver%20real%20value%20to%20companies/MGI-Artificial-Intelligence-Discussion-paper.ashx |website=McKinsey |publisher=McKinsey Global Institute |accessdate=1 June 2017}} The MLOps market was estimated at $23.2 billion in 2019 and is projected to reach $126 billion by 2025 due to rapid adoption.

Architecture

Machine Learning systems can be categorized in eight different categories: data collection, data processing, feature engineering, data labeling, model design, model training and optimization, endpoint deployment, and endpoint monitoring. Each step in the machine learning lifecycle is built in its own system, but requires interconnection. These are the minimum systems that enterprises need to scale machine learning within their organization.

Goals

There are a number of goals enterprises want to achieve through MLOps systems successfully implementing ML across the enterprise, including:{{cite web |last1=Walsh |first1=Nick |title=The Rise of Quant-Oriented Devs & The Need for Standardized MLOps |url=http://slides.com/walsh/standards-in-ml-ops#/ |website=Slides |publisher=Nick Walsh |accessdate=1 January 2018}}

  • Deployment and automation{{Cite web|date=2021-02-03|title=Code to production-ready machine learning in 4 steps|url=https://dagshub.com/blog/code-to-production-ready-machine-learning-in-4-steps/|access-date=2021-02-19|website=DAGsHub Blog|language=en}}
  • Reproducibility of models and predictions{{cite web |last1=Warden |first1=Pete |title=The Machine Learning Reproducibility Crisis |url=https://petewarden.com/2018/03/19/the-machine-learning-reproducibility-crisis/ |website=Pete Warden's Blog |publisher=Pete Warden |accessdate=19 March 2018}}
  • Diagnostics
  • Governance and regulatory compliance{{cite web |last1=Vaughan |first1=Jack |title=Machine learning algorithms meet data governance |url=https://searchdatamanagement.techtarget.com/feature/Machine-learning-algorithms-meet-data-governance |website=SearchDataManagement |publisher=TechTarget |accessdate=1 September 2017}}
  • Scalability{{cite web |last1=Lorica |first1=Ben |title=How to train and deploy deep learning at scale |url=https://www.oreilly.com/ideas/how-to-train-and-deploy-deep-learning-at-scale |website=O'Reilly |accessdate=15 March 2018}}
  • Collaboration{{cite web |last1=Garda |first1=Natalie |title=IoT and Machine Learning: Why Collaboration is Key |url=https://www.iottechexpo.com/2017/10/ai/iot-machine-learning-ml-ai-why-collaboration-key/ |website=IoT Tech Expo |publisher=Encore Media Group |accessdate=12 October 2017}}
  • Business uses{{cite web |last1=Manyika |first1=James |title=What's now and next in analytics, AI, and automation |url=https://www.mckinsey.com/featured-insights/digital-disruption/whats-now-and-next-in-analytics-ai-and-automation |website=McKinsey |publisher=McKinsey Global Institute |accessdate=1 May 2017}}
  • Monitoring and management{{cite web |last1=Haviv |first1=Yaron |title=MLOps Challenges, Solutions and Future Trends |url=https://www.iguazio.com/blog/mlops-challenges-solutions-future-trends/ |website=Iguazio |accessdate=19 February 2020}}

A standard practice, such as MLOps, takes into account each of the aforementioned areas, which can help enterprises optimize workflows and avoid issues during implementation.

A common architecture of an MLOps system would include data science platforms where models are constructed and the analytical engines where computations are performed, with the MLOps tool orchestrating the movement of machine learning models, data and outcomes between the systems.

See also

  • ModelOps, according to Gartner, MLOps is a subset of ModelOps. MLOps is focused on the operationalization of ML models, while ModelOps covers the operationalization of all types of AI models.
  • AIOps, a similarly named, but different concept - using AI (ML) in IT and Operations.

References