DataOps

{{Short description|Aspect of data analytics}}

DataOps is a set of practices, processes and technologies that combines an integrated and process-oriented perspective on data with automation and methods from agile software engineering to improve quality, speed, and collaboration and promote a culture of continuous improvement in the area of data analytics.{{cite journal |last1=Ereth |first1=Julian |title=DataOps-Towards a Definition. |journal=Proceedings of LWDA 2018 |date=2018 |page=109 |url=http://ceur-ws.org/Vol-2191/paper13.pdf}} While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics.{{Cite web|url=https://www.datasciencecentral.com/profiles/blogs/dataops-it-s-a-secret|title=DataOps – It’s a Secret|website=www.datasciencecentral.com|language=en|access-date=2017-04-05}} DataOps applies to the entire data lifecycle{{Cite news|url=http://searchdatamanagement.techtarget.com/definition/DataOps|title=What is DataOps (data operations)? - Definition from WhatIs.com|work=SearchDataManagement|access-date=2017-04-05|language=en-US}} from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.

DataOps incorporates the Agile methodology to shorten the cycle time of analytics development in alignment with business goals.

DevOps focuses on continuous delivery by leveraging on-demand IT resources and by automating test and deployment of software. This merging of software development and IT operations has improved velocity, quality, predictability and scale of software engineering and deployment. Borrowing methods from DevOps, DataOps seeks to bring these same improvements to data analytics.

DataOps utilizes statistical process control (SPC) to monitor and control the data analytics pipeline. With SPC in place, the data flowing through an operational system is constantly monitored and verified to be working. If an anomaly occurs, the data analytics team can be notified through an automated alert.{{Cite web|url=https://medium.com/data-ops/lean-manufacturing-secrets-that-you-can-apply-to-data-analytics-31d1a319cbf0|title=Lean Manufacturing Secrets that You Can Apply to Data Analytics|last=DataKitchen|date=2017-03-07|website=Medium|access-date=2017-08-24}}

DataOps is not tied to a particular technology, architecture, tool, language or framework. Tools that support DataOps promote collaboration, orchestration, quality, security, access and ease of use.{{Cite web|url=https://www.nexla.com/define-dataops/|title=What is DataOps? {{!}} Nexla: Scalable Data Operations Platform for the Machine Learning Age|website=www.nexla.com|language=en-US|access-date=2017-09-07}}

History

DataOps was first introduced by Lenny Liebmann, Contributing Editor, InformationWeek, in a blog post on the IBM Big Data & Analytics Hub titled "3 reasons why DataOps is essential for big data success" on June 19, 2014.{{Cite news|url=https://web.archive.org/web/20180810105123/http://www.ibmbigdatahub.com/blog/3-reasons-why-dataops-essential-big-data-success|title=3 reasons why DataOps is essential for big data success|work=IBM Big Data & Analytics Hub|access-date=2018-08-10|language=en}} The term DataOps was later popularized by Andy Palmer of Tamr and Steph Locke.{{Citation|title=Mango Solutions: #DataOps - it's a thing (honest)|url=https://www.youtube.com/watch?v=64PIa9gcuh0|language=en|access-date=2021-06-28}}{{Cite news|url=https://www.tamr.com/from-devops-to-dataops-by-andy-palmer/|title=From DevOps to DataOps, By Andy Palmer - Tamr Inc.|date=2015-05-07|work=Tamr Inc.|access-date=2017-03-21|language=en-US|archive-date=2018-07-12|archive-url=https://web.archive.org/web/20180712103947/https://www.tamr.com/from-devops-to-dataops-by-andy-palmer/|url-status=dead}} DataOps is a moniker for "Data Operations." 2017 was a significant year for DataOps with significant ecosystem development, analyst coverage, increased keyword searches, surveys, publications, and open source projects.{{Cite web|url=https://medium.com/data-ops/2017-the-year-of-dataops-b2023c17d2af|title=2017: The Year of DataOps|last=DataKitchen|date=2017-12-19|website=data-ops|access-date=2018-01-24}} Gartner named DataOps on the Hype Cycle for Data Management in 2018.{{Cite web|url=https://www.gartner.com/en/newsroom/press-releases/2018-09-11-gartner-hype-cycle-for-data-management-positions-three-technologies-in-the-innovation-trigger-phase-in-2018|title=Gartner Hype Cycle for Data Management Positions Three Technologies in the Innovation Trigger Phase in 2018|website=Gartner|language=en|access-date=2019-07-19}} File:Dataops.gif

Goals and philosophy

The volume of data is forecast to grow at a rate of 32% CAGR to 180 Zettabytes by the year 2025 (Source: IDC). DataOps seeks to provide the tools, processes, and organizational structures to cope with this significant increase in data. Automation streamlines data preboarding, ingestion, and the management of large integrated databases, freeing the data team to develop new analytics in a more efficient and effective way.{{Cite news|url=http://www.ciodive.com/news/5-trends-driving-big-data-in-2017/445239/|title=5 trends driving Big Data in 2017|work=CIO Dive|access-date=2017-09-07|language=en-US}} DataOps seeks to increase velocity, reliability, and quality of data analytics.{{Cite news|url=http://www.dbta.com/Editorial/News-Flashes/Unravel-Data-Advances-Application-Performance-Management-for-Big-Data-116893.aspx|title=Unravel Data Advances Application Performance Management for Big Data|date=2017-03-10|work=Database Trends and Applications|access-date=2017-09-07|language=en-US}} It emphasizes communication, collaboration, integration, automation, measurement and cooperation between data scientists, analysts, data/ETL (extract, transform, load) engineers, information technology (IT), and quality assurance/governance.

Implementation

Toph Whitmore at Blue Hill Research offers these DataOps leadership principles for the information technology department:

  • “Establish progress and performance measurements at every stage of the data flow. Where possible, benchmark data-flow cycle times.
  • Define rules for an abstracted semantic layer. Ensure everyone is “speaking the same language” and agrees upon what the data (and metadata) is and is not.
  • Validate with the “eyeball test”: Include continuous-improvement -oriented human feedback loops. Consumers must be able to trust the data, and that can only come with incremental validation.
  • Automate as many stages of the data flow as possible including BI, data science, and analytics.
  • Using benchmarked performance information, identify bottlenecks and then optimize for them. This may require investment in commodity hardware, or automation of a formerly-human-delivered data-science step in the process.
  • Establish governance discipline, with a particular focus on two-way data control, data ownership, transparency, and comprehensive data lineage tracking through the entire workflow.
  • Design process for growth and extensibility. The data flow model must be designed to accommodate volume and variety of data. Ensure enabling technologies are priced affordably to scale with that enterprise data growth.”

Events

  • Data Opticon{{Cite web|title=DataOpticon - YouTube|url=https://www.youtube.com/channel/UCAnJYf1L59gVp2SAeHMce1A/featured|access-date=2021-06-28|website=www.youtube.com}}
  • Data Ops Summit{{Cite web|title=DataOps Summit|url=https://www.dataopssummit-sf.com/about/|access-date=2021-06-28|website=www.dataopssummit-sf.com|archive-date=2021-07-02|archive-url=https://web.archive.org/web/20210702063917/https://www.dataopssummit-sf.com/about/|url-status=dead}}
  • Data Ops Online Champion{{Cite web|last=Intelligence|first=Corinium Global|title=DataOps Champions Online 2021 {{!}} Corinium|url=https://dco-dataops.coriniumintelligence.com/|access-date=2021-06-28|website=dco-dataops.coriniumintelligence.com|language=en}}

References