Data product

In data management and product management, a data product is a reusable, active, and standardized data asset designed to deliver measurable value to its users, whether internal or external, by applying the rigorous principles of product thinking and management. It comprises one or more data artifacts (e.g., datasets, models, pipelines) and is enriched with metadata, including governance policies, data quality rules, data contracts, and, where applicable, a software bill of materials (SBOM) to document its dependencies and components. Ownership of a data product is aligned to a specific domain or use case, ensuring accountability, stewardship, and its continuous evolution throughout its lifecycle. Adhering to the FAIR principles – findable, accessible, interoperable, and reusable – a data product is designed to be discoverable, scalable, reusable, and aligned with both business and regulatory standards, driving innovation and efficiency in modern data ecosystems.

History

In 2012, DJ Patil proposed the first documented definition: a data product is a product that facilitates an end goal through the use of data.{{cite web |last1=Patil |first1=DJ |title=Data Jujitsu: The Art of Turning Data into Product |url=https://www.oreilly.com/content/data-jujitsu-the-art-of-turning-data-into-product/ |publisher=O'Reilly |access-date=30 January 2025 |date=July 16, 2012}}

In 2019, Zhamak Dehghani introduced Data Mesh, with a strong focus on domain-oriented data products.{{cite web |last1=Dehghani |first1=Zhamak |title=How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh |url=https://martinfowler.com/articles/data-monolith-to-mesh.html |website=martinfowler.com |access-date=30 January 2025 |date=2019-05-20}} Later, in 2020, she solidifies Data Mesh around four principles, one being Data as a Product, in which she defines Data Product as the node on the mesh that encapsulates three structural components required for its function, providing access to the domain's analytical data as a product.{{cite web |last1=Dehghani |first1=Zhamak |title=Data Mesh Principles and Logical Architecture |url=https://martinfowler.com/articles/data-mesh-principles.html |website=MartinFowler.com |access-date=2025-01-30 |date=2020-12-03}}

In 2024, Andrea Gioia published one of the first books specifically on data products post Data Mesh announcement. In his book, Gioia defines the concept of pure data product.{{cite book |last1=Gioia |first1=Andrea |title=Managing Data as a Product: Design and build data-product-centered socio-technical architectures |date=2024-11-29 |publisher=Packt |isbn=9781835468531 |url=https://www.packtpub.com/en-us/product/managing-data-as-a-product-9781835468531 |access-date=30 January 2025}}

In 2025, during the Data Day Texas conference, Jean-Georges Perrin and a collective of product managers and data engineers got together to craft the current definition and make it available to the public domain.{{cite web |last1=Perrin |first1=Jean-Georges |first2=Malcolm |last2=Hawker |first3=Bethany |last3=Lyons |last4=Dolley |first4=Ryan |last5=Cao|first5=Lisa N. |first8=Yoann |last8=Benoit |first6=Joe |last6=Reis |first7=Juan |last7=Sequeda |title=Defining Data Products: A Community Effort |url=https://medium.com/p/77363611e5c5 |access-date=30 January 2025 |date=2025-01-28}}

See also

References