Data blending

Data blending is a process whereby big data from multiple sources[https://blog.ventanaresearch.com/2014/05/30/alteryx-analytics-brings-power-of-predictive-and-big-data-to-market Alteryx Analytics Brings Power of Predictive and Big Data to Market] are merged into a single data warehouse or data set.[https://altair.com/what-is-data-blending Data blending is the process of combining data from multiple sources into a functioning data set]

Data blending allows business analysts to cope with the expansion of data that they need to make critical business decisions based on good quality business intelligence.{{Cite web| url=https://www.trifacta.com/data-blending/| title=Data Blending| date=24 August 2017| publisher=Trifacta.com}} Data blending has been described as different from data integration due to the requirements of data analysts to merge sources very quickly, too quickly for any practical intervention by data scientists.[https://www.softwareadvice.com/resources/what-is-data-blending-tool/ What Is Data Blending, and Which Tools Make It Easier?] A study done by Forrester Consulting in 2015 found that 52 percent of companies are blending 50 or more data sources and 12 percent are blending over 1,000 sources.{{Cite web |title=Data Mashups for Analytics |url=http://www.pentaho.com/data-mashups-for-analytics |publisher=Pentaho}}

Extract, transform, load

Data blending is similar to extract, transform, load (ETL). Both ETL and data blending take data from various sources and combine them. However, ETL is used to merge and structure data into a target database,{{Cite web |title=How ETL Works |url=https://databricks.com/de/glossary/extract-transform-load |access-date=2021-02-27 |publisher=Databricks |language=de-DE}} often a data warehouse. Data blending differs slightly as it's about joining data for a specific use case at a specific time.{{Cite web |date=2016-08-25 |title=What Is Data Blending, and Which Tools Make It Easier? |url=https://www.softwareadvice.com/resources/what-is-data-blending-tool/ |access-date=2021-02-27 |publisher=Software Advice}} With some software, data isn't written into a database, which is very different to ETL. For example, with Google Data Studio.{{Cite web |title=Google Data Studio Overview |url=https://datastudio.google.com/overview |access-date=2021-02-27 |publisher=datastudio.google.com}}

Software products

Representing the increased demand for analysts to combine data sources, multiple software companies have seen large growth and raised millions of dollars,{{Cite web|title=Incorta raises $30M Series C for ETL-free data processing solution|url=https://techcrunch.com/2019/08/15/incorta-raises-30m-series-c-for-etl-free-data-processing-solution/|access-date=2021-02-27| publisher=TechCrunch }} with some early entrants into the market now public companies.{{Cite web|title=Alteryx Announces Pricing of Initial Public Offering|url=https://www.alteryx.com/press-releases/2017-03-23-alteryx-announces-pricing-initial-public-offering|access-date=2021-02-27|website=Alteryx|language=en}} Examples include AWS, Alteryx, Microsoft Power Query,{{Cite web|last=Corporation|first=Microsoft|title=Microsoft Power Query|url=https://powerquery.microsoft.com/en-us/|access-date=2021-02-27|website=powerquery.microsoft.com }} and Incorta,{{Cite web|title=Direct Data Analytics Software | publisher= Incorta| url=https://www.incorta.com/|access-date=2021-02-27 }} which enable combining data from many different data sources, for example, text files, databases, XML, JSON, and many other forms of structured and semi-structured data.{{Cite web|title=Data Sources|url=https://docs.incorta.com/4.4/data-sources/|access-date=2021-02-27|website=docs.incorta.com|language=en}}{{Cite web| last=davidiseminger| title=Shape and combine data from multiple sources using Power Query| url=https://docs.microsoft.com/en-us/power-query/power-query-tutorial-shape-combine| access-date=2021-02-27| publisher=docs.microsoft.com }}{{Cite web|title=Supported Data Sources: Amazon QuickSight|url=https://docs.aws.amazon.com/quicksight/latest/user/supported-data-sources.html|access-date=2021-02-27| publisher=docs.aws.amazon.com}}{{Cite web|title=Data Sources | publisher=Alteryx Help| url=https://help.alteryx.com/current/designer/data-sources| access-date=2021-02-27 }}

= Tableau =

In tableau software, data blending is a technique to combine data from multiple data sources in the data visualization.{{Cite web| title=Blend Your Data| url=https://help.tableau.com/current/pro/desktop/en-us/multiple_connections.htm| access-date=2021-02-27| publisher=help.tableau.com }} A key differentiator is the granularity of the data join. When blending data into a single data set, this would use a SQL database join, which would usually join at the most granular level, using an ID field where possible.{{Cite web| title=SQL Joins Explained| url=http://www.sql-join.com/| access-date=2021-02-27| publisher=SQL Joins Explained}} A data blend in tableau should happen at the least granular level.{{Cite web| last=TAR Solutions|date=2021-01-20| title=Data Blending in Tableau| url=https://tarsolutions.co.uk/blog/data-blending-in-tableau/|access-date=2021-02-27| publisher=TAR Solutions }}

= Looker Studio =

In Google's Looker Studio, data sources are combined by joining the records of one data source with the records of up to 4 other data sources.

Similar to Tableau, the data blend only happens on the reporting layer. The blended data is never stored as a separate combined data source.{{Cite web |title=About data blending - Data Studio Help |url=https://support.google.com/datastudio/answer/9061420|access-date=2021-02-27 |publisher=support.google.com}}

Challenges with data blending

The most common custom metadata question is: "How can this dataset blend with (join or union to) my other datasets?"{{Cite book| title=Principles of Data Wrangling| last1=Heer| first1=Jeffrey| last2=Hellerstein| first2=Joseph| last3=Kandel| first3=Sean| last4=Rattenbury| first4=Tye| publisher=O'Reilly Media| date=July 2017}}

See also

References