lossless join decomposition

In database design, a lossless join decomposition is a decomposition of a relation r into relations r_1, r_2 such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely from databases while preserving the original data.{{cite journal |last1=Pohler |first1=K |title=Lossless-Join Decomposition: applications in quantitative computing metrics |journal=International Journal of Applied Computer Science |date=2015 |volume=21 |issue=4 |pages=190–212}} Lossless join can also be called non-additive.{{cite book |last1=Elmasri |first1=Ramez |title=Fundamentals of database systems |date=2016 |publisher=Pearson |isbn=978-0133970777 |edition=Seventh |location=Hoboken, NJ |page=461}}

Definition

A relation r on schema R decomposes losslessly onto schemas R_1 and R_2 if \pi_{R_1}(r) \bowtie \pi_{R_2}(r) = r, that is r is the natural join of its projections onto the smaller schemas.

A pair (R_1, R_2) is a lossless-join decomposition of R or said to have a lossless join with respect to a set of functional dependencies F if any relation r(R) that satisfies F decomposes losslessly onto R_1 and R_2.{{cite book |last1=Maier |first1=David |title=The theory of relational databases |date=1983 |publisher=Computer Science Press |isbn=0-914894-42-0 |pages=101 |url=http://web.cecs.pdx.edu/~maier/TheoryBook/MAIER/C06.pdf |access-date=16 August 2024}}

Decompositions into more than two schemas can be defined in the same way.

Criteria

A decomposition R = R_1 \cup R_2 has a lossless join with respect to F if and only if the closure of R_1 \cap R_2 includes R_1 \setminus R_2 or R_2 \setminus R_1. In other words, one of the following must hold:{{cite book |last1=Ullman |first1=Jeffrey D. |title=Principles of Database and Knowledge-base Systems |date=1988 |publisher=Computer Science Press |isbn=0-88175188-X |page=397 |edition=1 |url=http://www.lsv.fr/~goubault/BD/ullman-principles-of-database-and-knowledge-base-systems-volume-1.pdf |access-date=16 August 2024}}

  • (R_1 \cap R_2) \to (R_1 \setminus R_2) \in F^+
  • (R_1 \cap R_2) \to (R_2 \setminus R_1) \in F^+

= Criteria for multiple sub-schemas =

Multiple sub-schemas R_1,R_2,...,R_n have a lossless join if there is some way in which we can repeatedly perform lossless joins until all the schemas have been joined into a single schema. Once we have a new sub-schema made from a lossless join, we are not allowed to use any of its isolated sub-schema to join with any of the other schemas. For example, if we can do a lossless join on a pair of schemas R_i, R_j to form a new schema R_{i,j}, we use this new schema (rather than R_i or R_j) to form a lossless join with another schema R_k (which may already be joined (e.g., R_{k,l})).{{Vague|date=August 2024}}

Example

  • Let R = \{A, B, C, D\} be the relation schema, with attributes {{mvar|A}}, {{mvar|B}}, {{mvar|C}} and {{mvar|D}}.
  • Let F = \{ A \rightarrow BC \} be the set of functional dependencies.
  • Decomposition into R_1 = \{A, B, C\} and R_2 = \{A, D\} is lossless under {{mvar|F}} because R_1 \cap R_2 = A and we have a functional dependency A \rightarrow BC. In other words, we have proven that (R_1 \cap R_2 \rightarrow R_1 \setminus R_2) \in F^+.{{Cite web|title = Lossless-Join Decomposition|url = http://www.cs.sfu.ca/CourseCentral/354/zaiane/material/notes/Chapter7/node7.html|website =Cs.sfu.ca|access-date = 2016-02-07}}{{Cite web |url=http://www.data-e-education.com/E121_Lossless_Join_Decomposition.html |title=www.data-e-education.com - Lossless Join Decomposition |access-date=2014-02-12 |archive-url=https://web.archive.org/web/20140221081148/http://www.data-e-education.com/E121_Lossless_Join_Decomposition.html |archive-date=2014-02-21 |url-status=dead}}

References