lossless join decomposition

In database design, a lossless join decomposition is a decomposition of a relation $r$ into relations $r_1, r_2$ such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely from databases while preserving the original data.{{cite journal |last1=Pohler |first1=K |title=Lossless-Join Decomposition: applications in quantitative computing metrics |journal=International Journal of Applied Computer Science |date=2015 |volume=21 |issue=4 |pages=190–212}} Lossless join can also be called non-additive.{{cite book |last1=Elmasri |first1=Ramez |title=Fundamentals of database systems |date=2016 |publisher=Pearson |isbn=978-0133970777 |edition=Seventh |location=Hoboken, NJ |page=461}}

Definition

A relation $r$ on schema $R$ decomposes losslessly onto schemas $R_1$ and $R_2$ if $\pi_{R_1}(r) \bowtie \pi_{R_2}(r) = r$ , that is $r$ is the natural join of its projections onto the smaller schemas.

A pair $(R_1, R_2)$ is a lossless-join decomposition of $R$ or said to have a lossless join with respect to a set of functional dependencies $F$ if any relation $r(R)$ that satisfies $F$ decomposes losslessly onto $R_1$ and $R_2$ .{{cite book |last1=Maier |first1=David |title=The theory of relational databases |date=1983 |publisher=Computer Science Press |isbn=0-914894-42-0 |pages=101 |url=http://web.cecs.pdx.edu/~maier/TheoryBook/MAIER/C06.pdf |access-date=16 August 2024}}

Decompositions into more than two schemas can be defined in the same way.

Criteria

A decomposition $R = R_1 \cup R_2$ has a lossless join with respect to $F$ if and only if the closure of $R_1 \cap R_2$ includes $R_1 \setminus R_2$ or $R_2 \setminus R_1$ . In other words, one of the following must hold:{{cite book |last1=Ullman |first1=Jeffrey D. |title=Principles of Database and Knowledge-base Systems |date=1988 |publisher=Computer Science Press |isbn=0-88175188-X |page=397 |edition=1 |url=http://www.lsv.fr/~goubault/BD/ullman-principles-of-database-and-knowledge-base-systems-volume-1.pdf |access-date=16 August 2024}}

$(R_1 \cap R_2) \to (R_1 \setminus R_2) \in F^+$
$(R_1 \cap R_2) \to (R_2 \setminus R_1) \in F^+$

= Criteria for multiple sub-schemas =

Multiple sub-schemas $R_1,R_2,...,R_n$ have a lossless join if there is some way in which we can repeatedly perform lossless joins until all the schemas have been joined into a single schema. Once we have a new sub-schema made from a lossless join, we are not allowed to use any of its isolated sub-schema to join with any of the other schemas. For example, if we can do a lossless join on a pair of schemas $R_i, R_j$ to form a new schema $R_{i,j}$ , we use this new schema (rather than $R_i$ or $R_j$ ) to form a lossless join with another schema $R_k$ (which may already be joined (e.g., $R_{k,l}$ )).{{Vague|date=August 2024}}

Example

Let $R = \{A, B, C, D\}$ be the relation schema, with attributes {{mvar|A}}, {{mvar|B}}, {{mvar|C}} and {{mvar|D}}.
Let $F = \{ A \rightarrow BC \}$ be the set of functional dependencies.
Decomposition into $R_1 = \{A, B, C\}$ and $R_2 = \{A, D\}$ is lossless under {{mvar|F}} because $R_1 \cap R_2 = A$ and we have a functional dependency $A \rightarrow BC$ . In other words, we have proven that $(R_1 \cap R_2 \rightarrow R_1 \setminus R_2) \in F^+$ .{{Cite web|title = Lossless-Join Decomposition|url = http://www.cs.sfu.ca/CourseCentral/354/zaiane/material/notes/Chapter7/node7.html|website =Cs.sfu.ca|access-date = 2016-02-07}}{{Cite web |url=http://www.data-e-education.com/E121_Lossless_Join_Decomposition.html |title=www.data-e-education.com - Lossless Join Decomposition |access-date=2014-02-12 |archive-url=https://web.archive.org/web/20140221081148/http://www.data-e-education.com/E121_Lossless_Join_Decomposition.html |archive-date=2014-02-21 |url-status=dead}}

References

Category:Databases

Category:Data modeling

Category:Database constraints

Category:Database normalization

Category:Relational algebra