Quality of Data
{{Orphan|date=January 2016}}
{{context|date=May 2018}}
Quality of Data (QoD) is a designation coined by L. Veiga, that specifies and describes the required Quality of Service of a distributed storage system from the Consistency point of view of its data. It can be used to support big data management frameworks, Workflow management, and HPC systems (mainly for data replication and consistency). It takes into account data semantics, namely the Time interval of data freshness, the Sequence of tolerable number of outstanding versions of the data read before ore refresh, and the Value divergence allowed before displaying it. Initially it was based on a model from an existing research work regarding vector-field Consistency,{{cite conference |author1=Nuno Santos |author2=Luís Veiga |author3=Paulo Ferreira | year=2007 | title=Vector-Field Consistency for Adhoc Gaming| book-title = ACM/IFIP/Usenix Middleware Conference 2007 | url=http://www.gsd.inesc-id.pt/~lveiga/msc-08-09/vfc-middleware-07.pdf }} awarded the best-paper prize in the ACM/IFIP/Usenix Middleware Conference 2007 and later enhanced for increased scalability and fault-tolerance.{{cite conference |author1=Luís Veiga |author2=André Negrão |author3=Nuno Santos |author4=Paulo Ferreira | year=2010 | title=Unifying Divergence Bounding and Locality Awareness in Replicated Systems with Vector-Field Consistency | book-title = JISA, Journal of Internet Services and Applications, Volume 1, Number 2, 95-115, Springer, 2010 | url=http://www.gsd.inesc-id.pt/~lveiga/vfc-JISA-2010.pdf }}
This consistency model has been successfully applied and proven in big data key/value store Apache HBase,{{Cite web |title=Apache HBase – Apache HBase™ Home |url=https://hbase.apache.org/ |access-date=2022-10-15 |website=hbase.apache.org}} initially designed as a middleware{{cite conference |author1=Sergio Estéves |author2=João Silva |author3=Luís Veiga |name-list-style=amp | year=2013 | title=Quality-of-service for consistency of data geo-replication in cloud computing | book-title = Euro-Par 2012 Parallel Processing. Springer Berlin Heidelberg, 2012. 285-297 | url=http://www.gsd.inesc-id.pt/~sesteves/papers/vfc3-europar12.pdf }} module seating between clusters from separate data centres. The HBase-QoD coupling {{cite conference |author1=Álvaro García-Recuero |author2=Sergio Estéves |author3=Luís Veiga | year=2013 | title=Quality-of-Data for Consistency Levels in Geo-replicated Cloud Data Stores | book-title = IEEE CloudCom 2013 | url=http://www.inesc-id.pt/ficheiros/publicacoes/9253.pdf }} minimises bandwidth usage and optimises resources allocation during replication achieving the desired consistency level at a more fine-grained level.
QoD is defined by the three-dimensions of vector k=(θ,σ,ν), but with a broader view of the issue, applicable also to large-scale data management techniques in regards to their timely delivery.[http://www-01.ibm.com/software/data/quality/ Data Quality] Published by IBM
Other descriptions
Quality of Data should not be confused with other definitions for data quality such as completeness, validity, and accuracy.{{cite conference |author1=Richard Y. Wang | year= 1992 | title=Toward quality data : an attribute-based approach | book-title=Decision Support Systems 13, MIT | url=http://web.mit.edu/tdqm/www/tdqmpub/Toward%20Quality%20Data.pdf }} {{cite conference |author1=George A. Mihaila |author2=Louiqa Raschid|author3-link=María-Esther Vidal|author2-link= Louiqa Raschid |author3=María-Esther Vidal | year= 2000 | title=Using Quality of Data Metadata for Source Selection and Ranking |citeseerx=10.1.1.34.9361}}
References
{{Reflist}}