quasi-identifier

Quasi-identifiers are pieces of information that are not of themselves unique identifiers, but are sufficiently well correlated with an entity that they can be combined with other quasi-identifiers to create a unique identifier.{{cite web|url=http://stats.oecd.org/glossary/detail.asp?ID=6961|title=Glossary of Statistical Terms: Quasi-identifier|publisher=OECD|date=November 10, 2005|access-date=29 September 2013}}

Quasi-identifiers can thus, when combined, become personally identifying information. This process is called re-identification. As an example, Latanya Sweeney has shown that even though neither gender, birth dates nor postal codes uniquely identify an individual, the combination of all three is sufficient to identify 87% of individuals in the United States.Sweeney, Latanya. Simple demographics often identify people uniquely. Carnegie Mellon University, 2000. http://dataprivacylab.org/projects/identifiability/paper1.pdf

The term was introduced by Tore Dalenius in 1986.Dalenius, Tore. Finding a Needle In a Haystack or Identifying Anonymous Census Records. Journal of Official Statistics, Vol.2, No.3, 1986. pp. 329–336. http://www.jos.nu/Articles/abstract.asp?article=23329 {{Webarchive|url=https://web.archive.org/web/20170808033912/http://www.jos.nu/Articles/abstract.asp?article=23329 |date=2017-08-08 }} Since then, quasi-identifiers have been the basis of several attacks on released data. For instance, Sweeney linked health records to publicly available information to locate the then-governor of Massachusetts' hospital records using uniquely identifying quasi-identifiers,Anderson, Nate.

Anonymized data really isn’t—and here’s why not. Ars Technica, 2009. https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/ Barth-Jones, Daniel C. The're-identification'of Governor William Weld's medical information: a critical re-examination of health data identification risks and privacy protections, then and now. Then and Now (June 4, 2012) (2012). and Sweeney, Abu and Winn used public voter records to re-identify participants in the Personal Genome Project. Sweeney, Latanya, Akua Abu, and Julia Winn. "Identifying participants in the personal genome project by name." Available at SSRN 2257732 (2013). Additionally, Arvind Narayanan and Vitaly Shmatikov discussed on quasi-identifiers to indicate statistical conditions for de-anonymizing data released by Netflix.Narayanan, Arvind and Shmatikov, Vitaly. Robust De-anonymization of Large Sparse Datasets. The University of Texas at Austin, 2008. https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf

Motwani and Ying warn about potential privacy breaches being enabled by publication of large volumes of government and business data containing quasi-identifiers.{{cite conference |title=Efficient Algorithms for Masking and Finding Quasi-Identifiers |author=Rajeev Motwani and Ying Xu |conference=Proceedings of SDM’08 International Workshop on Practical Privacy-Preserving Data Mining |url=https://www.csee.umbc.edu/~kunliu1/p3dm08/proceedings/2.pdf |year=2008}}

See also

References