big data ethics
{{Short description|Ethics of mass data analytics}}
{{essay|date=December 2019}}
{{Use American English|date = January 2019}}
{{Use mdy dates|date = January 2019}}
File:Identifiable-Images-of-Bystanders-Extracted-from-Corneal-Reflections-pone.0083325.s001.ogv
Big data ethics, also known simply as data ethics, refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data.{{Cite book|url=https://books.google.com/books?id=rNjSAwAAQBAJ&q=rob+kitchin+the+data+revolution|title=The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences|last=Kitchin|first=Rob|date=2014-08-18|publisher=SAGE|isbn=9781473908253|pages=27|language=en}} Since the dawn of the Internet the sheer quantity and quality of data has dramatically increased and is continuing to do so exponentially. Big data describes this large amount of data that is so voluminous and complex that traditional data processing application software is inadequate to deal with them. Recent innovations in medical research and healthcare, such as high-throughput genome sequencing, high-resolution imaging, electronic medical patient records and a plethora of internet-connected health devices have triggered a data deluge that will reach the exabyte range in the near future. Data ethics is of increasing relevance as the quantity of data increases because of the scale of the impact.
Big data ethics are different from information ethics because the focus of information ethics is more concerned with issues of intellectual property and concerns relating to librarians, archivists, and information professionals, while big data ethics is more concerned with collectors and disseminators of structured or unstructured data such as data brokers, governments, and large corporations. However, since artificial intelligence or machine learning systems are regularly built using big data sets, the discussions surrounding data ethics are often intertwined with those in the ethics of artificial intelligence.{{Cite journal |last1=Floridi |first1=Luciano |last2=Taddeo |first2=Mariarosaria |date=2016-12-28 |title=What is data ethics? |journal=Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences |language=en |volume=374 |issue=2083 |pages=20160360 |doi=10.1098/rsta.2016.0360 |issn=1364-503X |pmc=5124072 |pmid=28336805|bibcode=2016RSPTA.37460360F }} More recently, issues of big data ethics have also been researched in relation with other areas of technology and science ethics, including ethics in mathematics and engineering ethics, as many areas of applied mathematics and engineering use increasingly large data sets.
Principles
Data ethics is concerned with the following principles:{{Cite web |last=Cote |first=Catherine |date=2021-03-16 |title=5 Principles of Data Ethics for Business |url=https://online.hbs.edu/blog/post/data-ethics |access-date=2022-09-07 |website=Harvard Business School Online |language=en}}
- Ownership{{snd}}Individuals own their personal data.
- Transaction transparency{{snd}}If an individual's personal data is used, they should have transparent access to the algorithm design used to generate aggregate data sets.
- Consent{{snd}}If an individual or legal entity would like to use personal data, one needs informed and explicitly expressed consent of what personal data moves to whom, when, and for what purpose from the owner of the data.
- Privacy{{snd}}If data transactions occur all reasonable effort needs to be made to preserve privacy.
- Currency{{snd}}Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions.
- Openness{{snd}}Aggregate data sets should be freely available.
= Ownership =
Ownership of data involves determining rights and duties over property, such as the ability to exercise individual control over (including limit the sharing of) personal data comprising one's digital identity. The question of data ownership arises when someone records observations on an individual person. The observer and the observed both state a claim to the data. Questions also arise as to the responsibilities that the observer and the observed have in relation to each other. These questions have become increasingly relevant with the Internet magnifying the scale and systematization of observing people and their thoughts. The question of personal data ownership relates to questions of corporate ownership and intellectual property.{{cite web | first= Ignacio | last= Cofone | title=Beyond Data Ownership |url=https://cardozolawreview.com/beyond-data-ownership/ | date=2021 | publisher=Cardozo Law Review |volume=43 |issue=2 |pages=501}}
In the European Union, some people argue that the General Data Protection Regulation indicates that individuals own their personal data, although this is contested.{{Cite journal|last1=van Ooijen|first1=I.|last2=Vrabec|first2=Helena U.|date=2018-12-11|title=Does the GDPR Enhance Consumers' Control over Personal Data? An Analysis from a Behavioural Perspective|journal=Journal of Consumer Policy|volume=42|issue=1|pages=91–107|doi=10.1007/s10603-018-9399-7|s2cid=158945891|issn=0168-7034|doi-access=free|hdl=2066/216801|hdl-access=free}}
= Transaction transparency =
Concerns have been raised around how biases can be integrated into algorithm design resulting in systematic oppression{{Cite book|title=Weapons of Math Destruction|last=O'Neil|first=Cathy|publisher=Crown Books|year=2016|isbn=978-0553418811|language=en}}whether consciously or unconsciously. These manipulations often stem from biases in the data, the design of the algorithm, or the underlying goals of the organization deploying them. One major cause of algorithmic bias is that algorithms learn from historical data, which may perpetuate existing inequities. In many cases, algorithms exhibit reduced accuracy when applied to individuals from marginalized or underrepresented communities. A notable example of this is pulse oximetry, which has shown reduced reliability for certain demographic groups due to a lack of sufficient testing or information on these populations.{{cite journal |last1=Buolamwini |first1=Joy |last2=Gebiru |first2=Timnit |title=Gender shades: Intersectional accuracy disparities in commercial gender classification. |journal=Proceedings of the Conference on Fairness, Accountability, and Transparency |date=2018 |volume=81 |pages=1–15 |url=https://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf |access-date=11 December 2024}} Additionally, many algorithms are designed to maximize specific metrics, such as engagement or profit, without adequately considering ethical implications. For instance, companies like Facebook and Twitter have been criticized for providing anonymity to harassers and for allowing racist content disguised as humor to proliferate, as such content often increases engagement.{{cite journal |last1=Farkas |first1=Johan |last2=Matamoros-Fernandez |first2=Ariadna |title=Racism, Hate Speech, and Social Media: A Systematic Review and Critique |journal=Television & New Media |date=22 January 2021 |volume=22 |issue=2 |pages=205–224 |doi=10.1177/1527476420982230 |url=https://doi.org/10.1177/1527476420982230 |access-date=11 December 2024}} These challenges are compounded by the fact that many algorithms operate as "black boxes" for proprietary reasons, meaning that the reasoning behind their outputs is not fully understood by users. This opacity makes it more difficult to identify and address algorithmic bias.
In terms of governance, big data ethics is concerned with which types of inferences and predictions should be made using big data technologies such as algorithms.
Anticipatory governance is the practice of using predictive analytics to assess possible future behaviors.{{Cite book|title=The Data Revolution: Big Data, Open Data Infrastructure and Their Consequences|last=Kitchin|first=Rob|publisher=SAGE Publications|year=2014|pages=178–179}} This has ethical implications because it affords the ability to target particular groups and places which can encourage prejudice and discrimination For example, predictive policing highlights certain groups or neighborhoods which should be watched more closely than others which leads to more sanctions in these areas, and closer surveillance for those who fit the same profiles as those who are sanctioned.{{Cite journal|last=Zwitter|first=A.|date=2014|title=Big Data Ethics|journal=Big Data & Society|volume=1|issue=2|pages=4|doi=10.1177/2053951714559253|doi-access=free}}
The term "control creep" refers to data that has been generated with a particular purpose in mind but which is repurposed. This practice is seen with airline industry data which has been repurposed for profiling and managing security risks at airports.
= Privacy =
Privacy has been presented as a limitation to data usage which could also be considered unethical.{{cite journal |last1=Kostkova |first1=Patty |last2=Brewer |first2=Helen |last3=de Lusignan |first3=Simon |last4=Fottrell |first4=Edward |last5=Goldacre |first5=Ben |last6=Hart |first6=Graham |last7=Koczan |first7=Phil |last8=Knight |first8=Peter |last9=Marsolier |first9=Corinne |last10=McKendry |first10=Rachel A. |last11=Ross |first11=Emma |last12=Sasse |first12=Angela |last13=Sullivan |first13=Ralph |last14=Chaytor |first14=Sarah |last15=Stevenson |first15=Olivia |last16=Velho |first16=Raquel |last17=Tooke |first17=John |title=Who Owns the Data? Open Data for Healthcare |journal=Frontiers in Public Health |date=17 February 2016 |volume=4 |page=7 |doi=10.3389/fpubh.2016.00007 |pmid=26925395 |pmc=4756607 |doi-access=free }} For example, the sharing of healthcare data can shed light on the causes of diseases, the effects of treatments, an can allow for tailored analyses based on individuals' needs. This is of ethical significance in the big data ethics field because while many value privacy, the affordances of data sharing are also quite valuable, although they may contradict one's conception of privacy. Attitudes against data sharing may be based in a perceived loss of control over data and a fear of the exploitation of personal data. However, it is possible to extract the value of data without compromising privacy.
Government surveillance of big data has the potential to undermine individual privacy by collecting and storing data on phone calls, internet activity, and geolocation, among other things. For example, the NSA’s collection of metadata exposed in global surveillance disclosures raised concerns about whether privacy was adequately protected, even when the content of communications was not analyzed. The right to privacy is often complicated by legal frameworks that grant governments broad authority over data collection for “national security” purposes. In the United States, the Supreme Court has not recognized a general right to "informational privacy," or control over personal information, though legislators have addressed the issue selectively through specific statutes.{{cite web |last1=Gellman |first1=Barton |last2=Adler-Bell |first2=Sam |title=The Disparate Impact of Surveillance |url=https://tcf.org/content/report/disparate-impact-surveillance/ |website=The Century Foundation |date=December 21, 2017 |access-date=11 December 2024}} From an equity perspective, government surveillance and privacy violations tend to disproportionately harm marginalized communities. Historically, activists involved in the Civil rights movement were frequently targets of government surveillance as they were perceived as subversive elements. Programs such as COINTELPRO exemplified this pattern, involving espionage against civil rights leaders. This pattern persists today, with evidence of ongoing surveillance of activists and organizations.{{cite journal |last1=Von Solms |first1=Sune |last2=Van Heerden |first2=Renier |title=The consequences of Edward Snowden NSA related information disclosures. |journal=Proceedings of the 10th International Conference on Cyber Warfare and Security, ICCWS 2015 |date=2015 |pages=358–368 |url=https://www.scopus.com/record/display.uri?eid=2-s2.0-84969268019&origin=inward&txGid=74d0af4d0fda2ca16bea784bce465632 |access-date=11 December 2024}}
Additionally, the use of algorithms by governments to act on data obtained without consent introduces significant concerns about algorithmic bias. Predictive policing tools, for example, utilize historical crime data to predict “risky” areas or individuals, but these tools have been shown to disproportionately target minority communities.{{cite web |last1=Larson |first1=Jeff |last2=Mattu |first2=Surya |last3=Kirchner |first3=Lauren |last4=Angwin |first4=Julia |title=How We Analyzed the COMPAS Recidivism Algorithm |url=https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm |website=ProPublica |access-date=11 December 2024}} One such tool, the COMPAS system, is a notable example; Black defendants are twice as likely to be misclassified as high risk compared to white defendants, and Hispanic defendants are similarly more likely to be classified as high risk than their white counterparts.{{cite journal |last1=Hamilton |first1=Melissa |title=The biased algorithm: Evidence of disparate impact on Hispanics. |journal=American Criminal Law Review |date=2019 |volume=56 |issue=4 |url=https://www.law.georgetown.edu/american-criminal-law-review/wp-content/uploads/sites/15/2019/06/56-4-the-biased-algorithm-evidence-of-disparate-impact-on-hispanics.pdf}} Marginalized communities often lack the resources or education needed to challenge these privacy violations or protect their data from nonconsensual use. Furthermore, there is a psychological toll, known as the “chilling effect,” where the constant awareness of being surveilled disproportionately impacts communities already facing societal discrimination. This effect can deter individuals from engaging in legal but potentially "risky" activities, such as protesting or seeking legal assistance, further limiting their freedoms and exacerbating existing inequities.
Some scholars such as Jonathan H. King and Neil M. Richards are redefining the traditional meaning of privacy, and others to question whether or not privacy still exists. In a 2014 article for the Wake Forest Law Review, King and Richard argue that privacy in the digital age can be understood not in terms of secrecy but in term of regulations which govern and control the use of personal information. In the European Union, the right to be forgotten entitles EU countries to force the removal or de-linking of personal data from databases at an individual's request if the information is deemed irrelevant or out of date.{{Cite journal|last=Walker|first=R. K.|date=2012|title=The Right to be Forgotten|journal=Hastings Law Journal|volume=64|pages=257–261}} According to Andrew Hoskins, this law demonstrates the moral panic of EU members over the perceived loss of privacy and the ability to govern personal data in the digital age.{{Cite web|url=http://www.memorystudies-frankfurt.com/events/digital-memory-studies/|title=Digital Memory Studies {{!}}|last=Hoskins|first=Andrew|date=November 4, 2014|website=memorystudies-frankfurt.com|access-date=2017-11-28}} In the United States, citizens have the right to delete voluntarily submitted data. This is very different from the right to be forgotten because much of the data produced using big data technologies and platforms are not voluntarily submitted. While traditional notions of privacy are under scrutiny, different legal frameworks related to privacy in the EU and US demonstrate how countries are grappling with these concerns in the context of big data. For example, the "right to be forgotten" in the EU and the right to delete voluntarily submitted data in the US illustrate the varying approaches to privacy regulation in the digital age.{{Cite journal |date=January 2022 |title=ERRATUM |url=https://onlinelibrary.wiley.com/doi/10.1002/eahr.500113 |journal=Ethics & Human Research |language=en |volume=44 |issue=1 |pages=17 |doi=10.1002/eahr.500113 |pmid=34910377 |issn=2578-2355}}
== How much data is worth ==
The difference in value between the services facilitated by tech companies and the equity value of these tech companies is the difference in the exchange rate offered to the citizen and the "market rate" of the value of their data. Scientifically there are many holes in this rudimentary calculation: the financial figures of tax-evading companies are unreliable, either revenue or profit could be more appropriate, how a user is defined, a large number of individuals are needed for the data to be valuable, possible tiered prices for different people in different countries, etc. Although these calculations are crude, they serve to make the monetary value of data more tangible. Another approach is to find the data trading rates in the black market. RSA publishes a yearly cybersecurity shopping list that takes this approach.{{Cite news|url=https://www.rsa.com/content/dam/en/infographic/rsa-2018-cybercriminal-shopping-list.pdf|title=2018 Cybersecurity Shopping List|last=RSA|date=2018|language=en}}
This raises the economic question of whether free tech services in exchange for personal data is a worthwhile implicit exchange for the consumer. In the personal data trading model, rather than companies selling data, an owner can sell their personal data and keep the profit.{{Cite news|url=https://globalchallenges.org/en/new-shape-library/5996bcd2ee1fd321e1075a6d/intro|title=Personal Data trading Application to the New Shape Prize of the Global Challenges Foundation|last=László|first=Mitzi|date=2017-11-01|publisher=Global Challenges Foundation|location=online|pages=27|language=en|access-date=June 20, 2018|archive-date=June 20, 2018|archive-url=https://web.archive.org/web/20180620231928/https://globalchallenges.org/en/new-shape-library/5996bcd2ee1fd321e1075a6d/intro|url-status=dead}}
= Openness =
The idea of open data is centered around the argument that data should be freely available and should not have restrictions that would prohibit its use, such as copyright laws. {{As of| 2014}} many governments had begun to move towards publishing open datasets for the purpose of transparency and accountability.{{cite journal |last1=Kalin |first1=Ian |title=Open Data Policy Improves Democracy |journal=SAIS Review of International Affairs |date=2014 |volume=34 |issue=1 |pages=59–70 |doi=10.1353/sais.2014.0006 |s2cid=154068669 }} This movement has gained traction via "open data activists" who have called for governments to make datasets available to allow citizens to themselves extract meaning from the data and perform checks and balances themselves.{{Cite journal|last=Richards and King|first=N. M. and J. H.|date=2014|title=Big data ethics|journal=Wake Forest Law Review|volume=49|pages=393–432|ssrn=2384174}} King and Richards have argued that this call for transparency includes a tension between openness and secrecy.
Activists and scholars have also argued that because this open-sourced model of data evaluation is based on voluntary participation, the availability of open datasets has a democratizing effect on a society, allowing any citizen to participate.{{cite journal |last1=Baack |first1=Stefan |title=Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism |journal=Big Data & Society |date=27 December 2015 |volume=2 |issue=2 |pages=205395171559463 |doi=10.1177/2053951715594634 |s2cid=55542891 |doi-access=free }} To some, the availability of certain types of data is seen as a right and an essential part of a citizen's agency.
Open Knowledge Foundation (OKF) lists several dataset types it argues should be provided by governments for them to be truly open.{{Cite web|url=https://index.okfn.org/methodology/|title=Methodology - Global Open Data Index|last=Knowledge|first=Open|website=index.okfn.org|access-date=2017-11-23|archive-date=March 8, 2021|archive-url=https://web.archive.org/web/20210308173212/https://index.okfn.org/methodology/|url-status=dead}} OKF has a tool called the Global Open Data Index (GODI), a crowd-sourced survey for measuring the openness of governments, based on its Open Definition. GODI aims to be a tool for providing feedback to governments about the quality of their open datasets.{{Cite web|url=https://index.okfn.org/about/|title=About - Global Open Data Index|last=Knowledge|first=Open|website=index.okfn.org|access-date=2017-11-23|archive-date=April 21, 2021|archive-url=https://web.archive.org/web/20210421035337/https://index.okfn.org/about/|url-status=dead}}
Willingness to share data varies from person to person. Preliminary studies have been conducted into the determinants of the willingness to share data. For example, some have suggested that baby boomers are less willing to share data than millennials.{{Cite web|url=https://www.emerce.nl/research/babyboomers-willen-delen|title=Babyboomers willen gegevens niet delen|last=Emerce|website=emerce.nl|access-date=2016-05-12}}
Historical cases
=Snowden disclosures=
The fallout from Edward Snowden’s disclosures in 2013 significantly reshaped public discourse around data collection and the privacy principle of big data ethics. The case revealed that governments controlled and possessed far more information about civilians than previously understood, violating the principle of ownership, particularly in ways that disproportionately affected disadvantaged communities. For instance, activists were frequently targeted, including members of movements such as Occupy Wall Street and Black Lives Matter. This revelation prompted governments and organizations to revisit data collection and storage practices to better protect individual privacy while also addressing national security concerns. The case also exposed widespread online surveillance of other countries and their citizens, raising important questions about data sovereignty and ownership. In response, some countries, such as Brazil and Germany, took action to push back against these practices. However, many developing nations lacked the technological independence necessary or were too generally dependent on the nations surveilling them to resist such surveillance, leaving them at a disadvantage in addressing these concerns.
=Cambridge Analytica scandal=
The Cambridge Analytica scandal highlighted significant ethical concerns in the use of big data. Data was harvested from approximately 87 million Facebook users without their explicit consent and used to display targeted political advertisements. This violated the currency principle of big data ethics, as individuals were initially unaware of how their data was being exploited. The scandal revealed how data collected for one purpose could be repurposed for entirely different uses, bypassing users' consent and emphasizing the need for explicit and informed consent in data usage.{{cite journal |last1=Isaak |first1=Jim |last2=Hanna |first2=Mina J. |title=User Data Privacy: Facebook, Cambridge Analytica, and Privacy Protection |journal=Computer |date=14 August 2018 |volume=51 |issue=8 |pages=56–59|doi=10.1109/MC.2018.3191268 }} Additionally, the algorithms used for ad delivery were opaque, challenging the principles of transaction transparency and openness. In some cases, the political ads spread misinformation, often disproportionately targeting disadvantaged groups and contributing to knowledge gaps. Marginalized communities and individuals with lower digital literacy were disproportionately affected as they were less likely to recognize or act against exploitation. In contrast, users with more resources or digital literacy could better safeguard their data, exacerbating existing power imbalances.
Footnotes
{{reflist}}
References
{{refbegin|}}
- {{cite journal |last1=Baack |first1=Stefan |title=Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism |journal=Big Data & Society |date=27 December 2015 |volume=2 |issue=2 |pages=205395171559463 |doi=10.1177/2053951715594634 |s2cid=55542891 |doi-access=free }}
- {{cite book |last1=Davis |first1=Kord |last2=Patterson |first2=Doug |year=2012 |title=Ethics of Big Data |publisher=O'Reilly Media Inc. |isbn=9781449311797 }}
- {{cite journal |last1=de Jong-Chen |first1=Jing |year=2015 |title=Data Sovereignty, Cybersecurity, and Challenges for Globalization |journal=Georgetown Journal of International Affairs |pages=112–122 |id={{ProQuest|1832800533}} }}
- Hoskins, A. (November 4, 2014). "Digital Memory Studies". www.memorystudies-frankfurt.com. Retrieved 2017-11-28.
- {{cite journal |last1=Kalin |first1=Ian |title=Open Data Policy Improves Democracy |journal=SAIS Review of International Affairs |date=2014 |volume=34 |issue=1 |pages=59–70 |id={{ProQuest|1552151732}} |doi=10.1353/sais.2014.0006 |s2cid=154068669 }}
- Kitchin, R. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences, (pp. 165–183). SAGE Publications. Kindle Edition.
- {{cite journal |last1=Kostkova |first1=Patty |last2=Brewer |first2=Helen |last3=de Lusignan |first3=Simon |last4=Fottrell |first4=Edward |last5=Goldacre |first5=Ben |last6=Hart |first6=Graham |last7=Koczan |first7=Phil |last8=Knight |first8=Peter |last9=Marsolier |first9=Corinne |last10=McKendry |first10=Rachel A. |last11=Ross |first11=Emma |last12=Sasse |first12=Angela |last13=Sullivan |first13=Ralph |last14=Chaytor |first14=Sarah |last15=Stevenson |first15=Olivia |last16=Velho |first16=Raquel |last17=Tooke |first17=John |title=Who Owns the Data? Open Data for Healthcare |journal=Frontiers in Public Health |date=17 February 2016 |volume=4 |page=7 |doi=10.3389/fpubh.2016.00007 |pmid=26925395 |pmc=4756607 |doi-access=free }}
- {{cite book |last1=Mayer-Schönberger |first1=Viktor |last2=Cukier |first2=Kenneth |year=2013 |title=Big Data: A Revolution that Will Transform how We Live, Work, and Think |publisher=Houghton Mifflin Harcourt |isbn=9780544002692 }}
- {{cite journal |last1=Richards |first1=Neil M. |last2=King |first2=Jonathan |title=Big Data Ethics |journal=Wake Forest Law Review |date=19 May 2014 |ssrn=2384174 }}
- {{cite journal |last1=Walker |first1=Robert |title=Note – The Right to Be Forgotten |journal=Hastings Law Journal |date=1 December 2012 |volume=64 |issue=2 |pages=257 |url=https://repository.uchastings.edu/hastings_law_journal/vol64/iss2/6/ }}
- {{cite journal |last1=Zwitter |first1=Andrej |title=Big Data ethics |journal=Big Data & Society |date=10 July 2014 |volume=1 |issue=2 |pages=205395171455925 |doi=10.1177/2053951714559253 |s2cid=54923673 |doi-access=free }}
- {{cite news |title=Data workers of the world, unite |url=https://www.economist.com/the-world-if/2018/07/07/data-workers-of-the-world-unite |newspaper=The Economist |date=7 July 2018 }}
- {{cite journal |last1=Kruse |first1=Clemens Scott |last2=Goswamy |first2=Rishi |last3=Raval |first3=Yesha |last4=Marawi |first4=Sarah |title=Challenges and Opportunities of Big Data in Health Care: A Systematic Review |journal=JMIR Medical Informatics |date=21 November 2016 |volume=4 |issue=4 |pages=e38 |doi=10.2196/medinform.5359 |pmid=27872036 |pmc=5138448 |doi-access=free }}
{{refend}}