Open data

{{distinguish|Open file format}}

{{Short description|Openly accessible data}}

{{Use dmy dates|date=December 2015}}

File:Lod.png

File:LOD Cloud 2014-08.svg

File:Open Data stickers.jpg

Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license.{{Cite web|title=What is open?|url=https://okfn.org/opendata/|access-date=2022-03-22|website=okfn.org|language=en-gb}}{{Cite web|title=Open Definition 2.1 - Open Definition - Defining Open in Open Data, Open Content and Open Knowledge|url=https://opendefinition.org/od/2.1/en/|access-date=2022-03-22|website=opendefinition.org}}{{cite book |last1=Auer |first1=S. R. |title=The Semantic Web |last2=Bizer |first2=C. |last3=Kobilarov |first3=G. |last4=Lehmann |first4=J. |last5=Cyganiak |first5=R. |last6=Ives |first6=Z. |year=2007 |isbn=978-3-540-76297-3 |series=Lecture Notes in Computer Science |volume=4825 |pages=722–735 |chapter=DBpedia: A Nucleus for a Web of Open Data |doi=10.1007/978-3-540-76298-0_52|s2cid=7278297 }}

The goals of the open data movement are similar to those of other "open(-source)" movements such as open-source software, open-source hardware, open content, open specifications, open education, open educational resources, open government, open knowledge, open access, open science, and the open web. The growth of the open data movement is paralleled by a rise in intellectual property rights.{{cite book|title=The Data Revolution|last=Kitchin|first=Rob|publisher=Sage|year=2014|isbn=978-1-4462-8748-4|location=London|pages=49}} The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives Data.gov, Data.gov.uk and Data.gov.in.

Open data can be linked data—referred to as linked open data.

One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions. The importance of open government data is born from it being a part of citizens' everyday lives, down to the most routine and mundane tasks that are seemingly far removed from government.{{Citation needed|date=February 2025}}

The abbreviation {{nowrap|FAIR/O data}} is sometimes used to indicate that the dataset or database in question complies with the principles of FAIR data and carries an explicit data‑capable open license.

Overview

The concept of open data is not new, but a formalized definition is relatively new. Open data as a phenomenon denotes that governmental data should be available to anyone with a possibility of redistribution in any form without any copyright restriction.{{Cite journal|last=Kassen|first=Maxat|date=2013-10-01|title=A promising phenomenon of open data: A case study of the Chicago open data project|journal=Government Information Quarterly|volume=30|issue=4|pages=508–513|doi=10.1016/j.giq.2013.05.012|issn=0740-624X}} One more definition is the Open Definition which can be summarized as "a piece of data is open if anyone is free to use, reuse, and redistribute it—subject only, at most, to the requirement to attribute and/or share-alike."See [http://opendefinition.org/ Open Definition home page] and the [http://opendefinition.org/okd/ full Open Definition] Other definitions, including the Open Data Institute's "open data is data that anyone can access, use or share," have an accessible short version of the definition but refer to the formal definition.{{Cite web|title=What is 'open data' and why should we care? – The ODI|date=3 November 2017 |url=https://theodi.org/article/what-is-open-data-and-why-should-we-care/|access-date=2021-09-01|language=en-GB}} Open data may include non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data, and practice, bioscience and biodiversity data.

A major barrier to the open data movement is the commercial value of data. Access to, or re-use of, data is often controlled by public or private organizations. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions detract from the common good and that data should be available without restrictions or fees.{{Citation needed|date=February 2025}} There are many other, smaller barriers as well. [https://doi.org/10.1098/rspb.2022.1113] Gomes, D. G., Pottier, P., Crystal-Ornelas, R., Hudgins, E. J., Foroughirad, V., Sánchez-Reyes, L. L., ... & Gaynor, K. M. (2022). Why don't we share data and code? Perceived barriers and benefits to public archiving practices. Proceedings of the Royal Society B, 289(1987), 20221113.

Creators of data do not consider the need to state the conditions of ownership, licensing and re-use; instead presuming that not asserting copyright enters the data into the public domain. For example, many scientists do not consider the data published with their work to be theirs to control and consider the act of publication in a journal to be an implicit release of data into the commons. The lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an "Open" spirit. Because of this uncertainty it is possible for public or private organizations to aggregate said data, claim that it is protected by copyright, and then resell it.{{Citation needed|date=February 2025}}

Major sources

File:The State of Open Data Histories and Horizons.pdf]]

Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.

= In science =

{{main|Open scientific data}}

The concept of open access to scientific data was established with the formation of the World Data Center system, in preparation for the International Geophysical Year of 1957–1958.{{cite book|author=Committee on Scientific Accomplishments of Earth Observations from Space, National Research Council|title=Earth Observations from Space: The First 50 Years of Scientific Achievements|year=2008|publisher=The National Academies Press|isbn=978-0-309-11095-2|page=6|url=http://books.nap.edu/openbook.php?record_id=11991&page=6 |access-date=2010-11-24|doi=10.17226/11991}} The International Council of Scientific Unions (now the International Council for Science) oversees several World Data Centres with the mission to minimize the risk of data loss and to maximize data accessibility.{{cite web|url=https://www.icsu-wds.org/services/data-sharing-principles|title=Data Sharing Principles|author=World Data System|date=September 27, 2017|website=www.icsu-wds.org|publisher=ICSU-WDS (International Council for Science - World Data Service)|access-date=2017-09-27}}

While the open-science-data movement long predates the Internet, the availability of fast, readily available networking has significantly changed the context of open science data, as publishing or obtaining data has become much less expensive and time-consuming.{{cite journal|last1=Vuong|first1=Quan-Hoang|date=December 12, 2017|title=Open data, open review and open dialogue in making social sciences plausible|url=http://blogs.nature.com/scientificdata/2017/12/12/authors-corner-open-data-open-review-and-open-dialogue-in-making-social-sciences-plausible/|journal=Nature: Scientific Data Updates|arxiv=1712.04801|bibcode=2017arXiv171204801V|access-date=June 30, 2018}}

The Human Genome Project was a major initiative that exemplified the power of open data. It was built upon the so-called Bermuda Principles, stipulating that: "All human genomic sequence information … should be freely available and in the public domain in order to encourage research and development and to maximize its benefit to society".Human Genome Project, 1996. Summary of Principles Agreed Upon at the First International

Strategy Meeting on Human Genome Sequencing (Bermuda, 25–28 February 1996) More recent initiatives such as the Structural Genomics Consortium have illustrated that the open data approach can be used productively within the context of industrial R&D.{{cite journal|title= Open Data Partnerships between Firms and Universities: The Role of Boundary Organizations|journal=Research Policy|volume=44|issue=5|pages=1133–1143|doi=10.1016/j.respol.2014.12.006|year=2015|last1=Perkmann|first1=Markus|last2=Schildt|first2=Henri|doi-access=free|hdl=10044/1/19450|hdl-access=free}}

In 2004, the Science Ministers of all nations of the Organisation for Economic Co-operation and Development (OECD), which includes most developed countries of the world, signed a declaration which states that all publicly funded archive data should be made publicly available.[http://www.oecd.org/document/0,2340,en_2649_34487_25998799_1_1_1_1,00.html OECD Declaration on Open Access to publicly funded data] {{webarchive|url=https://web.archive.org/web/20100420102950/http://www.oecd.org/document/0%2C2340%2Cen_2649_34487_25998799_1_1_1_1%2C00.html |date=20 April 2010 }} Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.{{cite journal| title=OECD Principles and Guidelines for Access to Research Data from Public Funding| author1=Pilat, D.| author2=Fukasaku| url=https://www.researchgate.net/publication/220390209_OECD_Principles_and_Guidelines_for_Access_to_Research_Data_from_Public_Funding| journal=Data Science Journal| volume=6| pages=4–11| date=29 June 2007| access-date=31 January 2024| doi=10.2481/dsj.6.OD4| doi-access=free}}

Examples of open data in science:

  • data.uni-muenster.de – Open data about scientific artifacts from the University of Muenster, Germany. Launched in 2011.
  • Dataverse Network Project – archival repository software promoting data sharing, persistent data citation, and reproducible research.{{Cite web |url=http://thedata.org/ |title=Dataverse Network Project |access-date=10 October 2014 |archive-date=9 October 2014 |archive-url=https://web.archive.org/web/20141009185617/http://thedata.org/ |url-status=dead }}
  • linkedscience.org/data – Open scientific datasets encoded as Linked Data. Launched in 2011, ended 2018.{{Cite web|date=2012-10-17|title=Data|url=http://linkedscience.org/data/|url-status=dead|archive-url=https://web.archive.org/web/20121017085238/http://linkedscience.org/data/|archive-date=17 October 2012|access-date=2021-09-01|website=Linked Science}}{{Cite conference|last1=Kauppinen|first1=Tomi|last2=de Espindola|first2=Giovanna Mira|date=2011|title=Linked Open Science—Communicating, Sharing and Evaluating Data, Methods and Results for Executable Papers|url=http://linkedscience.org/linked-open-science-camera-ready-2011-03-28.pdf|conference=International Conference on Computational Science, ICCS 2011|publisher=Procedia Computer Science|volume=4}}
  • systemanaturae.org – Open scientific datasets related to wildlife classified by animal species. Launched in 2015.{{Cite web|title=Home|url=https://www.systemanaturae.org/|access-date=2021-09-01|website=Wildlife DataSets, Animal Population DataSets and Conservation Research Projects, Surveys - Systema Naturae|language=en-CA}}

= In government =

{{see also|Open government|PSI Directive}}There are a range of different arguments for government open data.{{cite conference|last=Gray|first=Jonathan|date=3 September 2014|title=Towards a Genealogy of Open Data|url=https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2605828#|conference=General Conference of the European Consortium for Political Research in Glasgow|doi=10.2139/ssrn.2605828|ssrn=2605828|via=SSRN|url-access=subscription}}{{cite journal|last=Brito|first=Jerry|date=21 October 2007|title=Hack, Mash, & Peer: Crowdsourcing Government Transparency|journal=Columbia Science & Technology Law Review |volume=9 |page=119 |doi=10.2139/SSRN.1023485|ssrn=1023485|s2cid=109457712}} Some advocates say that making government information available to the public as machine readable open data can facilitate government transparency, accountability and public participation. "Open data can be a powerful force for public accountability—it can make existing information easier to analyze, process, and combine than ever before, allowing a new level of public scrutiny."{{Cite journal|last1=Yu|first1=Harlan|last2=Robinson|first2=David G.|date=2012-02-28|title=The New Ambiguity of 'Open Government'|journal=UCLA Law Review Discourse|volume=59|doi=10.2139/ssrn.2012489|ssrn=2012489|via=Social Science Research Network}} Governments that enable public viewing of data can help citizens engage within the governmental sectors and "add value to that data."{{Cite journal|last1=Robinson|first1=David G.|last2=Yu|first2=Harlan|last3=Zeller|first3=William P.|last4=Felten|first4=Edward W.|date=2009-01-01|title=Government Data and the Invisible Hand|journal=Yale Journal of Law & Technology|location=Rochester, NY|volume=11|ssrn=1138083|via=Social Science Research Network}} Open data experts have nuanced the impact that opening government data may have on government transparency and accountability. In a widely cited paper, scholars David Robinson and Harlan Yu contend that governments may project a veneer of transparency by publishing machine-readable data that does not actually make government more transparent or accountable.{{Cite web |last=uclalaw |date=2012-08-08 |title=The New Ambiguity of "Open Government" |url=http://www.uclalawreview.org/the-new-ambiguity-of-open-government/ |access-date=2022-03-12 |website=UCLA Law Review |language=en-US}} Drawing from earlier studies on transparency and anticorruption,{{Cite journal |last1=Lindstedt |first1=Catharina |last2=Naurin |first2=Daniel |date=June 2010 |title=Transparency is not Enough: Making Transparency Effective in Reducing Corruption |url=https://journals.sagepub.com/doi/abs/10.1177/0192512110377602 |journal=International Political Science Review |volume=31 |issue=3 |pages=301–322 |doi=10.1177/0192512110377602 |s2cid=154948461 |issn=0192-5121|url-access=subscription }} World Bank political scientist Tiago C. Peixoto extended Yu and Robinson's argument by highlighting a minimal chain of events necessary for open data to lead to accountability:

  1. relevant data is disclosed;
  2. the data is widely disseminated and understood by the public;
  3. the public reacts to the content of the data; and
  4. public officials either respond to the public's reaction or are sanctioned by the public through institutional means.{{Cite web |last=uclalaw |date=2013-05-02 |title=The Uncertain Relationship Between Open Data and Accountability: A Response to Yu and Robinson's The New Ambiguity of "Open Government" |url=http://www.uclalawreview.org/the-uncertain-relationship-between-open-data-and-accountability-a-response-to-yu-and-robinsons-the-new-ambiguity-of-open-government/ |access-date=2022-03-12 |website=UCLA Law Review |language=en-US}}

Some make the case that opening up official information can support technological innovation and economic growth by enabling third parties to develop new kinds of digital applications and services.{{Cite web |title=The Economic Impact of Open Data: Opportunities for value creation in Europe |url=https://data.europa.eu/en/datastories/economic-impact-open-data-opportunities-value-creation-europe |access-date=2022-03-12 |website=data.europa.eu}}

Several national governments have created websites to distribute a portion of the data they collect. It is a concept for a collaborative project in the municipal Government to create and organize culture for Open Data or Open government data.

Additionally, other levels of government have established open data websites. There are many government entities pursuing Open Data in Canada. Data.gov lists the sites of a total of 40 US states and 46 US cities and counties with websites to provide open data, e.g., the state of Maryland, the state of California, US{{Cite web|title=California Open Data Portal|url=https://data.ca.gov/|access-date=2019-05-07|website=data.ca.gov}} and New York City.{{Cite web|url=https://opendata.cityofnewyork.us/|title=NYC Open Data|publisher=City of New York|language=en|access-date=2019-05-07}}

At the international level, the United Nations has an open data website that publishes statistical data from member states and UN agencies,{{Cite web|url=http://data.un.org/|title=UNdata|website=data.un.org|access-date=2019-05-07}} and the World Bank published a range of statistical data relating to developing countries.{{Cite web|url=https://data.worldbank.org/|title=World Bank Open Data {{!}} Data|website=data.worldbank.org|access-date=2019-05-07}} The European Commission has created two portals for the European Union: the EU Open Data Portal which gives access to open data from the EU institutions, agencies and other bodies{{Cite web|url=http://data.europa.eu/|title=Data.europa.eu|access-date=2019-05-07}} and the European Data Portal that provides datasets from local, regional and national public bodies across Europe.{{Cite web|url=https://data.europa.eu/euodp/en/home|title=Home {{!}} Open Data Portal|website=data.europa.eu|access-date=2019-05-07}} The two portals were consolidated to data.europa.eu on April 21, 2021.

Italy is the first country to release standard processes and guidelines under a Creative Commons license for spread usage in the Public Administration. The open model is called the Open Data Management Cycle and was adopted in several regions such as Veneto and Umbria.{{cite web|url=http://www.odmc.org/|title=Open Data Management Cycle| language=it}}{{Cite web|url=https://opendataveneto.regione.veneto.it/fare-open-data/conferimento|title=Linee guida per l'ecosistema regionale veneto dei dati aperti (Open Data)| language=it}}{{cite web|url=http://www.regione.umbria.it/documents/18/576921/20150327+DGR+n.371-2015+open+data+-+Allegato+A+MOODUmbria1-1.pdf/9caa5b37-1728-4f43-aa62-a5ee0d971179|title=Modello Operativo Open Data (MOOD) Umbria| language=it}} Main cities like Reggio Calabria and Genova have also adopted this model.{{citation needed| reason=dead link| date=January 2024}}{{Cite web|url=https://dati.cittametropolitana.genova.it/sites/default/files/documentazione/PTPCT%202020-2022_Allegato%20A1_Linee_guida_opendata-programmatiche-V2.pdf|title=Linee guida programmatiche della Città Metropolitana di Genova| language=it}}

In October 2015, the Open Government Partnership launched the International Open Data Charter, a set of principles and best practices for the release of governmental open data formally adopted by seventeen governments of countries, states and cities during the OGP Global Summit in Mexico.{{cite web|title = The Open Data Charter: A Roadmap for Using a Global Resource|url = http://www.huffingtonpost.com/joel-gurin/the-open-data-charter-a-r_b_8391470.html|website = The Huffington Post|access-date = 2015-10-29|date = 27 October 2015}}

In July 2024, the OECD adopted Creative Commons CC-BY-4.0 licensing for its published data and reports.

{{cite web

| author = OECD

| title = OECD data, publications and analysis become freely accessible — Press release

| date = 4 July 2024

| work = Organisation for Economic Co-operation and Development (OECD)

| location = Paris, France

| url = https://www.oecd.org/en/about/news/press-releases/2024/07/oecd-data-publications-and-analysis-become-freely-accessible.html

| access-date = 2024-07-10

}}

= In non-profit organizations =

Many non-profit organizations offer open access to their data, as long it does not undermine their users', members' or third party's privacy rights. In comparison to for-profit corporations, they do not seek to monetize their data. OpenNWT launched a website offering open data of elections.{{cite web |last1=Green |first1=Arthur C. |title=OpenNWT announces launch of new election information website |url=https://www.myyellowknifenow.com/42874/opennwt-announces-launch-of-new-election-information-website/ |website=My Yellowknife Now |date=17 September 2019 |language=en-CA}} CIAT offers open data to anybody who is willing to conduct big data analytics in order to enhance the benefit of international agricultural research.{{cite web |last1=Oyuela |first1=Andrea |last2=Walmsley |first2=Thea |last3=Walla |first3=Katherine |title=120 Organizations Creating a New Decade for Food |url=https://foodtank.com/news/2019/12/120-organizations-creating-a-new-decade-for-food/ |website=Food Tank |access-date=21 January 2020 |date=30 December 2019}} DBLP, which is owned by a non-profit organization Dagstuhl, offers its database of scientific publications from computer science as open data.{{cite web |title=dblp: How can I download the whole dblp dataset? |url=https://dblp.uni-trier.de/faq/How+can+I+download+the+whole+dblp+dataset |website=dblp.uni-trier.de |publisher=Dagstuhl |access-date=21 January 2020}}

Hospitality exchange services, including Bewelcome, Warm Showers, and CouchSurfing (before it became for-profit) have offered scientists access to their anonymized data for analysis, public research, and publication.{{cite journal |last1=Victor |first1=Patricia |last2=Cornelis |first2=Chris |last3=De Cock |first3=Martine |last4=Herrera-Viedma |first4=Enrique |title=Bilattice-based aggregation operators for gradual trust and distrust |journal=World Scientific Proceedings Series on Computer Engineering and Information Science |date=2010 |pages=505–510 |doi=10.1142/9789814324700_0075 |url=https://biblio.ugent.be/publication/1108551 |publisher=World Scientific|isbn=978-981-4324-69-4 |s2cid=5748283 }}{{cite report |last1=Dandekar |first1=Pranav |title=Analysis & Generative Model for Trust Networks |url=https://snap.stanford.edu/class/cs224w-2010/proj2009/final_report_Dandekar.pdf |work=Stanford Network Analysis Project |publisher=Stanford University}}{{cite journal |last1=Overgoor |first1=Jan |last2=Wulczyn |first2=Ellery |last3=Potts |first3=Christopher |title=Trust Propagation with Mixed-Effects Models |journal=Sixth International AAAI Conference on Weblogs and Social Media |url=https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/viewPaper/4627 |date=20 May 2012}}{{cite book |last1=Lauterbach |first1=Debra |last2=Truong |first2=Hung |last3=Shah |first3=Tanuj |last4=Adamic |first4=Lada |title=2009 International Conference on Computational Science and Engineering |chapter=Surfing a Web of Trust: Reputation and Reciprocity on CouchSurfing.com |date=August 2009 |volume=4 |pages=346–353 |doi=10.1109/CSE.2009.345 |isbn=978-1-4244-5334-4 |s2cid=12869279 }}{{cite book |last1=Tagiew |first1=Rustam |last2=Ignatov |first2=Dmitry. I |last3=Delhibabu |first3=Radhakrishnan |title=2015 IEEE International Conference on Data Mining Workshop (ICDMW) |chapter=Hospitality Exchange Services as a Source of Spatial and Social Data? |publisher=IEEE |date=2015 |pages=1125–1130 |doi=10.1109/ICDMW.2015.239 |isbn=978-1-4673-8493-3 |s2cid=8196598}}

Policies and strategies

At a small level, a business or research organization's policies and strategies towards open data will vary, sometimes greatly. One common strategy employed is the use of a data commons. A data commons is an interoperable software and hardware platform that aggregates (or collocates) data, data infrastructure, and data-producing and data-managing applications in order to better allow a community of users to manage, analyze, and share their data with others over both short- and long-term timelines.{{cite web |url=https://www.nsf.gov/geo/geo-data-policies/nsb-0540-1.pdf |title=Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century |author=National Science Foundation |publisher=National Science Foundation |page=23 |date=September 2005 |accessdate=4 January 2022}}{{Cite journal |last1=Grossman |first1=Robert L. |last2=Heath |first2=Allison |last3=Murphy |first3=Mark |last4=Patterson |first4=Maria |last5=Wells |first5=Walt |year=2016 |title=A Case for Data Commons: Toward Data Science as a Service |journal=Computing in Science & Engineering |volume=18 |issue=5 |pages=10–20 |doi=10.1109/MCSE.2016.92 |issn=1521-9615 |pmc=5636009 |pmid=29033693|arxiv=1604.02608 |bibcode=2016CSE....18e..10G }}{{cite web |author=Grossman, R.L. |date=23 April 2019 |title=How Data Commons Can Support Open Science |url=https://sagebionetworks.org/in-the-news/how-data-commons-can-support-open-science/ |url-status=dead |archive-url=https://web.archive.org/web/20190510235946/http://sagebionetworks.org/in-the-news/how-data-commons-can-support-open-science/ |archive-date=2019-05-10 |accessdate=4 January 2022 |work=Sage Bionetworks}} Ideally, this interoperable cyberinfrastructure should be robust enough "to facilitate transitions between stages in the life cycle of a collection" of data and information resources while still being driven by common data models and workspace tools enabling and supporting robust data analysis. The policies and strategies underlying a data commons will ideally involve numerous stakeholders, including the data commons service provider, data contributors, and data users.

Grossman et al suggests six major considerations for a data commons strategy that better enables open data in businesses and research organizations. Such a strategy should address the need for:

  • permanent, persistent digital IDs, which enable access controls for datasets;
  • permanent, discoverable metadata associated with each digital ID;
  • application programming interface (API)-based access, tied to an authentication and authorization service;
  • data portability;
  • data "peering," without access, egress, and ingress charges; and
  • a rationed approach to users computing data over the data commons.

Beyond individual businesses and research centers, and at a more macro level, countries like Germany

{{cite book

| author = BMI

| title = Open-Data-Strategie der Bundesregierung — BMI21030

| trans-title = Open data strategy of the German Federal Government — BMI21030

| language = German

| date = 7 July 2021

| publisher = Bundesministerium des Innern, für Bau und Heimat (BMI)

| location = Berlin, Germany

| url = https://www.bundesregierung.de/resource/blob/975232/1940386/1d269a2ad1b6346fcf60663bdea9c9f8/2021-07-07-open-data-strategie-data.pdf

| access-date = 2021-07-26

}}

have launched their own official nationwide open data strategies, detailing how data management systems and data commons should be developed, used, and maintained for the greater public good.

Arguments for and against

{{More citations needed section|date=May 2011}}

Opening government data is only a waypoint on the road to improving education, improving government, and building tools to solve other real-world problems. While many arguments have been made categorically{{Citation needed|reason=General statement without proof or an example.|date=May 2017}}, the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.

Arguments made on behalf of open data include the following:

  • "Data belongs to the human race". Typical examples are genomes, data on organisms, medical science, environmental data following the Aarhus Convention.
  • Public money was used to fund the work, and so it should be universally available.{{Cite web |url=http://www.publictechnology.net/sector/central-gov/dispatch-box-road-open-data |title=On the road to open data, by Ian Manocha |access-date=12 August 2011 |archive-url=https://web.archive.org/web/20120329192423/http://www.publictechnology.net/sector/central-gov/dispatch-box-road-open-data |archive-date=29 March 2012 |url-status=dead }}
  • It was created by or at a government institution (this is common in US National Laboratories and government agencies).
  • Facts cannot legally be copyrighted.
  • Sponsors of research do not get full value unless the resulting data are freely available.
  • Restrictions on data re-use create an anticommons.
  • Data are required for the smooth process of running communal human activities and are an important enabler of socio-economic development (health care, education, economic productivity, etc.).[https://ssrn.com/abstract=2205145 "Big Data for Development: From Information- to Knowledge Societies"], Martin Hilbert (2013), SSRN Scholarly Paper No. ID 2205145. Rochester, NY: Social Science Research Network; https://ssrn.com/abstract=2205145
  • In scientific research, the rate of discovery is accelerated by better access to data.[http://www.jstage.jst.go.jp/article/dsj/6/0/6_S116/_article How to Make the Dream Come True]{{Dead link|date=April 2020 |bot=InternetArchiveBot |fix-attempted=yes }} argues in one research area (Astronomy) that access to open data increases the rate of scientific discovery.[https://doi.org/10.1098/rspb.2024.1515 Gomes Dylan G. E. 2025 How will we prepare for an uncertain future? The value of open data and code for unborn generations facing climate change Proc. R. Soc. B.29220241515]
  • Making data open helps combat "data rot" and ensure that scientific research data are preserved over time.{{cite web|last1=Khodiyar|first1=Varsha|title=Stopping the rot: ensuring continued access to scientific data, irrespective of age|url=http://blog.f1000research.com/2014/05/19/stopping-the-rot/|website=F1000 Research|publisher=F1000|access-date=2015-03-11|ref=20150311Khodiyar|date=19 May 2014}}{{cite journal | vauthors = Magee AF, May MR, Moore BR | title = The dawn of open access to phylogenetic data | journal = PLOS ONE | volume = 9 | issue = 10 | pages = e110268 | date = 24 October 2014 | pmid = 25343725 | pmc = 4208793 | doi = 10.1371/journal.pone.0110268 | bibcode = 2014PLoSO...9k0268M | arxiv = 1405.6623 | doi-access = free }}
  • Statistical literacy benefits from open data. Instructors can use locally relevant data sets to teach statistical concepts to their students.{{cite journal|last1=Rivera|first1=Roberto|last2=Marazzi|first2=Mario|last3=Torres|first3=Pedro|title=Incorporating Open Data Into Introductory Courses in Statistics|journal=Journal of Statistics Education|url=https://www.tandfonline.com/doi/pdf/10.1080/10691898.2019.1669506?needAccess=true&|publisher=Taylor and Francis|access-date=2020-05-07|ref=2020Rob|date=19 June 2019|volume=27|issue=3|pages=198–207|doi=10.1080/10691898.2019.1669506|arxiv=1906.03762|s2cid=182952595}}{{cite book|last1=Rivera|first1=Roberto|title=Principles of Managerial Statistics and Data Science|date=5 February 2020|publisher=Wiley|isbn=978-1119486411}}
  • Allowing open data in the scientific community is essential for increasing the rate of discoveries and recognizing significant patterns.{{cite web| title=Data sharing: An open mind on open data| author=Gewin, V.| url=https://csu-sfsu.primo.exlibrisgroup.com/discovery/fulldisplay?docid=cdi_proquest_miscellaneous_1760855801&context=PC&vid=01CALS_SFR:01CALS_SFR&lang=en&search_scope=Everything_RAPIDO&adaptor=Primo%20Central&tab=Everything&query=any,contains,open%20data&offset=0| publisher=Nature| volume=529| pages=117–119| date=2016| access-date=31 January 2024| doi=10.1038/NJ7584-117A}}

It is generally held that factual data cannot be copyrighted.[http://sciencecommons.org/about/towards Towards a Science Commons] {{Webarchive|url=https://web.archive.org/web/20140714185153/http://sciencecommons.org/about/towards |date=14 July 2014 }} includes an overview of the basis of openness in science data. Publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.

While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.

Unlike open access, where groups of publishers have stated their concerns, open data is normally challenged by individual institutions.{{Citation needed|reason=Need an example.|date=May 2017}} Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.

Arguments against making all data available as open data include the following:

  • Government funding may not be used to duplicate or challenge the activities of the private sector (e.g. PubChem).
  • Governments have to be accountable for the efficient use of taxpayer's money: If public funds are used to aggregate the data and if the data will bring commercial (private) benefits to only a small number of users, the users should reimburse governments for the cost of providing the data.
  • Open data may lead to exploitation of, and rapid publication of results based on, data pertaining to developing countries by rich and well-equipped research institutes, without any further involvement and/or benefit to local communities (helicopter research); similarly, to the historical open access to tropical forests that has led to the misappropriation ("Global Pillage") of plant genetic resources from developing countries.{{cite book| title=The Third Revolution: Plant Genetic Resources in Developing Countries and China: Global Village or Global Pillage?| author=Low, A.| url=https://heinonline.org/HOL/LandingPage?handle=hein.journals/itbla6&div=15&id=&page=| publisher=International Trade & Business Law Annual Vol VI| date=2001| access-date=31 January 2024| isbn=9781843140870}}
  • The revenue earned by publishing data can be used to cover the costs of generating and/or disseminating the data, so that the dissemination can continue indefinitely.
  • The revenue earned by publishing data permits non-profit organizations to fund other activities (e.g. learned society publishing supports the society).
  • The government gives specific legitimacy for certain organizations to recover costs (NIST in US, Ordnance Survey in UK).
  • Privacy concerns may require that access to data is limited to specific users or to sub-sets of the data.{{Cite book |last1=Zuiderwijk |first1=Anneke |last2=Janssen |first2=Marijn |title=Proceedings of the 15th Annual International Conference on Digital Government Research |chapter=The negative effects of open government data - investigating the dark side of open data |date=2014-06-18 |chapter-url=https://doi.org/10.1145/2612733.2612761 |series=dg.o '14 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=147–152 |doi=10.1145/2612733.2612761 |isbn=978-1-4503-2901-9|s2cid=14440894 }}
  • Collecting, 'cleaning', managing and disseminating data are typically labour- and/or cost-intensive processes – whoever provides these services should receive fair remuneration for providing those services.
  • Sponsors do not get full value unless their data is used appropriately – sometimes this requires quality management, dissemination and branding efforts that can best be achieved by charging fees to users.
  • Often, targeted end-users cannot use the data without additional processing (analysis, apps etc.) – if anyone has access to the data, none may have an incentive to invest in the processing required to make data useful (typical examples include biological, medical, and environmental data).
  • There is no control to the secondary use (aggregation) of open data.{{Cite journal|last1=Sharif|first1=Naubahar|last2=Ritter|first2=Waltraut|last3=Davidson|first3=Robert L|last4=Edmunds|first4=Scott C|date=2018-12-31|title=An Open Science 'State of the Art' for Hong Kong: Making Open Research Data Available to Support Hong Kong Innovation Policy|url=https://doi.org/10.17477/JCEA.2018.17.2.200|journal=Journal of Contemporary Eastern Asia|volume=17|issue=2|pages=200–221|doi=10.17477/JCEA.2018.17.2.200}}

The paper entitled "Optimization of Soft Mobility Localization with Sustainable Policies and Open Data"{{Cite book |last1=Kleisarchaki |first1=Sofia |last2=Gürgen |first2=Levent |last3=Mitike Kassa |first3=Yonas |last4=Krystek |first4=Marcin |last5=González Vidal |first5=Daniel |title=2022 18th International Conference on Intelligent Environments (IE) |chapter=Optimization of Soft Mobility Localization with Sustainable Policies and Open Data |date=2022-06-12 |chapter-url=https://ieeexplore.ieee.org/document/9826779 |pages=1–8 |doi=10.1109/IE54923.2022.9826779|isbn=978-1-6654-6934-0 |s2cid=250595935 }} argues that open data is a valuable tool for improving the sustainability and equity of soft mobility in cities. The author argues that open data can be used to identify the needs of different areas of a city, develop algorithms that are fair and equitable, and justify the installation of soft mobility resources.

Relation to other open activities

The goals of the Open Data movement are similar to those of other "Open" movements.

  • Open access is concerned with making scholarly publications freely available on the internet. In some cases, these articles include open datasets as well.
  • Open specifications are documents describing file types or protocols, where the documents are openly licensed. These specifications are primarily meant to improve different software handling the same file types or protocols, but monopolists forced by law into open specifications might make it more difficult.
  • Open content is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.
  • Open knowledge. Open Knowledge International argues for openness in a range of issues including, but not limited to, those of open data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open data is included within the scope of the Open Knowledge Definition, which is alluded to in Science Commons' Protocol for Implementing Open Access Data.{{Cite web |url=http://sciencecommons.org/projects/publishing/open-access-data-protocol/ |title=Protocol for Implementing Open Access Data |access-date=17 April 2009 |archive-url=https://web.archive.org/web/20170130071332/http://sciencecommons.org/projects/publishing/open-access-data-protocol/ |archive-date=30 January 2017 |url-status=dead }}
  • Open notebook science refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.{{citation needed| reason=dead link| date=January 2024}}
  • Open-source software is concerned with the open-source licenses under which computer programs can be distributed and is not normally concerned primarily with data.
  • Open educational resources are freely accessible, openly licensed documents and media that are useful for teaching, learning, and assessing as well as for research purposes.
  • Open research/open science/open science data (linked open science) means an approach to open and interconnect scientific assets like data, methods and tools with linked data techniques to enable transparent, reproducible and interdisciplinary research.{{cite journal | last1 = Kauppinen | first1 = T. | last2 = Espindola | first2 = G. M. D. | doi = 10.1016/j.procs.2011.04.076 | title = Linked Open Science-Communicating, Sharing and Evaluating Data, Methods and Results for Executable Papers | journal = Procedia Computer Science | volume = 4 | pages = 726–731 | year = 2011 | doi-access = free }}
  • Open-GLAM (Galleries, Library, Archives, and Museums){{Cite web|title=Open GLAM|url=https://meta.wikimedia.org/wiki/Open_GLAM|website=Wikimedia Meta-Wiki}} is an initiative and network that supports exchange and collaboration between cultural institutions that support open access to their digitalized collections. The GLAM-Wiki Initiative helps cultural institutions share their openly licensed resources with the world through collaborative projects with experienced Wikipedia editors. Open Heritage Data is associated with Open GLAM, as openly licensed data in the heritage sector is now frequently used in research, publishing, and programming,{{Cite Q|Q111293389}} particularly in the Digital Humanities.

Open Data as commons

= Ideas and definitions =

Formally both the definition of Open Data and commons revolve around the concept of shared resources with a low barrier to access.

Substantially, digital commons include Open Data in that it includes resources maintained online, such as data.{{cite journal | last1 = Dulong de Rosnay | first1 = Mélanie | last2 = Stalder | first2 = Felix | title = Digital commons | journal = Internet Policy Review | date = 17 December 2020 | volume = 9 | issue = 4 | issn = 2197-6775 | doi = 10.14763/2020.4.1530 | pmid = | s2cid = 240800967 | url = | doi-access = free }} Overall, looking at operational principles of Open Data one could see the overlap between Open Data and (digital) commons in practice. Principles of Open Data are sometimes distinct depending on the type of data under scrutiny.{{cite book | title = Open Data Exposed | last1 = van Loenen | first1 = Bastiaan | last2 = Vancauwenberghe | first2 = Glenn | last3 = Crompvoets | first3 = Joep | last4 = Dalla Corte | first4 = Lorenzo | series = Information Technology and Law Series | date = 2018 | volume = 30 | pages = 1–10 | publisher = T.M.C. Asser Press | issn = 1570-2782 | eissn = 2215-1966 | doi = 10.1007/978-94-6265-261-3_1 | isbn = 978-94-6265-260-6 | url = http://resolver.tudelft.nl/uuid:125434bf-b78f-487b-99a2-4e3c2a786609}} Nonetheless, they are somewhat overlapping and their key rationale is the lack of barriers to the re-use of data(sets). Regardless of their origin, principles across types of Open Data hint at the key elements of the definition of commons. These are, for instance, accessibility, re-use, findability, non-proprietarily. Additionally, although to a lower extent, threats and opportunities associated with both Open Data and commons are similar. Synthesizing, they revolve around (risks and) benefits associated with (uncontrolled) use of common resources by a large variety of actors.

= The System =

Both commons and Open Data can be defined by the features of the resources that fit under these concepts, but they can be defined by the characteristics of the systems their advocates push for. Governance is a focus for both Open Data and commons scholars. The key elements that outline commons and Open Data peculiarities are the differences (and maybe opposition) to the dominant market logics as shaped by capitalism. Perhaps it is this feature that emerges in the recent surge of the concept of commons as related to a more social look at digital technologies in the specific forms of digital and, especially, data commons.

= Real-life case =

Application of open data for societal good has been demonstrated in academic research works.{{Cite book |last1=Kleisarchaki |first1=Sofia |last2=Gürgen |first2=Levent |last3=Mitike Kassa |first3=Yonas |last4=Krystek |first4=Marcin |last5=González Vidal |first5=Daniel |title=2022 18th International Conference on Intelligent Environments (IE) |chapter=Optimization of Soft Mobility Localization with Sustainable Policies and Open Data |date=2022-06-01 |chapter-url=https://ieeexplore.ieee.org/document/9826779/;jsessionid=M7uAleMIY2J2rAwQzV9bSPQ2dlC3WIMiPynN8waQrJLkxKSsNyXJ!1455866004 |pages=1–8 |doi=10.1109/IE54923.2022.9826779|isbn=978-1-6654-6934-0 |s2cid=250595935 }} The paper "Optimization of Soft Mobility Localization with Sustainable Policies and Open Data" uses open data in two ways. First, it uses open data to identify the needs of different areas of a city. For example, it might use data on population density, traffic congestion, and air quality to determine where soft mobility resources, such as bike racks and charging stations for electric vehicles, are most needed. Second, it uses open data to develop algorithms that are fair and equitable. For example, it might use data on the demographics of a city to ensure that soft mobility resources are distributed in a way that is accessible to everyone, regardless of age, disability, or gender. The paper also discusses the challenges of using open data for soft mobility optimization. One challenge is that open data is often incomplete or inaccurate. Another challenge is that it can be difficult to integrate open data from different sources. Despite these challenges, the paper argues that open data is a valuable tool for improving the sustainability and equity of soft mobility in cities.

An exemplification of how the relationship between Open Data and commons and how their governance can potentially disrupt the market logic otherwise dominating big data is a project conducted by Human Ecosystem Relazioni in Bologna (Italy). See: https://www.he-r.it/wp-content/uploads/2017/01/HUB-report-impaginato_v1_small.pdf.

This project aimed at extrapolating and identifying online social relations surrounding “collaboration” in Bologna. Data was collected from social networks and online platforms for citizens collaboration. Eventually data was analyzed for the content, meaning, location, timeframe, and other variables. Overall, online social relations for collaboration were analyzed based on network theory. The resulting dataset have been made available online as Open Data (aggregated and anonymized); nonetheless, individuals can reclaim all their data. This has been done with the idea of making data into a commons. This project exemplifies the relationship between Open Data and commons, and how they can disrupt the market logic driving big data use in two ways. First, it shows how such projects, following the rationale of Open Data somewhat can trigger the creation of effective data commons. The project itself was offering different types of support to social network platform users to have contents removed. Second, opening data regarding online social networks interactions has the potential to significantly reduce the monopolistic power of social network platforms on those data.

Funders' mandates

Several funding bodies that mandate Open Access also mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR):{{Cite web |url=https://mx2.arl.org/Lists/SPARC-OpenData/Message/34.html |title=Canadian Institutes of Health Research (CIHR) draft policy on access to research outputs

|access-date=2 November 2006 |archive-date=16 July 2011 |archive-url=https://web.archive.org/web/20110716120043/https://mx2.arl.org/Lists/SPARC-OpenData/Message/34.html |url-status=dead }}

  • to deposit bioinformatics, atomic and molecular coordinate data, and experimental data into the appropriate public database immediately upon publication of research results.
  • to retain original data sets for at least five years after the grant. This applies to all data, whether published or not.

Other bodies promoting the deposition of data and full text include the Wellcome Trust. An academic paper published in 2013 advocated that Horizon 2020 (the science funding mechanism of the EU) should mandate that funded projects hand in their databases as "deliverables" at the end of the project so that they can be checked for third-party usability and then shared.{{Cite journal |url=http://jhsrp.rsmjournals.com/content/early/2013/04/16/1355819613476017.full |title=Galsworthy, M.J. & McKee, M. (2013). Europe's "Horizon 2020" science funding programme: How is it shaping up? Journal of Health Services Research and Policy. doi: 10.1177/1355819613476017 |year=2013 |doi=10.1177/1355819613476017 |pmid=23595575 |access-date=24 April 2013 |archive-date=23 April 2013 |archive-url=https://web.archive.org/web/20130423074732/http://jhsrp.rsmjournals.com/content/early/2013/04/16/1355819613476017.full |url-status=dead |last1=Galsworthy |first1=M. |last2=McKee |first2=M. |journal=Journal of Health Services Research & Policy |volume=18 |issue=3 |pages=182–185 |pmc=4107840 }}

See also

References

{{reflist}}