Software Heritage

{{Short description|Public initiative for software archival}}

{{Infobox organization

| image =

| image_size =

| alt =

| caption =

| logo = Software-heritage-logo-title.svg

| logo_size =

| logo_alt = Software Heritage logo

| logo_caption =

| abbreviation =

| motto =

| formation = {{start date and age|2016|06|30}}

| founder = Roberto Di{{nbsp}}Cosmo,
Stefano Zacchiroli

| founding_location =

| type = Non{{nbhyph}}profit

| status =

| purpose =

| headquarters = Inria

| location = Rocquencourt, France

| coords =

| services =

| products =

| methods =

| fields =

| leader_title =

| leader_name =

| leader_title2 = Scientific Advisors

| leader_name2 = Gérard Berry
Jean-François Abramatic
Julia Lawall
Serge Abiteboul

| affiliations = Inria

| staff = 13

| slogan =

| mission =

| website = {{URL|https://softwareheritage.org}}

}}

Software Heritage is a non-profit organization which provides a service for archiving and referencing historical and contemporary software{{nbsp}}{{emdash}} with a focus on human readable source code. The site was unveiled in 2016 by Inria{{nnbsp}}{{cite web|title=Collect, organise, preserve and share the Software Heritage of mankind|url=https://www.softwareheritage.org/wp-content/uploads/2016/06/PressReleasePressKit-2016-06-30.en_.pdf|website=Software Heritage|accessdate=26 July 2016|date=30 June 2016}} and is supported by UNESCO.{{cite web|last1=UNESCO|title=Software Heritage|date=14 November 2019 |url=https://en.unesco.org/softwareheritage|accessdate=2 November 2020}}{{cite news|last1=Brown|first1=Paul|title=Software Heritage: Creating a safe haven for software|url=http://boingboing.net/2016/06/30/software-heritage-creating-a.html|accessdate=26 July 2016|work=Boing Boing|date=30 June 2016}}{{cite news|last1=Jost|first1=Clémence|title=Open source: lancement de Software Heritage, la plus grande bibliothèque de codes source de la planète|url=http://www.archimag.com/demat-cloud/2016/07/01/open-source-software-heritage-archive-codes-source|accessdate=27 July 2016|work=Archimag|date=1 July 2016}} The project itself is structured as a non{{nbhyph}}profit multi{{nbhyph}}stakeholder initiative.

Overview

The stated mission of Software Heritage is to collect, preserve and share all software that is publicly available in source code form, with the goal of building a common, shared infrastructure at the service of industry, research, culture and society as a whole.{{cite news|last1=Abramatic|first1=Jean-François|last2=Di Cosmo|first2=Roberto|last3=Zacchiroli|first3=Stefano|title=Building the Universal Archive of Source Code Journal Article|url=https://cacm.acm.org/magazines/2018/10/231366-building-the-universal-archive-of-source-code/fulltext|accessdate=2 November 2020|work=Communications of the ACM|date=1 October 2018}}

Software source code is collected by crawling code hosting platforms, like GitHub, GitLab.com or Bitbucket, and packages archives, like npm or PyPI, and ingested into a special data structure, a Merkle DAG, that is the core of the archive.{{cite web|title=Software Heritage Archive|url=http://archive.softwareheritage.org|accessdate=2 November 2020}} Each artifact in the archive is associated with an identifier called an SWHID.{{cite web|title=Software Heritage Persistent Identifiers|url=https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html|website=Software Heritage|accessdate=2 November 2020}} In 2023, the expansion of SWHID was changed from Software Heritage identifier to software hash identifier.

In order to increase the chances of preserving the Software Heritage archive over the long term, a mirror program was established in 2018, joined by ENEA{{cite web|title=At ENEA the first institutional mirror of Software Heritage|url=https://www.enea.it/en/news-enea/news/technology-at-enea-the-first-european-institutional-mirror-of-software-heritage|website=ENEA|accessdate=2 November 2020|archive-date=16 November 2020|archive-url=https://web.archive.org/web/20201116125254/https://www.enea.it/en/news-enea/news/technology-at-enea-the-first-european-institutional-mirror-of-software-heritage|url-status=dead}} and FossID{{cite web|title=FossID establishes first independent mirror of world's larges source code archive|url=https://fossid.com/2018/12/06/fossid-establishes-first-independent-mirror-of-worlds-largest-source-code-archive/|website=FossID|date=6 December 2018|accessdate=2 November 2020|archive-date=23 September 2020|archive-url=https://web.archive.org/web/20200923055251/https://fossid.com/2018/12/06/fossid-establishes-first-independent-mirror-of-worlds-largest-source-code-archive/|url-status=dead}} as of October 2020.

History

Development of Software Heritage began at Inria under the direction of computer scientists Roberto Di Cosmo and Stefano Zacchiroli in early 2015,{{cite news|last1=Moody|first1=Lyn|title=Software Heritage, the "Library of Alexandria of software," launches today|url=http://arstechnica.co.uk/business/2016/06/software-heritage-the-library-of-alexandria-of-software-launches-today/|accessdate=26 July 2016|work=Ars Technica|date=30 June 2016}} and the project was officially announced to the public on June 30, 2016.{{cite news|last1=Brogan|first1=Jacob|title=Introducing Software Heritage, the Library of Alexandria for Code|url=http://www.slate.com/blogs/future_tense/2016/06/30/software_heritage_from_inria_wants_to_preserve_old_versions_of_computer.html|accessdate=26 July 2016|work=Slate|date=30 June 2016}}

In 2017 Inria signed an agreement with UNESCO for the long-term preservation of software source code and for making it widely available, in particular through the Software Heritage initiative.

{{cite press release

| author = UNESCO

| title = Discours de la Directrice générale de l'UNESCO, Irina Bokova, à l'occasion de la signature de l'accord entre l'UNESCO et INRIA portant sur la préservation et le partage du patrimoine logiciel

| date = 3 April 2020

| publisher = UNESCO

| location = Paris, France

| url = https://unesdoc.unesco.org/ark:/48223/pf0000247817

| access-date = 2020-11-03

}} Bokova, IG, Director-General, 2009{{endash}}2017.

In June 2018, the Software Heritage Archive was opened at UNESCO headquarters.

On July 4, 2018, Software Heritage was included in the French National Plan for Open Science.{{cite web|title=National Plan for Open Science|url=https://cache.media.enseignementsup-recherche.gouv.fr/file/Recherche/50/1/SO_A4_2018_EN_01_leger_982501.pdf|website=Ouvrir La Science|accessdate=2 November 2020|archive-date=1 July 2021|archive-url=https://web.archive.org/web/20210701104417/https://cache.media.enseignementsup-recherche.gouv.fr/file/Recherche/50/1/SO_A4_2018_EN_01_leger_982501.pdf|url-status=dead}}

In October 2018, the strategy and vision underlying the mission of Software Heritage were published in Communications of the ACM.

In November 2018, a group of forty international experts met at the invitation of Inria and UNESCO,

{{cite press release

| title = Experts call for greater recognition of software source code as heritage for sustainable development

| date = 16 November 2020

| publisher = UNESCO

| location = Paris, France

| url = https://en.unesco.org/news/experts-call-greater-recognition-software-source-code-heritage-sustainable-development

| access-date = 2 November 2020

}}

which led to the publication in February 2019 of Paris Call: Software Source Code as Heritage for Sustainable Development.{{cite web|title=Paris Call on software source code as heritage for sustainable development|url=https://en.unesco.org/foss/paris-call-software-source-code|location= Paris |publisher=UNESCO |date=February 2019|accessdate=2 November 2020}}

In November 2019, Inria signed an agreement with GitHub to improve the archival process for GitHub-hosted projects in the Software Heritage archive.{{cite web|title=GitHub Archive Program|url=https://archiveprogram.github.com/|date=November 2019|accessdate=2 November 2020}}

As of October 2020, Software Heritage’s repository held over 143 million software projects in an archive of over 9.1 billion unique source files.

Funding

Software Heritage is a non-profit organization, funded largely from donations from supporting sponsors, that include private companies, public bodies and academic institutions.{{cite web|title=Software Heritage Sponsors|url=https://www.softwareheritage.org/support/sponsors|accessdate=2 November 2020}}

Software Heritage also seeks support for funding third parties interested in contributing to its mission. A grant from NLnet{{cite web|title=NLNet Software Heritage grant|url=https://nlnet.nl/project/SoftwareHeritage/|accessdate=2 November 2020}} funded the work of Octobus{{cite web|title=Augmenting Software Heritage archiving capabilities|url=https://octobus.net/blog/2020-03-24-swh-partnership.html|accessdate=2 November 2020}} and Tweag{{cite web|title=Long-term reproducibility with Nix and Software HERITAGE|url=https://www.tweag.io/blog/2020-06-18-software-heritage|accessdate=2 November 2020}} that led to rescuing 250.000 Mercurial repositories phased out from Bitbucket.{{cite web|title=Announcing the Mercurial public Bitbucket archive|url=https://octobus.net/blog/2020-08-05-bitbucket-public-archive.html|accessdate=2 November 2020}}

A grant from the Alfred P. Sloan Foundation funds experts to develop new connectors for expanding coverage of the Software Heritage Archive {{cite web|author=Sloan Foundation|title=Excited to support Software Heritage|url=https://twitter.com/SloanFoundation/status/1263494778748010496|accessdate=2 November 2020}}

Development and community

The Software Heritage infrastructure is built transparently and collaboratively. All the software developed in the process is released as free and open-source software.{{cite web|title=Software Heritage licensing|url=https://www.softwareheritage.org/community/developers/|accessdate=25 February 2021}} An ambassador program has been announced in December 2020 with the stated goal to grow the community of users and contributors.{{cite web|title=Software Heritage Ambassadors|url=https://www.softwareheritage.org/community/ambassadors/|accessdate=25 February 2021}}

Awards

In 2016, Software Heritage received the best community project award at Paris Open Source Summit 2016.{{Cite web |url=http://www.lesacteursdulibre.com/ |title=Les Acteurs du Libre - Précédents Lauréats |access-date=8 May 2020 |archive-date=18 January 2019 |archive-url=https://web.archive.org/web/20190118112843/http://www.lesacteursdulibre.com/ |url-status=bot: unknown }}{{cite web |title=Paris Open Source Summit 2016 : Prix Acteurs du Libre : et les gagnants sont... |url=https://www.programmez.com/actualites/paris-open-source-summit-2016-prix-acteurs-du-libre-et-les-gagnants-sont-25126 |website=Programmez! |accessdate=28 June 2019 |language=fr |date=17 November 2016}}

In 2019, Software Heritage received the award of Academic Initiative from the Pôle Systematic.{{cite tweet |user=Pole_Systematic |number=1144308178420719616 |date=27 June 2019|title=Convention @Pole_Systematic le Trophée Prix Initiative académique est remis @SWHeritage. }}

References

{{Reflist}}