reproducible builds

{{Short description|Process in computer science}}

File:Reproducible Builds project logo.svg's Reproducible Builds project]]

Reproducible builds, also known as deterministic compilation, is a process of compiling software which ensures the resulting binary code can be reproduced. Source code compiled using deterministic compilation will always output the same binary.{{Cite web |url=https://reproducible-builds.org |title=reproducible-builds.org |website=reproducible-builds.org |archive-url=https://web.archive.org/web/20160520123008/https://reproducible-builds.org/ |archive-date=20 May 2016 |url-status=live |access-date=22 August 2016 |quote=Reproducible builds are a set of software development practices which create a verifiable path from human readable source code to the binary code used by computers....build system needs to be made entirely deterministic: transforming a given source must always create the same result. }}{{cite journal |last1=Lamb |first1=Chris |last2=Zacchiroli |first2=Stefano |date=March 2022 |title=Reproducible Builds: Increasing the Integrity of Software Supply Chains |url=https://hal.science/hal-03196519/ |journal=IEEE Software |volume=39 |issue=2 |pages=62–70 |doi=10.1109/MS.2021.3073045 |s2cid=233219473 |access-date=26 March 2023|arxiv=2104.06020 }}{{Cite web |url=http://www.securityweek.com/establishing-correspondence-between-application-and-its-source-code |title=Establishing Correspondence Between an Application and its Source Code {{!}} SecurityWeek.com |last=Ratliff |first=Emily |date=4 April 2016 |website=www.securityweek.com |publisher=SecurityWeek |archive-url=https://web.archive.org/web/20160920014341/http://www.securityweek.com/establishing-correspondence-between-application-and-its-source-code |archive-date=20 September 2016 |url-status=live |access-date=22 August 2016}}

Reproducible builds can act as part of a chain of trust; the source code can be signed, and deterministic compilation can prove that the binary was compiled from trusted source code. Verified reproducible builds provide a strong countermeasure against attacks where binaries do not match their source code, e.g., because an attacker has inserted malicious code into a binary. This is a relevant attack; attackers sometimes attack binaries but not the source code, e.g., because they can only change the distributed binary or to evade detection since it is the source code that developers normally review and modify. In a survey of 17 experts, reproducible builds had a very high utility rating from 58.8% participants, but also a high-cost rating from 70.6%.{{Cite journal |title=Taxonomy of Attacks on Open-Source Software Supply Chains |last1=Ladisa |first1=Piergiorgio |last2=Plate |first2=Henrik |last3=Martinez |first3=Matias |last4=Barais |first4=Olivier |date=19 April 2022|doi=10.1109/SP46215.2023.00010 |doi-broken-date=1 November 2024 |arxiv=2204.04008 }} Various efforts are being made to modify software development tools to reduce these costs.

Methods

For the compilation process to be deterministic, the input to the compiler must be the same, regardless of the build environment used. This typically involves normalizing variables that may change, such as order of input files, timestamps, locales, and paths.

Additionally, the compilers must not introduce non-determinism themselves. This sometimes happens when using hash tables with a random hash seed value. It can also happen when using the address of variables because that varies from address space layout randomization (ASLR).

Build systems, such as Bazel and Gitian,{{Cite web|url=https://gitian.org/|title=Gitian: a secure software distribution method|website=gitian.org|language=en|access-date=2018-01-10}} can be used to automate deterministic build processes.

History

The GNU Project used reproducible builds in the early 1990s. Changelogs from 1992 indicate the ongoing effort.{{Cite mailing list |url=https://lists.reproducible-builds.org/pipermail/rb-general/2017-January/000309.html |title=SOURCE_PREFIX_MAP and Occam's Razor |date=2017-01-24 |mailing-list=rb-general |last=Gilmore |first=John |author-link=John Gilmore (activist)}}

One of the older{{cite web | url=https://github.com/devrandom/gitian-builder/blob/master/LICENSE | title=LICENSE-file of the Gitian-Project | website=GitHub | access-date=2019-12-03}} projects to promote reproducible builds is the Bitcoin project with [https://gitian.org/ Gitian]. Later, in 2013, the Tor (anonymity network) project started using Gitian for their reproducible builds.[https://blog.torproject.org/blog/deterministic-builds-part-two-technical-details Deterministic Builds Part Two: Technical Details.] October 04, 2013

From 2011 a reproducible Java build system was developed for a decentralized peer-to-peer FOSS project: DirectDemocracyP2P.{{cite web | url=https://github.com/ddp2p/DDP2P | title=DDP2P | website=GitHub | date=2011}} The concepts of the system's application to automated updates recommendation support was first presented in April 2013 at Decentralized Coordination. Alhamed, Khalid, et al. "{{cite web | url=https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=a309bfd0040276586f2d3657022498aa600dfea5 | website=Citeseer | title=Security by Decentralized Certification of Automatic-Updates for Open Source Software controlled by Volunteers}}.", Proceedings of Decentralized Coordination. pp 40-59, Lulu Publisher, April 6, 2013. Silaghi, M. C., Alhamed, K., Dhannoon, O., Qin, S., Vishen, R., Knowles, R., ... & Hirayama, K. (2013, September). DirectDemocracyP2P—decentralized deliberative petition drives—. In IEEE P2P 2013 Proceedings (pp. 1-2). IEEE. A treatise focusing on the implementation details of the reproducible Java compilation tool itself was published in 2015.Silaghi, M., Alhamed, K., & Stansifer, R. (2015, December). Java tool extensions for supporting multiple recommenders and distributed bundles. In 2015 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 722-725). IEEE.

In July 2013 on the Debian project started implementing reproducible builds across its entire package archive.{{Cite web|url=http://penta.debconf.org/dc13_schedule/events/1063.en.html|title=Reproducible Builds talk in Debian|date=21 September 2014 }}{{Cite web|url=https://wiki.debian.org/ReproducibleBuilds/History|title=Reproducible Builds history}} By July 2017 more than 90% of the packages in the repository have been proven to build reproducibly.{{Cite news|url=https://www.golem.de/news/linux-distributionen-mehr-als-90-prozent-der-debian-pakete-reproduzierbar-1707-129094.html|title=Linux-Distributionen: Mehr als 90 Prozent der Debian-Pakete reproduzierbar - Golem.de|date=2017-07-24|access-date=2018-10-30|language=de-DE}}

In November 2018, the Reproducible Builds project joined the Software Freedom Conservancy.{{ cite web | url=https://reproducible-builds.org/news/2018/11/08/reproducible-builds-joins-software-freedom-concervancy/ | title = Reproducible Builds joins the Software Freedom Conservancy | access-date=2018-12-15 }}

F-Droid uses reproducible builds to provide a guarantee that the distributed APKs use the claimed free source code.{{cite web|url=https://f-droid.org/docs/Reproducible_Builds/|title=Reproducible Builds|publisher=F-Droid}}

The Tails portable operating system uses reproducible builds and explains to others how to verify their distribution.{{cite web|url=https://tails.boum.org/contribute/build/reproducible/|title=Verifying a Tails image for reproducibility|publisher=Tails}}

NixOS claims 100% reproducible build in June 2021 for their minimal ISO releases.{{Cite web|date=2021-06-20|title=Nixos-unstable's iso_minimal.x86_64-linux is 100% reproducible!|url=https://discourse.nixos.org/t/nixos-unstable-s-iso-minimal-x86-64-linux-is-100-reproducible/13723|access-date=2021-06-21|website=NixOS Discourse|language=en}}

{{As of|2020|05}}, Arch Linux is working on making all official packages reproducible.{{cite web |title=ArchWiki - Reproducible Builds |url=https://wiki.archlinux.org/title/Reproducible_builds}}

{{As of|2025|03}} Debian live images for bookworm are reproducible.{{cite web |last1=Clobus |first1=Roland |title=Irregular status update about reproducible Debian live ISO images |url=https://lists.reproducible-builds.org/pipermail/rb-general/2025-March/003675.html |access-date=26 March 2025 |date=19 March 2025}}

Challenges

According to the Reproducible Builds project, timestamps are "the biggest source of reproducibility issues. Many build tools record the current date and time... and most archive formats will happily record modification times on top of their own timestamps."{{Cite web|title=Timestamps|url=https://reproducible-builds.org/docs/timestamps/|access-date=2022-04-16|website=Reproducible builds|language=en}} They recommend that "it is better to use a date that is relevant to the source code instead of the build: old software can always be built later" if it is reproducible. They identify several ways to modify build processes to do this:

  • Set the {{Mono|SOURCE_DATE_EPOCH}} environment variable to the number of seconds since January 1, 1970, using something from the source code. Tools that support this environment variable will use its value (when set) instead of the current date and time.
  • Post-process output to remove timestamps or normalize them. The tool strip-nondeterminism can often help do this.
  • Use a library like libfaketime to intercept requests for the current time of day and provide a controlled response.

In some cases other changes must be made to make a build process reproducible. For example, some data structures do not guarantee a stable order in each execution. A typical solution is to modify the build process to specify a sorted output from those structures.

{{Cite web|title=Timestamps|url=https://reproducible-builds.org/docs/stable-outputs/|access-date=2022-04-16|website=Reproducible builds|language=en}}

See also

References

{{reflist}}