GitHub Copilot#Licensing controversy

{{Short description|Artificial intelligence tool}}

{{Distinguish|Microsoft Copilot}}

{{Use dmy dates|date=April 2022}}

{{infobox software

| logo = GitHub Copilot (2025).svg{{!}}class=skin-invert

| logo_upright = 1.25

| developer = GitHub, OpenAI

| released = {{Start date and age|2021|10}}

| operating system = Microsoft Windows, Linux, macOS, Web

| website = {{URL|https://github.com/features/copilot/}}

| latest release version = 1.7.4421

}}

GitHub Copilot is a code completion and automatic programming tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code.{{cite web |last1=Gershgorn |first1=Dave |date=29 June 2021 |title=GitHub and OpenAI launch a new AI tool that generates its own code |url=https://www.theverge.com/2021/6/29/22555777/github-openai-ai-tool-autocomplete-code |access-date=6 July 2021 |website=The Verge |language=en-US}} Currently available by subscription to individual developers and to businesses, the generative artificial intelligence software was first announced by GitHub on 29 June 2021.{{Cite web |title=GitHub Copilot · Your AI pair programmer |url=https://copilot.github.com/ |access-date=7 April 2022 |website=GitHub Copilot |language=en-US}} Users can choose the large language model used for generation.{{Cite web |last=Warren |first=Tom |date=2024-10-29 |title=GitHub Copilot will support models from Anthropic, Google, and OpenAI |url=https://www.theverge.com/2024/10/29/24282544/github-copilot-multi-model-anthropic-google-open-ai-github-spark-announcement |access-date=2025-01-28 |website=The Verge |language=en}}

History

On June 29, 2021, GitHub announced GitHub Copilot for technical preview in the Visual Studio Code development environment.{{Cite web |date=29 June 2021 |title=Introducing GitHub Copilot: your AI pair programmer |url=https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/ |access-date=7 April 2022 |website=The GitHub Blog |language=en-US}} GitHub Copilot was released as a plugin on the JetBrains marketplace on October 29, 2021.{{Cite web |title=GitHub Copilot - IntelliJ IDEs Plugin {{!}} Marketplace |url=https://plugins.jetbrains.com/plugin/17718-github-copilot/versions/stable |access-date=7 April 2022 |website=JetBrains Marketplace}} October 27, 2021, GitHub released the GitHub Copilot Neovim plugin as a public repository.{{Citation |title=Copilot.vim |date=7 April 2022 |url=https://github.com/github/copilot.vim |publisher=GitHub |access-date=7 April 2022}} GitHub announced Copilot's availability for the Visual Studio 2022 IDE on March 29, 2022.{{Cite web |date=29 March 2022 |title=GitHub Copilot now available for Visual Studio 2022 |url=https://github.blog/2022-03-29-github-copilot-now-available-for-visual-studio-2022/ |access-date=7 April 2022 |website=The GitHub Blog |language=en-US}} On June 21, 2022, GitHub announced that Copilot was out of "technical preview", and is available as a subscription-based service for individual developers.{{Cite web |date=21 June 2022 |title=GitHub Copilot is generally available to all developers |url=https://github.blog/2022-06-21-github-copilot-is-generally-available-to-all-developers/ |access-date=21 June 2022 |website=The GitHub Blog |language=en-US}}

GitHub Copilot is the evolution of the "Bing Code Search" plugin for Visual Studio 2013, which was a Microsoft Research project released in February 2014.{{Cite web |last=Lardinois |first=Frederic |date=2014-02-17 |title=Microsoft Launches Smart Visual Studio Add-On For Code Snippet Search |url=https://techcrunch.com/2014/02/17/microsoft-launches-smart-visual-studio-add-on-for-code-snippet-search/ |access-date=2023-09-05 |website=TechCrunch |language=en-US}} This plugin integrated with various sources, including MSDN and Stack Overflow, to provide high-quality contextually relevant code snippets in response to natural language queries.{{Cite web |date=2014-02-11 |title=Bing Code Search |url=https://www.microsoft.com/en-us/research/video/bing-code-search/ |access-date=2023-09-05 |website=Microsoft Research |language=en-US}}

Features

When provided with a programming problem in natural language, Copilot is capable of generating solution code.{{Cite book |last1=Finnie-Ansley |first1=James |last2=Denny |first2=Paul |last3=Becker |first3=Brett A. |last4=Luxton-Reilly |first4=Andrew |last5=Prather |first5=James |title=Australasian Computing Education Conference |chapter=The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming |date=14 February 2022 |series=ACE '22 |language=en-US |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=10–19 |doi=10.1145/3511861.3511863 |isbn=978-1-4503-9643-1 |s2cid=246681316 |doi-access=free}} It is also able to describe input code in English and translate code between programming languages.

Copilot enables developers to utilize a variety of Large Language Models (LLMs) from leading LLM providers, including various versions of OpenAI’s GPT, Anthropic’s Sonnet, and Google’s Gemini. {{Cite web |last=VibeCentral |date=2025-05-21 |title=Navigating the AI Coding Landscape: A Comparative Analysis of GitHub Copilot's LLMs for Optimal Developer Productivity |url=https://vibecentral.ai/report/coding/navigating-the-ai-coding-landscape-a-comparative-analysis-of-github-copilots-large-language-models-for-optimal-developer-productivity/ |access-date=2025-05-23 |website=VibeCentral |language=en-US}}

According to its website, GitHub Copilot includes assistive features for programmers, such as the conversion of code comments to runnable code, and autocomplete for chunks of code, repetitive sections of code, and entire methods and/or functions.{{Cite journal |last1=Sobania |first1=Dominik |last2=Schweim |first2=Dirk |last3=Rothlauf |first3=Franz |date=2022 |title=A Comprehensive Survey on Program Synthesis with Evolutionary Algorithms |url=https://ieeexplore.ieee.org/document/9743417 |journal=IEEE Transactions on Evolutionary Computation |volume=27 |pages=82–97 |doi=10.1109/TEVC.2022.3162324 |s2cid=247721793 |issn=1941-0026|url-access=subscription }} GitHub reports that Copilot's autocomplete feature is accurate roughly half of the time; with some Python function header code, for example, Copilot correctly autocompleted the rest of the function body code 43% of the time on the first try and 57% of the time after ten attempts.

GitHub states that Copilot's features allow programmers to navigate unfamiliar coding frameworks and languages by reducing the amount of time users spend reading documentation.

Implementation

GitHub Copilot was initially powered by the OpenAI Codex,{{Cite web |last=Krill |first=Paul |date=12 August 2021 |title=OpenAI offers API for GitHub Copilot AI model |url=https://www.infoworld.com/article/3629469/openai-offers-api-for-github-copilot-ai-model.html |access-date=7 April 2022 |website=InfoWorld |language=en}} which is a modified, production version of GPT-3.{{Cite web |date=3 June 2020 |title=OpenAI Releases GPT-3, The Largest Model So Far |url=https://analyticsindiamag.com/open-ai-gpt-3-language-model/ |access-date=7 April 2022 |website=Analytics India Magazine |language=en-US}} The Codex model is additionally trained on gigabytes of source code in a dozen programming languages. Copilot's OpenAI Codex was trained on a selection of the English language, public GitHub repositories, and other publicly available source code. This includes a filtered dataset of 159 gigabytes of Python code sourced from 54 million public GitHub repositories.{{Cite web |title=OpenAI Announces 12 Billion Parameter Code-Generation AI Codex |url=https://www.infoq.com/news/2021/08/openai-codex/ |access-date=7 April 2022 |website=InfoQ |language=en}} OpenAI's GPT-3 is licensed exclusively to Microsoft, GitHub's parent company.{{Cite web |title=OpenAI is giving Microsoft exclusive access to its GPT-3 language model |url=https://www.technologyreview.com/2020/09/23/1008729/openai-is-giving-microsoft-exclusive-access-to-its-gpt-3-language-model/ |access-date=7 April 2022 |website=MIT Technology Review |language=en}}

In November 2023, Copilot Chat was updated to use OpenAI's GPT-4 model.{{cite web |url=https://github.blog/changelog/2023-11-30-github-copilot-november-30th-update/ |title=GitHub Copilot – November 30th Update · GitHub Changelog |date=30 November 2023}} In 2024, Copilot began allowing users to choose between different large language models, such as GPT-4o or Claude 3.5.

Reception

Since Copilot's release, there have been concerns with its security and educational impact, as well as licensing controversy surrounding the code it produces.{{cite arXiv |last1=Pearce |first1=Hammond |last2=Ahmad |first2=Baleegh |last3=Tan |first3=Benjamin |last4=Dolan-Gavitt |first4=Brendan |last5=Karri |first5=Ramesh |date=16 December 2021 |title=Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions |class=cs.CR |eprint=2108.09293 }}

= Licensing controversy =

While GitHub CEO Nat Friedman stated in June 2021 that "training ML systems on public data is fair use",{{Cite tweet |user=natfriedman|number=1409914420579344385|author=Nat Friedman|title=In general: (1) training ML systems on public data is fair use|access-date=2023-02-23 |website=Twitter |language=en|archive-url=https://web.archive.org/web/20210630043243/https://twitter.com/natfriedman/status/1409914420579344385|archive-date=2021-06-30}} a class-action lawsuit filed in November 2022 called this "pure speculation", asserting that "no Court has considered the question of

whether 'training ML systems on public data is fair use.'"{{cite web

|last1=Butterick

|first1=Matthew

|title=GitHub Copilot litigation

|url=https://githubcopilotlitigation.com/

|website=githubcopilotlitigation.com

|publisher=Joseph Saveri Law Firm

|access-date=12 February 2023

|date=November 3, 2022

|archive-url=https://web.archive.org/web/20221103204107/https://githubcopilotlitigation.com/pdf/1-0-github_complaint.pdf

|archive-date=2022-11-03

|quote=22-cv-06823-JST

}} The lawsuit from Joseph Saveri Law Firm, LLP challenges the legality of Copilot on several claims, ranging from breach of contract with GitHub's users, to breach of privacy under the CCPA for sharing PII.{{Cite web |last=Vincent |first=James |date=2022-11-08 |title=The lawsuit that could rewrite the rules of AI copyright |url=https://www.theverge.com/2022/11/8/23446821/microsoft-openai-github-copilot-class-action-lawsuit-ai-copyright-violation-training-data |access-date=2022-12-07 |website=The Verge |language=en-US}}

GitHub admits that a small proportion of the tool's output may be copied verbatim, which has led to fears that the output code is insufficiently transformative to be classified as fair use and may infringe on the copyright of the original owner.{{cite news|url=https://www.theverge.com/2021/7/7/22561180/github-copilot-legal-copyright-fair-use-public-code|title=GitHub's automatic coding tool rests on untested legal ground|date=7 July 2021|access-date=11 July 2021|work=The Verge}} In June 2022, the Software Freedom Conservancy announced it would end all uses of GitHub in its own projects,{{Cite web |title=Give Up GitHub: The Time Has Come! |url=https://sfconservancy.org/blog/2022/jun/30/give-up-github-launch/ |access-date=2022-09-08 |website=Software Freedom Conservancy |language=en}} accusing Copilot of ignoring code licenses used in training data.{{Cite web |title=If Software is My Copilot, Who Programmed My Software? |url=https://sfconservancy.org/blog/2022/feb/03/github-copilot-copyleft-gpl/ |access-date=2022-09-08 |website=Software Freedom Conservancy |language=en}} In a customer-support message, GitHub stated that "training machine learning models on publicly available data is considered fair use across the machine learning community", but the class action lawsuit called this "false" and additionally noted that "regardless of this concept's level of acceptance in 'the machine learning community,' under Federal law, it is illegal".

= Privacy concerns =

The Copilot service is cloud-based and requires continuous communication with the GitHub Copilot servers.{{cite web |title=GitHub Copilot - Your AI pair programmer |url=https://github.com/features/copilot/#faq-privacy |website=GitHub |access-date=18 October 2022}} This opaque architecture has fueled concerns over telemetry and data mining of individual keystrokes.{{cite web |title=CoPilot: Privacy & DataMining |url=https://github.com/community/community/discussions/7263 |website=GitHub |access-date=18 October 2022}}{{cite web |last=Stallman|first=Richard|author-link=Richard Stallman|title=Who does that server really serve?|url=https://www.gnu.org/philosophy/who-does-that-server-really-serve.en.html |website=gnu.org |access-date=18 Oct 2022}}

In late 2022 GitHub Copilot has been accused of emitting Quake game source code, with no author attribution or license.{{cite web |date=October 23, 2022 |title=GitHub Copilot: The Latest in the List of AI Generative Models Facing Copyright Allegations |url=https://analyticsindiamag.com/github-copilot-the-latest-in-the-list-of-ai-generative-models-facing-copyright-allegations/ |url-status=live |archive-url=https://web.archive.org/web/20230322031738/https://analyticsindiamag.com/github-copilot-the-latest-in-the-list-of-ai-generative-models-facing-copyright-allegations/ |archive-date=March 22, 2023 |access-date=March 23, 2023 |website=Analytics India Magazine}}

See also

References

{{reflist}}