Generative pre-trained transformer

File:Full GPT architecture.svgA generative pre-trained transformer (GPT) is a type of large language model (LLM){{cite web|url=https://www.aljazeera.com/news/2023/3/15/how-do-ai-models-like-gpt-4-work-and-how-can-you-start-using-it|title=How does GPT-4 work and how can you start using it in ChatGPT?|first=Mohammed|last=Haddad|website=www.aljazeera.com|access-date=April 10, 2023|archive-date=July 5, 2023|archive-url=https://web.archive.org/web/20230705224641/https://www.aljazeera.com/news/2023/3/15/how-do-ai-models-like-gpt-4-work-and-how-can-you-start-using-it|url-status=live}} and a prominent framework for generative artificial intelligence.{{cite web|url=https://pub.towardsai.net/generative-ai-and-future-c3b1695876f2|title=Generative AI and Future|first=Luhui|last=Hu|date=November 15, 2022|website=Medium|access-date=April 29, 2023|archive-date=June 5, 2023|archive-url=https://web.archive.org/web/20230605023010/https://pub.towardsai.net/generative-ai-and-future-c3b1695876f2|url-status=live}}{{cite web|url=https://www.computer.org/csdl/magazine/co/2022/10/09903869/1H0G6xvtREk|title=CSDL | IEEE Computer Society|website=www.computer.org|access-date=April 29, 2023|archive-date=April 28, 2023|archive-url=https://web.archive.org/web/20230428171218/https://www.computer.org/csdl/magazine/co/2022/10/09903869/1H0G6xvtREk|url-status=live}} It is an artificial neural network that is used in natural language processing by machines.{{Cite web|title= LibGuides: Using AI Language Models : ChatGPT|url= https://hallmark.libguides.com/c.php?g=1312147&p=9644939|access-date= December 7, 2023|archive-date= December 8, 2023|archive-url= https://web.archive.org/web/20231208014633/https://hallmark.libguides.com/c.php?g=1312147&p=9644939|url-status= live}} It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able to generate novel human-like content.{{cite web|url=https://www.weforum.org/agenda/2023/01/davos23-generative-ai-a-game-changer-industries-and-society-code-developers/|title=Generative AI: a game-changer society needs to be ready for|website=World Economic Forum|date=January 9, 2023|access-date=April 8, 2023|archive-date=April 25, 2023|archive-url=https://web.archive.org/web/20230425234858/https://www.weforum.org/agenda/2023/01/davos23-generative-ai-a-game-changer-industries-and-society-code-developers/|url-status=live}}{{cite magazine|url=https://time.com/6271657/a-to-z-of-artificial-intelligence/|title=The A to Z of Artificial Intelligence|date=April 13, 2023|magazine=Time|access-date=April 14, 2023|archive-date=June 16, 2023|archive-url=https://web.archive.org/web/20230616123839/https://time.com/6271657/a-to-z-of-artificial-intelligence/|url-status=live}} As of 2023, most LLMs had these characteristics{{cite web|url=https://www.forbes.com/sites/robtoews/2023/02/07/the-next-generation-of-large-language-models/|title=The Next Generation Of Large Language Models|first=Rob|last=Toews|website=Forbes|access-date=April 9, 2023|archive-date=April 14, 2023|archive-url=https://web.archive.org/web/20230414030738/https://www.forbes.com/sites/robtoews/2023/02/07/the-next-generation-of-large-language-models/|url-status=live}} and are sometimes referred to broadly as GPTs.{{cite web|url=https://www.forbes.com/sites/joemckendrick/2023/03/26/most-jobs-soon-to-be-influenced-by-artificial-intelligence-research-out-of-openai-and-university-of-pennsylvania-suggests/?sh=420f9c8f73c7|title=Most Jobs Soon To Be 'Influenced' By Artificial Intelligence, Research Out Of OpenAI And University Of Pennsylvania Suggests|work=Forbes|first=Joe|last=Mckendrick|date=March 13, 2023|access-date=April 16, 2023|archive-date=April 16, 2023|archive-url=https://web.archive.org/web/20230416155511/https://www.forbes.com/sites/joemckendrick/2023/03/26/most-jobs-soon-to-be-influenced-by-artificial-intelligence-research-out-of-openai-and-university-of-pennsylvania-suggests/?sh=420f9c8f73c7|url-status=live}}

The first GPT was introduced in 2018 by OpenAI. OpenAI has released significant GPT foundation models that have been sequentially numbered, to comprise its "GPT-n" series.{{cite web|url=https://www.makeuseof.com/gpt-models-explained-and-compared/|title=GPT-1 to GPT-4: Each of OpenAI's GPT Models Explained and Compared|date=April 11, 2023|website=MUO|access-date=May 3, 2023|archive-date=April 15, 2023|archive-url=https://web.archive.org/web/20230415175013/https://www.makeuseof.com/gpt-models-explained-and-compared/|url-status=live}} Each of these was significantly more capable than the previous, due to increased size (number of trainable parameters) and training. The most recent of these, GPT-4o, was released in May 2024.{{Cite web |title=GPT-4 |url=https://openai.com/research/gpt-4 |access-date=December 8, 2023 |website=openai.com |language=en-US |archive-date=March 14, 2023 |archive-url=https://web.archive.org/web/20230314174531/https://openai.com/research/gpt-4 |url-status=live }} Such models have been the basis for their more task-specific GPT systems, including models fine-tuned for instruction following{{mdash}}which in turn power the ChatGPT chatbot service.

The term "GPT" is also used in the names and descriptions of such models developed by others. For example, other GPT foundation models include a series of models created by EleutherAI,{{cite web |last=Alford |first=Anthony |date=July 13, 2021 |title=EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J |url=https://www.infoq.com/news/2021/07/eleutherai-gpt-j/ |website=InfoQ |access-date=April 3, 2023 |archive-date=February 10, 2023 |archive-url=https://web.archive.org/web/20230210114137/https://www.infoq.com/news/2021/07/eleutherai-gpt-j/ |url-status=live }} and seven models created by Cerebras in 2023.{{cite press release | url=https://www.businesswire.com/news/home/20230328005366/en/Cerebras-Systems-Releases-Seven-New-GPT-Models-Trained-on-CS-2-Wafer-Scale-Systems | title=News | access-date=April 5, 2023 | archive-date=April 5, 2023 | archive-url=https://web.archive.org/web/20230405080938/https://www.businesswire.com/news/home/20230328005366/en/Cerebras-Systems-Releases-Seven-New-GPT-Models-Trained-on-CS-2-Wafer-Scale-Systems | url-status=live }} Companies in different industries have developed task-specific GPTs in their respective fields, such as Salesforce's "EinsteinGPT" (for CRM){{cite web |last1=Morrison |first1=Ryan |title=Salesforce launches EinsteinGPT built with OpenAI technology |url=https://techmonitor.ai/technology/ai-and-automation/salesforce-einsteingpt-openai-chatgpt |website=Tech Monitor |date=March 7, 2023 |access-date=April 10, 2023 |archive-date=April 15, 2023 |archive-url=https://web.archive.org/web/20230415095633/https://techmonitor.ai/technology/ai-and-automation/salesforce-einsteingpt-openai-chatgpt |url-status=live }} and Bloomberg's "BloombergGPT" (for finance).{{cite web | url=https://www.forbes.com/sites/jamielsheikh/2023/04/05/the-chatgpt-of-finance-is-here-bloomberg-is-combining-ai-and-fintech/?sh=43b4385e3081 | title=The ChatGPT of Finance is Here, Bloomberg is Combining AI and Fintech | website=Forbes | access-date=April 6, 2023 | archive-date=April 6, 2023 | archive-url=https://web.archive.org/web/20230406140911/https://www.forbes.com/sites/jamielsheikh/2023/04/05/the-chatgpt-of-finance-is-here-bloomberg-is-combining-ai-and-fintech/?sh=43b4385e3081 | url-status=live }}

History

= Initial developments =

Generative pretraining (GP) was a long-established concept in machine learning applications.{{Cite journal |last=Hinton (et-al) |first=Geoffrey |date=October 15, 2012 |title=Deep neural networks for acoustic modeling in speech recognition |url=http://cs224d.stanford.edu/papers/maas_paper.pdf |journal=IEEE Signal Processing Magazine |volume=Digital Object Identifier 10.1109/MSP.2012.2205597 |doi=10.1109/MSP.2012.2205597 |s2cid=206485943 |archive-date=March 18, 2023 |access-date=April 27, 2023 |archive-url=https://web.archive.org/web/20230318044634/http://cs224d.stanford.edu/papers/maas_paper.pdf |url-status=live }}{{cite journal|title=A tutorial survey of architectures, algorithms, and applications for deep learning | APSIPA Transactions on Signal and Information Processing | Cambridge Core |journal=Apsipa Transactions on Signal and Information Processing |doi=10.1017/atsip.2013.9 |publisher=Cambridge.org |date=January 22, 2014 |volume=3 |pages=e2 |s2cid=9928823 |last1=Deng |first1=Li |doi-access=free }} It was originally used as a form of semi-supervised learning, as the model is trained first on an unlabeled dataset (pretraining step) by learning to generate datapoints in the dataset, and then it is trained to classify a labeled dataset.{{Cite journal |last1=Erhan |first1=Dumitru |last2=Courville |first2=Aaron |last3=Bengio |first3=Yoshua |last4=Vincent |first4=Pascal |date=March 31, 2010 |title=Why Does Unsupervised Pre-training Help Deep Learning? |url=https://proceedings.mlr.press/v9/erhan10a.html |journal=Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics |language=en |publisher=JMLR Workshop and Conference Proceedings |pages=201–208 |archive-date=January 24, 2024 |access-date=January 24, 2024 |archive-url=https://web.archive.org/web/20240124195306/https://proceedings.mlr.press/v9/erhan10a.html |url-status=live }}

There were three main types of early GP. The hidden Markov models learn a generative model of sequences for downstream applications. For example, in speech recognition, a trained HMM infers the most likely hidden sequence for a speech signal, and the hidden sequence is taken as the phonemes of the speech signal. These were developed in the 1970s and became widely applied in speech recognition in the 1980s.{{Cite web |date=January 12, 2015 |title=First-Hand:The Hidden Markov Model – Engineering and Technology History Wiki |url=http://ethw.org/First-Hand:The_Hidden_Markov_Model |url-status=live |archive-url=https://web.archive.org/web/20180403191314/http://ethw.org/First-Hand:The_Hidden_Markov_Model |archive-date=April 3, 2018 |access-date=May 1, 2018 |website=ethw.org }}{{Cite journal |last1=Juang |first1=B. H. |last2=Rabiner |first2=L. R. |date=1991 |title=Hidden Markov Models for Speech Recognition |url=https://www.jstor.org/stable/1268779 |journal=Technometrics |volume=33 |issue=3 |pages=251–272 |doi=10.2307/1268779 |jstor=1268779 |issn=0040-1706 |archive-date=October 8, 2024 |access-date=October 4, 2024 |archive-url=https://web.archive.org/web/20241008004649/https://www.jstor.org/stable/1268779 |url-status=live }}

The compressors learn to compress data such as images and textual sequences, and the compressed data serves as a good representation for downstream applications such as facial recognition.{{Cite journal |last1=Cottrell |first1=Garrison W. |last2=Munro |first2=Paul |last3=Zipser |first3=David |date=1987 |title=Learning Internal Representation From Gray-Scale Images: An Example of Extensional Programming |url=https://escholarship.org/uc/item/2zs7w6z8 |journal=Proceedings of the Annual Meeting of the Cognitive Science Society |language=en |volume=9 |archive-date=October 7, 2024 |access-date=October 4, 2024 |archive-url=https://web.archive.org/web/20241007093334/https://escholarship.org/uc/item/2zs7w6z8 |url-status=live }}{{Citation |last=Cottrell |first=Garrison W. |title=Extracting features from faces using compression networks: Face, identity, emotion, and gender recognition using holons |date=January 1, 1991 |work=Connectionist Models |pages=328–337 |editor-last=Touretzky |editor-first=David S. |url=https://www.sciencedirect.com/science/article/abs/pii/B9781483214481500391 |access-date=October 4, 2024 |publisher=Morgan Kaufmann |isbn=978-1-4832-1448-1 |editor2-last=Elman |editor2-first=Jeffrey L. |editor3-last=Sejnowski |editor3-first=Terrence J. |editor4-last=Hinton |editor4-first=Geoffrey E. |archive-date=October 7, 2024 |archive-url=https://web.archive.org/web/20241007102908/https://www.sciencedirect.com/science/article/abs/pii/B9781483214481500391 |url-status=live }}{{cite journal |last1=Schmidhuber |first1=Jürgen |year=1992 |title=Learning complex, extended sequences using the principle of history compression |url=https://gwern.net/doc/ai/nn/rnn/1992-schmidhuber.pdf |journal=Neural Computation |volume=4 |issue=2 |pages=234–242 |doi=10.1162/neco.1992.4.2.234 |s2cid=18271205}} The autoencoders similarly learn a latent representation of data for later downstream applications such as speech recognition.{{Cite journal |last1=Elman |first1=Jeffrey L. |last2=Zipser |first2=David |date=April 1, 1988 |title=Learning the hidden structure of speech |url=https://pubs.aip.org/jasa/article/83/4/1615/826094/Learning-the-hidden-structure-of-speechLearning |journal=The Journal of the Acoustical Society of America |language=en |volume=83 |issue=4 |pages=1615–1626 |doi=10.1121/1.395916 |pmid=3372872 |bibcode=1988ASAJ...83.1615E |issn=0001-4966 |archive-date=October 7, 2024 |access-date=October 4, 2024 |archive-url=https://web.archive.org/web/20241007093918/https://pubs.aip.org/jasa/article/83/4/1615/826094/Learning-the-hidden-structure-of-speechLearning |url-status=live }}{{Cite journal |last1=Bourlard |first1=H. |last2=Kamp |first2=Y. |date=1988 |title=Auto-association by multilayer perceptrons and singular value decomposition |url=http://infoscience.epfl.ch/record/82601 |journal=Biological Cybernetics |volume=59 |issue=4–5 |pages=291–294 |doi=10.1007/BF00332918 |pmid=3196773 |s2cid=206775335 |archive-date=June 27, 2021 |access-date=October 4, 2024 |archive-url=https://web.archive.org/web/20210627222903/https://infoscience.epfl.ch/record/82601 |url-status=live }} The connection between autoencoders and algorithmic compressors was noted in 1993.{{Cite journal |last1=Hinton |first1=Geoffrey E |last2=Zemel |first2=Richard |date=1993 |title=Autoencoders, Minimum Description Length and Helmholtz Free Energy |url=https://proceedings.neurips.cc/paper/1993/hash/9e3cfc48eccf81a0d57663e129aef3cb-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Morgan-Kaufmann |volume=6 |archive-date=August 14, 2024 |access-date=October 4, 2024 |archive-url=https://web.archive.org/web/20240814042046/https://proceedings.neurips.cc/paper/1993/hash/9e3cfc48eccf81a0d57663e129aef3cb-Abstract.html |url-status=live }}

During the 2010s, the problem of machine translation was solved{{cn|date=December 2024}} by recurrent neural networks, with attention mechanism added. This was optimized into the transformer architecture, published by Google researchers in Attention Is All You Need (2017).{{cite journal |last1=Vaswani |first1=Ashish |author1-link=Ashish Vaswani |last2=Shazeer |first2=Noam |last3=Parmar |first3=Niki |last4=Uszkoreit |first4=Jakob |last5=Jones |first5=Llion |last6=Gomez |first6=Aidan N |author6-link=Aidan Gomez |last7=Kaiser |first7=Łukasz |last8=Polosukhin |first8=Illia |title=Attention is All you Need |journal=Advances in Neural Information Processing Systems |date=2017 |volume=30 |url=https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf |publisher=Curran Associates, Inc. |archive-date=February 21, 2024 |access-date=January 28, 2024 |archive-url=https://web.archive.org/web/20240221141113/https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf |url-status=live }} That development led to the emergence of large language models such as BERT (2018){{Cite journal |last1=Devlin |first1=Jacob |last2=Chang |first2=Ming-Wei |last3=Lee |first3=Kenton |last4=Toutanova |first4=Kristina |date=May 24, 2019 |title=BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |journal=Association for Computational Linguistics |arxiv=1810.04805}} which was a pre-trained transformer (PT) but not designed to be generative (BERT was an "encoder-only" model). Also in 2018, OpenAI published Improving Language Understanding by Generative Pre-Training, which introduced GPT-1, the first in its GPT series.{{cite web

|url = https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

|title = Improving Language Understanding by Generative Pre-Training

|last1 = Radford

|first1 = Alec

|last2 = Narasimhan

|first2 = Karthik

|last3 = Salimans

|first3 = Tim

|last4 = Sutskever

|first4 = Ilya

|page = 12

|publisher = OpenAI

|date = June 11, 2018

|access-date = January 23, 2021

|archive-date = January 26, 2021

|archive-url = https://web.archive.org/web/20210126024542/https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

|url-status = live

}}

Previously in 2017, some of the authors who would later work on GPT-1 worked on generative pre-training of language with LSTM, which resulted in a model that could represent text with vectors that could easily be fine-tuned for downstream applications.{{cite arXiv |last1=Radford |first1=Alec |title=Learning to Generate Reviews and Discovering Sentiment |date=April 6, 2017 |eprint=1704.01444 |last2=Jozefowicz |first2=Rafal |last3=Sutskever |first3=Ilya|class=cs.LG }}

Prior to transformer-based architectures, the best-performing neural NLP (natural language processing) models commonly employed supervised learning from large amounts of manually-labeled data. The reliance on supervised learning limited their use on datasets that were not well-annotated, and also made it prohibitively expensive and time-consuming to train extremely large language models.

The semi-supervised approach OpenAI employed to make a large-scale generative system{{mdash}}and was first to do with a transformer model{{mdash}}involved two stages: an unsupervised generative "pretraining" stage to set initial parameters using a language modeling objective, and a supervised discriminative "fine-tuning" stage to adapt these parameters to a target task.

= Later developments =

Regarding more recent GPT foundation models, OpenAI published its first versions of GPT-3 in July 2020. There were three models, with 1B, 6.7B, 175B parameters, respectively named babbage, curie, and davinci (giving initials B, C, and D).{{Citation needed|date=November 2023}}

In July 2021, OpenAI published Codex, a task-specific GPT model targeted for programming applications. This was developed by fine-tuning a 12B parameter version of GPT-3 (different from previous GPT-3 models) using code from GitHub.{{Cite journal |last1=Chen |first1=Mark |last2=Tworek |first2=Jerry |last3=Jun |first3=Heewoo |last4=Yuan |first4=Qiming |last5=Ponde de Oliveira Pinto |first5=Henrique |last6=Kaplan |first6=Jared |last7=Edwards |first7=Harri |last8=Burda |first8=Yuri |last9=Joseph |first9=Nicholas |last10=Brockman |first10=Greg |last11=Ray |first11=Alex |last12=Puri |first12=Raul |last13=Krueger |first13=Gretchen |last14=Petrov |first14=Michael |last15=Khlaaf |first15=Heidy |date=July 1, 2021 |title=Evaluating Large Language Models Trained on Code |journal=Association for Computational Linguistics |arxiv=2107.03374}}

In March 2022, OpenAI published two versions of GPT-3 that were fine-tuned for instruction-following (instruction-tuned), named davinci-instruct-beta (175B) and text-davinci-001,{{Cite journal |last1=Ouyang |first1=Long |last2=Wu |first2=Jeffrey |last3=Jiang |first3=Xu |last4=Almeida |first4=Diogo |last5=Wainwright |first5=Carroll |last6=Mishkin |first6=Pamela |last7=Zhang |first7=Chong |last8=Agarwal |first8=Sandhini |last9=Slama |first9=Katarina |last10=Ray |first10=Alex |last11=Schulman |first11=John |last12=Hilton |first12=Jacob |last13=Kelton |first13=Fraser |last14=Miller |first14=Luke |last15=Simens |first15=Maddie |date=December 6, 2022 |title=Training language models to follow instructions with human feedback |url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=35 |pages=27730–27744 |arxiv=2203.02155 |archive-date=June 28, 2023 |access-date=June 24, 2023 |archive-url=https://web.archive.org/web/20230628171849/https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html |url-status=live }} and then started beta testing code-davinci-002.{{Cite web |title=New GPT-3 capabilities: Edit & insert |url=https://openai.com/blog/gpt-3-edit-insert |access-date=June 24, 2023 |website=openai.com |language=en-US |archive-date=June 29, 2023 |archive-url=https://web.archive.org/web/20230629004341/https://openai.com/blog/gpt-3-edit-insert |url-status=live }} text-davinci-002 was instruction-tuned from code-davinci-002. Both text-davinci-003 and ChatGPT were released in November 2022, with both building upon text-davinci-002 via reinforcement learning from human feedback (RLHF). text-davinci-003 is trained for following instructions (like its predecessors), whereas ChatGPT is further trained for conversational interaction with a human user.{{cite journal |last1=Fu |first1=Yao |last2=Peng |first2=Hao |last3=Khot |first3=Tushar |year=2022 |title=How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources |url=https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Abilities-of-Language-Models-to-their-Sources-b9a57ac0fcf74f30a1ab9e3e36fa1dc1 |journal=Yao Fu's Notion |archive-date=April 19, 2023 |access-date=June 24, 2023 |archive-url=https://web.archive.org/web/20230419174208/https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Abilities-of-Language-Models-to-their-Sources-b9a57ac0fcf74f30a1ab9e3e36fa1dc1 |url-status=live }}{{Cite web |title=Model index for researchers |url=https://platform.openai.com/docs/model-index-for-researchers |url-status=live |archive-url=https://archive.today/20230623231655/https://platform.openai.com/docs/model-index-for-researchers |archive-date=June 23, 2023 |access-date=June 23, 2023 |website=OpenAI API |language=en}}

OpenAI's most recent GPT foundation model, GPT-4, was released on March 14, 2023. It can be accessed directly by users via a premium version of ChatGPT, and is available to developers for incorporation into other products and services via OpenAI's API. Other producers of GPT foundation models include EleutherAI (with a series of models starting in March 2021) and Cerebras (with seven models released in March 2023).

Foundation models

A foundation model is an AI model trained on broad data at scale such that it can be adapted to a wide range of downstream tasks.{{cite web|url=https://hai.stanford.edu/news/introducing-center-research-foundation-models-crfm|title=Introducing the Center for Research on Foundation Models (CRFM)|website=Stanford HAI|date=August 18, 2021|access-date=April 26, 2023|archive-date=June 4, 2023|archive-url=https://web.archive.org/web/20230604175717/https://hai.stanford.edu/news/introducing-center-research-foundation-models-crfm|url-status=live}}{{Cite web |date=October 18, 2021 |title=Reflections on Foundation Models |url=https://hai.stanford.edu/news/reflections-foundation-models |access-date=August 15, 2024 |website=hai.stanford.edu |language=en |archive-date=August 15, 2024 |archive-url=https://web.archive.org/web/20240815084336/https://hai.stanford.edu/news/reflections-foundation-models |url-status=live }}

Thus far, the most notable GPT foundation models have been from OpenAI's GPT-n series. The most recent from that is GPT-4, for which OpenAI declined to publish the size or training details (citing "the competitive landscape and the safety implications of large-scale models").

class="wikitable"

|+ OpenAI's GPT-n series

!Model

!Architecture

!Parameter count

!Training data

!Release date

!Training cost

GPT-1

|12-level, 12-headed Transformer decoder (no encoder), followed by linear-softmax.

|117 million

|BookCorpus:{{Cite conference |last1=Zhu |first1=Yukun |last2=Kiros |first2=Ryan |last3=Zemel |first3=Rich |last4=Salakhutdinov |first4=Ruslan |last5=Urtasun |first5=Raquel |last6=Torralba |first6=Antonio |last7=Fidler |first7=Sanja |date=2015 |title=Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books |url=https://www.cv-foundation.org/openaccess/content_iccv_2015/html/Zhu_Aligning_Books_and_ICCV_2015_paper.html |conference=IEEE International Conference on Computer Vision (ICCV) 2015 |pages=19–27 |arxiv=1506.06724 |access-date=February 7, 2023 |archive-date=February 5, 2023 |archive-url=https://web.archive.org/web/20230205222219/https://www.cv-foundation.org/openaccess/content_iccv_2015/html/Zhu_Aligning_Books_and_ICCV_2015_paper.html |url-status=live }} 4.5 GB of text, from 7,000 unpublished books of various genres.

|{{Date table sorting|2018|June|11}}{{cite web |date=June 11, 2018 |title=Improving language understanding with unsupervised learning |url=https://openai.com/research/language-unsupervised |url-status=live |archive-url=https://web.archive.org/web/20230318210736/https://openai.com/research/language-unsupervised |archive-date=March 18, 2023 |access-date=March 18, 2023 |website=openai.com |language=en-US}}

|30 days on 8 P600 graphics cards, or 1 petaFLOPS-day.

GPT-2

|GPT-1, but with modified normalization

|1.5 billion

|WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit.

|{{Date table sorting|2019|February|14}} (initial/limited version) and {{Date table sorting|2019|November|5}} (full version){{cite web|url=https://www.theverge.com/2019/11/7/20953040/openai-text-generation-ai-gpt-2-full-model-release-1-5b-parameters|title=OpenAI has published the text-generating AI it said was too dangerous to share|first=James|last=Vincent|date=November 7, 2019|website=The Verge|access-date=April 28, 2023|archive-date=June 11, 2020|archive-url=https://web.archive.org/web/20200611054114/https://www.theverge.com/2019/11/7/20953040/openai-text-generation-ai-gpt-2-full-model-release-1-5b-parameters|url-status=live}}

|"tens of petaflop/s-day", or 1.5e21 FLOPS.{{cite web |title=ML input trends visualization |url=https://epochai.org/mlinputs/visualization |access-date=May 2, 2023 |website=Epoch |language=en |archive-date=July 16, 2023 |archive-url=https://web.archive.org/web/20230716134652/https://epochai.org/mlinputs/visualization |url-status=live }}

GPT-3

|GPT-2, but with modification to allow larger scaling

|175 billion{{Cite web |last=Ver Meer |first=Dave |date=June 1, 2023 |title=ChatGPT Statistics |url=https://www.namepepper.com/chatgpt-users |access-date=June 9, 2023 |website=NamePepper |language=en |archive-date=June 5, 2023 |archive-url=https://web.archive.org/web/20230605230914/https://www.namepepper.com/chatgpt-users |url-status=live }}

| 499 billion tokens consisting of CommonCrawl (570 GB), WebText, English Wikipedia, and two books corpora (Books1 and Books2).

|{{Date table sorting|2020|May|28}}{{Cite journal|title=Language Models are Few-Shot Learners|first1=Tom B.|last1=Brown|first2=Benjamin|last2=Mann|first3=Nick|last3=Ryder|first4=Melanie|last4=Subbiah|first5=Jared|last5=Kaplan|first6=Prafulla|last6=Dhariwal|first7=Arvind|last7=Neelakantan|first8=Pranav|last8=Shyam|first9=Girish|last9=Sastry|first10=Amanda|last10=Askell|first11=Sandhini|last11=Agarwal|first12=Ariel|last12=Herbert-Voss|first13=Gretchen|last13=Krueger|first14=Tom|last14=Henighan|first15=Rewon|last15=Child|first16=Aditya|last16=Ramesh|first17=Daniel M.|last17=Ziegler|first18=Jeffrey|last18=Wu|first19=Clemens|last19=Winter|first20=Christopher|last20=Hesse|first21=Mark|last21=Chen|first22=Eric|last22=Sigler|first23=Mateusz|last23=Litwin|first24=Scott|last24=Gray|first25=Benjamin|last25=Chess|first26=Jack|last26=Clark|first27=Christopher|last27=Berner|first28=Sam|last28=McCandlish|first29=Alec|last29=Radford|first30=Ilya|last30=Sutskever|first31=Dario|last31=Amodei|date=May 28, 2020|journal=NeurIPS|arxiv=2005.14165v4}}

|3640 petaflop/s-day (Table D.1), or 3.1e23 FLOPS.

GPT-3.5

|Undisclosed

|175 billion

|Undisclosed

|March 15, 2022

|Undisclosed

GPT-4

|Also trained with both text prediction and RLHF; accepts both text and images as input. Further details are not public.{{cite web |last=OpenAI |date=2023 |title=GPT-4 Technical Report |url=https://cdn.openai.com/papers/gpt-4.pdf |access-date=March 16, 2023 |archive-date=March 14, 2023 |archive-url=https://web.archive.org/web/20230314190904/https://cdn.openai.com/papers/gpt-4.pdf |url-status=live }}

|Undisclosed. Estimated 1.7 trillion.{{cite news |title=GPT-4 has more than a trillion parameters – Report |date=March 25, 2023 |url=https://the-decoder.com/gpt-4-has-a-trillion-parameters/ |archive-date=March 4, 2024 |access-date=October 23, 2023 |archive-url=https://web.archive.org/web/20240304161007/https://the-decoder.com/gpt-4-has-a-trillion-parameters/ |url-status=live }}

|Undisclosed

|{{Date table sorting|2023|March|14}}

|Undisclosed. Estimated 2.1 × 10²⁵ FLOPS.

GPT-4o

|{{dunno}}

|{{dts|2024-5-13}}

|{{dunno}}

GPT-4.5

|{{dunno}}

|{{dts|2025-2-27}}

|{{dunno}}

GPT-4.1

|{{dunno}}

|{{dts|2025-4-14}}

|{{dunno}}

Other such models include Google's PaLM, a broad foundation model that has been compared to GPT-3 and has been made available to developers via an API,{{cite web|url=https://www.theverge.com/2023/3/14/23639313/google-ai-language-model-palm-api-challenge-openai|title=Google opens up its AI language model PaLM to challenge OpenAI and GPT-3|first=James|last=Vincent|date=March 14, 2023|website=The Verge|access-date=April 29, 2023|archive-date=March 14, 2023|archive-url=https://web.archive.org/web/20230314130256/https://www.theverge.com/2023/3/14/23639313/google-ai-language-model-palm-api-challenge-openai|url-status=live}}{{cite web|url=https://aibusiness.com/nlp/google-opens-access-to-palm-language-model|title=Google Opens Access to PaLM Language Model|access-date=April 29, 2023|archive-date=May 31, 2023|archive-url=https://web.archive.org/web/20230531193140/https://aibusiness.com/nlp/google-opens-access-to-palm-language-model|url-status=live}} and Together's GPT-JT, which has been reported as the closest-performing open-source alternative to GPT-3 (and is derived from earlier open-source GPTs).{{cite web|url=https://analyticsindiamag.com/meet-gpt-jt-the-closest-open-source-alternative-to-gpt-3/|title=Meet GPT-JT, the Closest Open Source Alternative to GPT-3|first=Aparna|last=Iyer|date=November 30, 2022|website=Analytics India Magazine|access-date=April 29, 2023|archive-date=June 2, 2023|archive-url=https://web.archive.org/web/20230602011925/https://analyticsindiamag.com/meet-gpt-jt-the-closest-open-source-alternative-to-gpt-3/|url-status=live}} Meta AI (formerly Facebook) also has a generative transformer-based foundational large language model, known as LLaMA.{{cite web|url=https://www.pcmag.com/news/meta-debuts-ai-language-model-but-its-only-for-researchers|title=Meta Debuts AI Language Model, But It's Only for Researchers|website=PCMAG|date=February 24, 2023|access-date=May 21, 2023|archive-date=July 19, 2023|archive-url=https://web.archive.org/web/20230719172539/https://www.pcmag.com/news/meta-debuts-ai-language-model-but-its-only-for-researchers|url-status=live}}

Foundational GPTs can also employ modalities other than text, for input and/or output. GPT-4 is a multi-modal LLM that is capable of processing text and image input (though its output is limited to text).{{cite web|url=https://www.marktechpost.com/2023/03/27/multimodal-language-models-the-future-of-artificial-intelligence-ai/|title=Multimodal Language Models: The Future of Artificial Intelligence (AI)|first=Arham|last=Islam|date=March 27, 2023|access-date=May 15, 2023|archive-date=May 15, 2023|archive-url=https://web.archive.org/web/20230515010932/https://www.marktechpost.com/2023/03/27/multimodal-language-models-the-future-of-artificial-intelligence-ai/|url-status=dead}} Regarding multimodal output, some generative transformer-based models are used for text-to-image technologies such as diffusion{{cite web|url=https://www.marktechpost.com/2022/11/14/how-do-dall%c2%b7e-2-stable-diffusion-and-midjourney-work/|title=How Do DALL·E 2, Stable Diffusion, and Midjourney Work?|first=Arham|last=Islam|date=November 14, 2022|access-date=May 21, 2023|archive-date=July 18, 2023|archive-url=https://web.archive.org/web/20230718183647/https://www.marktechpost.com/2022/11/14/how-do-dall%C2%B7e-2-stable-diffusion-and-midjourney-work/|url-status=live}} and parallel decoding.{{cite web|url=https://analyticsindiamag.com/google-launches-muse-a-new-text-to-image-transformer-model/|title=Google Launches Muse, A New Text-to-Image Transformer Model|first=Shritama|last=Saha|date=January 4, 2023|website=Analytics India Magazine|access-date=May 15, 2023|archive-date=May 15, 2023|archive-url=https://web.archive.org/web/20230515010939/https://analyticsindiamag.com/google-launches-muse-a-new-text-to-image-transformer-model/|url-status=live}} Such kinds of models can serve as visual foundation models (VFMs) for developing downstream systems that can work with images.{{Cite arXiv |last=Wu (et-al) |first=Chenfei |date=March 8, 2023 |title=Visual ChatGPT |class=cs.CV |eprint=2303.04671 }}

Task-specific models

A foundational GPT model can be further adapted to produce more targeted systems directed to specific tasks and/or subject-matter domains. Methods for such adaptation can include additional fine-tuning (beyond that done for the foundation model) as well as certain forms of prompt engineering.{{Cite arXiv |last=Bommasani (et-al) |first=Rishi |date=July 12, 2022 |title=On the Opportunities and Risks of Foundation Models |class=cs.LG |eprint=2108.07258 }}

An important example of this is fine-tuning models to follow instructions, which is of course a fairly broad task but more targeted than a foundation model. In January 2022, OpenAI introduced "InstructGPT"{{mdash}}a series of models which were fine-tuned to follow instructions using a combination of supervised training and reinforcement learning from human feedback (RLHF) on base GPT-3 language models. Advantages this had over the bare foundational models included higher accuracy, less negative/toxic sentiment, and generally better alignment with user needs. Hence, OpenAI began using this as the basis for its API service offerings.{{cite web|url=https://analyticsindiamag.com/openai-dumps-its-own-gpt-3-for-something-called-instructgpt-and-for-right-reason/|title=OpenAI dumps its own GPT-3 for something called InstructGPT, and for right reason|first=Meeta|last=Ramnani|date=January 28, 2022|website=Analytics India Magazine|access-date=April 29, 2023|archive-date=June 4, 2023|archive-url=https://web.archive.org/web/20230604103815/https://analyticsindiamag.com/openai-dumps-its-own-gpt-3-for-something-called-instructgpt-and-for-right-reason/|url-status=live}} Other instruction-tuned models have been released by others, including a fully open version.{{cite web|url=https://crfm.stanford.edu/2023/03/13/alpaca.html|title=Stanford CRFM|website=crfm.stanford.edu|access-date=May 15, 2023|archive-date=April 6, 2023|archive-url=https://web.archive.org/web/20230406082332/https://crfm.stanford.edu/2023/03/13/alpaca.html|url-status=live}}{{cite web|url=https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm|title=Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM|date=April 12, 2023|website=Databricks|access-date=May 15, 2023|archive-date=July 14, 2023|archive-url=https://web.archive.org/web/20230714134230/https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm|url-status=live}}

Another (related) kind of task-specific models are chatbots, which engage in human-like conversation. In November 2022, OpenAI launched ChatGPT{{mdash}}an online chat interface powered by an instruction-tuned language model trained in a similar fashion to InstructGPT. They trained this model using RLHF, with human AI trainers providing conversations in which they played both the user and the AI, and mixed this new dialogue dataset with the InstructGPT dataset for a conversational format suitable for a chatbot. Other major chatbots currently include Microsoft's Bing Chat, which uses OpenAI's GPT-4 (as part of a broader close collaboration between OpenAI and Microsoft),{{cite web|url=https://techcrunch.com/2023/05/04/microsoft-doubles-down-on-ai-with-new-bing-features/|title=Microsoft doubles down on AI with new Bing features|first=Kyle|last=Wiggers|date=May 4, 2023|access-date=May 4, 2023|archive-date=December 7, 2023|archive-url=https://web.archive.org/web/20231207194237/https://techcrunch.com/2023/05/04/microsoft-doubles-down-on-ai-with-new-bing-features/|url-status=live}} and Google's competing chatbot Gemini (initially based on their LaMDA family of conversation-trained language models, with plans to switch to PaLM).{{cite web|url=https://www.cnet.com/tech/services-and-software/chatgpt-vs-bing-vs-google-bard-which-ai-is-the-most-helpful/|title=ChatGPT vs. Bing vs. Google Bard: Which AI Is the Most Helpful?|website=CNET|access-date=April 30, 2023|archive-date=July 24, 2023|archive-url=https://web.archive.org/web/20230724222201/https://www.cnet.com/tech/services-and-software/chatgpt-vs-bing-vs-google-bard-which-ai-is-the-most-helpful/|url-status=live}}

Yet another kind of task that a GPT can be used for is the meta-task of generating its own instructions, like developing a series of prompts for 'itself' to be able to effectuate a more general goal given by a human user.{{cite web|url=https://mashable.com/article/autogpt-ai-agents-how-to-get-access|title=Auto-GPT, BabyAGI, and AgentGPT: How to use AI agents|date=April 19, 2023|website=Mashable|access-date=May 15, 2023|archive-date=July 22, 2023|archive-url=https://web.archive.org/web/20230722065813/https://mashable.com/article/autogpt-ai-agents-how-to-get-access|url-status=live}} This is known as an AI agent, and more specifically a recursive one because it uses results from its previous self-instructions to help it form its subsequent prompts; the first major example of this was Auto-GPT (which uses OpenAI's GPT models), and others have since been developed as well.{{cite web|url=https://www.forbes.com/sites/bernardmarr/2023/04/24/auto-gpt-may-be-the-strong-ai-tool-that-surpasses-chatgpt/|title=Auto-GPT May Be The Strong AI Tool That Surpasses ChatGPT|first=Bernard|last=Marr|website=Forbes|access-date=May 15, 2023|archive-date=May 21, 2023|archive-url=https://web.archive.org/web/20230521205727/https://www.forbes.com/sites/bernardmarr/2023/04/24/auto-gpt-may-be-the-strong-ai-tool-that-surpasses-chatgpt/|url-status=live}}

=Multimodality=

Generative transformer-based systems can also be targeted for tasks involving modalities beyond text. For example, Microsoft{{'s}} "Visual ChatGPT" combines ChatGPT with visual foundation models (VFMs) to enable input or output comprising images as well as text.{{cite web|url=https://www.infoq.com/news/2023/04/microsoft-visual-chatgpt/|title=Microsoft Open-Sources Multimodal Chatbot Visual ChatGPT|website=InfoQ|access-date=May 15, 2023|archive-date=June 3, 2023|archive-url=https://web.archive.org/web/20230603203250/https://www.infoq.com/news/2023/04/microsoft-visual-chatgpt/|url-status=live}} Also, advances in text-to-speech technology offer tools for audio content creation when used in conjunction with foundational GPT language models.{{cite web|url=https://arstechnica.com/information-technology/2023/01/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio/|title=Microsoft's new AI can simulate anyone's voice with 3 seconds of audio|first=Benj|last=Edwards|date=January 9, 2023|website=Ars Technica|access-date=May 15, 2023|archive-date=July 18, 2023|archive-url=https://web.archive.org/web/20230718184636/https://arstechnica.com/information-technology/2023/01/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio/|url-status=live}}

=Domain-specificity=

GPT systems can be directed toward particular fields or domains. Some reported examples of such models and apps are as follows:

EinsteinGPT – for sales and marketing domains, to aid with customer relationship management (uses GPT-3.5){{cite web|url=https://techmonitor.ai/technology/ai-and-automation/salesforce-einsteingpt-openai-chatgpt|title=Salesforce launches EinsteinGPT built with OpenAI technology|first=Ryan|last=Morrison|date=March 7, 2023|access-date=April 10, 2023|archive-date=April 15, 2023|archive-url=https://web.archive.org/web/20230415095633/https://techmonitor.ai/technology/ai-and-automation/salesforce-einsteingpt-openai-chatgpt|url-status=live}}{{cite journal|last1=Sharma|first1=Animesh K.|last2=Sharma|first2=Rahul|title=The role of generative pretrained transformers (GPTs) in revolutionising digital marketing: A conceptual model|journal=Journal of Cultural Marketing Strategy |volume=8|issue=1|pages=80–90|url=https://ideas.repec.org/s/aza/jcms00.html|year=2023|doi=10.69554/TLVQ2275 |url-access=subscription}}
BloombergGPT – for the financial domain, to aid with financial news and information (uses "freely available" AI methods, combined with their proprietary data){{cite web|url=https://www.cnbc.com/2023/04/13/bloomberg-plans-to-integrate-gpt-style-ai-into-its-terminal.html|title=Bloomberg plans to integrate GPT-style A.I. into its terminal|first=Kif|last=Leswing|date=April 13, 2023|website=CNBC|access-date=May 4, 2023|archive-date=May 19, 2023|archive-url=https://web.archive.org/web/20230519205206/https://www.cnbc.com/2023/04/13/bloomberg-plans-to-integrate-gpt-style-ai-into-its-terminal.html|url-status=live}}
Khanmigo – described as a GPT version for tutoring, in the education domain, it aids students using Khan Academy by guiding them through their studies without directly providing answers (powered by GPT-4){{Cite news |date=May 4, 2023 |title=Learning nonprofit Khan Academy is piloting a version of GPT called Khanmigo |url=https://www.fastcompany.com/90891522/the-learning-nonprofit-khan-academy-piloting-a-version-of-gpt-called-khanmigo |access-date=May 22, 2023 |website=Fast Company |last1=Melendez |first1=Steven |archive-date=May 11, 2023 |archive-url=https://web.archive.org/web/20230511132231/https://www.fastcompany.com/90891522/the-learning-nonprofit-khan-academy-piloting-a-version-of-gpt-called-khanmigo |url-status=live }}{{cite web|url=https://thejournal.com/articles/2023/03/14/khan-academy-pilots-gpt-4-powered-tool-khanmigo-for-teachers.aspx|title=Khan Academy Pilots GPT-4 Powered Tool Khanmigo for Teachers|website=THE Journal|access-date=May 7, 2023|archive-date=May 7, 2023|archive-url=https://web.archive.org/web/20230507124146/https://thejournal.com/articles/2023/03/14/khan-academy-pilots-gpt-4-powered-tool-khanmigo-for-teachers.aspx|url-status=live}}
SlackGPT – for the Slack instant-messaging service, to aid with navigating and summarizing discussions on it (uses OpenAI's API){{cite web |last=Hachman |first=Mark |date=May 4, 2023 |title=Slack GPT will bring AI chatbots to your conversations |url=https://www.pcworld.com/article/1807402/slack-gpt-will-bring-ai-chatbots-to-your-conversations.html |website=PCWorld |access-date=May 4, 2023 |archive-date=June 9, 2023 |archive-url=https://web.archive.org/web/20230609193414/https://www.pcworld.com/article/1807402/slack-gpt-will-bring-ai-chatbots-to-your-conversations.html |url-status=live }}
BioGPT – for the biomedical domain, to aid with biomedical literature text generation and mining (uses GPT-2){{Cite journal |last=Luo (et-al) |first=Renqian |date=April 3, 2023 |title=BioGPT: Generative pre-trained transformer for biomedical text generation and mining |journal=Briefings in Bioinformatics |volume=23 |issue=6 |doi=10.1093/bib/bbac409 |pmid=36156661 |arxiv=2210.10341 }}

Sometimes domain-specificity is accomplished via software plug-ins or add-ons. For example, several different companies have developed particular plugins that interact directly with OpenAI's ChatGPT interface,{{cite web|url=https://wire19.com/chatgpt-plugins/|title=Know about ChatGPT's 13 best plugins, designed to improve your overall user experience|website=Latest Digital Transformation Trends | Cloud News | Wire19|date=May 5, 2023|last1=John|first1=Amy Sarah|access-date=May 7, 2023|archive-date=May 9, 2023|archive-url=https://web.archive.org/web/20230509151243/https://wire19.com/chatgpt-plugins/|url-status=dead}}{{cite web|url=https://openai.com/blog/chatgpt-plugins|title=ChatGPT plugins|website=openai.com|date=March 13, 2024|access-date=May 7, 2023|archive-date=March 23, 2023|archive-url=https://web.archive.org/web/20230323213712/https://openai.com/blog/chatgpt-plugins|url-status=live}} and Google Workspace has available add-ons such as "GPT for Sheets and Docs"{{mdash}}which is reported to aid use of spreadsheet functionality in Google Sheets.{{cite web|url=https://www.makeuseof.com/how-use-chatgpt-google-sheets/|title=How to Use ChatGPT on Google Sheets With GPT for Sheets and Docs|date=March 12, 2023|website=MUO|access-date=May 7, 2023|archive-date=June 19, 2023|archive-url=https://web.archive.org/web/20230619164055/https://www.makeuseof.com/how-use-chatgpt-google-sheets/|url-status=live}}{{cite web|url=https://www.infoworld.com/article/3689175/embrace-and-extend-excel-for-ai-data-prep.html|title=Embrace and extend Excel for AI data prep|first=Matt|last=Asay|date=February 27, 2023|website=InfoWorld|access-date=May 7, 2023|archive-date=June 2, 2023|archive-url=https://web.archive.org/web/20230602121215/https://www.infoworld.com/article/3689175/embrace-and-extend-excel-for-ai-data-prep.html|url-status=live}}

In November 2023, OpenAI announced that ChatGPT Plus subscribers would be able to create custom versions of ChatGPT (being called GPTs).{{cite web | url=https://www.techopedia.com/definition/openai-gpts | title=OpenAI GPTS | date=November 10, 2023 | access-date=December 29, 2023 | archive-date=December 29, 2023 | archive-url=https://web.archive.org/web/20231229111454/https://www.techopedia.com/definition/openai-gpts | url-status=live }} These can be tailored for specific domains via prompt engineering, curated datasets, and/or targeted interaction with external tools. Users who register as verified builders are able to publish their custom GPTs for other users, with monetization potential. (This is notably distinct from OpenAI's API service, as this is based internally within OpenAI's platform.)

Brand issues

OpenAI, which created the first generative pre-trained transformer (GPT) in 2018, asserted in 2023 that "GPT" should be regarded as a brand of OpenAI.{{cite web |last=Hicks |first=William |date=May 10, 2023 |title=ChatGPT creator OpenAI is asking startups to remove 'GPT' from their names |url=https://www.bizjournals.com/sanfrancisco/inno/stories/news/2023/05/10/openai-startups-gpt.html |access-date=May 21, 2023 |website=The Business Journal |archive-date=June 28, 2023 |archive-url=https://web.archive.org/web/20230628214533/https://www.bizjournals.com/sanfrancisco/inno/stories/news/2023/05/10/openai-startups-gpt.html |url-status=live }} In April 2023, OpenAI revised the brand guidelines in its terms of service to indicate that other businesses using its API to run their artificial intelligence (AI) services would no longer be able to include "GPT" in such names or branding.{{cite web |last=OpenAI |date=April 24, 2023 |title=Brand Guidelines |url=https://openai.com/brand |access-date=May 21, 2023 |archive-date=July 18, 2023 |archive-url=https://web.archive.org/web/20230718140318/https://openai.com/brand |url-status=live }} In May 2023, OpenAI engaged a brand management service to notify its API customers of this policy, although these notifications stopped short of making overt legal claims (such as allegations of trademark infringement or demands to cease and desist). As of November 2023, OpenAI still prohibits its API licensees from naming their own products with "GPT",{{Cite web|title= Brand guidelines|url= https://openai.com/brand#models|access-date= November 28, 2023|archive-date= July 18, 2023|archive-url= https://web.archive.org/web/20230718140318/https://openai.com/brand#models|url-status= live}} but it has begun enabling its ChatGPT Plus subscribers to make "custom versions of ChatGPT" that are being called GPTs on the OpenAI site.{{Cite web|title= Introducing GPTS|date= March 13, 2024|url= https://openai.com/blog/introducing-gpts|access-date= November 28, 2023|archive-date= March 20, 2024|archive-url= https://web.archive.org/web/20240320152321/https://openai.com/blog/introducing-gpts|url-status= live}} OpenAI's terms of service says that its subscribers may use "GPT" in the names of these, although it's "discouraged".

Relatedly, OpenAI has applied to the United States Patent and Trademark Office (USPTO) to seek domestic trademark registration for the term "GPT" in the field of AI. OpenAI sought to expedite handling of its application, but the USPTO declined that request in April 2023.{{Cite news |last=Heah |first=Alexa |date=April 26, 2023 |title=OpenAI Unsuccessful At Speeding Up Its Attempt To Trademark 'GPT' |work=DesignTAXI |url=https://designtaxi.com/news/423211/OpenAI-Unsuccessful-At-Speeding-Up-Its-Attempt-To-Trademark-GPT/ |access-date=May 21, 2023 |archive-date=April 26, 2023 |archive-url=https://web.archive.org/web/20230426090310/https://designtaxi.com/news/423211/OpenAI-Unsuccessful-At-Speeding-Up-Its-Attempt-To-Trademark-GPT/ |url-status=live }} In May 2023, the USPTO responded to the application with a determination that "GPT" was both descriptive and generic.{{Cite web |date=May 25, 2023 |title=NONFINAL OFFICE ACTION |url=https://tsdr.uspto.gov/documentviewer?caseId=sn97733259&docId=NFIN20230525093517#docIndex=4&page=1 |website=USPTO |access-date=December 30, 2023 |archive-date=December 3, 2023 |archive-url=https://web.archive.org/web/20231203101937/https://tsdr.uspto.gov/documentviewer?caseId=sn97733259&docId=NFIN20230525093517#docIndex=4&page=1 |url-status=live }} As of November 2023, OpenAI continues to pursue its argument through the available processes. Regardless, failure to obtain a registered U.S. trademark does not preclude some level of common-law trademark rights in the U.S.,{{Cite web|title= U.S. Trademark Law|date= December 2015|url= https://digital.gov/resources/u-s-trademark-law/|access-date= November 29, 2023|archive-date= January 17, 2024|archive-url= https://web.archive.org/web/20240117165722/https://digital.gov/resources/u-s-trademark-law/|url-status= live}} and/or trademark rights in other countries.{{Cite web|title= International Trademark Rights|url= https://www.inta.org/fact-sheets/international-trademark-rights/|access-date= November 29, 2023|archive-date= March 11, 2024|archive-url= https://web.archive.org/web/20240311090204/https://www.inta.org/fact-sheets/international-trademark-rights/|url-status= live}}

For any given type or scope of trademark protection in the U.S., OpenAI would need to establish that the term is actually "distinctive" to their specific offerings in addition to being a broader technical term for the kind of technology. Some media reports suggested that OpenAI may be able to obtain trademark registration based indirectly on the fame of its GPT-based chatbot product, ChatGPT,{{cite web |date=April 25, 2023 |url=https://www.techtimes.com/articles/290766/20230425/openai-trademark-gpt-chatgpt-rise-ai-chatbots.htm |title=OpenAI Wants to Trademark 'GPT' Amid Rise of AI Chatbots |publisher=Tech Times |accessdate=May 21, 2023 |archive-date=April 25, 2023 |archive-url=https://web.archive.org/web/20230425151024/https://www.techtimes.com/articles/290766/20230425/openai-trademark-gpt-chatgpt-rise-ai-chatbots.htm |url-status=live }} for which OpenAI has separately sought protection (and which it has sought to enforce more strongly).{{cite web |last=Louise |first=Nickie |date=April 3, 2023 |title=OpenAI files a UDRP case against the current owner of ChatGPT.com |url=https://techstartups.com/2023/04/03/openai-files-a-udrp-case-against-the-current-owner-of-chatgpt-com/ |access-date=May 21, 2023 |language=en-US |archive-date=June 5, 2023 |archive-url=https://web.archive.org/web/20230605031229/https://techstartups.com/2023/04/03/openai-files-a-udrp-case-against-the-current-owner-of-chatgpt-com/ |url-status=live }} Other reports have indicated that registration for the bare term "GPT" seems unlikely to be granted,{{cite web |last=Demcak |first=Tramatm-Igor |date=April 26, 2023 |title=OpenAI's Battle for Brand Protection: Can GPT be trademarked? |url=https://www.lexology.com/library/detail.aspx?g=763049f7-7ef8-4a68-bdb1-2e4fa194b7ad |archive-url=https://web.archive.org/web/20230505162827/https://www.lexology.com/library/detail.aspx?g=763049f7-7ef8-4a68-bdb1-2e4fa194b7ad |archive-date=May 5, 2023 |access-date=May 22, 2023 |website=Lexology |language=en}} as it is used frequently as a common term to refer simply to AI systems that involve generative pre-trained transformers.{{cite web |last=Lawton |first=George |date=April 20, 2023 |title=ChatGPT vs. GPT: How are they different? {{!}} TechTarget |url=https://www.techtarget.com/searchenterpriseai/feature/ChatGPT-vs-GPT-How-are-they-different |archive-url=https://web.archive.org/web/20230509150052/https://www.techtarget.com/searchenterpriseai/feature/ChatGPT-vs-GPT-How-are-they-different |archive-date=May 9, 2023 |access-date=May 21, 2023 |website=Enterprise AI |language=en}}{{cite web |last=Robb |first=Drew |date=April 12, 2023 |title=GPT-4 vs. ChatGPT: AI Chatbot Comparison |url=https://www.eweek.com/artificial-intelligence/gpt-4-vs-chatgpt/ |access-date=May 21, 2023 |website=eWEEK |language=en-US |archive-date=July 27, 2023 |archive-url=https://web.archive.org/web/20230727102701/https://www.eweek.com/artificial-intelligence/gpt-4-vs-chatgpt/ |url-status=live }}{{Cite news |last=Russo |first=Philip |date=August 22, 2023 |title=The Genesis of Generative AI for Everything Everywhere All at Once in CRE |work=Commercial Observer |url=https://commercialobserver.com/2023/08/jll-ai-gpt-proptech/ |url-status=live |archive-url=https://web.archive.org/web/20230824103201/https://commercialobserver.com/2023/08/jll-ai-gpt-proptech/ |archive-date=August 24, 2023}} In any event, to whatever extent exclusive rights in the term may occur the U.S., others would need to avoid using it for similar products or services in ways likely to cause confusion.{{Cite web|title= Trademark infringement|url= https://www.law.cornell.edu/wex/trademark_infringement|access-date= November 29, 2023|archive-date= November 30, 2023|archive-url= https://web.archive.org/web/20231130025605/https://www.law.cornell.edu/wex/trademark_infringement|url-status= live}} If such rights ever became broad enough to implicate other well-established uses in the field, the trademark doctrine of descriptive fair use could still continue non-brand-related usage.{{cite web |last=Rheintgen |first=Husch Blackwell LLP-Kathleen A. |date=August 16, 2013 |title=Branding 101: trademark descriptive fair use |url=https://www.lexology.com/library/detail.aspx?g=4f7fc6dd-1d5f-41a1-beac-2638750faa75 |access-date=May 21, 2023 |website=Lexology |language=en |archive-date=May 21, 2023 |archive-url=https://web.archive.org/web/20230521234617/https://www.lexology.com/library/detail.aspx?g=4f7fc6dd-1d5f-41a1-beac-2638750faa75 |url-status=live }}

Selected bibliography

This section lists the main official publications from OpenAI and Microsoft on their GPT models.

GPT-1: report, GitHub release.{{Citation |title=finetune-transformer-lm |date=June 11, 2018 |url=https://github.com/openai/finetune-transformer-lm |access-date=May 1, 2023 |publisher=OpenAI |archive-date=May 19, 2023 |archive-url=https://web.archive.org/web/20230519062127/https://github.com/openai/finetune-transformer-lm |url-status=live }}
GPT-2: blog announcement,{{cite web |title=GPT-2: 1.5B release |url=https://openai.com/research/gpt-2-1-5b-release |access-date=May 1, 2023 |website=openai.com |language=en-US |archive-date=March 31, 2023 |archive-url=https://web.archive.org/web/20230331004642/https://openai.com/research/gpt-2-1-5b-release |url-status=live }} report on its decision of "staged release",{{Cite arXiv |eprint=1908.09203 |class=cs.CL |first1=Irene |last1=Solaiman |first2=Miles |last2=Brundage |author-link=Irene Solaiman |title=Release Strategies and the Social Impacts of Language Models |date=November 12, 2019 |last3=Clark |first3=Jack |last4=Askell |first4=Amanda |last5=Herbert-Voss |first5=Ariel |last6=Wu |first6=Jeff |last7=Radford |first7=Alec |last8=Krueger |first8=Gretchen |last9=Kim |first9=Jong Wook |last10=Kreps |first10=Sarah |last11=McCain |first11=Miles |last12=Newhouse |first12=Alex |last13=Blazakis |first13=Jason |last14=McGuffie |first14=Kris |last15=Wang |first15=Jasmine}} GitHub release.{{Citation |title=gpt-2 |date=May 1, 2023 |url=https://github.com/openai/gpt-2 |access-date=May 1, 2023 |publisher=OpenAI |archive-date=March 11, 2023 |archive-url=https://web.archive.org/web/20230311154936/https://github.com/openai/gpt-2 |url-status=live }}
GPT-3: report. No GitHub or any other form of code release thenceforth.
WebGPT: blog announcement,{{Cite web |title=WebGPT: Improving the factual accuracy of language models through web browsing |url=https://openai.com/research/webgpt |archive-url=https://web.archive.org/web/20230621182942/https://openai.com/research/webgpt |archive-date=June 21, 2023 |access-date=July 2, 2023 |website=openai.com |language=en-US}} report,{{Cite journal |last1=Nakano |first1=Reiichiro |last2=Hilton |first2=Jacob |last3=Balaji |first3=Suchir |author-link3=Suchir Balaji |last4=Wu |first4=Jeff |last5=Ouyang |first5=Long |last6=Kim |first6=Christina |last7=Hesse |first7=Christopher |last8=Jain |first8=Shantanu |last9=Kosaraju |first9=Vineet |last10=Saunders |first10=William |last11=Jiang |first11=Xu |last12=Cobbe |first12=Karl |last13=Eloundou |first13=Tyna |last14=Krueger |first14=Gretchen |last15=Button |first15=Kevin |date=December 1, 2021 |title=WebGPT: Browser-assisted question-answering with human feedback |url=https://ui.adsabs.harvard.edu/abs/2021arXiv211209332N |journal=CoRR |arxiv=2112.09332 |archive-date=July 2, 2023 |access-date=July 2, 2023 |archive-url=https://web.archive.org/web/20230702191323/https://ui.adsabs.harvard.edu/abs/2021arXiv211209332N |url-status=live }}
InstructGPT: blog announcement, report.
ChatGPT: blog announcement (no report).
GPT-4: blog announcement,{{cite web |title=GPT-4 |url=https://openai.com/research/gpt-4 |access-date=May 1, 2023 |website=openai.com |language=en-US |archive-date=March 14, 2023 |archive-url=https://web.archive.org/web/20230314174531/https://openai.com/research/gpt-4 |url-status=live }} reports,{{Cite arXiv |last=OpenAI |date=March 27, 2023 |title=GPT-4 Technical Report |class=cs.CL |eprint=2303.08774 }}{{Cite arXiv |last1=Bubeck |first1=Sébastien |last2=Chandrasekaran |first2=Varun |last3=Eldan |first3=Ronen |last4=Gehrke |first4=Johannes |last5=Horvitz |first5=Eric |last6=Kamar |first6=Ece |last7=Lee |first7=Peter |last8=Lee |first8=Yin Tat |last9=Li |first9=Yuanzhi |last10=Lundberg |first10=Scott |last11=Nori |first11=Harsha |last12=Palangi |first12=Hamid |last13=Ribeiro |first13=Marco Tulio |last14=Zhang |first14=Yi |date=April 13, 2023 |title=Sparks of Artificial General Intelligence: Early experiments with GPT-4 |class=cs.CL |eprint=2303.12712 }} model card.[https://cdn.openai.com/papers/gpt-4-system-card.pdf GPT-4 System Card] {{Webarchive|url=https://web.archive.org/web/20230407201347/https://cdn.openai.com/papers/gpt-4-system-card.pdf |date=April 7, 2023 }}, OpenAI, March 23, 2023 (Accessed May 22, 2023).
GPT-4o: blog announcement.{{Cite web |date=May 13, 2024 |title=Hello GPT-4o |url=https://openai.com/index/hello-gpt-4o/ |website=OpenAI |access-date=August 8, 2024 |archive-date=May 14, 2024 |archive-url=https://web.archive.org/web/20240514024319/https://openai.com/index/hello-gpt-4o/ |url-status=live }}
GPT-4.5: blog announcement.{{Cite web|date=February 27, 2025|title=Introducing GPT-4.5|url=https://openai.com/index/introducing-gpt-4-5|website=OpenAI|access-date=March 18, 2025|archive-date=March 19, 2025|archive-url=https://web.archive.org/web/20250319141909/https://openai.com/index/introducing-gpt-4-5/|url-status=live}}

References

{{reflist|refs=

{{cite web |title=Aligning language models to follow instructions |url=https://openai.com/research/instruction-following |website=openai.com |access-date=March 23, 2023 |archive-date=March 23, 2023 |archive-url=https://web.archive.org/web/20230323110040/https://openai.com/research/instruction-following |url-status=live }}

{{cite journal |last1=Ouyang |first1=Long |last2=Wu |first2=Jeff |last3=Jiang |first3=Xu |last4=Almeida |first4=Diogo |last5=Wainwright |first5=Carroll L. |last6=Mishkin |first6=Pamela |last7=Zhang |first7=Chong |last8=Agarwal |first8=Sandhini |last9=Slama |first9=Katarina |last10=Ray |first10=Alex |last11=Schulman |first11=John |last12=Hilton |first12=Jacob |last13=Kelton |first13=Fraser |last14=Miller |first14=Luke |last15=Simens |first15=Maddie |last16=Askell |first16=Amanda |last17=Welinder |first17=Peter |last18=Christiano |first18=Paul |last19=Leike |first19=Jan |last20=Lowe |first20=Ryan |title=Training language models to follow instructions with human feedback |journal=NeurIPS |date=November 4, 2022 |arxiv=2203.02155 |display-authors=3 }}

{{cite web |title=Introducing ChatGPT |url=https://openai.com/blog/chatgpt |access-date=March 16, 2023 |website=openai.com |language=en-US |archive-date=March 16, 2023 |archive-url=https://web.archive.org/web/20230316001700/https://openai.com/blog/chatgpt/ |url-status=live }}

}}

{{Subject bar|portal1=Computer programming|portal2=Technology|d=y}}

Category:Large language models

Category:Generative artificial intelligence

Category:Artificial neural networks

Category:OpenAI

Generative pre-trained transformer

History

= Initial developments =

= Later developments =

Foundation models

Task-specific models

=Multimodality=

=Domain-specificity=

Brand issues

Selected bibliography

See also

References