OpenAI o1
{{Short description|2024 LLM with enhanced reasoning}}
{{Use mdy dates|date=September 2024}}
{{Infobox software
| name = o1
| logo =
| logo caption =
| screenshot =
| caption =
| author =
| developer = OpenAI
| latest release version =
| latest preview date = {{Start date and age|2024|09|12}}
| released = {{start date and age|2024|12|5}}
| repo =
| programming language =
| operating system =
| genre = {{indented plainlist|
}}
| replaces =
| replaced_by = OpenAI o3
| license = Proprietary
| website = {{official|https://openai.com/o1/}}
}}
OpenAI o1 is a reflective generative pre-trained transformer (GPT). A preview of o1 was released by OpenAI on September 12, 2024. o1 spends time "thinking" before it answers, making it better at complex reasoning tasks, science and programming than GPT-4o.{{Cite web |url=https://www.nytimes.com/2024/09/12/technology/openai-chatgpt-math.html |title=OpenAI Unveils New ChatGPT That Can Reason Through Math and Science |date=September 12, 2024 |last=Metz |first=Cade |work=The New York Times |access-date=September 12, 2024}} The full version was released to ChatGPT users on December 5, 2024.{{cite web |title=Introducing OpenAI o1 |url=https://openai.com/o1/ |website=OpenAI |access-date=6 December 2024}}
History
=Background=
According to leaked information, o1 was formerly known within OpenAI as "Q*", and later as "Strawberry". The codename "Q*" first surfaced in November 2023, around the time of Sam Altman's ousting and subsequent reinstatement, with rumors suggesting that this experimental model had shown promising results on mathematical benchmarks.{{Cite news |date=November 23, 2023 |title=OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say |url=https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/ |work=Reuters}} In July 2024, Reuters reported that OpenAI was developing a generative pre-trained transformer known as "Strawberry",{{Cite web |url=https://www.reuters.com/technology/artificial-intelligence/openai-working-new-reasoning-technology-under-code-name-strawberry-2024-07-12/ |title=Exclusive: OpenAI working on new reasoning technology under code name 'Strawberry' |date=July 15, 2024 |last1=Tong |first1=Anna |last2=Paul |first2=Katie |publisher=Reuters |access-date=September 12, 2024}} which later became o1.
=Release=
"o1-preview" and "o1-mini" were released on September 12, 2024, for ChatGPT Plus and Team users. GitHub started testing the integration of o1-preview in its Copilot service the same day.{{Cite web |url=https://www.theverge.com/2024/9/12/24243143/github-has-started-testing-openais-o1-preview-in-github-copilot |title=GitHub has started testing OpenAI's o1-preview in GitHub Copilot. |date=September 12, 2024 |last=Peters |first=Jay |work=The Verge |access-date=September 12, 2024}} On December 5, 2024, the full version of o1 was released.{{cite news |last=Robison |first=Kylie |date=December 5, 2024 |title=OpenAI is charging $200 a month for an exclusive version of its o1 'reasoning' model |url=https://www.theverge.com/2024/12/5/24314147/openai-reasoning-model-o1-strawberry-chatgpt-pro-new-tier |access-date=December 5, 2024 |work=The Verge}} On the same day, a subscription called ChatGPT Pro was released, featuring access to a pro version of o1 that uses more compute to provide better answers. In January 2025, o1 was integrated into Microsoft Copilot.{{Cite news |last=Claburn |first=Thomas |date=31 January 2025 |title=You begged Microsoft to be reasonable. Instead it made Copilot reason-able with OpenAI GPT-o1 |url=https://www.theregister.com/2025/01/31/microsoft_open_ai_reasoning_copilot/ |work=The Register}}
o1-preview's API is several times more expensive than GPT-4o.{{Cite web |last=Robison |first=Kylie |date=September 12, 2024 |title=OpenAI releases o1, its first model with 'reasoning' abilities |url=https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt |access-date=September 15, 2024 |website=The Verge |language=en}} As of January 2025, API usage for the full o1 model is limited to developers on usage tier 5.{{Cite web |title=OpenAI o1 and new tools for developers |url=https://openai.com/index/o1-and-new-tools-for-developers/ |access-date=2025-01-26 |website=openai.com |language=en-US}}
OpenAI noted that o1 is the first of a series of "reasoning" models. OpenAI shared in December 2024 benchmark results for its successor, o3 (the name o2 was skipped to avoid trademark conflict with the mobile carrier brand named O2).{{Cite web |date=2024-12-20 |title=OpenAI confirms new frontier models o3 and o3-mini |url=https://venturebeat.com/ai/openai-confirms-new-frontier-models-o3-and-o3-mini/ |access-date=2025-01-26 |website=VentureBeat |language=en-US}}
In March 2025, OpenAI released the o1-pro API, its most expensive AI model to date. The pricing is set at $150 per 1 million input tokens and $600 per 1 million output tokens.{{Cite web |last=Wiggers |first=Kyle |date=2025-03-19 |title=OpenAI's o1-pro is the company's most expensive AI model yet |url=https://techcrunch.com/2025/03/19/openais-o1-pro-is-its-most-expensive-model-yet/ |access-date=2025-03-21 |website=TechCrunch |language=en-US}}
Capabilities
According to OpenAI, o1 has been trained using a new optimization algorithm and a dataset specifically tailored to it; while also meshing in reinforcement learning into its training. OpenAI described o1 as a complement to GPT-4o rather than a successor.{{Cite web |date=2024-09-12 |title=New reasoning models: OpenAI o1-preview and o1-mini |url=https://community.openai.com/t/new-reasoning-models-openai-o1-preview-and-o1-mini/938081 |access-date=2024-10-17 |website=OpenAI Developer Forum |language=en}}
o1 spends additional time thinking (generating a chain of thought) before generating an answer, which makes it better for complex reasoning tasks, particularly in science and mathematics. Compared to previous models, o1 has been trained to generate long "chains of thought" before returning a final answer.{{Cite web |title=Learning to Reason with LLMs |url=https://openai.com/index/learning-to-reason-with-llms/ |archive-url=https://web.archive.org/web/20240912185410/https://openai.com/index/learning-to-reason-with-llms/ |archive-date=September 12, 2024 |access-date=September 13, 2024 |website=OpenAI}}{{Cite web |last=Kahn |first=Jeremy |title=Here are 9 things you need to know about OpenAI's o1 model |url=https://fortune.com/2024/09/13/openai-o1-strawberry-model-9-things-you-need-know/ |access-date=September 15, 2024 |website=Fortune |language=en}} According to Mira Murati, this ability to think before responding represents a new, additional paradigm, which is improving model outputs by spending more computing power when generating the answer, whereas the model scaling paradigm improves outputs by increasing the model size, training data and training compute power.{{Cite magazine |last=Knight |first=Will |title=OpenAI Announces a New AI Model, Code-Named Strawberry, That Solves Difficult Problems Step by Step |url=https://www.wired.com/story/openai-o1-strawberry-problem-reasoning/ |access-date=September 15, 2024 |magazine=Wired |language=en-US |issn=1059-1028}} OpenAI's test results suggest a correlation between accuracy and the logarithm of the amount of compute spent thinking before answering.
o1-preview performed approximately at a PhD level on benchmark tests related to physics, chemistry, and biology. On the American Invitational Mathematics Examination, it solved 83% (12.5/15) of the problems, compared to 13% (1.8/15) for GPT-4o. It also ranked in the 89th percentile in Codeforces coding competitions.{{Cite web |last=Franzen |first=Carl |date=September 12, 2024 |title=Forget GPT-5! OpenAI launches new AI model family o1 claiming PhD-level performance |url=https://venturebeat.com/ai/forget-gpt-5-openai-launches-new-ai-model-family-o1-claiming-phd-level-performance/ |access-date=September 15, 2024 |website=VentureBeat |language=en-US}} o1-mini is faster and 80% cheaper than o1-preview. It is particularly suitable for programming and STEM-related tasks, but does not have the same "broad world knowledge" as o1-preview.{{Cite web |date=September 12, 2024 |title=OpenAI o1-mini |url=https://openai.com/index/openai-o1-mini-advancing-cost-efficient-reasoning/ |website=OpenAI}}
OpenAI noted that o1's reasoning capabilities make it better at adhering to safety rules provided in the prompt's context window. OpenAI reported that during a test, one instance of o1-preview exploited a misconfiguration to succeed at a task that should have been infeasible due to a bug.{{Cite web |last=Coombes |first=Lloyd |date=September 13, 2024 |title=OpenAI's new ChatGPT o1 model 'cheated' on an impossible test — here's what happened |url=https://www.tomsguide.com/ai/chatgpt/openais-new-chatgpt-o1-model-cheated-on-an-impossible-test-heres-what-happened |access-date=September 15, 2024 |website=Tom's Guide |language=en}}{{Cite web |date=September 12, 2024 |title=OpenAI o1 System Card |url=https://cdn.openai.com/o1-system-card.pdf |website=OpenAI |pages=16–17}} OpenAI also granted early access to the UK and US AI Safety Institutes for research, evaluation, and testing. According to OpenAI's assessments, o1-preview and o1-mini crossed into "medium risk" in CBRN (biological, chemical, radiological, and nuclear) weapons. Dan Hendrycks wrote that "The model already outperforms PhD scientists most of the time on answering questions related to bioweapons." He suggested that these concerning capabilities will continue to increase.{{Cite web |last=Boran |first=Marie |date=September 13, 2024 |title=OpenAI o1 model warning issued by scientist: "Particularly dangerous" |url=https://www.newsweek.com/openai-advanced-gpt-model-potential-risks-need-regulation-experts-1953311 |access-date=September 15, 2024 |website=Newsweek |language=en}}
Limitations
o1 usually requires more computing time and power than other GPT models by OpenAI, because it generates long chains of thought before making the final response.
According to OpenAI, o1 may "fake alignment", that is, generate a response that is contrary to accuracy and its own chain of thought, in about 0.38% of cases.{{cite news |last1=Robison |first1=Kylie |title=OpenAI's new model is better at reasoning and, occasionally, deceiving |url=https://www.theverge.com/2024/9/17/24243884/openai-o1-model-research-safety-alignment |work=The Verge |date=17 September 2024 |language=en}}
OpenAI forbids users from trying to reveal o1's chain of thought, which is hidden by design and not trained to comply with the company's policies. Prompts are monitored, and users who intentionally or accidentally violate this may lose their access to o1. OpenAI cites AI safety and competitive advantage as reasons for the restriction, which has been described as a loss of transparency by developers who work with large language models (LLMs).{{cite news |last1=Edwards |first1=Benj |title=Ban warnings fly as users dare to probe the "thoughts" of OpenAI's latest model |url=https://arstechnica.com/information-technology/2024/09/openai-threatens-bans-for-probing-new-ai-models-reasoning-process/ |work=Ars Technica |date=16 September 2024 |language=en-us}}
In October 2024, researchers at Apple submitted a preprint reporting that LLMs such as o1 may be replicating reasoning steps from the models' own training data.{{cite arXiv |last1=Mirzadeh |first1=Iman |last2=Alizadeh |first2=Keivan |last3=Shahrokhi |first3=Hooman |last4=Tuzel |first4=Oncel |last5=Bengio |first5=Samy |last6=Farajtabar |first6=Mehrdad |title=GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models |date=2024 |class=cs.LG |eprint=2410.05229 }} By changing the numbers and names used in a math problem or simply running the same problem again, LLMs would perform somewhat worse than their best benchmark results. Adding extraneous but logically inconsequential information to the problems caused a much greater drop in performance, from −17.5% for o1-preview and −29.1% for o1-mini, to −65.7% for the worst model tested.{{cite web |last1=Orland |first1=Kyle |title=Apple study exposes deep cracks in LLMs' "reasoning" capabilities |url=https://arstechnica.com/ai/2024/10/llms-cant-perform-genuine-logical-reasoning-apple-researchers-suggest/ |website=Ars Technica |access-date=15 October 2024 |date=14 October 2024}}
References
{{Reflist}}
External links
- {{official|https://openai.com/o1/}}
{{OpenAI}}
{{Artificial intelligence navbox}}
{{Generative AI}}
Category:Generative pre-trained transformers