LMArena

{{Short description|Website comparing AI chatbots based on votes}}

{{Infobox website

| logo = Chatbot Arena logo.png

| logo_size = 150px

| name = Chatbot Arena

| url = {{URL|https://lmarena.ai}}

| type = Chatbot, artificial intelligence

| commercial = No

| registration = None

| country_of_origin = United States

| owner = LMSYS Org

| founder = {{Unbulleted list|Wei-Lin Chiang|Anastasios Angelopoulos}}

| launch_date = {{Start date and age|2023|5|3}}

}}

LMArena (formerly Chatbot Arena) is a public, web-based platform that evaluates large language models (LLMs) through anonymous, crowd-sourced pairwise comparisons. Users enter prompts for two anonymous models to respond to and vote on the model that gave the better response, in which the model's identities are revealed. Users can also choose models to test themselves.{{Cite web |last=Hart |first=Robert |date=July 18, 2024 |title=What AI Is The Best? Chatbot Arena Relies On Millions Of Human Votes |url=https://www.forbes.com/sites/roberthart/2024/07/18/what-ai-is-the-best-chatbot-arena-relies-on-millions-of-human-votes/ |access-date=April 21, 2025 |website=Forbes}}{{Cite web |last=Kruppa |first=Miles |date=December 5, 2024 |title=The UC Berkeley Project That Is the AI Industry's Obsession |url=https://www.wsj.com/tech/ai/the-uc-berkeley-project-that-is-the-ai-industrys-obsession-bc68b3e3 |access-date=April 21, 2025 |website=The Wall Street Journal}}

Chatbot Arena is popular within the artificial intelligence industry, with major companies supplying their large language models, such as GPT-4o, o1, Gemini,{{Cite web |last=Nuñez |first=Michael |date=November 15, 2024 |title=Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don't tell the whole story |url=https://venturebeat.com/ai/google-gemini-unexpectedly-surges-to-no-1-over-openai-but-benchmarks-dont-tell-the-whole-story/ |access-date=April 21, 2025 |website=VentureBeat}} and Claude,{{Cite web |last=Edwards |first=Benj |date=March 27, 2024 |title="The king is dead"—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time |url=https://arstechnica.com/information-technology/2024/03/the-king-is-dead-claude-3-surpasses-gpt-4-on-chatbot-arena-for-the-first-time/ |access-date=April 21, 2025 |website=Ars Technica}} and using their subsequent rankings to promote them. Notably, Chinese company DeepSeek tested its prototype models in the Chatbot Arena months before its R1 model gained attention in Western media.{{Cite web |last=Metz |first=Rachel |date=February 18, 2025 |title=Before DeepSeek Blew Up, Chatbot Arena Announced Its Arrival |url=https://www.bloomberg.com/news/articles/2025-02-18/before-deepseek-blew-up-one-website-announced-its-arrival?embedded-checkout=true |access-date=April 21, 2025 |website=Bloomberg News}} The website has even been used for preview releases of upcoming models. However, Chatbot Arena's methodology for measuring large language model performance has been questioned as insufficient.{{Cite web |last=Stokel-Walker |first=Chris |date=February 6, 2025 |title=Hundreds of rigged votes can skew AI model rankings on Chatbot Arena, study finds |url=https://www.fastcompany.com/91273226/rigged-votes-ai-model-rankings-chatbot-arena |access-date=April 21, 2025 |website=Fast Company}}{{Cite web |last=Wiggers |first=Kyle |date=September 5, 2024 |title=The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark |url=https://techcrunch.com/2024/09/05/the-ai-industry-is-obsessed-with-chatbot-arena-but-it-might-not-be-the-best-benchmark/ |access-date=April 21, 2025 |website=TechCrunch}}

File:Chatbot Arena main UI.png

{{-}}

References