Groq

{{Infobox company

| name = Groq, Inc.

| logo = Groq_logo.svg

| type = Private

| genre =

| fate =

| predecessor =

| successor =

| foundation = {{start date and age|2016}}

| founders = {{ubl|Jonathan Ross}}

| defunct =

| hq_location_city = Mountain View, California

| hq_location_country = US

| key_people = Jonathan Ross (CEO), Andrew S. Rappaport (Board Member), Chamath Palihapitiya (Investor)

| industry = {{ubl|Semiconductor industry|Artificial Intelligence|Cloud computing}}

| products = Language Processing Unit (LPU)

| production =

| services =

| revenue = US$3.2 million (2023)

| profit = US$−88 million (2023)

| num_employees = 250 (2023)

| homepage = {{URL|https://groq.com/}}

| footnotes =

| intl =

}}

Groq, Inc. is an American artificial intelligence (AI) company that builds an AI accelerator application-specific integrated circuit (ASIC) that they call the Language Processing Unit (LPU) and related hardware to accelerate the inference performance of AI workloads.

Examples of the types AI workloads that run on Groq's LPU are: large language models (LLMs),{{cite web |last1=Williams |first1=Wayne |title='Feels like magic!': Groq's ultrafast LPU could well be the first LLM-native processor — and its latest demo may well convince Nvidia and AMD to get out their checkbooks |url=https://www.techradar.com/pro/feels-like-magic-groqs-ultrafast-lpu-could-well-be-the-first-llm-native-processor-and-its-latest-demo-may-well-convince-nvidia-and-amd-to-get-out-their-checkbooks |website=TechRadar Pro |date=27 February 2024 |publisher=TechRadar |access-date=19 April 2024}}{{cite web |last1=Ward-Foxton |first1=Sally |title=Groq Demonstrates Fast LLMs on 4-Year-Old Silicon |url=https://www.eetimes.com/groq-demos-fast-llms-on-4-year-old-silicon/ |website=EETimes |date=12 September 2023 |access-date=19 April 2024}} image classification,{{cite web |last1=Ward-Foxton |first1=Sally |title=Groq's AI Chip Debuts in the Cloud |url=https://www.eetimes.com/groqs-ai-chip-debuts-in-the-cloud/ |website=EETimes |date=21 January 2020 |access-date=19 April 2024}} anomaly detection,{{cite web |last1=Moorhead |first1=Patrick |title=US Army Analytics Group – Cybersecurity Anomaly Detection 1000X Faster With Less False Positives |url=https://www.forbes.com/sites/patrickmoorhead/2022/11/14/us-army-analytics-group--cybersecurity-anomaly-detection-1000x-faster-with-less-false-positives/?sh=701ef6aa6d0c |work=Forbes |access-date=19 April 2024}}{{cite web |last1=Herman |first1=Arthur |title=Cybersecurity Is Entering The High-Tech Era |url=https://www.forbes.com/sites/arthurherman/2022/10/31/cybersecurity-is-entering-the-high-tech-era/?sh=2fe0412b1103 |work=Forbes |access-date=19 April 2024}} and predictive analysis.{{cite web |last1=Heinonen |first1=Nils |title=Researchers accelerate fusion research with Argonne's Groq AI platform |url=https://www.alcf.anl.gov/news/researchers-accelerate-fusion-research-argonne-s-groq-ai-platform |website=Argonne Leadership Computing Facility |access-date=19 April 2024}}{{cite web |last1=Larwood |first1=Mariah |last2=Cerny |first2=Beth |title=Argonne deploys new Groq system to ALCF AI Testbed, providing AI accelerator access to researchers globally |url=https://www.alcf.anl.gov/news/argonne-deploys-new-groq-system-alcf-ai-testbed-providing-ai-accelerator-access-researchers |website=Argonne Leadership Computing Facility |access-date=19 April 2024}}

Groq is headquartered in Mountain View, CA, and has offices in San Jose, CA, Liberty Lake, WA, Toronto, Canada, London, U.K. and remote employees throughout North America and Europe.

History

Groq was founded in 2016 by a group of former Google engineers, led by Jonathan Ross, one of the designers of the Tensor Processing Unit (TPU), an AI accelerator ASIC, and Douglas Wightman, an entrepreneur and former engineer at Google X (known as X Development), who served as the company’s first CEO.{{cite web |last1=Levy |first1=Ari |title=Several Google engineers have left one of its most secretive AI projects to form a stealth start-up |url=https://www.cnbc.com/2017/04/20/ex-googlers-left-secretive-ai-unit-to-form-groq-with-palihapitiya.html |website=CNBC |date=21 April 2017 |access-date=19 April 2024}}

Groq received seed funding from Social Capital's Chamath Palihapitiya, with a $10 million investment in 2017{{cite web |last1=Clark |first1=Kate |title=Secretive semiconductor startup Groq raises $52M from Social Capital |url=https://techcrunch.com/2018/09/05/secretive-semiconductor-startup-groq-raises-52m-from-social-capital/ |website=TechCrunch |date=6 September 2018 |access-date=19 April 2024}} and soon after secured additional funding.

In April 2021, Groq raised $300 million in a series C round led by Tiger Global Management and D1 Capital Partners.{{cite news |last1=King |first1=Ian |title=Tiger Global, D1 Lead $300 Million Round in AI Chip Startup Groq |url=https://www.bloomberg.com/news/articles/2021-04-14/tiger-global-d1-lead-300-million-round-in-ai-chip-startup-groq |website=Bloomberg |date=14 April 2021 |access-date=19 April 2024}} Current investors include: The Spruce House Partnership, Addition, GCM Grosvenor, Xⁿ, Firebolt Ventures, General Global Capital, and Tru Arrow Partners, as well as follow-on investments from TDK Ventures, XTX Ventures, Boardman Bay Capital Management, and Infinitum Partners.{{cite web |last1=Wheatly |first1=Mike |title=AI chipmaker Groq raises $300M in Series C round |url=https://siliconangle.com/2021/04/14/ai-chip-maker-groq-raises-300m-series-c-round/ |website=Silicon Angle |date=14 April 2021 |access-date=19 April 2024}}{{cite web |last1=McFarland |first1=Alex |title=AI Chip Startup Groq Closes $300 Million in Series C Fundraising |url=https://www.unite.ai/ai-chip-startup-groq-closes-300-million-in-series-c-fundraising/ |website=Unite.AI |date=14 April 2021 |access-date=19 April 2024}} After Groq’s series C funding round, it was valued at over $1{{nbsp}}billion, making the startup a unicorn.{{cite web |last1=Andonov |first1=Kaloyan |last2=Lavine |first2=Rob |title=Analysis: Groq computes a $300m series C |url=https://globalventuring.com/corporate/analysis-groq-computes-a-300m-series-c/ |website=Global Venturing |date=19 April 2021 |access-date=19 April 2024}}

On March 1, 2022, Groq acquired Maxeler Technologies, a company known for its dataflow systems technologies.{{cite web |last1=Prickett Morgan |first1=Timothy |title=GROQ BUYS MAXELER FOR ITS HPC AND AI DATAFLOW EXPERTISE |url=https://www.nextplatform.com/2022/03/01/groq-buys-maxeler-for-its-hpc-and-ai-dataflow-expertise/ |website=The Next Platform |date=2 March 2022 |access-date=19 April 2024}}

On August 16, 2023, Groq selected Samsung Electronics foundry in Taylor, Texas to manufacture its next generation chips, on Samsung's 4-nanometer (nm) process node. This was the first order at this new Samsung chip factory.{{cite web |last1=Hwang |first1=Jeong-Soo |title=Samsung's new US chip fab wins first foundry order from Groq |url=https://www.kedglobal.com/korean-chipmakers/newsView/ked202308160014 |website=The Korea Economic Daily |access-date=19 April 2024}}

On February 19, 2024, Groq soft launched a developer platform, GroqCloud, to attract developers into using the Groq API and rent access to their chips.{{cite web |last1=Franzen |first1=Carl |title=Groq launches developer playground GroqCloud with newly acquired Definitive Intelligence |url=https://venturebeat.com/programming-development/groq-launches-developer-playground-groqcloud-with-newly-acquired-definitive-intelligence/ |website=Venture Beat |date=March 2024 |access-date=19 April 2024}} On March 1, 2024 Groq acquired Definitive Intelligence, a startup known for offering a range of business-oriented AI solutions, to help with its cloud platform.{{cite web |last1=Wiggers |first1=Kyle |title=AI chip startup Groq forms new business unit, acquires Definitive Intelligence |url=https://techcrunch.com/2024/03/01/ai-chip-startup-groq-forms-new-business-unit-acquires-definitive-intelligence/ |website=TechCrunch |date=March 2024 |access-date=19 April 2024}}

Groq raised $640 million in a series D round led by BlackRock Private Equity Partners in August 2024, valuing the company at $2.8 billion.{{Cite web |last=Nieva |first=Richard |date=August 5, 2024 |title=The AI Chip Boom Saved This Tiny Startup. Now Worth $2.8 Billion, It's Taking On Nvidia |url=https://www.forbes.com/sites/richardnieva/2024/08/05/groq-funding-series-d-nvidia/ |work=Forbes}}{{Cite web |last=Wiggers |first=Kyle |date=2024-08-05 |title=AI chip startup Groq lands $640M to challenge Nvidia |url=https://techcrunch.com/2024/08/05/ai-chip-startup-groq-lands-640m-to-challenge-nvidia/ |access-date=2024-08-26 |website=TechCrunch |language=en-US}}

A recent update by Groq on its website is that they have secured 1.5 Billion in funding from the KSA (Kingdom of Saudi Arabia) to expand its infrastructure. https://groq.com/leap2025/

Language Processing Unit

File:LPU-v1-die.jpg

Groq's initial name for their ASIC was the Tensor Streaming Processor (TSP), but later rebranded the TSP as the Language Processing Unit (LPU).{{cite web |last1=Mellor |first1=Chris |title=Grokking Groq's Groqness |url=https://blocksandfiles.com/2024/01/23/grokking-groqs-groqness/ |website=Blocks & Files |date=23 January 2024 |access-date=19 April 2024}}{{cite book |last1=Abts |first1=Dennis |last2=Ross |first2=Jonathan |last3=Sparling |first3=Jonathan |last4=Wong-VanHaren |first4=Mark |last5=Baker |first5=Max |last6=Hawkins |first6=Tom |last7=Bell |first7=Andrew |last8=Thompson |first8=John |last9=Kahsai |first9=Temesghen |last10=Kimmell |first10=Garrin |last11=Hwang |first11=Jennifer |last12=Leslie-Hurd |first12=Rebekah |last13=Bye |first13=Michael |last14=Creswick |first14=E.R. |last15=Boyd |first15=Matthew |last16=Venigalla |first16=Mahitha |last17=Laforge |first17=Evan |last18=Purdy |first18=Jon |last19=Kamath |first19=Purushotham |last20=Maheshwari |first20=Dinesh |last21=Beidler |first21=Michael |last22=Rosseel |first22=Geert |last23=Ahmad |first23=Omar |last24=Gagarin |first24=Gleb |last25=Czekalski |first25=Richard |last26=Rane |first26=Ashay |last27=Parmar |first27=Sahil |last28=Werner |first28=Jeff |last29=Sproch |first29=Jim |last30=Macias |first30=Adrian |last31=Kurtz |first31=Brian |chapter=Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads |chapter-url=https://groq.com/wp-content/uploads/2020/06/ISCA-TSP.pdf |title= 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) |date=May 2020 |pages=145–158 |doi=10.1109/ISCA45697.2020.00023|isbn=978-1-7281-4661-4 }}

The LPU features a functionally sliced microarchitecture, where memory units are interleaved with vector and matrix computation units.{{cite book |chapter-url=https://dl.acm.org/doi/10.1145/3470496.3527405 |title=Proceedings of the 49th Annual International Symposium on Computer Architecture |date=June 11, 2022 |doi=10.1145/3470496.3527405 |access-date=2024-03-18 |last1=Abts |first1=Dennis |last2=Kimmell |first2=Garrin |last3=Ling |first3=Andrew |last4=Kim |first4=John |last5=Boyd |first5=Matt |last6=Bitar |first6=Andrew |last7=Parmar |first7=Sahil |last8=Ahmed |first8=Ibrahim |last9=Dicecco |first9=Roberto |last10=Han |first10=David |last11=Thompson |first11=John |last12=Bye |first12=Michael |last13=Hwang |first13=Jennifer |last14=Fowers |first14=Jeremy |last15=Lillian |first15=Peter |last16=Murthy |first16=Ashwin |last17=Mehtabuddin |first17=Elyas |last18=Tekur |first18=Chetan |last19=Sohmers |first19=Thomas |last20=Kang |first20=Kris |last21=Maresh |first21=Stephen |last22=Ross |first22=Jonathan |chapter=A software-defined tensor streaming multiprocessor for large-scale machine learning |pages=567–580 |isbn=978-1-4503-8610-4 }} This design facilitates the exploitation of dataflow locality in AI compute graphs, improving execution performance and efficiency. The LPU was designed off of two key observations:

AI workloads exhibit substantial data parallelism, which can be mapped onto purpose built hardware, leading to performance gains.
A deterministic processor design, coupled with a producer-consumer programming model, allows for precise control and reasoning over hardware components, allowing for optimized performance and energy efficiency.

In addition to its functionally sliced microarchitecture, the LPU can also be characterized by its single core, deterministic architecture.{{cite book |chapter-url=https://dl.acm.org/doi/10.1145/3490422.3510453 |title=Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays |date=February 11, 2022 |doi=10.1145/3490422.3510453 |access-date=2024-03-18 |last1=Singh |first1=Satnam |chapter=The Virtuous Cycles of Determinism: Programming Groq's Tensor Streaming Processor |page=153 |isbn=978-1-4503-9149-8 }} The LPU is able to achieve deterministic execution by avoiding the use of traditional reactive hardware components (branch predictors, arbiters, reordering buffers, caches) and by having all execution explicitly controlled by the compiler thereby guaranteeing determinism in execution of an LPU program.

The first generation of the LPU (LPU v1) yields a computational density of more than 1TeraOp/s per square mm of silicon for its 25×29 mm 14nm chip operating at a nominal clock frequency of 900 MHz.{{Cite book|date=2022-06-11|chapter=A software-defined tensor streaming multiprocessor for large-scale machine learning|doi=10.1145/3470496.3527405 |chapter-url=https://dl.acm.org/doi/abs/10.1145/3470496.3527405 |title=Proceedings of the 49th Annual International Symposium on Computer Architecture |last1=Abts |first1=Dennis |last2=Kimmell |first2=Garrin |last3=Ling |first3=Andrew |last4=Kim |first4=John |last5=Boyd |first5=Matt |last6=Bitar |first6=Andrew |last7=Parmar |first7=Sahil |last8=Ahmed |first8=Ibrahim |last9=Dicecco |first9=Roberto |last10=Han |first10=David |last11=Thompson |first11=John |last12=Bye |first12=Michael |last13=Hwang |first13=Jennifer |last14=Fowers |first14=Jeremy |last15=Lillian |first15=Peter |last16=Murthy |first16=Ashwin |last17=Mehtabuddin |first17=Elyas |last18=Tekur |first18=Chetan |last19=Sohmers |first19=Thomas |last20=Kang |first20=Kris |last21=Maresh |first21=Stephen |last22=Ross |first22=Jonathan |pages=567–580 |isbn=978-1-4503-8610-4 }} The second generation of the LPU (LPU v2) will be manufactured on Samsung's 4nm process node.

=Performance=

Groq emerged as the first API provider to break the 100 tokens per second generation rate while running Meta’s Llama2-70B parameter model.{{cite web |last1=Smith-Goodson |first1=Paul |title=Groq's Record-Breaking Language Processor Hits 100 Tokens Per Second On A Massive AI Model |url=https://www.forbes.com/sites/moorinsights/2023/08/11/groqs-record-breaking-language-processor-hits-100-tokens-per-second-on-a-massive-ai-model/?sh=6e6e4825358f |website=Forbes |access-date=19 April 2024}}

Groq currently hosts a variety of open-source large language models running on its LPUs for public access.{{cite web |last1=Morrison |first1=Ryan |title=Meet Groq — the chip designed to run AI models really, really fast |url=https://www.tomsguide.com/ai/meet-groq-the-chip-designed-to-run-ai-models-really-really-fast |website=Tom’s Guide |date=27 February 2024 |access-date=19 April 2024}} Access to these demos are available through Groq's website. The LPU's performance while running these open source LLMs has been independently benchmarked by ArtificialAnalysis.ai, in comparison with other LLM providers.{{cite web |url=https://www.hpcwire.com/off-the-wire/groq-shows-promising-results-in-new-llm-benchmark-surpassing-industry-averages/ |title=Groq Shows Promising Results in New LLM Benchmark, Surpassing Industry Averages |publisher=HPCwire |date=2024-02-13 |access-date=2024-03-18}} The LPU's measured performance is shown in the table below:

class="wikitable" \|+Language Processing Unit LLM Performance ! Model Name!! Tokens/second (T/s) !! Latency (seconds)
Llama2-70B{{cite web \|url=https://artificialanalysis.ai/models/llama-2-chat-70b/providers \|title=Llama-2 Chat 70B Providers \|website=artificialanalysis.ai \|access-date=2024-03-18}}{{cite web \|url=https://www.datanami.com/this-just-in/groq-shows-promising-results-in-new-llm-benchmark-surpassing-industry-averages/ \|title=Groq Shows Promising Results in New LLM Benchmark, Surpassing Industry Averages \|publisher=Datanami \|date=2024-02-13 \|access-date=2024-03-18}}{{cite web \|url=https://www.eetimes.com/groq-demos-fast-llms-on-4-year-old-silicon/ \|title=Groq Demos Fast LLMs on 4-Year-Old Silicon \|publisher=EE Times \|date=2023-09-12 \|access-date=2024-03-18}}	253 T/s	0.3s
Mixtral{{cite web \|url=https://artificialanalysis.ai/models/mixtral-8x7b-instruct/providers \|title=Mixtral 8x7B Instruct Providers \|website=artificialanalysis.ai \|access-date=2024-03-18}}	473 T/s	0.3s
Gemma{{cite web \|url=https://artificialanalysis.ai/models/gemma-7b/providers \|title=Gemma-7B Models Providers \|website=artificialanalysis.ai \|access-date=2024-03-18}}	826 T/s	0.3s