deepset

{{Short description|German natural language processing startup}}

{{Lowercase title}}

{{Infobox company

| name = deepset

| logo = deepset.svg

| type = Private

| industry = Natural Language Processing

| founded = {{Start date and age|2018|06|22}}

| founders = {{hlist|Milos Rusic|Malte Pietsch|Timo Möller}}

| location_city = Berlin

| location_country = Germany

| products = Haystack, deepset Cloud

| num_employees = > 50

| homepage = {{URL|https://www.deepset.ai/}}

}}

deepset is an enterprise software vendor that provides developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in Berlin by Milos Rusic, Malte Pietsch, and Timo Möller.{{Cite news |last=Wiggers |first=Kyle |date=April 28, 2022 |title=Deepset raises $14M to help companies build NLP apps |work=TechCrunch |url=https://techcrunch.com/2022/04/28/deepset-raises-14m-to-help-companies-build-nlp-apps/ |access-date=August 31, 2022}}

deepset authored and maintains the open source software Haystack and its commercial SaaS offering deepset Cloud.

History

In June 2018, Milos Rusic, Malte Pietsch, and Timo Möller co-founded deepset in Berlin, Germany. In the same year, the company served first customers who wanted to implement NLP services by tailoring BERT language models to their domain.

In July 2019, the company released the initial version of the open source software FARM.{{Cite web |title=deepset-ai/FARM |url=https://github.com/deepset-ai/FARM |access-date=August 31, 2022 |website=GitHub}}

In November 2019, the company released the initial version of the open source software Haystack.{{Cite web |title=deepset-ai/haystack |url=https://github.com/deepset-ai/haystack |access-date=August 31, 2022 |website=GitHub}}

Throughout 2020 and 2021 deepset published several applied research papers at EMNLP, COLING and ACL, the leading conferences in the area of NLP. In 2020, the research contributions comprised German language models named GBERT and GELECTRA,{{Cite book |last1=Chan |first1=Branden |last2=Schweter |first2=Stefan |last3=Möller |first3=Timo |title=Proceedings of the 28th International Conference on Computational Linguistics |chapter=German's Next Language Model |date=2020 |chapter-url=https://www.aclweb.org/anthology/2020.coling-main.598 |language=en |location=Barcelona, Spain (Online) |publisher=International Committee on Computational Linguistics |pages=6788–6796 |doi=10.18653/v1/2020.coling-main.598|doi-access=free }} and a question answering dataset addressing the COVID-19 pandemic called COVID-QA, which was created in collaboration with Intel and has been annotated by biomedical experts.{{Cite journal |last1=Möller |first1=Timo |last2=Reina |first2=Anthony |last3=Jayakumar |first3=Raghavan |last4=Pietsch |first4=Malte |date=2020-07-09 |title=COVID-QA: A Question Answering Dataset for COVID-19 |url=https://aclanthology.org/2020.nlpcovid19-acl.18 |journal=Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020 |location=Online |publisher=Association for Computational Linguistics}}

In 2021, the research contributions comprised German models and datasets for question answering and passage retrieval named GermanQuAD and GermanDPR,{{Cite journal |last1=Möller |first1=Timo |last2=Risch |first2=Julian |last3=Pietsch |first3=Malte |date=2021 |title=GermanQuAD and GermanDPR: Improving Non-English Question Answering and Passage Retrieval |url=https://aclanthology.org/2021.mrqa-1.4 |journal=Proceedings of the 3rd Workshop on Machine Reading for Question Answering |language=en |location=Punta Cana, Dominican Republic |publisher=Association for Computational Linguistics |pages=42–50 |doi=10.18653/v1/2021.mrqa-1.4|doi-access=free |arxiv=2104.12741 }} a semantic answer similarity metric,{{Cite journal |last1=Risch |first1=Julian |last2=Möller |first2=Timo |last3=Gutsch |first3=Julian |last4=Pietsch |first4=Malte |date=2021 |title=Semantic Answer Similarity for Evaluating Question Answering Models |url=https://aclanthology.org/2021.mrqa-1.15 |journal=Proceedings of the 3rd Workshop on Machine Reading for Question Answering |language=en |location=Punta Cana, Dominican Republic |publisher=Association for Computational Linguistics |pages=149–157 |doi=10.18653/v1/2021.mrqa-1.15|doi-access=free |arxiv=2108.06130 }} and an approach for multimodal retrieval of texts and tables to enable question answering on tabular data.{{Cite journal |last1=Kostić |first1=Bogdan |last2=Risch |first2=Julian |last3=Möller |first3=Timo |date=2021 |title=Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models |url=https://aclanthology.org/2021.mrqa-1.8 |journal=Proceedings of the 3rd Workshop on Machine Reading for Question Answering |language=en |location=Punta Cana, Dominican Republic |publisher=Association for Computational Linguistics |pages=82–91 |doi=10.18653/v1/2021.mrqa-1.8|doi-access=free |arxiv=2108.04049 }} Haystack contains implementations of all three contributions, enabling the use of the research through the open source framework.

In November 2021, the development of the FARM framework was discontinued and its main features were integrated into the Haystack framework.

In April 2022, the company announced its commercial SaaS offering deepset Cloud.{{Cite web |title=deepset Cloud |url=https://www.deepset.ai/deepset-cloud |access-date=August 31, 2022 |website=deepset}}

As of August 2023, the most popular finetuned language model created by deepset was downloaded more than 52 million times.{{Cite web |title=deepset/roberta-base-squad2 · Hugging Face |url=https://huggingface.co/deepset/roberta-base-squad2 |access-date=October 12, 2022 |website=huggingface.co}}

Products and applications

Haystack is an open source Python framework for building custom applications with large language models. With its modular building blocks, software developers can implement pipelines to address various search tasks over large document collections, such as document retrieval, semantic search, text generation, question answering, or summarization. It integrates with Hugging Face Transformers, Elasticsearch, OpenSearch, OpenAI, Cohere, Anthropic and others. The framework has an active community on Discord with more than 1.8k members and GitHub, where so far more than 200 people contributed to its continuous development,{{Cite web |title=Contributors to deepset-ai/haystack |url=https://github.com/deepset-ai/haystack |access-date=August 31, 2022 |website=GitHub|language=en}} and it also enjoys a vibrant community on Meetup.{{Cite web |title=Open NLP Group |url=https://www.meetup.com/open-nlp-meetup/ |access-date=August 31, 2022 |website=Meetup |language=en}}

Thousands of organizations use the framework, including Global 500 enterprises like Airbus, Intel, Netflix, Apple, or Infineon, Alcatel-Lucent Enterprise, BetterUp, Etalab, Sooth.ai, and Lego.{{Cite news |last=Laughlin |first=Eleni |date=April 28, 2022 |title=deepset Raises $14 Million Series A Led By GV for Advanced NLP Platform |work=Business Wire |url=https://www.businesswire.com/news/home/20220428005187/en/ |access-date=August 31, 2022}}{{Cite web |title=Who uses Haystack |url=https://github.com/deepset-ai/haystack#who-uses-haystack |access-date=August 31, 2022 |website=GitHub|language=en}}

The deepset Cloud platform supports customers at building scalable NLP applications by covering the entire process of prototyping, experimentation, deployment, and monitoring.{{Cite web |title=deepset Cloud |url=https://venturebeat.com/ai/open-source-nlp-company-deepset-nabs-14m-to-power-plain-english-enterprise-search/ |access-date=November 1, 2022 |website=VentureBeat |date=28 April 2022 |language=en}} It is built on Haystack.

FARM was a framework for adapting representation models. One of its core concepts was the implementation of adaptive models, which comprised language models and an arbitrary number of prediction heads. FARM supported domain-adaptation and finetuning of these models with advanced options, for example gradient accumulation, cross-validation or automatic mixed-precision training. Its main features were integrated into Haystack in November 2021, and its development was discontinued at that time.{{Cite book |chapter=Finding A Needle in a Haystack: Automated Mining of Silent Vulnerability Fixes |chapter-url=https://ieeexplore.ieee.org/document/9678720 |access-date=2023-11-13 |doi=10.1109/ase51524.2021.9678720|s2cid=246081539 |title=2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) |date=2021 |last1=Zhou |first1=Jiayuan |last2=Pacheco |first2=Michael |last3=Wan |first3=Zhiyuan |last4=Xia |first4=Xin |last5=Lo |first5=David |last6=Wang |first6=Yuan |last7=Hassan |first7=Ahmed E. |pages=705–716 |isbn=978-1-6654-0337-5 }}

Funding

On August 9, 2023, deepset announced a Series B investment round of $30 million led by Balderton Capital and including participation from existing investors GV, System.One, Lunar Ventures and Harpoon Ventures.{{Cite web |title=Deepset raises $30M to help enterprises unlock the value of LLMs|url=https://venturebeat.com/ai/deepset-raises-30m-to-help-enterprises-unlock-the-value-of-llms/ |access-date=August 22, 2023 |website=VentureBeat |date=9 August 2023 |language=en}}{{Cite web |title=Deepset secures $30M to expand its LLM-focused MLOps offerings |url=https://techcrunch.com/2023/08/09/deepset-secures-30m-to-expand-its-llm-focused-mlops-offerings/ |access-date=August 22, 2023 |website=TechCrunch |date=9 August 2023 |language=en}}{{Cite web |title=Deepset, an AI startup that helps companies build apps with LLMs, just raised $30 million with this 12-slide pitch deck |url=https://www.businessinsider.com/deepset-german-ai-startup-raises-30m-balderton-to-expand-llms-2023-8 |access-date=August 22, 2023 |website=Business Insider |language=en}}{{Cite web |title=Deepset raises $30 million to help the world's biggest companies leverage LLM promise |url=https://www.balderton.com/news/deepset-raises-30-million-to-help-the-worlds-biggest-companies-leverage-llm-promise/ |access-date=August 22, 2023 |website=Balderton |date=9 August 2023 |language=en}} On April 28, 2022, deepset announced a Series A investment round of $14 million led by GV, with the participation of Harpoon Ventures, Acequia Capital and a team of experienced commercial open source software and machine learning founders, such as Alex Ratner (Snorkel AI), Mustafa Suleyman (Deepmind), Spencer Kimball (Cockroach Labs), Jeff Hammerbacher (Cloudera) and Emil Eifrem (Neo4j). A previous pre-seed investment round of $1.6 million on March 8, 2021, was led by System.One and Lunar Ventures, who also participated in the subsequent Series A round.

References

{{reflist}}