TabPFN
{{Short description|AI Foundation model for tabular data}}
{{Article for deletion/dated|page=TabPFN|timestamp=20250622052743|year=2025|month=June|day=22|substed=yes|help=off}}
{{Infobox software
| name = TabPFN
| developer = Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, Frank Hutter, Leo Grinsztajn, Klemens Flöge, Oscar Key & Sauraj Gambhir
| released = {{Start date and age|2023|09|16}}Python Package Index (PyPI) - tabpfn https://pypi.org/project/tabpfn/
| latest release date = {{Start date and age|2025|01|08}}
| operating system = Linux, macOS, Microsoft Windows
| programming language = Python
| genre = Machine learning
| license = Apache License 2.0
| website = {{URL|https://github.com/PriorLabs/TabPFN}}
}}
{{Multiple issues|
{{Promotional|date=June 2025}}
{{AI-generated|date=June 2025}}
}}
TabPFN (Tabular Prior-data Fitted Network) is a machine learning model that uses a transformer architecture for supervised classification and regression tasks on small to medium-sized tabular datasets, e.g., up to 10,000 samples.{{Cite journal |last1=Hollmann |first1=N. |last2=Müller |first2=S. |last3=Purucker |first3=L. |title=Accurate predictions on small data with a tabular foundation model |journal=Nature |volume=637 |pages=319–326 |year=2025 |issue=8045 |doi=10.1038/s41586-024-08328-6 |pmid=39780007|pmc=11711098 |bibcode=2025Natur.637..319H }}
Overview
First developed in 2022, TabPFN v2 was published in 2025 in Nature (journal) by Hollmann and co-authors. The source code is published on GitHub under a modified Apache License and on PyPi.{{Citation |title=PriorLabs/TabPFN |date=2025-06-22 |url=https://github.com/PriorLabs/TabPFN |access-date=2025-06-23 |publisher=Prior Labs}}
TabPFN v1 was introduced in a 2022 pre-print and presented at ICLR 2023. Prior Labs, founded in 2024, aims to commercialize TabPFN.{{Cite web |last=Kahn |first=Jeremy |date=5 February 2025 |title=AI has struggled to analyze tables and spreadsheets. This German startup thinks its breakthrough is about to change that |url=https://fortune.com/2025/02/05/prior-labs-9-million-euro-preseed-funding-tabular-data-ai/ |website=Fortune}}
TabPFN supports classification, regression and generative tasks, and its TabPFN-TS extension adds time series forecasting.{{Cite web |title=TabPFN Time Series |url=https://github.com/PriorLabs/tabpfn-time-series |website=GitHub}}
Pre-training
TabPFN addresses challenges in modeling tabular data{{Cite journal |last1=Shwartz-Ziv |first1=Ravid |last2=Armon |first2=Amitai |title=Tabular data: Deep learning is not all you need |journal=Information Fusion |volume=81 |pages=84–90 |year=2022 |doi=10.1016/j.inffus.2021.11.011|arxiv=2106.03253 }}{{Cite conference |last1=Grinsztajn |first1=Léo |last2=Oyallon |first2=Edouard |last3=Varoquaux |first3=Gaël |title=Why do tree-based models still outperform deep learning on typical tabular data? |conference=Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS '22) |year=2022 |pages=507–520 |url=https://dl.acm.org/doi/10.5555/3600270.3600307}} with Prior-Data Fitted Networks,{{Cite conference |last1=Müller |first1=Samuel |year=2022 |title=Transformers can do Bayesian inference |url=https://openreview.net/pdf?id=KSugKcbNf9 |conference=International Conference on Learning Representations (ICLR)}} by using a transformer pre-trained on synthetic tabular datasets.{{Cite conference |last1=Hollmann |first1=Noah |year=2023 |title=TabPFN: A transformer that solves small tabular classification problems in a second |url=https://iclr.cc/virtual/2023/oral/12541 |conference=International Conference on Learning Representations (ICLR)}}{{Cite web |last=McCarter |first=Calvin |date=May 7, 2024 |title=What exactly has TabPFN learned to do? {{!}} ICLR Blogposts 2024 |url=https://iclr-blogposts.github.io/2024/blog/what-exactly-has-tabpfn-learned-to-do/ |access-date=2025-06-22 |website=iclr-blogposts.github.io}}
It is pre-trained once on around 130 million synthetic datasets generated using Structural Causal Models or Bayesian Neural Networks, simulating real-world data characteristics like missing values or noise. This enables TabPFN to process new datasets in a single forward pass, adapting to the input without retraining. The model’s transformer encoder processes features and labels by alternating attention across rows and columns, capturing relationships within the data.{{Cite journal |last=McElfresh |first=Duncan C. |title=The AI tool that can interpret any spreadsheet instantly |journal=Nature |date=8 January 2025 |volume=637 |issue=8045 |pages=274–275 |doi=10.1038/d41586-024-03852-x |pmid=39780000 |bibcode=2025Natur.637..274M }} TabPFN v2, an updated version, handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation.
TabPFN's pre-training exclusively uses synthetically generated datasets, avoiding benchmark contamination and the costs of curating real-world data. TabPFN v2 was pre-trained on approximately 130 million such datasets, each serving as a "meta-datapoint".
The synthetic datasets are primarily drawn from a prior distribution embodying causal reasoning principles, using Structural Causal Models (SCMs) or Bayesian Neural Networks (BNNs). Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures. The process generates diverse datasets that simulate real-world imperfections like missing values, imbalanced data and noise. During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass.
Since TabPFN is pre-trained, in contrast to other deep learning methods, it does not require costly hyperparameter optimization.
Applications
Applications for TabPFN have been investigated for domains such as Time Series Forecasting, chemoproteomics,{{cite journal |last1=Offensperger |first1=Fabian |last2=Tin |first2=Gary |last3=Duran-Frigola |first3=Miquel |last4=Hahn |first4=Elisa |last5=Dobner |first5=Sarah |last6=Ende |first6=Christopher W. am |last7=Strohbach |first7=Joseph W. |last8=Rukavina |first8=Andrea |last9=Brennsteiner |first9=Vincenth |last10=Ogilvie |first10=Kevin |last11=Marella |first11=Nara |last12=Kladnik |first12=Katharina |last13=Ciuffa |first13=Rodolfo |last14=Majmudar |first14=Jaimeen D. |last15=Field |first15=S. Denise |last16=Bensimon |first16=Ariel |last17=Ferrari |first17=Luca |last18=Ferrada |first18=Evandro |last19=Ng |first19=Amanda |last20=Zhang |first20=Zhechun |last21=Degliesposti |first21=Gianluca |last22=Boeszoermenyi |first22=Andras |last23=Martens |first23=Sascha |last24=Stanton |first24=Robert |last25=Müller |first25=André C. |last26=Hannich |first26=J. Thomas |last27=Hepworth |first27=David |last28=Superti-Furga |first28=Giulio |last29=Kubicek |first29=Stefan |last30=Schenone |first30=Monica |last31=Winter |first31=Georg E. |title=Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells |journal=Science |date=26 April 2024 |volume=384 |issue=6694 |pages=eadk5864 |doi=10.1126/science.adk5864 |pmid=38662832 |bibcode=2024Sci...384k5864O }} insurance risk classification,{{cite book |doi=10.1109/GECOST60902.2024.10475046 |chapter=Deep Learning for Cross-Selling Health Insurance Classification |title=2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST) |date=2024 |last1=Chu |first1=Jasmin Z. K. |last2=Than |first2=Joel C. M. |last3=Jo |first3=Hudyjaya Siswoyo |pages=453–457 |isbn=979-8-3503-5790-5 }} medical diagnostics,{{cite journal |last1=Alzakari |first1=Sarah A. |last2=Aldrees |first2=Asma |last3=Umer |first3=Muhammad |last4=Cascone |first4=Lucia |last5=Innab |first5=Nisreen |last6=Ashraf |first6=Imran |title=Artificial intelligence-driven predictive framework for early detection of still birth |journal=SLAS Technology |date=December 2024 |volume=29 |issue=6 |pages=100203 |doi=10.1016/j.slast.2024.100203 |pmid=39424101 }}{{cite journal |last1=El-Melegy |first1=Moumen |last2=Mamdouh |first2=Ahmed |last3=Ali |first3=Samia |last4=Badawy |first4=Mohamed |last5=El-Ghar |first5=Mohamed Abou |last6=Alghamdi |first6=Norah Saleh |last7=El-Baz |first7=Ayman |title=Prostate Cancer Diagnosis via Visual Representation of Tabular Data and Deep Transfer Learning |journal=Bioengineering |date=21 June 2024 |volume=11 |issue=7 |pages=635 |doi=10.3390/bioengineering11070635 |doi-access=free |pmid=39061717 |pmc=11274351 }}{{cite journal |last1=Karabacak |first1=Mert |last2=Schupper |first2=Alexander |last3=Carr |first3=Matthew |last4=Margetis |first4=Konstantinos |title=A machine learning-based approach for individualized prediction of short-term outcomes after anterior cervical corpectomy |journal=Asian Spine Journal |date=August 2024 |volume=18 |issue=4 |pages=541–549 |doi=10.31616/asj.2024.0048 |pmid=39113482 |pmc=11366553 }}{{cite journal |last1=Liu |first1=Yanqing |last2=Su |first2=Zhenyi |last3=Tavana |first3=Omid |last4=Gu |first4=Wei |title=Understanding the complexity of p53 in a new era of tumor suppression |journal=Cancer Cell |date=June 2024 |volume=42 |issue=6 |pages=946–967 |doi=10.1016/j.ccell.2024.04.009 |pmid=38729160 |pmc=11190820 }} metagenomics,{{cite conference |last1=Perciballi |first1=Giulia |last2=Granese |first2=Federica |last3=Fall |first3=Ahmad |last4=Zehraoui |first4=Farida |last5=Prifti |first5=Edi |last6=Zucker |first6=Jean-Daniel |title=Adapting TabPFN for Zero-Inflated Metagenomic Data |date=10 October 2024 |url=https://openreview.net/forum?id=3I0bVvUj25 |conference=Table Representation Learning Workshop at NeurIPS 2024 }} wildfire propagation modeling,{{cite journal |last1=Khanmohammadi |first1=Sadegh |last2=Cruz |first2=Miguel G. |last3=Perrakis |first3=Daniel D.B. |last4=Alexander |first4=Martin E. |last5=Arashpour |first5=Mehrdad |title=Using AutoML and generative AI to predict the type of wildfire propagation in Canadian conifer forests |journal=Ecological Informatics |date=September 2024 |volume=82 |pages=102711 |doi=10.1016/j.ecoinf.2024.102711 }} and others.
See also
References
{{reflist}}