List of datasets in computer vision and image processing

This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.

Object detection and recognition

class="wikitable sortable" style="width: 100%"

! scope="col" style="width: 15%;" | Dataset Name

! scope="col" style="width: 18%;" | Brief description

! scope="col" style="width: 18%;" | Preprocessing

! scope="col" style="width: 6%;" | Instances

! scope="col" style="width: 7%;" | Format

! scope="col" style="width: 7%;" | Default Task

! scope="col" style="width: 6%;" | Created (updated)

! scope="col" style="width: 6%;" | Reference

! scope="col" style="width: 11%;" | Creator

MNIST

|Database of grayscale handwritten digits.

|60,000

|image, label

|classification

|1994

|{{Cite book |last1=Bottou |first1=L. |last2=Cortes |first2=C. |last3=Denker |first3=J.S. |last4=Drucker |first4=H. |last5=Guyon |first5=I. |last6=Jackel |first6=L.D. |last7=LeCun |first7=Y. |last8=Muller |first8=U.A. |last9=Sackinger |first9=E. |last10=Simard |first10=P. |last11=Vapnik |first11=V. |chapter=Comparison of classifier methods: A case study in handwritten digit recognition |date=1994 |title=Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5) |chapter-url=https://ieeexplore.ieee.org/document/576879 |publisher=IEEE Comput. Soc. Press |volume=2 |pages=77–82 |doi=10.1109/ICPR.1994.576879 |isbn=978-0-8186-6270-6}}

|LeCun et al.

Extended MNIST

|Database of grayscale handwritten digits and letters.

|810,000

|image, label

|classification

|2010

|{{Cite journal |date=2010-08-27 |title=NIST Special Database 19 |url=https://www.nist.gov/srd/nist-special-database-19 |journal=NIST |language=en}}

|NIST

NYU Object Recognition Benchmark (NORB)

|Stereoscopic pairs of photos of toys in various orientations.

|Centering, perturbation.

|97,200 image pairs (50 uniform-colored toys under 36 angles, 9 azimuths, and 6 lighting conditions)

|Images

|Object recognition

|2004

|{{Cite web |last=LeCun |first=Yann |title=NORB: Generic Object Recognition in Images |url=https://cs.nyu.edu/~yann/research/norb/ |access-date=2025-04-26 |website=cs.nyu.edu}}{{Cite book |last1=LeCun |first1=Y. |last2=Fu Jie Huang |last3=Bottou |first3=L. |chapter=Learning methods for generic object recognition with invariance to pose and lighting |date=2004 |title=Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004 |chapter-url=https://ieeexplore.ieee.org/document/1315150 |publisher=IEEE |volume=2 |pages=97–104 |doi=10.1109/CVPR.2004.1315150 |isbn=978-0-7695-2158-9}}

|LeCun et al.

80 Million Tiny Images

|80 million 32×32 images labelled with 75,062 non-abstract nouns.

|80,000,000

|image, label

|2008

|{{Cite journal |last1=Torralba |first1=A. |last2=Fergus |first2=R. |last3=Freeman |first3=W.T. |date=November 2008 |title=80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition |url=https://ieeexplore.ieee.org/document/4531741 |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=30 |issue=11 |pages=1958–1970 |doi=10.1109/TPAMI.2008.128 |pmid=18787244 |issn=0162-8828}}

|Torralba et al.

Street View House Numbers (SVHN)

|630,420 digits with bounding boxes in house numbers captured in Google Street View.

|630,420

|image, label, bounding boxes

|2011

|{{Cite web |title=The Street View House Numbers (SVHN) Dataset |url=http://ufldl.stanford.edu/housenumbers/ |access-date=2025-02-25 |website=ufldl.stanford.edu}}Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng. "[http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf Reading Digits in Natural Images with Unsupervised Feature Learning]" NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011

|Netzer et al.

JFT-300M

|Dataset internal to Google Research. 303M images with 375M labels in 18291 categories

|303,000,000

|image, label

|2017

|{{cite arXiv |last1=Hinton |first1=Geoffrey |title=Distilling the Knowledge in a Neural Network |date=2015-03-09 |eprint=1503.02531 |last2=Vinyals |first2=Oriol |last3=Dean |first3=Jeff|class=stat.ML }}{{Cite arXiv |last1=Sun |first1=Chen |last2=Shrivastava |first2=Abhinav |last3=Singh |first3=Saurabh |last4=Gupta |first4=Abhinav |date=2017 |title=Revisiting Unreasonable Effectiveness of Data in Deep Learning Era |pages=843–852|class=cs.CV |eprint=1707.02968 }}{{cite arXiv |last1=Abnar |first1=Samira |title=Exploring the Limits of Large Scale Pre-training |date=2021-10-05 |eprint=2110.02095 |last2=Dehghani |first2=Mostafa |last3=Neyshabur |first3=Behnam |last4=Sedghi |first4=Hanie|class=cs.LG }}

|Google Research

JFT-3B

|Internal to Google Research. 3 billion images, annotated with ~30k categories in a hierarchy.

|3,000,000,000

|image, label

|2021

|{{cite arXiv |last1=Zhai |first1=Xiaohua |title=Scaling Vision Transformers |date=2021-06-08 |eprint=2106.04560 |last2=Kolesnikov |first2=Alexander |last3=Houlsby |first3=Neil |last4=Beyer |first4=Lucas|class=cs.CV }}

|Google Research

[http://places2.csail.mit.edu/ Places]

|10+ million images in 400+ scene classes, with 5000 to 30,000 images per class.

|10,000,000

|image, label

|2018

|{{Cite journal |last1=Zhou |first1=Bolei |last2=Lapedriza |first2=Agata |last3=Khosla |first3=Aditya |last4=Oliva |first4=Aude |last5=Torralba |first5=Antonio |date=2018-06-01 |title=Places: A 10 Million Image Database for Scene Recognition |url=https://ieeexplore.ieee.org/document/7968387 |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=40 |issue=6 |pages=1452–1464 |doi=10.1109/TPAMI.2017.2723009 |pmid=28692961 |issn=0162-8828}}

|Zhou et al

Ego 4D

|A massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video.

|Object bounding boxes, transcriptions, labeling.

|3,670 video hours

|video, audio, transcriptions

|Multimodal first-person task

|2022

|{{cite arXiv|last1=Grauman |first1=Kristen |last2=Westbury |first2=Andrew |last3=Byrne |first3=Eugene |last4=Chavis |first4=Zachary |last5=Furnari |first5=Antonino |last6=Girdhar |first6=Rohit |last7=Hamburger |first7=Jackson |last8=Jiang |first8=Hao |last9=Liu |first9=Miao |last10=Liu |first10=Xingyu |last11=Martin |first11=Miguel |last12=Nagarajan |first12=Tushar |last13=Radosavovic |first13=Ilija |last14=Ramakrishnan |first14=Santhosh Kumar |last15=Ryan |first15=Fiona |last16=Sharma |first16=Jayant |last17=Wray |first17=Michael |last18=Xu |first18=Mengmeng |last19=Xu |first19=Eric Zhongcong |last20=Zhao |first20=Chen |last21=Bansal |first21=Siddhant |last22=Batra |first22=Dhruv |last23=Cartillier |first23=Vincent |last24=Crane |first24=Sean |last25=Do |first25=Tien |last26=Doulaty |first26=Morrie |last27=Erapalli |first27=Akshay |last28=Feichtenhofer |first28=Christoph |last29=Fragomeni |first29=Adriano |last30=Fu |first30=Qichen |last31=Gebreselasie |first31=Abrham |last32=Gonzalez |first32=Cristina |last33=Hillis |first33=James |last34=Huang |first34=Xuhua |last35=Huang |first35=Yifei |last36=Jia |first36=Wenqi |last37=Khoo |first37=Weslie |last38=Kolar |first38=Jachym |last39=Kottur |first39=Satwik |last40=Kumar |first40=Anurag |last41=Landini |first41=Federico |last42=Li |first42=Chao |last43=Li |first43=Yanghao |last44=Li |first44=Zhenqiang |last45=Mangalam |first45=Karttikeya |last46=Modhugu |first46=Raghava |last47=Munro |first47=Jonathan |last48=Murrell |first48=Tullie |last49=Nishiyasu |first49=Takumi |last50=Price |first50=Will |last51=Puentes |first51=Paola Ruiz |last52=Ramazanova |first52=Merey |last53=Sari |first53=Leda |last54=Somasundaram |first54=Kiran |last55=Southerland |first55=Audrey |last56=Sugano |first56=Yusuke |last57=Tao |first57=Ruijie |last58=Vo |first58=Minh |last59=Wang |first59=Yuchen |last60=Wu |first60=Xindi |last61=Yagi |first61=Takuma |last62=Zhao |first62=Ziwei |last63=Zhu |first63=Yunyi |last64=Arbelaez |first64=Pablo |last65=Crandall |first65=David |last66=Damen |first66=Dima |last67=Farinella |first67=Giovanni Maria |last68=Fuegen |first68=Christian |last69=Ghanem |first69=Bernard |last70=Ithapu |first70=Vamsi Krishna |last71=Jawahar |first71=C. V. |last72=Joo |first72=Hanbyul |last73=Kitani |first73=Kris |last74=Li |first74=Haizhou |last75=Newcombe |first75=Richard |last76=Oliva |first76=Aude |last77=Park |first77=Hyun Soo |last78=Rehg |first78=James M. |last79=Sato |first79=Yoichi |last80=Shi |first80=Jianbo |last81=Shou |first81=Mike Zheng |last82=Torralba |first82=Antonio |last83=Torresani |first83=Lorenzo |last84=Yan |first84=Mingfei |last85=Malik |first85=Jitendra |title=Ego4D: Around the World in 3,000 Hours of Egocentric Video |date=2022 |class=cs.CV |eprint=2110.07058}}

|K. Grauman et al.

Wikipedia-based Image Text Dataset

|37.5 million image-text examples with 11.5 million unique images across 108 Wikipedia languages.

|11,500,000

|image, caption

|Pretraining, image captioning

|2021

|{{Cite book |last1=Srinivasan |first1=Krishna |last2=Raman |first2=Karthik |last3=Chen |first3=Jiecao |last4=Bendersky |first4=Michael |last5=Najork |first5=Marc |chapter=WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning |date=2021-07-11 |title=Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval |chapter-url=https://dl.acm.org/doi/10.1145/3404835.3463257 |language=en |publisher=ACM |pages=2443–2449 |doi=10.1145/3404835.3463257 |isbn=978-1-4503-8037-9|arxiv=2103.01913 }}

|Srinivasan e al, Google Research

Visual Genome

|Images and their description

|108,000

|images, text

|Image captioning

|2016

|{{Cite journal|doi=10.1007/s11263-016-0981-7|title=Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations|journal=International Journal of Computer Vision|volume=123|pages=32–73|year=2017|last1=Krishna|first1=Ranjay|last2=Zhu|first2=Yuke|last3=Groth|first3=Oliver|last4=Johnson|first4=Justin|last5=Hata|first5=Kenji|last6=Kravitz|first6=Joshua|last7=Chen|first7=Stephanie|last8=Kalantidis|first8=Yannis|last9=Li|first9=Li-Jia|last10=Shamma|first10=David A|last11=Bernstein|first11=Michael S|last12=Fei-Fei|first12=Li|arxiv=1602.07332|s2cid=4492210}}

|R. Krishna et al.

Berkeley 3-D Object Dataset

|849 images taken in 75 different scenes. About 50 different object classes are labeled.

|Object bounding boxes and labeling.

|849

|labeled images, text

|Object recognition

|2014

|Karayev, S., et al. "[http://alliejanoch.com/iccvw2011.pdf A category-level 3-D object dataset: putting the Kinect to work]." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2011.Tighe, Joseph, and Svetlana Lazebnik. "[http://152.2.128.56/~jtighe/Papers/ECCV10/eccv10-jtighe.pdf Superparsing: scalable nonparametric image parsing with superpixels] {{Webarchive|url=https://web.archive.org/web/20190806022752/http://152.2.128.56/~jtighe/Papers/ECCV10/eccv10-jtighe.pdf |date=6 August 2019 }}." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 352–365.

|A. Janoch et al.

Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500)

|500 natural images, explicitly separated into disjoint train, validation and test subsets + benchmarking code. Based on BSDS300.

|Each image segmented by five different subjects on average.

|500

|Segmented images

|Contour detection and hierarchical image segmentation

|2011

|{{cite journal|last1=Arbelaez|first1=P.|last2=Maire|first2=M|last3=Fowlkes|first3=C|last4=Malik|first4=J|title=Contour Detection and Hierarchical Image Segmentation|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |date=May 2011|volume=33|issue=5|pages=898–916|url=http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/papers/amfm_pami2010.pdf|access-date=27 February 2016|doi=10.1109/tpami.2010.161|pmid=20733228|s2cid=206764694}}

|University of California, Berkeley

{{anchor|COCO}}Microsoft Common Objects in Context (MS COCO)

|complex everyday scenes of common objects in their natural context.

|Object highlighting, labeling, and classification into 91 object types.

|2,500,000

|Labeled images, text

|Object recognition, image segmentation, keypointing, image captioning

|2015

|{{cite arXiv | eprint=1405.0312 | last1=Lin | first1=Tsung-Yi | last2=Maire | first2=Michael | last3=Belongie | first3=Serge | last4=Bourdev | first4=Lubomir | last5=Girshick | first5=Ross | last6=Hays | first6=James | last7=Perona | first7=Pietro | last8=Ramanan | first8=Deva | last9=Lawrence Zitnick | first9=C. | last10=Dollár | first10=Piotr | title=Microsoft COCO: Common Objects in Context | year=2014 | class=cs.CV }}{{cite journal | last1 = Russakovsky | first1 = Olga | display-authors = et al | year = 2015 | title = Imagenet large scale visual recognition challenge | journal = International Journal of Computer Vision | volume = 115 | issue = 3| pages = 211–252 | doi=10.1007/s11263-015-0816-y| arxiv = 1409.0575 | hdl = 1721.1/104944 | s2cid = 2930547 }}{{cite web|url=https://cocodataset.org/|title=COCO – Common Objects in Context|website=cocodataset.org}}

|T. Lin et al.

ImageNet

|Labeled object image database, used in the ImageNet Large Scale Visual Recognition Challenge

|Labeled objects, bounding boxes, descriptive words, SIFT features

|14,197,122

|Images, text

|Object recognition, scene recognition

|2009 (2014)

|Deng, Jia, et al. "[https://www.researchgate.net/profile/Li_Jia_Li/publication/221361415_ImageNet_a_Large-Scale_Hierarchical_Image_Database/links/00b495388120dbc339000000/ImageNet-a-Large-Scale-Hierarchical-Image-Database.pdf Imagenet: A large-scale hierarchical image database]."Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.{{cite journal|last1=Russakovsky|first1=Olga|last2=Deng|first2=Jia|last3=Su|first3=Hao|last4=Krause|first4=Jonathan|last5=Satheesh|first5=Sanjeev|last6=Ma|first6=Sean|last7=Huang|first7=Zhiheng|last8=Karpathy|first8=Andrej|last9=Khosla|first9=Aditya|last10=Bernstein|first10=Michael|last11=Berg|first11=Alexander C.|last12=Fei-Fei|first12=Li|display-authors=5|title=ImageNet Large Scale Visual Recognition Challenge|journal=International Journal of Computer Vision|date=11 April 2015|volume=115|issue=3|pages=211–252|doi=10.1007/s11263-015-0816-y|arxiv=1409.0575|hdl=1721.1/104944|s2cid=2930547}}

|J. Deng et al.

SUN (Scene UNderstanding)

|Very large scene and object recognition database.

|Places and objects are labeled. Objects are segmented.

|131,067

|Images, text

|Object recognition, scene recognition

|2014

|{{Cite book |last1=Xiao |first1=Jianxiong |last2=Hays |first2=James |last3=Ehinger |first3=Krista A. |last4=Oliva |first4=Aude |last5=Torralba |first5=Antonio |chapter=SUN database: Large-scale scene recognition from abbey to zoo |date=June 2010 |title=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition |chapter-url=https://ieeexplore.ieee.org/document/5539970 |publisher=IEEE |pages=3485–3492 |doi=10.1109/cvpr.2010.5539970|hdl=1721.1/60690 |isbn=978-1-4244-6984-0 |hdl-access=free }}{{cite arXiv |eprint=1310.1531 |last1=Donahue |first1=Jeff |title=DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition |last2=Jia |first2=Yangqing |last3=Vinyals |first3=Oriol |last4=Hoffman |first4=Judy |last5=Zhang |first5=Ning |last6=Tzeng |first6=Eric |last7=Darrell |first7=Trevor |class=cs.CV |year=2013}}

|J. Xiao et al.

LSUN (Large SUN)

|10 scene categories (bedroom, etc) and 20 object categories (airplane, etc)

|Images and labels.

|~60 million

|Images, text

|Object recognition, scene recognition

|2015

|{{cite arXiv |last1=Yu |first1=Fisher |title=LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop |date=2016-06-04 |eprint=1506.03365 |last2=Seff |first2=Ari |last3=Zhang |first3=Yinda |last4=Song |first4=Shuran |last5=Funkhouser |first5=Thomas |last6=Xiao |first6=Jianxiong|class=cs.CV }}{{Cite web |title=Index of /lsun/ |url=http://dl.yf.io/lsun/ |access-date=2024-09-19 |website=dl.yf.io}}{{Cite web |title=LSUN |url=https://complexity.cecs.ucf.edu/lsun/ |access-date=2024-09-19 |website=Complex Adaptive Systems Laboratory |language=en-US}}

|Yu et al.

LVIS (Large Vocabulary Instance Segmentation)

|segmentation masks for over 1000 entry-level object categories in images

|2.2 million segmentations, 164K images

|Images, segmentation masks.

|image segmentation masking

|2019

|{{Cite journal |last1=Gupta |first1=Agrim |last2=Dollar |first2=Piotr |last3=Girshick |first3=Ross |date=2019 |title=LVIS: A Dataset for Large Vocabulary Instance Segmentation |url=https://openaccess.thecvf.com/content_CVPR_2019/html/Gupta_LVIS_A_Dataset_for_Large_Vocabulary_Instance_Segmentation_CVPR_2019_paper.html |pages=5356–5364}}

Open Images

|A Large set of images listed as having CC BY 2.0 license with image-level labels and bounding boxes spanning thousands of classes.

|Image-level labels, Bounding boxes

|9,178,275

|Images, text

|Classification, Object recognition

|2017

(V7 : 2022)

|Ivan Krasin, Tom Duerig, Neil Alldrin, Andreas Veit, Sami Abu-El-Haija, Serge Belongie, David Cai, Zheyun Feng, Vittorio Ferrari, Victor Gomes, Abhinav Gupta, Dhyanesh Narayanan, Chen Sun, Gal Chechik, Kevin Murphy. "OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017. Available from https://github.com/openimages."

TV News Channel Commercial Detection Dataset

|TV commercials and news broadcasts.

|Audio and video features extracted from still images.

|129,685

|Text

|Clustering, classification

|2015

|Vyas, Apoorv, et al. "[https://dl.acm.org/citation.cfm?id=2683546 Commercial Block Detection in Broadcast News Videos]." Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing. ACM, 2014.Hauptmann, Alexander G., and Michael J. Witbrock. "[https://pdfs.semanticscholar.org/5c21/6db7892fa3f515d816f84893bfab1137f0b2.pdf Story segmentation and detection of commercials in broadcast news video]." Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on. IEEE, 1998.

|P. Guha et al.

Statlog (Image Segmentation) Dataset

|The instances were drawn randomly from a database of 7 outdoor images and hand-segmented to create a classification for every pixel.

|Many features calculated.

|2310

|Text

|Classification

|1990

|Tung, Anthony KH, Xin Xu, and Beng Chin Ooi. "[https://www.researchgate.net/profile/Anthony_Tung/publication/221214229_CURLER_Finding_and_Visualizing_Nonlinear_Correlated_Clusters/links/55b8691a08aed621de05cd92.pdf Curler: finding and visualizing nonlinear correlation clusters]." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005.

|University of Massachusetts

Caltech 101

|Pictures of objects.

|Detailed object outlines marked.

|9146

|Images

|Classification, object recognition

|2003

|Jarrett, Kevin, et al. "[https://ieeexplore.ieee.org/abstract/document/5459469/ What is the best multi-stage architecture for object recognition?]." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "[https://hal.inria.fr/inria-00548585/document Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories]."Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006.

|F. Li et al.

Caltech-256

|Large dataset of images for object classification.

|Images categorized and hand-sorted.

|30,607

|Images, Text

|Classification, object detection

|2007

|Griffin, G., A. Holub, and P. Perona. Caltech-256 object category dataset California Inst. Technol., Tech. Rep. 7694, 2007. Available: http://authors.library.caltech.edu/7694, 2007.Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999.

|G. Griffin et al.

COYO-700M

|Image–text-pair dataset

|10 billion pairs of alt-text and image sources in HTML documents in CommonCrawl

|746,972,269

|Images, Text

|Classification, Image-Language

|2022

|{{cite web |title=🐺 COYO-700M: Image-Text Pair Dataset |date=2022-11-03 |url=https://github.com/kakaobrain/coyo-dataset |publisher=Kakao Brain |access-date=2022-11-03}}

SIFT10M Dataset

|SIFT features of Caltech-256 dataset.

|Extensive SIFT feature extraction.

|11,164,866

|Text

|Classification, object detection

|2016

|Fu, Xiping, et al. "[https://pdfs.semanticscholar.org/9da2/abae3072fd9fcff0e13b8f00fc21f22d0085.pdf NOKMeans: Non-Orthogonal K-means Hashing]." Computer Vision—ACCV 2014. Springer International Publishing, 2014. 162–177.

|X. Fu et al.

LabelMe

|Annotated pictures of scenes.

|Objects outlined.

|187,240

|Images, text

|Classification, object detection

|2005

|{{cite journal | last1 = Heitz | first1 = Geremy | display-authors = et al | year = 2009 | title = Shape-based object localization for descriptive classification | journal = International Journal of Computer Vision | volume = 84 | issue = 1| pages = 40–62 | doi=10.1007/s11263-009-0228-y| citeseerx = 10.1.1.142.280 | s2cid = 646320 }}

|MIT Computer Science and Artificial Intelligence Laboratory

PASCAL VOC Dataset

|Images in 20 categories and localization bounding boxes.

|Labeling, bounding box included

|500,000

|Images, text

|Classification, object detection

|2010

|{{cite journal | last1 = Everingham | first1 = Mark | display-authors = et al | year = 2010 | title = The pascal visual object classes (voc) challenge | url = https://www.research.ed.ac.uk/portal/en/publications/the-pascal-visual-object-classes-voc-challenge(88a29de3-6220-442b-ab2d-284210cf72d6).html| journal = International Journal of Computer Vision | volume = 88 | issue = 2| pages = 303–338 | doi=10.1007/s11263-009-0275-4| hdl = 20.500.11820/88a29de3-6220-442b-ab2d-284210cf72d6 | s2cid = 4246903 | hdl-access = free }}{{cite journal | last1 = Felzenszwalb | first1 = Pedro F. | display-authors = et al | year = 2010 | title = Object detection with discriminatively trained part-based models | journal = IEEE Transactions on Pattern Analysis and Machine Intelligence | volume = 32 | issue = 9| pages = 1627–1645 | doi=10.1109/tpami.2009.167| pmid = 20634557 | citeseerx = 10.1.1.153.2745 | s2cid = 3198903 }}

|M. Everingham et al.

CIFAR-10 Dataset

|Many small, low-resolution, images of 10 classes of objects.

|Classes labelled, training set splits created.

|60,000

|Images

|Classification

|2009

|Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "[http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Imagenet classification with deep convolutional neural networks]." Advances in neural information processing systems. 2012.Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A procrustean approach to learning binary codes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.

|A. Krizhevsky et al.

CIFAR-100 Dataset

|Like CIFAR-10, above, but 100 classes of objects are given.

|Classes labelled, training set splits created.

|60,000

|Images

|Classification

|2009

|A. Krizhevsky et al.

CINIC-10 Dataset

|A unified contribution of CIFAR-10 and Imagenet with 10 classes, and 3 splits. Larger than CIFAR-10.

|Classes labelled, training, validation, test set splits created.

|270,000

|Images

|Classification

|2018

|{{cite web|title=CINIC-10 dataset|url=http://www.bayeswatch.com/2018/10/09/CINIC/|website=Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey (2018) CINIC-10 is not ImageNet or CIFAR-10|access-date=2018-11-13|date=2018-10-09}}

|Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey

Fashion-MNIST

|A MNIST-like fashion product database

|Classes labelled, training set splits created.

|60,000

|Images

|Classification

|2017

|{{cite web|title=fashion-mnist: A MNIST-like fashion product database. Benchmark :point_right|date=2017-10-07|url=https://github.com/zalandoresearch/fashion-mnist|publisher=Zalando Research|access-date=2017-10-07}}

|Zalando SE

notMNIST

|Some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A–J taken from different fonts.

|Classes labelled, training set splits created.

|500,000

|Images

|Classification

|2011

|{{cite web|title=notMNIST dataset|url=http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html|website=Machine Learning, etc|access-date=2017-10-13|date=2011-09-08}}

|Yaroslav Bulatov

Linnaeus 5 dataset

|Images of 5 classes of objects.

|Classes labelled, training set splits created.

|8000

|Images

|Classification

|2017

|Chaladze, G., Kalatozishvili, L. (2017). Linnaeus 5 dataset. Chaladze.com. Retrieved 13 November 2017, from http://chaladze.com/l5/

|Chaladze & Kalatozishvili

11K Hands

|11,076 hand images (1600 x 1200 pixels) of 190 subjects, of varying ages between 18 – 75 years old, for gender recognition and biometric identification.

|None

|11,076 hand images

|Images and (.mat, .txt, and .csv) label files

|Gender recognition and biometric identification

|2017

|{{cite arXiv|last=Afifi|first=Mahmoud|date=2017-11-12|title=Gender recognition and biometric identification using a large dataset of hand images|eprint=1711.04322|class=cs.CV}}

|M Afifi

CORe50

|Specifically designed for Continuous/Lifelong Learning and Object Recognition, is a collection of more than 500 videos (30fps) of 50 domestic objects belonging to 10 different categories.

|Classes labelled, training set splits created based on a 3-way, multi-runs benchmark.

|164,866 RBG-D images

|images (.png or .pkl)

and (.pkl, .txt, .tsv) label files

|Classification, Object recognition

|2017

|{{Cite arXiv|last1=Lomonaco|first1=Vincenzo|last2=Maltoni|first2=Davide|date=2017-10-18|title=CORe50: a New Dataset and Benchmark for Continuous Object Recognition|eprint=1705.03550|class=cs.CV}}

|V. Lomonaco and D. Maltoni

OpenLORIS-Object

|Lifelong/Continual Robotic Vision dataset (OpenLORIS-Object) collected by real robots mounted with multiple high-resolution sensors, includes a collection of 121 object instances (1st version of dataset, 40 categories daily necessities objects under 20 scenes). The dataset has rigorously considered 4 environment factors under different scenes, including illumination, occlusion, object pixel size and clutter, and defines the difficulty levels of each factor explicitly.

|Classes labelled, training/validation/testing set splits created by benchmark scripts.

|1,106,424 RBG-D images

|images (.png and .pkl)

and (.pkl) label files

|Classification, Lifelong object recognition, Robotic Vision

|2019

|{{Cite arXiv|last1=She|first1=Qi|last2=Feng|first2=Fan|last3=Hao|first3=Xinyue|last4=Yang|first4=Qihan|last5=Lan|first5=Chuanlin|last6=Lomonaco|first6=Vincenzo|last7=Shi|first7=Xuesong|last8=Wang|first8=Zhengwei|last9=Guo|first9=Yao|last10=Zhang|first10=Yimin|last11=Qiao|first11=Fei|last12=Chan|first12=Rosa H.M.|date=2019-11-15|title=OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning|eprint=1911.06487v2|class=cs.CV}}

|Q. She et al.

THz and thermal video data set

|This multispectral data set includes terahertz, thermal, visual, near infrared, and three-dimensional videos of objects hidden under people's clothes.

|images and 3D point clouds

|More than 20 videos. The duration of each video is about 85 seconds (about 345 frames).

|AP2J

|Experiments with hidden object detection

|2019

|{{cite web|url=http://www.fullvision.ru/monitoring/description_eng.php|last1=Morozov|first1=Alexei|last2=Sushkova|first2=Olga|date=2019-06-13|title=THz and thermal video data set|publisher=IRE RAS|website=Development of the multi-agent logic programming approach to a human behaviour analysis in a multi-channel video surveillance|access-date=2019-07-19|location=Moscow}}{{cite journal |last1=Morozov|first1=Alexei|last2=Sushkova|first2=Olga|last3=Kershner|first3=Ivan|last4=Polupanov|first4=Alexander|date=2019-07-09|title=Development of a method of terahertz intelligent video surveillance based on the semantic fusion of terahertz and 3D video images|url=http://ceur-ws.org/Vol-2391/paper19.pdf|journal=CEUR|volume=2391|pages=paper19|access-date=2019-07-19}}

|Alexei A. Morozov and Olga S. Sushkova

= 3D Objects =

See (Calli et al, 2015){{Cite journal |last1=Calli |first1=Berk |last2=Walsman |first2=Aaron |last3=Singh |first3=Arjun |last4=Srinivasa |first4=Siddhartha |last5=Abbeel |first5=Pieter |last6=Dollar |first6=Aaron M. |date=September 2015 |title=Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set |url=https://ieeexplore.ieee.org/document/7254318 |journal=IEEE Robotics & Automation Magazine |volume=22 |issue=3 |pages=36–52 |doi=10.1109/MRA.2015.2448951 |issn=1070-9932|arxiv=1502.03143 }} for a review of 33 datasets of 3D object as of 2015. See (Downs et al., 2022){{Cite book |last1=Downs |first1=Laura |last2=Francis |first2=Anthony |last3=Koenig |first3=Nate |last4=Kinman |first4=Brandon |last5=Hickman |first5=Ryan |last6=Reymann |first6=Krista |last7=McHugh |first7=Thomas B. |last8=Vanhoucke |first8=Vincent |chapter=Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items |date=2022-05-23 |title=2022 International Conference on Robotics and Automation (ICRA) |chapter-url=https://ieeexplore.ieee.org/document/9811809 |publisher=IEEE |pages=2553–2560 |doi=10.1109/ICRA46639.2022.9811809 |isbn=978-1-7281-9681-7|arxiv=2204.11918 }} for a review of more datasets as of 2022.

class="wikitable sortable" style="width: 100%"

! scope="col" style="width: 15%;" | Dataset Name

! scope="col" style="width: 18%;" | Brief description

! scope="col" style="width: 18%;" | Preprocessing

! scope="col" style="width: 6%;" | Instances

! scope="col" style="width: 7%;" | Format

! scope="col" style="width: 7%;" | Default Task

! scope="col" style="width: 6%;" | Created (updated)

! scope="col" style="width: 6%;" | Reference

! scope="col" style="width: 11%;" | Creator

Princeton Shape Benchmark

|3D polygonal models collected from the Internet

|1814 models in 92 categories

|3D polygonal models, categories

|shape-based retrieval and analysis

|2004

|{{Cite web |title=Princeton Shape Benchmark |url=https://shape.cs.princeton.edu/benchmark/main.html |access-date=2025-03-07 |website=shape.cs.princeton.edu}}{{Cite book |last1=Shilane |first1=P. |last2=Min |first2=P. |last3=Kazhdan |first3=M. |last4=Funkhouser |first4=T. |chapter=The princeton shape benchmark |date=2004 |title=Proceedings Shape Modeling Applications, 2004 |chapter-url=https://ieeexplore.ieee.org/document/1314504 |publisher=IEEE |pages=167–388 |doi=10.1109/SMI.2004.1314504 |isbn=978-0-7695-2075-9}}

|Shilane et al.

Berkeley 3-D Object Dataset (B3DO)

|Depth and color images collected from crowdsourced Microsoft Kinect users. Annotated in 50 object categories.

|849 images, in 75 scenes

|color image, depth image, object class, bounding boxes, 3D center points

|Predict bounding boxes

|2011, updated 2014

|{{Citation |last1=Janoch |first1=Allison |title=A Category-Level 3D Object Dataset: Putting the Kinect to Work |date=2013 |work=Consumer Depth Cameras for Computer Vision: Research Topics and Applications |pages=141–165 |editor-last=Fossati |editor-first=Andrea |url=https://link.springer.com/chapter/10.1007/978-1-4471-4640-7_8 |access-date=2025-03-07 |place=London |publisher=Springer |language=en |doi=10.1007/978-1-4471-4640-7_8 |isbn=978-1-4471-4640-7 |last2=Karayev |first2=Sergey |last3=Jia |first3=Yangqing |last4=Barron |first4=Jonathan T. |last5=Fritz |first5=Mario |last6=Saenko |first6=Kate |last7=Darrell |first7=Trevor |editor2-last=Gall |editor2-first=Juergen |editor3-last=Grabner |editor3-first=Helmut |editor4-last=Ren |editor4-first=Xiaofeng}}

|Janoch et al.

ShapeNet

|3D models. Some are classified into WordNet synsets, like ImageNet. Partially classified into 3,135 categories.

|3,000,000 models, 220,000 of which are classified.

|3D models, class labels

|Predict class label.

|2015

|{{cite arXiv |last1=Chang |first1=Angel X. |title=ShapeNet: An Information-Rich 3D Model Repository |date=2015-12-09 |eprint=1512.03012 |last2=Funkhouser |first2=Thomas |last3=Guibas |first3=Leonidas |last4=Hanrahan |first4=Pat |last5=Huang |first5=Qixing |last6=Li |first6=Zimo |last7=Savarese |first7=Silvio |last8=Savva |first8=Manolis |last9=Song |first9=Shuran|class=cs.GR }}

|Chang et al.

ObjectNet3D

|Images, 3D shapes, and objects 100 categories.

|90127 images, 201888 objects, 44147 3D shapes

|images, 3D shapes, object bounding boxes, category labels

|recognizing the 3D pose and 3D shape of objects from 2D images

|2016

|{{Cite web |title=Computational Vision and Geometry Lab |url=https://cvgl.stanford.edu/projects/objectnet3d/ |access-date=2025-03-07 |website=cvgl.stanford.edu}}{{Cite book |last1=Xiang |first1=Yu |last2=Kim |first2=Wonhui |last3=Chen |first3=Wei |last4=Ji |first4=Jingwei |last5=Choy |first5=Christopher |last6=Su |first6=Hao |last7=Mottaghi |first7=Roozbeh |last8=Guibas |first8=Leonidas |last9=Savarese |first9=Silvio |chapter=ObjectNet3D: A Large Scale Database for 3D Object Recognition |series=Lecture Notes in Computer Science |date=2016 |volume=9912 |editor-last=Leibe |editor-first=Bastian |editor2-last=Matas |editor2-first=Jiri |editor3-last=Sebe |editor3-first=Nicu |editor4-last=Welling |editor4-first=Max |title=Computer Vision – ECCV 2016 |chapter-url=https://link.springer.com/chapter/10.1007/978-3-319-46484-8_10 |language=en |location=Cham |publisher=Springer International Publishing |pages=160–176 |doi=10.1007/978-3-319-46484-8_10 |isbn=978-3-319-46484-8}}

|Xiang et al.

Common Objects in 3D (CO3D)

|Video frames from videos capturing objects from 50 MS-COCO categories, filmed by people on Amazon Mechanical Turk.

|6 million frames from 40000 videos

|multi-view images, camera poses, 3D point clouds, object category

|Predict object category. Generate objects.

|2021, updated 2022 as CO3Dv2

|{{Cite journal |last1=Reizenstein |first1=Jeremy |last2=Shapovalov |first2=Roman |last3=Henzler |first3=Philipp |last4=Sbordone |first4=Luca |last5=Labatut |first5=Patrick |last6=Novotny |first6=David |date=2021 |title=Common Objects in 3D: Large-Scale Learning and Evaluation of Real-Life 3D Category Reconstruction |url=https://openaccess.thecvf.com/content/ICCV2021/html/Reizenstein_Common_Objects_in_3D_Large-Scale_Learning_and_Evaluation_of_Real-Life_ICCV_2021_paper.html |language=en |pages=10901–10911}}{{cite arXiv |last1=Reizenstein |first1=Jeremy |title=Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction |date=2021-09-01 |eprint=2109.00512 |last2=Shapovalov |first2=Roman |last3=Henzler |first3=Philipp |last4=Sbordone |first4=Luca |last5=Labatut |first5=Patrick |last6=Novotny |first6=David|class=cs.CV }}

|Meta AI

Google Scanned Objects

|Scanned objects in SDF format.

|over 10 million

|2022

|Google AI

Objectverse-XL

|3D objects

|over 10 million

|3D objects, metadata

|novel view synthesis, 3D object generation

|2023

|{{Cite journal |last1=Deitke |first1=Matt |last2=Liu |first2=Ruoshi |last3=Wallingford |first3=Matthew |last4=Ngo |first4=Huong |last5=Michel |first5=Oscar |last6=Kusupati |first6=Aditya |last7=Fan |first7=Alan |last8=Laforte |first8=Christian |last9=Voleti |first9=Vikram |last10=Gadre |first10=Samir Yitzhak |last11=VanderBilt |first11=Eli |last12=Kembhavi |first12=Aniruddha |last13=Vondrick |first13=Carl |last14=Gkioxari |first14=Georgia |last15=Ehsani |first15=Kiana |date=2023-12-15 |title=Objaverse-XL: A Universe of 10M+ 3D Objects |url=https://proceedings.neurips.cc/paper_files/paper/2023/hash/70364304877b5e767de4e9a2a511be0c-Abstract-Datasets_and_Benchmarks.html |journal=Advances in Neural Information Processing Systems |language=en |volume=36 |pages=35799–35813}}

|Deitke et al.

OmniObject3D

|Scanned objects, labelled in 190 daily categories

|6,000

|textured meshes, point clouds, multiview images, videos

|robust 3D perception, novel-view synthesis,surface reconstruction, 3D object generation

|2023

|{{Cite journal |last1=Wu |first1=Tong |last2=Zhang |first2=Jiarui |last3=Fu |first3=Xiao |last4=Wang |first4=Yuxin |last5=Ren |first5=Jiawei |last6=Pan |first6=Liang |last7=Wu |first7=Wayne |last8=Yang |first8=Lei |last9=Wang |first9=Jiaqi |last10=Qian |first10=Chen |last11=Lin |first11=Dahua |last12=Liu |first12=Ziwei |date=2023 |title=OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation |url=https://openaccess.thecvf.com/content/CVPR2023/html/Wu_OmniObject3D_Large-Vocabulary_3D_Object_Dataset_for_Realistic_Perception_Reconstruction_and_CVPR_2023_paper.html |language=en |pages=803–814}}{{Cite web |title=OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation |url=https://omniobject3d.github.io/ |access-date=2025-03-07 |website=omniobject3d.github.io}}

|Wu et al.

UnCommon Objects in 3D (uCO3D)

|1,070 categories in the LVIS

|2025

|{{Cite web |title=UnCommon Objects in 3D |url=https://uco3d.github.io/ |access-date=2025-03-07 |website=uco3d.github.io}}{{cite arXiv |last1=Liu |first1=Xingchen |title=UnCommon Objects in 3D |date=2025-01-13 |eprint=2501.07574 |last2=Tayal |first2=Piyush |last3=Wang |first3=Jianyuan |last4=Zarzar |first4=Jesus |last5=Monnier |first5=Tom |last6=Tertikas |first6=Konstantinos |last7=Duan |first7=Jiali |last8=Toisoul |first8=Antoine |last9=Zhang |first9=Jason Y.|class=cs.CV }}

|Meta AI

=Object detection and recognition for autonomous vehicles=

class="wikitable sortable" style="width: 100%"

! scope="col" style="width: 15%;" | Dataset Name

! scope="col" style="width: 18%;" | Brief description

! scope="col" style="width: 18%;" | Preprocessing

! scope="col" style="width: 6%;" | Instances

! scope="col" style="width: 7%;" | Format

! scope="col" style="width: 7%;" | Default Task

! scope="col" style="width: 6%;" | Created (updated)

! scope="col" style="width: 6%;" | Reference

! scope="col" style="width: 11%;" | Creator

Cityscapes Dataset

|Stereo video sequences recorded in street scenes, with pixel-level annotations. Metadata also included.

|Pixel-level segmentation and labeling

|25,000

|Images, text

|Classification, object detection

|2016

|M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "[https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2015cvprw.pdf The Cityscapes Dataset]." In CVPR Workshop on The Future of Datasets in Vision, 2015.

|Daimler AG et al.

German Traffic Sign Detection Benchmark Dataset

|Images from vehicles of traffic signs on German roads. These signs comply with UN standards and therefore are the same as in other countries.

|Signs manually labeled

|900

|Images

|Classification

|2013

|Houben, Sebastian, et al. "[https://www.researchgate.net/profile/Sebastian_Houben/publication/242346625_Detection_of_Traffic_Signs_in_Real-World_Images_The_German_Traffic_Sign_Detection_Benchmark/links/0046352a03ec384e97000000/Detection-of-Traffic-Signs-in-Real-World-Images-The-German-Traffic-Sign-Detection-Benchmark.pdf Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark]." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.Mathias, Mayeul, et al. "[http://www.varcity.eu/paper/ijcnn2013_mathias_trafficsign.pdf Traffic sign recognition—How far are we from the solution?]." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.

|S. Houben et al.

{{anchor|KITTI}}KITTI Vision Benchmark Dataset

|Autonomous vehicles driving through a mid-size city captured images of various areas using cameras and laser scanners.

|Many benchmarks extracted from data.

|>100 GB of data

|Images, text

|Classification, object detection

|2012

|Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "[https://www.cvlibs.net/publications/Geiger2012CVPR.pdf Are we ready for autonomous driving? the kitti vision benchmark suite]." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.Sturm, Jürgen, et al. "[http://jsturm.de/publications/data/sturm12iros.pdf A benchmark for the evaluation of RGB-D SLAM systems]." Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 2012.{{YouTube|KXpZ6B1YB_k|The KITTI Vision Benchmark Suite}}

|A. Geiger et al.

FieldSAFE

|Multi-modal dataset for obstacle detection in agriculture including stereo camera, thermal camera, web camera, 360-degree camera, lidar, radar, and precise localization.

|Classes labelled geographically.

|>400 GB of data

|Images and 3D point clouds

|Classification, object detection, object localization

|2017

|{{cite journal | last1 = Kragh | first1 = Mikkel F. | display-authors = et al | year = 2017 | title = FieldSAFE – Dataset for Obstacle Detection in Agriculture | url = https://vision.eng.au.dk/fieldsafe | journal = Sensors | volume = 17 | issue = 11 | pages = 2579| doi = 10.3390/s17112579 | pmid = 29120383 | pmc = 5713196 | bibcode = 2017Senso..17.2579K| arxiv = 1709.03526 | doi-access = free }}

| M. Kragh et al.

Daimler Monocular Pedestrian Detection dataset

|It is a dataset of pedestrians in urban environments.

|Pedestrians are box-wise labeled.

|Labeled part contains 15560 samples with pedestrians and 6744 samples without. Test set contains 21790 images without labels.

|Images

|Object recognition and classification

|2006

|{{cite web |title=Papers with Code - Daimler Monocular Pedestrian Detection Dataset |url=https://paperswithcode.com/dataset/daimler-monocular-pedestrian-detection |website=paperswithcode.com |access-date=5 May 2023 |language=en}}{{cite journal |last1=Enzweiler |first1=Markus |last2=Gavrila |first2=Dariu M. |title=Monocular Pedestrian Detection: Survey and Experiments |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |date=December 2009 |volume=31 |issue=12 |pages=2179–2195 |doi=10.1109/TPAMI.2008.260 |pmid=19834140 |s2cid=1192198 |url=https://ieeexplore.ieee.org/document/4657363 |issn=1939-3539}}{{cite arXiv |last1=Yin |first1=Guojun |last2=Liu |first2=Bin |last3=Zhu |first3=Huihui |last4=Gong |first4=Tao |last5=Yu |first5=Nenghai |title=A Large Scale Urban Surveillance Video Dataset for Multiple-Object Tracking and Behavior Analysis |date=28 July 2020 |class=cs.CV |eprint=1904.11784 }}

|Daimler AG

CamVid

|The Cambridge-driving Labeled Video Database (CamVid) is a collection of videos.

|The dataset is labeled with semantic labels for 32 semantic classes.

|over 700 images

|Images

|Object recognition and classification

|2008

|{{cite web |title=Object Recognition in Video Dataset |url=https://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/ |website=mi.eng.cam.ac.uk |access-date=5 May 2023}}{{cite book |last1=Brostow |first1=Gabriel J. |last2=Shotton |first2=Jamie |last3=Fauqueur |first3=Julien |last4=Cipolla |first4=Roberto |title=Computer Vision – ECCV 2008 |chapter=Segmentation and Recognition Using Structure from Motion Point Clouds |series=Lecture Notes in Computer Science |date=2008 |volume=5302 |pages=44–57 |doi=10.1007/978-3-540-88682-2_5 |chapter-url=https://link.springer.com/chapter/10.1007/978-3-540-88682-2_5 |publisher=Springer |isbn=978-3-540-88681-5 |language=en}}{{cite journal |last1=Brostow |first1=Gabriel J. |last2=Fauqueur |first2=Julien |last3=Cipolla |first3=Roberto |title=Semantic object classes in video: A high-definition ground truth database |journal=Pattern Recognition Letters |date=15 January 2009 |volume=30 |issue=2 |pages=88–97 |doi=10.1016/j.patrec.2008.04.005 |bibcode=2009PaReL..30...88B |url=https://www.sciencedirect.com/science/article/abs/pii/S0167865508001220 |language=en |issn=0167-8655}}

|Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, Roberto Cipolla

RailSem19

|RailSem19 is a dataset for understanding scenes for vision systems on railways.

|The dataset is labeled semanticly and box-wise.

|8500

|Images

|Object recognition and classification, scene recognition

|2019

|{{cite web |title=WildDash 2 Benchmark |url=https://wilddash.cc/railsem19 |website=wilddash.cc |access-date=5 May 2023}}{{cite book |last1=Zendel |first1=Oliver |last2=Murschitz |first2=Markus |last3=Zeilinger |first3=Marcel |last4=Steininger |first4=Daniel |last5=Abbasi |first5=Sara |last6=Beleznai |first6=Csaba |title=2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |chapter=RailSem19: A Dataset for Semantic Rail Scene Understanding |date=June 2019 |pages=1221–1229 |doi=10.1109/CVPRW.2019.00161 |isbn=978-1-7281-2506-0 |s2cid=198166233 |chapter-url=https://ieeexplore.ieee.org/document/9025646}}

|Oliver Zendel, Markus Murschitz, Marcel Zeilinger, Daniel Steininger, Sara Abbasi, Csaba Beleznai

BOREAS

|BOREAS is a multi-season autonomous driving dataset. It includes data from includes a Velodyne Alpha-Prime (128-beam) lidar, a FLIR Blackfly S camera, a Navtech CIR304-H radar, and an Applanix POS LV GNSS-INS.

|The data is annotated by 3D bounding boxes.

|350 km of driving data

|Images, Lidar and Radar data

|Object recognition and classification, scene recognition

|2023

|{{cite web |title=The Boreas Dataset |url=https://www.boreas.utias.utoronto.ca/#/ |website=www.boreas.utias.utoronto.ca |access-date=5 May 2023}}{{cite arXiv |last1=Burnett |first1=Keenan |last2=Yoon |first2=David J. |last3=Wu |first3=Yuchen |last4=Li |first4=Andrew Zou |last5=Zhang |first5=Haowei |last6=Lu |first6=Shichen |last7=Qian |first7=Jingxing |last8=Tseng |first8=Wei-Kang |last9=Lambert |first9=Andrew |last10=Leung |first10=Keith Y. K. |last11=Schoellig |first11=Angela P.|author11-link=Angela Schoellig |last12=Barfoot |first12=Timothy D. |title=Boreas: A Multi-Season Autonomous Driving Dataset |date=26 January 2023 |class=cs.RO |eprint=2203.10168 }}

|Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y.K. Leung, Angela P. Schoellig, Timothy D. Barfoot

Bosch Small Traffic Lights Dataset

|It is a dataset of traffic lights.

|The labeling include bounding boxes of traffic lights together with their state (active light).

|5000 images for training and a video sequence of 8334 frames for evaluation

|Images

|Traffic light recognition

|2017

|{{cite web |title=Bosch Small Traffic Lights Dataset |url=https://hci.iwr.uni-heidelberg.de/content/bosch-small-traffic-lights-dataset |website=hci.iwr.uni-heidelberg.de |access-date=5 May 2023 |language=en |date=1 March 2017}}{{cite book |last1=Behrendt |first1=Karsten |last2=Novak |first2=Libor |last3=Botros |first3=Rami |title=2017 IEEE International Conference on Robotics and Automation (ICRA) |chapter=A deep learning approach to traffic lights: Detection, tracking, and classification |date=May 2017 |pages=1370–1377 |doi=10.1109/ICRA.2017.7989163 |isbn=978-1-5090-4633-1 |s2cid=6257133 |chapter-url=https://ieeexplore.ieee.org/document/7989163}}

|Karsten Behrendt, Libor Novak, Rami Botros

FRSign

|It is a dataset of French railway signals.

|The labeling include bounding boxes of railway signals together with their state (active light).

|more than 100000

|Images

|Railway signal recognition

|2020

|{{cite web |title=FRSign Dataset |url=https://frsign.irt-systemx.fr/ |website=frsign.irt-systemx.fr |access-date=5 May 2023}}{{cite arXiv |last1=Harb |first1=Jeanine |last2=Rébéna |first2=Nicolas |last3=Chosidow |first3=Raphaël |last4=Roblin |first4=Grégoire |last5=Potarusov |first5=Roman |last6=Hajri |first6=Hatem |title=FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains |date=5 February 2020 |class=cs.CY |eprint=2002.05665 }}

|Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri

{{anchor|GERALD}}GERALD

|It is a dataset of German railway signals.

|The labeling include bounding boxes of railway signals together with their state (active light).

|5000

|Images

|Railway signal recognition

|2023

|{{cite web |title=ifs-rwth-aachen/GERALD |url=https://github.com/ifs-rwth-aachen/GERALD |publisher=Chair and Institute for Rail Vehicles and Transport Systems |access-date=5 May 2023 |date=30 April 2023}}{{cite journal |last1=Leibner |first1=Philipp |last2=Hampel |first2=Fabian |last3=Schindler |first3=Christian |title=GERALD: A novel dataset for the detection of German mainline railway signals |journal=Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit |date=3 April 2023 |volume=237 |issue=10 |pages=1332–1342 |doi=10.1177/09544097231166472 |s2cid=257939937 |url=https://journals.sagepub.com/doi/abs/10.1177/09544097231166472 |language=en |issn=0954-4097}}

|Philipp Leibner, Fabian Hampel, Christian Schindler

Multi-cue pedestrian

|Multi-cue onboard pedestrian detection dataset is a dataset for detection of pedestrians.

|The databaset is labeled box-wise.

|1092 image pairs with 1776 boxes for pedestrians

|Images

|Object recognition and classification

|2009

|{{cite book |last1=Wojek |first1=Christian |last2=Walk |first2=Stefan |last3=Schiele |first3=Bernt |title=2009 IEEE Conference on Computer Vision and Pattern Recognition |chapter=Multi-cue onboard pedestrian detection |date=June 2009 |pages=794–801 |doi=10.1109/CVPR.2009.5206638 |isbn=978-1-4244-3992-8 |s2cid=18000078 |chapter-url=https://ieeexplore.ieee.org/document/5206638}}

|Christian Wojek, Stefan Walk, Bernt Schiele

RAWPED

|RAWPED is a dataset for detection of pedestrians in the context of railways.

|The dataset is labeled box-wise.

|26000

|Images

|Object recognition and classification

|2020

|{{cite journal |last1=Toprak |first1=Tuğçe |last2=Aydın |first2=Burak |last3=Belenlioğlu |first3=Burak |last4=Güzeliş |first4=Cüneyt |last5=Selver |first5=M. Alper |title= Conditional Weighted Ensemble of Transferred Models for Camera Based Onboard Pedestrian Detection in Railway Driver Support Systems|journal=IEEE Transactions on Vehicular Technology |url=https://zenodo.org/record/3741742 |access-date=5 May 2023 |date=5 April 2020|page=1 |doi=10.1109/TVT.2020.2983825 |s2cid=216510283 }}{{cite journal |last1=Toprak |first1=Tugce |last2=Belenlioglu |first2=Burak |last3=Aydın |first3=Burak |last4=Guzelis |first4=Cuneyt |last5=Selver |first5=M. Alper |title=Conditional Weighted Ensemble of Transferred Models for Camera Based Onboard Pedestrian Detection in Railway Driver Support Systems |journal=IEEE Transactions on Vehicular Technology |date=May 2020 |volume=69 |issue=5 |pages=5041–5054 |doi=10.1109/TVT.2020.2983825 |s2cid=216510283 |url=https://ieeexplore.ieee.org/document/9050835 |issn=1939-9359}}

|Tugce Toprak, Burak Belenlioglu, Burak Aydın, Cuneyt Guzelis, M. Alper Selver

OSDaR23

|OSDaR23 is a multi-sensory dataset for detection of objects in the context of railways.

|The databaset is labeled box-wise.

|16874 frames

|Images, Lidar, Radar and Infrared

|Object recognition and classification

|2023

|{{cite journal |last1=Tilly |first1=Roman |last2=Neumaier |first2=Philipp |last3=Schwalbe |first3=Karsten |last4=Klasek |first4=Pavel |last5=Tagiew |first5=Rustam |last6=Denzler |first6=Patrick |last7=Klockau |first7=Tobias |last8=Boekhoff |first8=Martin |last9=Köppel |first9=Martin |title=Open Sensor Data for Rail 2023 |date=2023 |doi=10.57806/9mv146r0 |journal=FID Move |language=de}}{{cite book |last1=Tagiew |first1=Rustam |last2=Köppel |first2=Martin |last3=Schwalbe |first3=Karsten |last4=Denzler |first4=Patrick |last5=Neumaier |first5=Philipp |last6=Klockau |first6=Tobias |last7=Boekhoff |first7=Martin |last8=Klasek |first8=Pavel |last9=Tilly |first9=Roman |title=2023 8th International Conference on Robotics and Automation Engineering (ICRAE) |chapter=OSDaR23: Open Sensor Data for Rail 2023 |date=4 May 2023 |pages=270–276 |doi=10.1109/ICRAE59816.2023.10458449 |arxiv=2305.03001 |isbn=979-8-3503-2765-6 }}

|Roman Tilly, Rustam Tagiew, Pavel Klasek (DZSF); Philipp Neumaier, Patrick Denzler, Tobias Klockau, Martin Boekhoff, Martin Köppel (Digitale Schiene Deutschland); Karsten Schwalbe (FusionSystems)

Agroverse

|Argoverse is a multi-sensory dataset for detection of objects in the context of roads.

|The dataset is annotated box-wise.

|320 hours of recording

|Data from 7 cameras and LiDAR

|Object recognition and classification, object tracking

|2022

|{{cite web |title=Home |url=https://www.argoverse.org/ |website=Argoverse |access-date=5 May 2023}}{{cite arXiv |last1=Chang |first1=Ming-Fang |last2=Lambert |first2=John |last3=Sangkloy |first3=Patsorn |last4=Singh |first4=Jagjeet |last5=Bak |first5=Slawomir |last6=Hartnett |first6=Andrew |last7=Wang |first7=De |last8=Carr |first8=Peter |last9=Lucey |first9=Simon |last10=Ramanan |first10=Deva |last11=Hays |first11=James |title=Argoverse: 3D Tracking and Forecasting with Rich Maps |date=6 November 2019 |class=cs.CV |eprint=1911.02620 }}

|Argo AI, Carnegie Mellon University, Georgia Institute of Technology

Rail3D

|Rail3D is a LiDAR dataset for railways recorded in Hungary, France, and Belgium

|The dataset is annotated semantically

|288 million annotated points

|LiDAR

|Object recognition and classification, object tracking

|2024

|{{cite journal |last1=Kharroubi |first1=Abderrazzaq |last2=Ballouch |first2=Zouhair |last3=Hajji |first3=Rafika |last4=Yarroudh |first4=Anass |last5=Billen |first5=Roland |title=Multi-Context Point Cloud Dataset and Machine Learning for Railway Semantic Segmentation |journal=Infrastructures |date=9 April 2024 |volume=9 |issue=4 |pages=71 |doi=10.3390/infrastructures9040071|doi-access=free }}

|Abderrazzaq Kharroubi, Ballouch Zouhair, Rafika Hajji, Anass Yarroudh, and Roland Billen; University of Liège and Hassan II Institute of Agronomy and Veterinary Medicine

WHU-Railway3D

|WHU-Railway3D is a LiDAR dataset for urban, rural, and plateau railways recorded in China

|The dataset is annotated semantically

|4.6 billion annotated data points

|LiDAR

|Object recognition and classification, object tracking

|2024

|{{cite journal |last1=Qiu |first1=Bo |last2=Zhou |first2=Yuzhou |last3=Dai |first3=Lei |last4=Wang |first4=Bing |last5=Li |first5=Jianping |last6=Dong |first6=Zhen |last7=Wen |first7=Chenglu |last8=Ma |first8=Zhiliang |last9=Yang |first9=Bisheng |title=WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation |journal=IEEE Transactions on Intelligent Transportation Systems |date=December 2024 |volume=25 |issue=12 |pages=20900–20916 |doi=10.1109/TITS.2024.3469546 |url=https://ieeexplore.ieee.org/document/10716569 |issn=1558-0016}}

|Bo Qiu, Yuzhou Zhou, Lei Dai; Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, Bisheng Yang; Wuhan University, University of Oxford, Hong Kong Polytechnic University, Nanyang Technological University, Xiamen University and Tsinghua University

RailFOD23

|A dataset of foreign objects on railway catenary

|The dataset is annotated boxwise

|14,615 images

|Images

|Object recognition and classification, object tracking

|2024

|{{cite journal |last1=Chen |first1=Zhichao |last2=Yang |first2=Jie |last3=Feng |first3=Zhicheng |last4=Zhu |first4=Hao |title=RailFOD23: A dataset for foreign object detection on railroad transmission lines |journal=Scientific Data |date=16 January 2024 |volume=11 |issue=1 |pages=72 |doi=10.1038/s41597-024-02918-9 |pmid=38228610 |pmc=10791632 |bibcode=2024NatSD..11...72C |language=en |issn=2052-4463}}

|Zhichao Chen, Jie Yang, Zhicheng Feng, Hao Zhu; Jiangxi University of Science and Technology

ESRORAD

|A dataset of images and point clouds for urban road and rail scenes from Le Havre and Rouen

|The dataset is annotated boxwise

|2,700 k virtual images and 100,000 real images

|Images, LiDAR

|Object recognition and classification, object tracking

|2022

|{{cite journal |last1=Khemmar |first1=Redouane |last2=Mauri |first2=Antoine |last3=Dulompont |first3=Camille |last4=Gajula |first4=Jayadeep |last5=Vauchey |first5=Vincent |last6=Haddad |first6=Madjid |last7=Boutteau |first7=Rémi |title=Road and Railway Smart Mobility: A High-Definition Ground Truth Hybrid Dataset |journal=Sensors |date=22 May 2022 |volume=22 |issue=10 |pages=3922 |doi=10.3390/s22103922|doi-access=free |pmid=35632331 |bibcode=2022Senso..22.3922K |pmc=9143394 }}

|Redouane Khemmar, Antoine Mauri, Camille Dulompont, Jayadeep Gajula, Vincent Vauchey, Madjid Haddad and Rémi Boutteau; Le Havre Normandy University and SEGULA Technologies

RailVID

|Data recorded by AT615X infrared thermography from InfiRay in diverse railway scenarios, including carport, depot, and straight.

|The dataset is annotated semantically

|1,071 images

|infrared images

|Object recognition and classification, object tracking

|2022

|{{cite book |title=ICONS 2022: the seventeenth International Conference on Systems: April 24-28, 2022, Barcelona, Spain |date=2022 |publisher=IARIA |location=Wilmington, DE, USA |isbn=978-1-61208-941-6}}

|Hao Yuan, Zhenkun Mei, Yihao Chen, Weilong Niu, Cheng Wu; Soochow University

RailPC

|LiDAR dataset in the context of railways

|The dataset is annotated semantically

|3 billion data points

|LiDAR

|Object recognition and classification, object tracking

|2024

|{{cite journal |last1=Jiang |first1=Tengping |last2=Li |first2=Shiwei |last3=Zhang |first3=Qinyu |last4=Wang |first4=Guangshuai |last5=Zhang |first5=Zequn |last6=Zeng |first6=Fankun |last7=An |first7=Peng |last8=Jin |first8=Xin |last9=Liu |first9=Shan |last10=Wang |first10=Yongjun |title=RailPC: A large-scale railway point cloud semantic segmentation dataset |journal=CAAI Transactions on Intelligence Technology |date=2024 |volume=9 |issue=6 |pages=1548–1560 |doi=10.1049/cit2.12349 |language=en |issn=2468-2322|doi-access=free }}

|Tengping Jiang, Shiwei Li, Qinyu Zhang, Guangshuai Wang, Zequn Zhang, Fankun Zeng, Peng An, Xin Jin, Shan Liu, Yongjun Wang ; Nanjing Normal University, Ministry of Natural Resources, Eastern Institute of Technology, Tianjin Key Laboratory of Rail Transit Navigation Positioning and Spatio‐temporal Big Data Technology, Northwest Normal University, Washington University in St. Louis and Ningbo University of Technology

RailCloud-HdF

|LiDAR dataset in the context of railways

|The dataset is annotated semantically

|8060.3 million data points

|LiDAR

|Object recognition and classification, object tracking

|2024

|{{cite book |last1=Abid |first1=Mahdi |last2=Teixeira |first2=Mathis |last3=Mahtani |first3=Ankur |last4=Laurent |first4=Thomas |title=Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications |chapter=RailCloud-HdF: A Large-Scale Point Cloud Dataset for Railway Scene Semantic Segmentation |date=2024 |pages=159–170 |doi=10.5220/0012394800003660|isbn=978-989-758-679-8 }}

|Mahdi Abid , Mathis Teixeira, Ankur Mahtani and Thomas Laurent; Railenium

RailGoerl24

|RGB and LiDAR dataset in the context of railways

|The dataset is annotated boxwise

|12205 HD RGB frames and 383922305 LiDAR colored cloud points

|RGB, LiDAR

|Person recognition and classification

|2025

|{{cite journal |last1=Rustam |first1=Tagiew |last2=Ilkay |first2=Wunderlich |last3=Philipp |first3=Zanitzer |last4=Mark |first4=Sastuba |last5=Carsten |first5=Knoll |last6=Kilian |first6=Göller |last7=Haadia |first7=Amjad |last8=Steffen |first8=Seitz |title=Görlitz Rail Test Center CV Dataset 2024 (RailGoerl24) |journal=German National Library of Science and Technology |date=2025 |url=https://data.fid-move.de/de/dataset/railgoerl24}}

|DZSF, PECS-WORK GmbH, EYYES Deutschland GmbH, TU Dresden

Facial recognition

In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces. See {{Cite web |title=Face Recognition Homepage - Databases |url=https://www.face-rec.org/databases/ |access-date=2025-04-26 |website=www.face-rec.org}} for a curated list of datasets, focused on the pre-2005 period.

class="wikitable sortable" style="width: 100%"

! scope="col" style="width: 15%;" |Dataset name

! scope="col" style="width: 18%;" | Brief description

! scope="col" style="width: 18%;" | Preprocessing

! scope="col" style="width: 6%;" | Instances

! scope="col" style="width: 7%;" | Format

! scope="col" style="width: 7%;" | Default task

! scope="col" style="width: 6%;" | Created (updated)

! scope="col" style="width: 6%;" | Reference

! scope="col" style="width: 11%;" | Creator

Labeled Faces in the Wild (LFW)

|Images of named individuals obtained by Internet search.

|frontal face detection, bounding box cropping

|13233 images of 5749 named individuals

|images, labels

|unconstrained face recognition

|2008

|Huang, Gary B., et al. [https://hal.inria.fr/docs/00/32/19/23/PDF/Huang_long_eccv2008-lfw.pdf Labeled faces in the wild: A database for studying face recognition in unconstrained environments]. Vol. 1. No. 2. Technical Report 07-49, University of Massachusetts, Amherst, 2007.{{Cite web |date=2012-12-01 |title=LFW Face Database : Main |url=http://vis-www.cs.umass.edu/lfw |url-status=dead |archive-url=https://web.archive.org/web/20121201044531/http://vis-www.cs.umass.edu/lfw |archive-date=2012-12-01 |access-date=2025-04-26 }}

|Huang et al.

Aff-Wild

|298 videos of 200 individuals, ~1,250,000 manually annotated images: annotated in terms of dimensional affect (valence-arousal); in-the-wild setting; color database; various resolutions (average = 640x360)

|the detected faces, facial landmarks and valence-arousal annotations

|~1,250,000 manually annotated images

|video (visual + audio modalities)

|affect recognition (valence-arousal estimation)

|2017

|CVPR{{Cite book|last1=Zafeiriou|first1=S.|last2=Kollias|first2=D.|last3=Nicolaou|first3=M.A.|last4=Papaioannou|first4=A.|last5=Zhao|first5=G.|last6=Kotsia|first6=I.|title=2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) |chapter=Aff-Wild: Valence and Arousal 'In-the-Wild' Challenge |date=2017|chapter-url=https://eprints.mdx.ac.uk/22045/1/aff_wild_kotsia.pdf|pages=1980–1987|doi=10.1109/CVPRW.2017.248|isbn=978-1-5386-0733-6|s2cid=3107614|url=http://urn.fi/urn:nbn:fi-fe201902276466 }}

IJCV{{Cite journal|last1=Kollias|first1=D.|last2=Tzirakis|first2=P.|last3=Nicolaou|first3=M.A.|last4=Papaioannou|first4=A.|last5=Zhao|first5=G.|last6=Schuller|first6=B.|last7=Kotsia|first7=I.|last8=Zafeiriou|first8=S.|date=2019|title=Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond|url=https://rdcu.be/bmGm2|journal=International Journal of Computer Vision |volume=127|issue=6–7|pages=907–929|doi=10.1007/s11263-019-01158-4|s2cid=13679040|doi-access=free|arxiv=1804.10938}}

|D. Kollias et al.

Aff-Wild2

|558 videos of 458 individuals, ~2,800,000 manually annotated images: annotated in terms of i) categorical affect (7 basic expressions: neutral, happiness, sadness, surprise, fear, disgust, anger); ii) dimensional affect (valence-arousal); iii) action units (AUs 1,2,4,6,12,15,20,25); in-the-wild setting; color database; various resolutions (average = 1030x630)

|the detected faces, detected and aligned faces and annotations

|~2,800,000 manually annotated images

|video (visual + audio modalities)

|affect recognition (valence-arousal estimation, basic expression classification, action unit detection)

|2019

|BMVC{{Cite journal|last1=Kollias|first1=D.|last2=Zafeiriou|first2=S.|date=2019|title=Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface|url=https://bmvc2019.org/wp-content/uploads/papers/0399-paper.pdf|journal=British Machine Vision Conference (BMVC), 2019|arxiv=1910.04855}}

FG{{Cite book|last1=Kollias|first1=D.|last2=Schulc|first2=A.|last3=Hajiyev|first3=E.|last4=Zafeiriou|first4=S.|title=2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) |chapter=Analysing Affective Behavior in the First ABAW 2020 Competition |date=2020|chapter-url=https://www.computer.org/csdl/proceedings-article/fg/2020/307900a794/1kecIYu9wL6|pages=637–643|doi=10.1109/FG47880.2020.00126|arxiv=2001.11409|isbn=978-1-7281-3079-8|s2cid=210966051}}

|D. Kollias et al.

FERET (facial recognition technology)

|11338 images of 1199 individuals in different positions and at different times.

|None.

|11,338

|Images

|Classification, face recognition

|2003

|{{cite journal | last1 = Phillips | first1 = P. Jonathon | display-authors = et al | year = 1998 | title = The FERET database and evaluation procedure for face-recognition algorithms | journal = Image and Vision Computing | volume = 16 | issue = 5| pages = 295–306 | doi=10.1016/s0262-8856(97)00070-x}}{{cite journal | last1 = Wiskott | first1 = Laurenz | display-authors = et al | year = 1997 | title = Face recognition by elastic bunch graph matching | journal = IEEE Transactions on Pattern Analysis and Machine Intelligence | volume = 19 | issue = 7| pages = 775–779 | doi=10.1109/34.598235| citeseerx = 10.1.1.44.2321 | s2cid = 30523165 }}

|United States Department of Defense

Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

|7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities.

|Files labelled with expression. Perceptual validation ratings provided by 319 raters.

|7,356

|Video, sound files

|Classification, face recognition, voice recognition

|2018

|{{Cite journal | doi=10.1371/journal.pone.0196391| pmid=29768426| pmc=5955500| title=The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English| journal=PLOS ONE| volume=13| issue=5| pages=e0196391| year=2018| last1=Livingstone| first1=Steven R.| last2=Russo| first2=Frank A.| bibcode=2018PLoSO..1396391L| doi-access=free}}{{Cite book | doi=10.5281/zenodo.1188976| year=2018| last1=Livingstone| first1=Steven R.| title=The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)| last2=Russo| first2=Frank A.| chapter=Emotion}}

|S.R. Livingstone and F.A. Russo

SCFace

|Color images of faces at various angles.

|Location of facial features extracted. Coordinates of features given.

|4,160

|Images, text

|Classification, face recognition

|2011

|{{cite journal | last1 = Grgic | first1 = Mislav | last2 = Delac | first2 = Kresimir | last3 = Grgic | first3 = Sonja | year = 2011 | title = SCface–surveillance cameras face database | journal = Multimedia Tools and Applications | volume = 51 | issue = 3| pages = 863–879 | doi = 10.1007/s11042-009-0417-2 | s2cid = 207218990 }}Wallace, Roy, et al. "[https://repository.ubn.ru.nl/bitstream/handle/2066/94489/94489.pdf Inter-session variability modelling and joint factor analysis for face authentication]." Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011.

|M. Grgic et al.

Yale Face Database

|Faces of 15 individuals in 11 different expressions.

|Labels of expressions.

|165

|Images

|Face recognition

|1997

|{{cite journal | last1 = Georghiades | first1 = A | title = Yale face database | journal = Center for Computational Vision and Control at Yale University| url=http://CVC.yale.edu/Projects/Yalefaces/Yalefa | volume = 2 | page = 1997 }}{{cite journal | last1 = Nguyen | first1 = Duy | display-authors = et al | year = 2006 | title = Real-time face detection and lip feature extraction using field-programmable gate arrays | journal = IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics| volume = 36 | issue = 4| pages = 902–912 | doi=10.1109/tsmcb.2005.862728| pmid = 16903373 | citeseerx = 10.1.1.156.9848 | s2cid = 7334355 }}

|J. Yang et al.

Cohn-Kanade AU-Coded Expression Database

|Large database of images with labels for expressions.

|Tracking of certain facial features.

|500+ sequences

|Images, text

|Facial expression analysis

|2000

|Kanade, Takeo, Jeffrey F. Cohn, and Yingli Tian. "[http://www.ri.cmu.edu/pub_files/pub2/kanade_takeo_2000_1/kanade_takeo_2000_1.pdf Comprehensive database for facial expression analysis]." Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on. IEEE, 2000.{{cite journal | last1 = Zeng | first1 = Zhihong | display-authors = et al | year = 2009 | title = A survey of affect recognition methods: Audio, visual, and spontaneous expressions | journal = IEEE Transactions on Pattern Analysis and Machine Intelligence | volume = 31 | issue = 1| pages = 39–58 | doi=10.1109/tpami.2008.52| pmid = 19029545 | citeseerx = 10.1.1.144.217 }}

|T. Kanade et al.

JAFFE Facial Expression Database

|213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models.

|Images are cropped to the facial region. Includes semantic ratings data on emotion labels.

|213

|Images, text

|Facial expression cognition

|1998

|{{Cite book | doi=10.5281/zenodo.3451524| year=1998| last1=Lyons| first1=Michael| title=The Japanese Female Facial Expression (JAFFE) Database| last2=Kamachi| first2=Miyuki| last3=Gyoba| first3=Jiro| chapter=Facial expression images}}Lyons, Michael; Akamatsu, Shigeru; Kamachi, Miyuki; Gyoba, Jiro "[https://zenodo.org/record/3430156 Coding facial expressions with Gabor wavelets]." Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on. IEEE, 1998.

|Lyons, Kamachi, Gyoba

FaceScrub

|Images of public figures scrubbed from image searching.

|Name and m/f annotation.

|107,818

|Images, text

|Face recognition

|2014

|Ng, Hong-Wei, and Stefan Winkler. "[http://vintage.winklerbros.net/Publications/icip2014a.pdf A data-driven approach to cleaning large face datasets] {{Webarchive|url=https://web.archive.org/web/20191206175300/http://vintage.winklerbros.net/Publications/icip2014a.pdf |date=6 December 2019 }}." Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014.{{cite arXiv |eprint=1506.01342|last1=RoyChowdhury|first1=Aruni|title=One-to-many face recognition with bilinear CNNs|last2=Lin|first2=Tsung-Yu|last3=Maji|first3=Subhransu|last4=Learned-Miller|first4=Erik|class=cs.CV|year=2015}}

|H. Ng et al.

BioID Face Database

|Images of faces with eye positions marked.

|Manually set eye positions.

|1521

|Images, text

|Face recognition

|2001

|Jesorsky, Oliver, Klaus J. Kirchberg, and Robert W. Frischholz. "Robust face detection using the hausdorff distance." Audio-and video-based biometric person authentication. Springer Berlin Heidelberg, 2001.

|BioID

Skin Segmentation Dataset

|Randomly sampled color values from face images.

|B, G, R, values extracted.

|245,057

|Text

|Segmentation, classification

|2012

|Bhatt, Rajen B., et al. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.708.9158&rep=rep1&type=pdf Efficient skin region segmentation using low complexity fuzzy decision tree model]." India Conference (INDICON), 2009 Annual IEEE. IEEE, 2009.{{cite journal | last1 = Lingala | first1 = Mounika | display-authors = et al | year = 2014 | title = Fuzzy logic color detection: Blue areas in melanoma dermoscopy images | journal = Computerized Medical Imaging and Graphics | volume = 38 | issue = 5| pages = 403–410 | doi=10.1016/j.compmedimag.2014.03.007| pmid = 24786720 | pmc = 4287461 }}

|R. Bhatt.

Bosphorus

|3D Face image database.

|34 action units and 6 expressions labeled; 24 facial landmarks labeled.

|4652

Images, text

|Face recognition, classification

|2008

|Maes, Chris, et al. "[https://lirias.kuleuven.be/retrieve/135678 Feature detection on 3D face surfaces for pose normalisation and recognition]." Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010.Savran, Arman, et al. "[https://web.archive.org/web/20190222192331/http://pdfs.semanticscholar.org/4254/fbba3846008f50671edc9cf70b99d7304543.pdf Bosphorus database for 3D face analysis]." Biometrics and Identity Management. Springer Berlin Heidelberg, 2008. 47–56.

|A Savran et al.

UOY 3D-Face

|neutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised.

|labeling.

|5250

Images, text

|Face recognition, classification

|2004

|Heseltine, Thomas, Nick Pears, and Jim Austin. "[http://eprints.whiterose.ac.uk/1526/01/austinj4.pdf Three-dimensional face recognition: An eigensurface approach]." Image Processing, 2004. ICIP'04. 2004 International Conference on. Vol. 2. IEEE, 2004.{{cite journal | last1 = Ge | first1 = Yun | display-authors = et al | year = 2011 | title = 3D Novel Face Sample Modeling for Face Recognition | journal = Journal of Multimedia | volume = 6 | issue = 5| pages = 467–475 | doi=10.4304/jmm.6.5.467-475| citeseerx = 10.1.1.461.9710 }}

|University of York

CASIA 3D Face Database

|Expressions: Anger, smile, laugh, surprise, closed eyes.

|None.

|4624

Images, text

|Face recognition, classification

|2007

|{{cite journal | last1 = Wang | first1 = Yueming | last2 = Liu | first2 = Jianzhuang | last3 = Tang | first3 = Xiaoou | year = 2010 | title = Robust 3D face recognition by local shape difference boosting | journal = IEEE Transactions on Pattern Analysis and Machine Intelligence | volume = 32 | issue = 10| pages = 1858–1870 | doi=10.1109/tpami.2009.200| pmid = 20724762 | citeseerx = 10.1.1.471.2424 | s2cid = 15263913 }}Zhong, Cheng, Zhenan Sun, and Tieniu Tan. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.580.8534&rep=rep1&type=pdf Robust 3D face recognition using learned visual codebook]." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.

|Institute of Automation, Chinese Academy of Sciences

CASIA NIR

|Expressions: Anger Disgust Fear Happiness Sadness Surprise

|None.

|480

|Annotated Visible Spectrum and Near Infrared Video captures at 25 frames per second

|Face recognition, classification

|2011

|{{cite journal | last1 = Zhao | first1 = G. | last2 = Huang | first2 = X. | last3 = Taini | first3 = M. | last4 = Li | first4 = S. Z. | last5 = Pietikäinen | first5 = M. | year = 2011 | title = Facial expression recognition from near-infrared videos | url = http://www.academia.edu/download/42229488/Image_and_Vision_Computing20160206-29020-1auzaon.pdf | journal = Image and Vision Computing | volume = 29 | issue = 9| pages = 607–619 | doi = 10.1016/j.imavis.2011.07.002 }}{{dead link|date=July 2022|bot=medic}}{{cbignore|bot=medic}}

|Zhao, G. et al.

BU-3DFE

|neutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear (4 levels). 3D images extracted.

|None.

|2500

|Images, text

|Facial expression recognition, classification

|2006

|Soyel, Hamit, and Hasan Demirel. "[https://pdfs.semanticscholar.org/cf81/4b618fcbc9a556cdce225e74a8806867ba84.pdf Facial expression recognition using 3D facial feature distances]." Image Analysis and Recognition. Springer Berlin Heidelberg, 2007. 831–838.

|Binghamton University

Face Recognition Grand Challenge Dataset

|Up to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data.

|None.

|4007

|Images, text

|Face recognition, classification

|2004

|{{cite journal | last1 = Bowyer | first1 = Kevin W. | last2 = Chang | first2 = Kyong | last3 = Flynn | first3 = Patrick | year = 2006 | title = A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition | journal = Computer Vision and Image Understanding | volume = 101 | issue = 1| pages = 1–15 | doi=10.1016/j.cviu.2005.05.005| citeseerx = 10.1.1.134.8784 }}{{cite journal | last1 = Tan | first1 = Xiaoyang | last2 = Triggs | first2 = Bill | year = 2010 | title = Enhanced local texture feature sets for face recognition under difficult lighting conditions | journal = IEEE Transactions on Image Processing| volume = 19 | issue = 6| pages = 1635–1650 | doi=10.1109/tip.2010.2042645| pmid = 20172829 | bibcode = 2010ITIP...19.1635T | citeseerx = 10.1.1.105.3355 | s2cid = 4943234 }}

|National Institute of Standards and Technology

Gavabdb

|Up to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images.

|None.

|549

|Images, text

|Face recognition, classification

|2008

|{{cite book | chapter-url=https://ieeexplore.ieee.org/document/4529822 | doi=10.1109/ICIS.2008.77 | chapter=Three Dimensional Face Recognition Using SVM Classifier | title=Seventh IEEE/ACIS International Conference on Computer and Information Science (Icis 2008) | year=2008 | last1=Mousavi | first1=Mir Hashem | last2=Faez | first2=Karim | last3=Asghari | first3=Amin | pages=208–213 | isbn=978-0-7695-3131-1 | s2cid=2710422 }}{{Cite book |year=2008 |isbn=978-1-4244-2154-1 |doi=10.1109/AFGR.2008.4813376 |chapter-url=https://gravis.dmi.unibas.ch/publications/2008/FG08_Amberg.pdf |url-status=dead |chapter=Expression invariant 3D face recognition with a Morphable Model |title=2008 8th IEEE International Conference on Automatic Face & Gesture Recognition |last1=Amberg |first1=Brian |last2=Knothe |first2=Reinhard |last3=Vetter |first3=Thomas |pages=1–6 |s2cid=5651453 |access-date=6 August 2019 |archive-date=28 July 2018 |archive-url=https://web.archive.org/web/20180728233944/http://gravis.dmi.unibas.ch/publications/2008/FG08_Amberg.pdf }}

|King Juan Carlos University

3D-RMA

|Up to 100 subjects, expressions mostly neutral. Several poses as well.

|None.

|9971

|Images, text

|Face recognition, classification

|2004

|{{Cite book |chapter-url=https://www.researchgate.net/publication/4090704 |doi= 10.1109/ICPR.2004.1333734|chapter= 3D shape-based face recognition using automatically registered facial surfaces|title= Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004|year= 2004|last1= Irfanoglu|first1= M.O.|last2= Gokberk|first2= B.|last3= Akarun|first3= L.|pages= 183–186 Vol.4|isbn= 0-7695-2128-2|s2cid= 10987293}}{{cite journal | last1 = Beumier | first1 = Charles | last2 = Acheroy | first2 = Marc | year = 2001 | title = Face verification from 3D and grey level clues | journal = Pattern Recognition Letters | volume = 22 | issue = 12| pages = 1321–1329 | doi=10.1016/s0167-8655(01)00077-0| bibcode = 2001PaReL..22.1321B }}

|Royal Military Academy (Belgium)

SoF

|112 persons (66 males and 46 females) wear glasses under different illumination conditions.

|A set of synthetic filters (blur, occlusions, noise, and posterization ) with different level of difficulty.

|42,592 (2,662 original image × 16 synthetic image)

|Images, Mat file

|Gender classification, face detection, face recognition, age estimation, and glasses detection

|2017

|{{cite arXiv|last1=Afifi|first1=Mahmoud|last2=Abdelhamed|first2=Abdelrahman|date=2017-06-13|title=AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces|eprint=1706.04277|class=cs.CV}}{{Cite web|url=https://sites.google.com/view/sof-dataset|title=SoF dataset|website=sites.google.com|language=en-US|access-date=2017-11-18}}

|Afifi, M. et al.

IMDb-WIKI

|IMDb and Wikipedia face images with gender and age labels.

| None

| 523,051

|Images

|Gender classification, face detection, face recognition, age estimation

|2015

|{{Cite web|url=https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/|title=IMDb-WIKI|website=data.vision.ee.ethz.ch|language=en-US|access-date=2018-03-13}}

|R. Rothe, R. Timofte, L. V. Gool

Action recognition

class="wikitable sortable" style="width: 100%"

! scope="col" style="width: 15%;" |Dataset name

! scope="col" style="width: 18%;" | Brief description

! scope="col" style="width: 18%;" | Preprocessing

! scope="col" style="width: 6%;" | Instances

! scope="col" style="width: 7%;" | Format

! scope="col" style="width: 7%;" | Default Task

! scope="col" style="width: 6%;" | Created (updated)

! scope="col" style="width: 6%;" | Reference

! scope="col" style="width: 11%;" | Creator

AVA-Kinetics Localized Human Actions Video

|Annotated 80 action classes from keyframes from videos from Kinetics-700.

|1.6 million annotations. 238,906 video clips, 624,430 keyframes.

|Annotations, videos.

|Action prediction

|2020

|{{Cite web |title=AVA: A Video Dataset of Atomic Visual Action |url=https://research.google.com/ava/ |access-date=2024-10-18 |website=research.google.com}}{{cite arXiv |last1=Li |first1=Ang |title=The AVA-Kinetics Localized Human Actions Video Dataset |date=2020-05-20 |eprint=2005.00214 |last2=Thotakuri |first2=Meghana |last3=Ross |first3=David A. |last4=Carreira |first4=João |last5=Vostrikov |first5=Alexander |last6=Zisserman |first6=Andrew|class=cs.CV }}

|Li et al from Perception Team of Google AI.

TV Human Interaction Dataset

|Videos from 20 different TV shows for prediction social actions: handshake, high five, hug, kiss and none.

|None.

|6,766 video clips

|video clips

|Action prediction

|2013

|{{cite journal | last1 = Patron-Perez | first1 = A. | last2 = Marszalek | first2 = M. | last3 = Reid | first3 = I. | last4 = Zisserman | first4 = A. | year = 2012 | title = Structured learning of human interactions in TV shows | journal = IEEE Transactions on Pattern Analysis and Machine Intelligence | volume = 34 | issue = 12| pages = 2441–2453 | doi=10.1109/tpami.2012.24| pmid = 23079467 | s2cid = 6060568 }}

|Patron-Perez, A. et al.

Berkeley Multimodal Human Action Database (MHAD)

|Recordings of a single person performing 12 actions

|MoCap pre-processing

|660 action samples

|8 PhaseSpace Motion Capture, 2 Stereo Cameras, 4 Quad Cameras, 6 accelerometers, 4 microphones

|Action classification

|2013

|Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (January 2013). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.432.5113&rep=rep1&type=pdf Berkeley MHAD: A comprehensive multimodal human action database]. In Applications of Computer Vision (WACV), 2013 IEEE Workshop on (pp. 53–60). IEEE.

|Ofli, F. et al.

THUMOS Dataset

|Large video dataset for action classification.

|Actions classified and labeled.

|45M frames of video

|Video, images, text

|Classification, action detection

|2013

|Jiang, Y. G., et al. "THUMOS challenge: Action recognition with a large number of classes." ICCV Workshop on Action Recognition with a Large Number of Classes, http://crcv.ucf.edu/ICCV13-Action-Workshop. 2013.Simonyan, Karen, and Andrew Zisserman. "[https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf Two-stream convolutional networks for action recognition in videos]." Advances in Neural Information Processing Systems. 2014.

|Y. Jiang et al.

MEXAction2

|Video dataset for action localization and spotting

|Actions classified and labeled.

|1000

|Video

|Action detection

|2014

|{{cite journal |doi=10.1109/TCSVT.2015.2475835|title=Fast Action Localization in Large-Scale Video Archives|journal=IEEE Transactions on Circuits and Systems for Video Technology|volume=26|issue=10|pages=1917–1930|year=2016|last1=Stoian|first1=Andrei|last2=Ferecatu|first2=Marin|last3=Benois-Pineau|first3=Jenny|last4=Crucianu|first4=Michel|s2cid=31537462}}

|Stoian et al.

Handwriting and character recognition

class="wikitable sortable" style="width: 100%"

!Dataset name

!Brief description

!Preprocessing

!Instances

!Format

!Default Task

!Created (updated)

!Reference

!Creator

Artificial Characters Dataset

|Artificially generated data describing the structure of 10 capital English letters.

|Coordinates of lines drawn given as integers. Various other features.

|6000

|Text

|Handwriting recognition, classification

|1992

|Botta, M., A. Giordana, and L. Saitta. "[https://pdfs.semanticscholar.org/9f0e/1349d1422f1b455b8ccc26ebf7b114b8db20.pdf Learning fuzzy concept definitions]." Fuzzy Systems, 1993., Second IEEE International Conference on. IEEE, 1993.

|H. Guvenir et al.

Letter Dataset

|Upper-case printed letters.

|17 features are extracted from all images.

|20,000

|Text

|OCR, classification

|1991

|{{cite journal | last1 = Frey | first1 = Peter W. | last2 = Slate | first2 = David J. | year = 1991 | title = Letter recognition using Holland-style adaptive classifiers | journal = Machine Learning | volume = 6 | issue = 2| pages = 161–182 | doi=10.1007/bf00114162| doi-access = free }}{{cite journal | last1 = Peltonen | first1 = Jaakko | last2 = Klami | first2 = Arto | last3 = Kaski | first3 = Samuel | year = 2004 | title = Improved learning of Riemannian metrics for exploratory analysis | journal = Neural Networks | volume = 17 | issue = 8| pages = 1087–1100 | doi=10.1016/j.neunet.2004.06.008| pmid = 15555853 | citeseerx = 10.1.1.59.4865 }}

|D. Slate et al.

CASIA-HWDB

|Offline handwritten Chinese character database. 3755 classes in the GB 2312 character set.

|Gray-scaled images with background pixels labeled as 255.

|1,172,907

|Images, Text

|Handwriting recognition, classification

|2009

|{{cite journal |title=Online and offline handwritten Chinese character recognition: Benchmarking on new databases |journal=Pattern Recognition |volume=46 |issue=1 |date=January 2013 |pages=155–162 |first1=Cheng-Lin |last1=Liu |first2=Fei |last2=Yin |first3=Da-Han |last3=Wang |first4=Qiu-Feng |last4=Wang |doi=10.1016/j.patcog.2012.06.021 |bibcode=2013PatRe..46..155L }}

|CASIA

CASIA-OLHWDB

|Online handwritten Chinese character database, collected using Anoto pen on paper. 3755 classes in the GB 2312 character set.

|Provides the sequences of coordinates of strokes.

|1,174,364

|Images, Text

|Handwriting recognition, classification

|2009

|{{cite book |last1=Wang |first1=D. |first2=C. |last2=Liu |first3=J. |last3=Yu |first4=X. |last4=Zhou |title=2009 10th International Conference on Document Analysis and Recognition |chapter=CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters |year=2009 |pages=1206–1210|doi=10.1109/ICDAR.2009.163 |isbn=978-1-4244-4500-4 |s2cid=5705532 }}

|CASIA

Character Trajectories Dataset

|Labeled samples of pen tip trajectories for people writing simple characters.

|3-dimensional pen tip velocity trajectory matrix for each sample

|2858

|Text

|Handwriting recognition, classification

|2008

|Williams, Ben H., Marc Toussaint, and Amos J. Storkey. [https://www.era.lib.ed.ac.uk/bitstream/handle/1842/3221/BH%20Williams%20PhD%20thesis%2009.pdf?sequence=1 Extracting motion primitives from natural handwriting data]. Springer Berlin Heidelberg, 2006.Meier, Franziska, et al. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.395.8598&rep=rep1&type=pdf Movement segmentation using a primitive library]."Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011.

|B. Williams

Chars74K Dataset

|Character recognition in natural images of symbols used in both English and Kannada

|74,107

|Character recognition, handwriting recognition, OCR, classification

|2009

|T. E. de Campos, B. R. Babu and M. Varma. [http://personal.ee.surrey.ac.uk/Personal/T.Decampos/papers/decampos_etal_visapp2009.pdf Character recognition in natural images]. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009

|T. de Campos

EMNIST dataset

|Handwritten characters from 3600 contributors

|Derived from NIST Special Database 19. Converted to 28x28 pixel images, matching the MNIST dataset.{{Cite arXiv|eprint = 1702.05373v1|last1 = Cohen|first1 = Gregory|last2 = Afshar|first2 = Saeed|last3 = Tapson|first3 = Jonathan|author4 = André van Schaik|title = EMNIST: An extension of MNIST to handwritten letters|year = 2017| class=cs.CV }}

|800,000

|Images

|character recognition, classification, handwriting recognition

|2016

|EMNIST dataset{{Cite journal|url=https://www.nist.gov/itl/products-and-services/emnist-dataset|title = The EMNIST Dataset| journal=NIST |date = 4 April 2017}}

Documentation{{cite arXiv | eprint=1702.05373 | last1=Cohen | first1=Gregory | last2=Afshar | first2=Saeed | last3=Tapson | first3=Jonathan | author4=André van Schaik | title=EMNIST: An extension of MNIST to handwritten letters | year=2017 | class=cs.CV }}

|Gregory Cohen, et al.

UJI Pen Characters Dataset

|Isolated handwritten characters

|Coordinates of pen position as characters were written given.

|11,640

|Text

|Handwriting recognition, classification

|2009

|Llorens, David, et al. "[https://web.archive.org/web/20190806015012/https://pdfs.semanticscholar.org/24cf/ef15094c59322560377bbf8e4185245c654f.pdf The UJIpenchars Database: a Pen-Based Database of Isolated Handwritten Characters]." LREC. 2008.{{cite journal | last1 = Calderara | first1 = Simone | last2 = Prati | first2 = Andrea | last3 = Cucchiara | first3 = Rita | year = 2011 | title = Mixtures of von mises distributions for people trajectory shape analysis | journal = IEEE Transactions on Circuits and Systems for Video Technology| volume = 21 | issue = 4| pages = 457–471 | doi=10.1109/tcsvt.2011.2125550| hdl = 11380/646181 | s2cid = 1427766 }}

|F. Prat et al.

Gisette Dataset

|Handwriting samples from the often-confused 4 and 9 characters.

|Features extracted from images, split into train/test, handwriting images size-normalized.

|13,500

|Images, text

|Handwriting recognition, classification

|2003

|Guyon, Isabelle, et al. "[http://papers.nips.cc/paper/2728-result-analysis-of-the-nips-2003-feature-selection-challenge.pdf Result analysis of the nips 2003 feature selection challenge]." Advances in neural information processing systems. 2004.

|Yann LeCun et al.

Omniglot dataset

|1623 different handwritten characters from 50 different alphabets.

|Hand-labeled.

|38,300

|Images, text, strokes

|Classification, one-shot learning

|2015

|{{Cite journal|last1=Lake|first1=B. M.|last2=Salakhutdinov|first2=R.|last3=Tenenbaum|first3=J. B.|date=2015-12-11|title=Human-level concept learning through probabilistic program induction|journal=Science|language=en|volume=350|issue=6266|pages=1332–1338|doi=10.1126/science.aab3050|issn=0036-8075|pmid=26659050|bibcode=2015Sci...350.1332L|doi-access=free}}{{cite web|last=Lake|first=Brenden|title=Omniglot data set for one-shot learning|website=GitHub |date=2019-11-09|url=https://github.com/brendenlake/omniglot|access-date=2019-11-10}}

|American Association for the Advancement of Science

MNIST database

|Database of handwritten digits.

|Hand-labeled.

|60,000

|Images, text

|Classification

|1994

|{{cite journal | last1 = LeCun | first1 = Yann | display-authors = et al | year = 1998 | title = Gradient-based learning applied to document recognition | journal = Proceedings of the IEEE | volume = 86 | issue = 11| pages = 2278–2324 | doi=10.1109/5.726791| citeseerx = 10.1.1.32.9552 | s2cid = 14542261 }}{{cite journal | last1 = Kussul | first1 = Ernst | last2 = Baidyk | first2 = Tatiana |author2-link=Tetyana Baydyk| year = 2004 | title = Improved method of handwritten digit recognition tested on MNIST database | journal = Image and Vision Computing | volume = 22 | issue = 12| pages = 971–981 | doi = 10.1016/j.imavis.2004.03.008 }}

|National Institute of Standards and Technology

Optical Recognition of Handwritten Digits Dataset

|Normalized bitmaps of handwritten data.

|Size normalized and mapped to bitmaps.

|5620

|Images, text

|Handwriting recognition, classification

|1998

|{{cite journal | last1 = Xu | first1 = Lei | last2 = Krzyżak | first2 = Adam | last3 = Suen | first3 = Ching Y. | year = 1992 | title = Methods of combining multiple classifiers and their applications to handwriting recognition | journal = IEEE Transactions on Systems, Man, and Cybernetics| volume = 22 | issue = 3| pages = 418–435 | doi=10.1109/21.155943| hdl = 10338.dmlcz/135217 }}

|E. Alpaydin et al.

Pen-Based Recognition of Handwritten Digits Dataset

|Handwritten digits on electronic pen-tablet.

|Feature vectors extracted to be uniformly spaced.

|10,992

|Images, text

|Handwriting recognition, classification

|1998

|Alimoglu, Fevzi, et al. "[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.6299 Combining multiple classifiers for pen-based handwritten digit recognition]." (1996).{{cite journal | last1 = Tang | first1 = E. Ke | display-authors = et al | year = 2005 | title = Linear dimensionality reduction using relevance weighted LDA | journal = Pattern Recognition | volume = 38 | issue = 4| pages = 485–493 | doi=10.1016/j.patcog.2004.09.005| bibcode = 2005PatRe..38..485T | s2cid = 10580110 }}

|E. Alpaydin et al.

Semeion Handwritten Digit Dataset

|Handwritten digits from 80 people.

|All handwritten digits have been normalized for size and mapped to the same grid.

|1593

|Images, text

|Handwriting recognition, classification

|2008

|Hong, Yi, et al. "[https://pages.ucsd.edu/~ztu/publication/iccv11_sparsemetric.pdf Learning a mixture of sparse distance metrics for classification and dimensionality reduction]." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.

|T. Srl

HASYv2

|Handwritten mathematical symbols

|All symbols are centered and of size 32px x 32px.

|168233

|Images, text

|Classification

|2017

|{{cite arXiv |eprint=1701.08380|last1=Thoma|first1=Martin|title=The HASYv2 dataset|class=cs.CV|year=2017}}

|Martin Thoma

Noisy Handwritten Bangla Dataset

|Includes Handwritten Numeral Dataset (10 classes) and Basic Character Dataset (50 classes), each dataset has three types of noise: white gaussian, motion blur, and reduced contrast.

|All images are centered and of size 32x32.

|Numeral Dataset:

23330,

Character Dataset:

76000

|Images,

text

|Handwriting recognition,

classification

|2017

|{{cite arXiv|last1=Karki|first1=Manohar|last2=Liu|first2=Qun|last3=DiBiano|first3=Robert|last4=Basu|first4=Saikat|last5=Mukhopadhyay|first5=Supratik|date=2018-06-20|title=Pixel-level Reconstruction and Classification for Noisy Handwritten Bangla Characters|eprint=1806.08037|class=cs.CV}}{{cite book|last1=Liu|first1=Qun|chapter=PCGAN-CHAR: Progressively Trained Classifier Generative Adversarial Networks for Classification of Noisy Handwritten Bangla Characters|date=2019|title=Digital Libraries at the Crossroads of Digital Information for the Future|pages=3–15|publisher=Springer International Publishing|isbn=978-3-030-34057-5|last2=Collier|first2=Edward|last3=Mukhopadhyay|first3=Supratik|series=Lecture Notes in Computer Science |volume=11853 |doi=10.1007/978-3-030-34058-2_1|arxiv=1908.08987|s2cid=201665955}}

|M. Karki et al.

Aerial images

class="wikitable sortable" style="width: 100%"

! scope="col" style="width: 15%;" |Dataset name

! scope="col" style="width: 18%;" | Brief description

! scope="col" style="width: 18%;" | Preprocessing

! scope="col" style="width: 6%;" | Instances

! scope="col" style="width: 7%;" | Format

! scope="col" style="width: 7%;" | Default Task

! scope="col" style="width: 6%;" | Created (updated)

! scope="col" style="width: 6%;" | Reference

! scope="col" style="width: 11%;" | Creator

iSAID: Instance Segmentation in Aerial Images Dataset

|Precise instance-level annotatio carried out by professional annotators, cross-checked and validated by expert annotators complying with well-defined guidelines.

|655,451 (15 classes)

|Images, jpg, json

|Aerial Classification, Object Detection, Instance Segmentation

|2019

|{{Cite web|title=iSAID|url=https://captain-whu.github.io/iSAID/index.html|access-date=2021-11-30|website=captain-whu.github.io}}Zamir, Syed & Arora, Aditya & Gupta, Akshita & Khan, Salman & Sun, Guolei & Khan, Fahad & Zhu, Fan & Shao, Ling & Xia, Gui-Song & Bai, Xiang. (2019). iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images. [https://captain-whu.github.io/iSAID/index.html website]

|Syed Waqas Zamir,

Aditya Arora,

Akshita Gupta,

Salman Khan,

Guolei Sun,

Fahad Shahbaz Khan, Fan Zhu,

Ling Shao, Gui-Song Xia, Xiang Bai

Aerial Image Segmentation Dataset

|80 high-resolution aerial images with spatial resolution ranging from 0.3 to 1.0.

|Images manually segmented.

|80

|Images

|Aerial Classification, object detection

|2013

|{{cite journal | last1 = Yuan | first1 = Jiangye | last2 = Gleason | first2 = Shaun S. | last3 = Cheriyadat | first3 = Anil M. | year = 2013 | title = Systematic benchmarking of aerial image segmentation | journal = IEEE Geoscience and Remote Sensing Letters| volume = 10 | issue = 6| pages = 1527–1531 | doi=10.1109/lgrs.2013.2261453| bibcode = 2013IGRSL..10.1527Y | s2cid = 629629 }}Vatsavai, Ranga Raju. "[https://dl.acm.org/citation.cfm?id=2534927 Object based image classification: state of the art and computational challenges]." Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data. ACM, 2013.

|J. Yuan et al.

KIT AIS Data Set

|Multiple labeled training and evaluation datasets of aerial images of crowds.

|Images manually labeled to show paths of individuals through crowds.

|~ 150

|Images with paths

|People tracking, aerial tracking

|2012

|Butenuth, Matthias, et al. "[http://www.hartmann-alberts.de/dirk/pub/proceedings2011e.pdf Integrating pedestrian simulation, tracking and event detection for crowd analysis]." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011.Fradi, Hajer, and Jean-Luc Dugelay. "[http://www.eurecom.fr/fr/publication/3841/download/mm-publi-3841.pdf Low level crowd analysis using frame-wise normalized feature for people counting]." Information Forensics and Security (WIFS), 2012 IEEE International Workshop on. IEEE, 2012.

|M. Butenuth et al.

Wilt Dataset

|Remote sensing data of diseased trees and other land cover.

|Various features extracted.

|4899

|Images

|Classification, aerial object detection

|2014

|Johnson, Brian Alan, Ryutaro Tateishi, and Nguyen Thanh Hoan. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.826.9200&rep=rep1&type=pdf A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees]." International journal of remote sensing34.20 (2013): 6969–6982.{{Cite journal|url=https://www.tandfonline.com/doi/abs/10.1080/2150704X.2015.1062159|doi = 10.1080/2150704X.2015.1062159|title = A new classification model for a class imbalanced data set using genetic programming and support vector machines: Case study for wilt disease classification|year = 2015|last1 = Mohd Pozi|first1 = Muhammad Syafiq|last2 = Sulaiman|first2 = Md Nasir|last3 = Mustapha|first3 = Norwati|last4 = Perumal|first4 = Thinagaran|journal = Remote Sensing Letters|volume = 6|issue = 7|pages = 568–577| bibcode=2015RSL.....6..568M |s2cid = 58788630}}

|B. Johnson

MASATI dataset

|Maritime scenes of optical aerial images from the visible spectrum. It contains color images in dynamic marine environments, each image may contain one or multiple targets in different weather and illumination conditions.

|Object bounding boxes and labeling.

|7389

|Images

|Classification, aerial object detection

|2018

|Gallego, A.-J.; Pertusa, A.; Gil, P. "[https://www.mdpi.com/2072-4292/10/4/511 Automatic Ship Classification from Optical Aerial Images with Convolutional Neural Networks]." Remote Sensing. 2018; 10(4):511.Gallego, A.-J.; Pertusa, A.; Gil, P. "MAritime SATellite Imagery dataset". Available: https://www.iuii.ua.es/datasets/masati/, 2018.

|A.-J. Gallego et al.

Forest Type Mapping Dataset

|Satellite imagery of forests in Japan.

|Image wavelength bands extracted.

|326

|Text

|Classification

|2015

|{{cite journal | last1 = Johnson | first1 = Brian | last2 = Tateishi | first2 = Ryutaro | last3 = Xie | first3 = Zhixiao | year = 2012 | title = Using geographically weighted variables for image classification | journal = Remote Sensing Letters | volume = 3 | issue = 6| pages = 491–499 | doi=10.1080/01431161.2011.629637| bibcode = 2012RSL.....3..491J | s2cid = 122543681 }}Chatterjee, Sankhadeep, et al. "[https://www.researchgate.net/profile/Sankhadeep_Chatterjee/publication/282605325_Forest_Type_Classification_A_Hybrid_NN-GA_Model_Based_Approach/links/57493cb308ae5c51e29e6f1b/Forest-Type-Classification-A-Hybrid-NN-GA-Model-Based-Approach.pdf Forest Type Classification: A Hybrid NN-GA Model Based Approach]." Information Systems Design and Intelligent Applications. Springer India, 2016. 227–236.

|B. Johnson

Overhead Imagery Research Data Set

|Annotated overhead imagery. Images with multiple objects.

|Over 30 annotations and over 60 statistics that describe the target within the context of the image.

|1000

|Images, text

|Classification

|2009

|Diegert, Carl. "[https://www.osti.gov/servlets/purl/1278837 A combinatorial method for tracing objects using semantics of their shape]." Applied Imagery Pattern Recognition Workshop (AIPR), 2010 IEEE 39th. IEEE, 2010.Razakarivony, Sebastien, and Frédéric Jurie. "[https://hal.archives-ouvertes.fr/hal-00943444/file/13_mva-detection.pdf Small target detection combining foreground and background manifolds]." IAPR International Conference on Machine Vision Applications. 2013.

|F. Tanner et al.

SpaceNet

|SpaceNet is a corpus of commercial satellite imagery and labeled training data.

|GeoTiff and GeoJSON files containing building footprints.

|>17533

|Images

|Classification, Object Identification

|2017

|{{Cite web|url=http://explore.digitalglobe.com/spacenet|title=SpaceNet|website=explore.digitalglobe.com|access-date=2018-03-13|archive-date=13 March 2018|archive-url=https://web.archive.org/web/20180313092809/http://explore.digitalglobe.com/spacenet|url-status=dead}}{{Cite web|url=https://medium.com/the-downlinq/getting-started-with-spacenet-data-827fd2ec9f53|title=Getting Started With SpaceNet Data|last=Etten|first=Adam Van|date=2017-01-05|website=The DownLinQ|access-date=2018-03-13}}{{Cite book|last1=Vakalopoulou|first1=M.|last2=Bus|first2=N.|last3=Karantzalosa|first3=K.|last4=Paragios|first4=N.|title=2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) |chapter=Integrating edge/Boundary priors with classification scores for building detection in very high resolution data |date=July 2017|pages=3309–3312|doi=10.1109/IGARSS.2017.8127705|isbn=978-1-5090-4951-6|s2cid=8297433}}

|DigitalGlobe, Inc.

UC Merced Land Use Dataset

|These images were manually extracted from large images from the USGS National Map Urban Area Imagery collection for various urban areas around the US.

|This is a 21 class land use image dataset meant for research purposes. There are 100 images for each class.

|2,100

|Image chips of 256x256, 30 cm (1 foot) GSD

|Land cover classification

|2010

|{{Cite book|last1=Yang|first1=Yi|last2=Newsam|first2=Shawn|title=Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems |chapter=Bag-of-visual-words and spatial extensions for land-use classification |date=2010|pages=270–279 |location=New York, New York, USA|publisher=ACM Press|doi=10.1145/1869790.1869829|isbn=9781450304283|s2cid=993769}}

|Yi Yang and Shawn Newsam

SAT-4 Airborne Dataset

|Images were extracted from the National Agriculture Imagery Program (NAIP) dataset.

|SAT-4 has four broad land cover classes, includes barren land, trees, grassland and a class that consists of all land cover classes other than the above three.

|500,000

|Images

|Classification

|2015

|{{Cite book|last1=Basu|first1=Saikat|last2=Ganguly|first2=Sangram|last3=Mukhopadhyay|first3=Supratik|last4=DiBiano|first4=Robert|last5=Karki|first5=Manohar|last6=Nemani|first6=Ramakrishna|title=Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems |chapter=DeepSat: A learning framework for satellite imagery |date=2015-11-03|publisher=ACM|pages=1–10|doi=10.1145/2820783.2820816|isbn=9781450339674|s2cid=4387134}}{{Cite journal|last1=Liu|first1=Qun|last2=Basu|first2=Saikat|last3=Ganguly|first3=Sangram|last4=Mukhopadhyay|first4=Supratik|last5=DiBiano|first5=Robert|last6=Karki|first6=Manohar|last7=Nemani|first7=Ramakrishna|date=2019-11-21|title=DeepSat V2: feature augmented convolutional neural nets for satellite image classification|journal=Remote Sensing Letters|volume=11|issue=2|pages=156–165|doi=10.1080/2150704x.2019.1693071|arxiv=1911.07747|s2cid=208138097|issn=2150-704X}}

|S. Basu et al.

SAT-6 Airborne Dataset

|Images were extracted from the National Agriculture Imagery Program (NAIP) dataset.

|SAT-6 has six broad land cover classes, includes barren land, trees, grassland, roads, buildings and water bodies.

|405,000

|Images

|Classification

|2015

|S. Basu et al.

Underwater images

class="wikitable sortable" style="width: 100%"

! scope="col" style="width: 15%;" |Dataset name

! scope="col" style="width: 18%;" | Brief description

! scope="col" style="width: 18%;" | Preprocessing

! scope="col" style="width: 6%;" | Instances

! scope="col" style="width: 7%;" | Format

! scope="col" style="width: 7%;" | Default Task

! scope="col" style="width: 6%;" | Created (updated)

! scope="col" style="width: 6%;" | Reference

! scope="col" style="width: 11%;" | Creator

SUIM Dataset

|The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants.

|Images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor.

|1,635

|Images

|Segmentation

|2020

|Md Jahidul Islam, et al. "[https://ieeexplore.ieee.org/abstract/document/9340821 Semantic Segmentation of Underwater Imagery: Dataset and Benchmark]." 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020.

|Md Jahidul Islam et al.

LIACI Dataset

|Images have been collected during underwater ship inspections and annotated by human domain experts.

|Images with pixel annotations for ten object categories: defects, corrosion, paint peel, marine growth, sea chest gratings, overboard valves, propeller, anodes, bilge keel and ship hull.

|1,893

|Images

|Segmentation

|2022

|Waszak et al. "[https://ieeexplore.ieee.org/document/9998080 Semantic Segmentation in Underwater Ship Inspections: Benchmark and Data Set]." IEEE Journal of Oceanic Engineering. IEEE, 2022.

|Waszak et al.

Other images

class="wikitable sortable" style="width: 100%"

! scope="col" style="width: 15%;" |Dataset name

! scope="col" style="width: 18%;" | Brief description

! scope="col" style="width: 18%;" | Preprocessing

! scope="col" style="width: 6%;" | Instances

! scope="col" style="width: 7%;" | Format

! scope="col" style="width: 7%;" | Default Task

! scope="col" style="width: 6%;" | Created (updated)

! scope="col" style="width: 6%;" | Reference

! scope="col" style="width: 11%;" | Creator

Kodak Lossless True Color Image Suite

|RGB images for testing image compression.

|None

|24

|Image

|Image compression

|1999

|{{Cite web |title=True Color Kodak Images |url=https://r0k.us/graphics/kodak/ |access-date=2025-02-27 |website=r0k.us}}

|Kodak

NRC-GAMMA

|A novel benchmark gas meter image dataset

|None

|28,883

|Image, Label

|Classification

|2021

|{{cite arXiv|last1=Ebadi|first1=Ashkan|last2=Paul|first2=Patrick|last3=Auer|first3=Sofia|last4=Tremblay|first4=Stéphane|date=2021-11-12|title=NRC-GAMMA: Introducing a Novel Large Gas Meter Image Dataset|class=cs.CV|eprint=2111.06827}}{{Cite journal|last=Canada|first=Government of Canada National Research Council|title=The gas meter image dataset (NRC-GAMMA) - NRC Digital Repository|url=https://nrc-digital-repository.canada.ca/eng/view/object/?id=ba1fc493-e65f-4c0a-ab31-ecbcdf00bfa4|access-date=2021-12-02|website=nrc-digital-repository.canada.ca|year=2021|doi=10.4224/3c8s-z290}}

|A. Ebadi, P. Paul, S. Auer, & S. Tremblay

The SUPATLANTIQUE dataset

|Images of scanned official and Wikipedia documents

|None

|4908

|TIFF/pdf

|Source device identification, forgery detection, Classification,..

|2020

|{{Cite book|last1=Rabah|first1=Chaima Ben|last2=Coatrieux|first2=Gouenou|last3=Abdelfattah|first3=Riadh|title=2020 IEEE International Conference on Image Processing (ICIP) |chapter=The Supatlantique Scanned Documents Database for Digital Image Forensics Purposes |date=October 2020|chapter-url=http://dx.doi.org/10.1109/icip40778.2020.9190665|pages=2096–2100|publisher=IEEE|doi=10.1109/icip40778.2020.9190665|isbn=978-1-7281-6395-6|s2cid=224881147}}

|C. Ben Rabah et al.

Density functional theory quantum simulations of graphene

|Labelled images of raw input to a simulation of graphene

|Raw data (in HDF5 format) and output labels from density functional theory quantum simulation

| 60744 test and 501473 training files

|Labeled images

|Regression

|2019

|{{cite web | doi=10.4224/c8sc04578j.data| title=Big graphene dataset| date=2018-05-16| last1=Mills| first1=Kyle| last2=Tamblyn| first2=Isaac| publisher=National Research Council of Canada}}

|K. Mills & I. Tamblyn

Quantum simulations of an electron in a two dimensional potential well

|Labelled images of raw input to a simulation of 2d Quantum mechanics

|Raw data (in HDF5 format) and output labels from quantum simulation

|1.3 million images

|Labeled images

|Regression

|2017

|{{Cite book | doi=10.4224/PhysRevA.96.042113.data| title=Quantum simulations of an electron in a two dimensional potential well| date=2018-05-16| last1=Mills| first1=Kyle| last2=Spanner| first2=Michael| last3=Tamblyn| first3=Isaac| chapter=Quantum simulation| publisher=National Research Council of Canada}}

|K. Mills, M.A. Spanner, & I. Tamblyn

MPII Cooking Activities Dataset

|Videos and images of various cooking activities.

|Activity paths and directions, labels, fine-grained motion labeling, activity class, still image extraction and labeling.

|881,755 frames

|Labeled video, images, text

|Classification

|2012

|{{cite conference | last1=Rohrbach | first1=M. | last2=Amin | first2=S. | last3=Andriluka | first3=M. | last4=Schiele | first4=B. | title=2012 IEEE Conference on Computer Vision and Pattern Recognition | chapter=A database for fine grained activity detection of cooking activities | publisher=IEEE | year=2012 | pages=1194–1201 | isbn=978-1-4673-1228-8 | doi=10.1109/cvpr.2012.6247801 }}Kuehne, Hilde, Ali Arslan, and Thomas Serre. "[https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Kuehne_The_Language_of_2014_CVPR_paper.pdf The language of actions: Recovering the syntax and semantics of goal-directed human activities]."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

|M. Rohrbach et al.

FAMOS Dataset

|5,000 unique microstructures, all samples have been acquired 3 times with two different cameras.

|Original PNG files, sorted per camera and then per acquisition. MATLAB datafiles with one 16384 times 5000 matrix per camera per acquisition.

|30,000

|Images and .mat files

|Authentication

|2012

|Sviatoslav, Voloshynovskiy, et al. "[http://vision.unige.ch/publications/postscript/2012/2012.WIFS.database.pdf Towards Reproducible results in authentication based on physical non-cloneable functions: The Forensic Authentication Microstructure Optical Set (FAMOS).]"Proc. Proceedings of IEEE International Workshop on Information Forensics and Security. 2012.

|S. Voloshynovskiy, et al.

PharmaPack Dataset

|1,000 unique classes with 54 images per class.

|Class labeling, many local descriptors, like SIFT and aKaZE, and local feature agreators, like Fisher Vector (FV).

|54,000

|Images and .mat files

|Fine-grain classification

|2017

|Olga, Taran and Shideh, Rezaeifar, et al. "[https://archive-ouverte.unige.ch/unige:97444/ATTACHMENT01 PharmaPack: mobile fine-grained recognition of pharma packages]."Proc. European Signal Processing Conference (EUSIPCO). 2017.

|O. Taran and S. Rezaeifar, et al.

Stanford Dogs Dataset

|Images of 120 breeds of dogs from around the world.

|Train/test splits and ImageNet annotations provided.

|20,580

|Images, text

|Fine-grain classification

|2011

|Khosla, Aditya, et al. "[https://people.csail.mit.edu/khosla/papers/fgvc2011.pdf Novel dataset for fine-grained image categorization: Stanford dogs]."Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC). 2011.Parkhi, Omkar M., et al. "[http://www.robots.ox.ac.uk:5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf Cats and dogs]."Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

|A. Khosla et al.

StanfordExtra Dataset

|2D keypoints and segmentations for the Stanford Dogs Dataset.

|2D keypoints and segmentations provided.

|12,035

|Labelled images

|3D reconstruction/pose estimation

|2020

|{{cite book | arxiv=2007.11110 | doi=10.1007/978-3-030-58621-8 | title=Computer Vision – ECCV 2020 | series=Lecture Notes in Computer Science | year=2020 | volume=12356 | isbn=978-3-030-58620-1 | last1=Biggs | first1=Benjamin | last2=Boyne | first2=Oliver | last3=Charles | first3=James | last4=Fitzgibbon | first4=Andrew | last5=Cipolla | first5=Roberto | s2cid=227173931 }}

|B. Biggs et al.

The Oxford-IIIT Pet Dataset

|37 categories of pets with roughly 200 images of each.

|Breed labeled, tight bounding box, foreground-background segmentation.

|~ 7,400

|Images, text

|Classification, object detection

|2012

|Razavian, Ali, et al. "[https://www.cv-foundation.org/openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf CNN features off-the-shelf: an astounding baseline for recognition]." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014.

|O. Parkhi et al.

Corel Image Features Data Set

|Database of images with features extracted.

|Many features including color histogram, co-occurrence texture, and colormoments,

|68,040

|Text

|Classification, object detection

|1999

|{{cite journal | last1 = Ortega | first1 = Michael | display-authors = et al | year = 1998 | title = Supporting ranked boolean similarity queries in MARS | journal = IEEE Transactions on Knowledge and Data Engineering| volume = 10 | issue = 6| pages = 905–925 | doi=10.1109/69.738357| citeseerx = 10.1.1.36.6079 }}He, Xuming, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán. "[ftp://www-vhost.cs.toronto.edu/public_html/public_html/dist/zemel/Papers/cvpr04.pdf Multiscale conditional random fields for image labeling]{{dead link|date=May 2025|bot=medic}}{{cbignore|bot=medic}}." Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on. Vol. 2. IEEE, 2004.

|M. Ortega-Bindenberger et al.

Online Video Characteristics and Transcoding Time Dataset.

|Transcoding times for various different videos and video properties.

|Video features given.

|168,286

|Text

|Regression

|2015

|Deneke, Tewodros, et al. "[https://ieeexplore.ieee.org/abstract/document/6890256/ Video transcoding time prediction for proactive load balancing]." Multimedia and Expo (ICME), 2014 IEEE International Conference on. IEEE, 2014.

|T. Deneke et al.

Microsoft Sequential Image Narrative Dataset (SIND)

|Dataset for sequential vision-to-language

|Descriptive caption and storytelling given for each photo, and photos are arranged in sequences

|81,743

|Images, text

|Visual storytelling

|2016

|{{cite arXiv |author=Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell |eprint=1604.03968 |title=Visual Storytelling |class=cs.CL |date=13 April 2016 }}

|Microsoft Research

Caltech-UCSD Birds-200-2011 Dataset

|Large dataset of images of birds.

|Part locations for birds, bounding boxes, 312 binary attributes given

|11,788

|Images, text

|Classification

|2011

|Wah, Catherine, et al. "[https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf The caltech-ucsd birds-200-2011 dataset]." (2011).Duan, Kun, et al. "[http://vision.soic.indiana.edu/papers/attributes2012cvpr.pdf Discovering localized attributes for fine-grained recognition]." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

|C. Wah et al.

YouTube-8M

|Large and diverse labeled video dataset

|YouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities

|8 million

|Video, text

|Video classification

|2016

|{{cite web|title=YouTube-8M Dataset|url=https://research.google.com/youtube8m/|website=research.google.com|access-date=1 October 2016}}{{cite arXiv |author1=Abu-El-Haija, Sami |author2=Kothari, Nisarg |author3=Lee, Joonseok |author4=Natsev, Paul |author5=Toderici, George |author6=Varadarajan, Balakrishnan |author7=Vijayanarasimhan, Sudheendra |eprint=1609.08675 |title=YouTube-8M: A Large-Scale Video Classification Benchmark |class=cs.CV |date=27 September 2016 }}

|S. Abu-El-Haija et al.

YFCC100M

|Large and diverse labeled image and video dataset

|Flickr Videos and Images and associated description, titles, tags, and other metadata (such as EXIF and geotags)

|100 million

|Video, Image, Text

|Video and Image classification

|2016

|{{cite web|title=YFCC100M Dataset|url=http://mmcommons.org|website=mmcommons.org|publisher=Yahoo-ICSI-LLNL|access-date=1 June 2017}}{{cite journal |author1=Bart Thomee |author2=David A Shamma |author3=Gerald Friedland |author4=Benjamin Elizalde |author5=Karl Ni |author6=Douglas Poland |author7=Damian Borth |author8=Li-Jia Li |arxiv=1503.01817 |title=Yfcc100m: The new data in multimedia research |date=25 April 2016 |doi=10.1145/2812802 |volume=59 |issue=2 |journal=Communications of the ACM |pages=64–73 |s2cid=207230134 }}

|B. Thomee et al.

Discrete LIRIS-ACCEDE

|Short videos annotated for valence and arousal.

|Valence and arousal labels.

|9800

|Video

|Video emotion elicitation detection

|2015

|Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "[https://hal.archives-ouvertes.fr/hal-01375518/document LIRIS-ACCEDE: A Video Database for Affective Content Analysis]," in IEEE Transactions on Affective Computing, 2015.

|Y. Baveye et al.

Continuous LIRIS-ACCEDE

|Long videos annotated for valence and arousal while also collecting Galvanic Skin Response.

|Valence and arousal labels.

|30

|Video

|Video emotion elicitation detection

|2015

|Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "[https://hal.archives-ouvertes.fr/hal-01193144/document Deep Learning vs. Kernel Methods: Performance for Emotion Prediction in Videos]," in 2015 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2015.

|Y. Baveye et al.

MediaEval LIRIS-ACCEDE

|Extension of Discrete LIRIS-ACCEDE including annotations for violence levels of the films.

|Violence, valence and arousal labels.

|10900

|Video

|Video emotion elicitation detection

|2015

|M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E. Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, "[https://www.researchgate.net/profile/Hanli_Wang2/publication/309704559_The_MediaEval_2015_Affective_Impact_of_Movies_Task/links/581dada308ae12715af33bc8/The-MediaEval-2015-Affective-Impact-of-Movies-Task.pdf The mediaeval 2015 affective impact of movies task]," in MediaEval 2015 Workshop, 2015.

|Y. Baveye et al.

Leeds Sports Pose

|Articulated human pose annotations in 2000 natural sports images from Flickr.

|Rough crop around single person of interest with 14 joint labels

|2000

|Images plus .mat file labels

|Human pose estimation

|2010

|S. Johnson and M. Everingham, "[http://sam.johnson.io/research/publications/johnson10bmvc.pdf Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation] {{Webarchive|url=https://web.archive.org/web/20211104045320/http://sam.johnson.io/research/publications/johnson10bmvc.pdf |date=2021-11-04 }}", in Proceedings of the 21st British Machine Vision Conference (BMVC2010)

|S. Johnson and M. Everingham

Leeds Sports Pose Extended Training

|Articulated human pose annotations in 10,000 natural sports images from Flickr.

|14 joint labels via crowdsourcing

|10000

|Images plus .mat file labels

|Human pose estimation

|2011

|S. Johnson and M. Everingham, "[http://sam.johnson.io/research/publications/johnson11cvpr.pdf Learning Effective Human Pose Estimation from Inaccurate Annotation] {{Webarchive|url=https://web.archive.org/web/20211104114144/http://sam.johnson.io/research/publications/johnson11cvpr.pdf |date=2021-11-04 }}", In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2011)

|S. Johnson and M. Everingham

MCQ Dataset

|6 different real multiple choice-based exams (735 answer sheets and 33,540 answer boxes) to evaluate computer vision techniques and systems developed for multiple choice test assessment systems.

|None

|735 answer sheets and 33,540 answer boxes

|Images and .mat file labels

|Development of multiple choice test assessment systems

|2017

|{{cite arXiv|last1=Afifi|first1=Mahmoud|last2=Hussain|first2=Khaled F.|date=2017-11-02|title=The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques|eprint=1711.00972|class=cs.CV}}{{Cite web|url=https://sites.google.com/view/mcq-dataset/mcqe-dataset|title=MCQ Dataset|website=sites.google.com|language=en-US|access-date=2017-11-18}}

|Afifi, M. et al.

Surveillance Videos

|Real surveillance videos cover a large surveillance time (7 days with 24 hours each).

|None

|19 surveillance videos (7 days with 24 hours each).

|Videos

|Data compression

|2016

|{{Cite book|last1=Taj-Eddin|first1=I. A. T. F.|last2=Afifi|first2=M.|last3=Korashy|first3=M.|last4=Hamdy|first4=D.|last5=Nasser|first5=M.|last6=Derbaz|first6=S.|title=2016 Sixth International Conference on Digital Information and Communication Technology and its Applications (DICTAP) |chapter=A new compression technique for surveillance videos: Evaluation using new dataset |date=July 2016|pages=159–164|doi=10.1109/DICTAP.2016.7544020|isbn=978-1-4673-9609-7|s2cid=8698850}}

|Taj-Eddin, I. A. T. F. et al.

LILA BC

|Labeled Information Library of Alexandria: Biology and Conservation. Labeled images that support machine learning research around ecology and environmental science.

|None

|~10M images

|Images

|Classification

|2019

|{{cite journal|last1=Tabak|first1=Michael A.|last2=Norouzzadeh|first2=Mohammad S.|last3=Wolfson|first3=David W.|last4=Sweeney|first4=Steven J.|last5=Vercauteren|first5=Kurt C.|last6=Snow|first6=Nathan P.|last7=Halseth|first7=Joseph M.|last8=Di Salvo|first8=Paul A.|last9=Lewis|first9=Jesse S.|last10=White|first10=Michael D.|last11=Teton|first11=Ben|last12=Beasley|first12=James C.|last13=Schlichting|first13=Peter E.|last14=Boughton|first14=Raoul K.|last15=Wight|first15=Bethany|last16=Newkirk|first16=Eric S.|last17=Ivan|first17=Jacob S.|last18=Odell|first18=Eric A.|last19=Brook|first19=Ryan K.|last20=Lukacs|first20=Paul M.|last21=Moeller|first21=Anna K.|last22=Mandeville|first22=Elizabeth G.|last23=Clune|first23=Jeff|last24=Miller|first24=Ryan S.|last25=Photopoulou|first25=Theoni|title=Machine learning to classify animal species in camera trap images: Applications in ecology|journal=Methods in Ecology and Evolution|volume=10|issue=4|pages=585–590|year=2018|issn=2041-210X|doi=10.1111/2041-210X.13120|doi-access=free}}

|LILA working group

Can We See Photosynthesis?

|32 videos for eight live and eight dead leaves recorded under both DC and AC lighting conditions.

|None

|32 videos

|Videos

|Liveness detection of plants

|2017

|{{Cite journal|last1=Taj-Eddin|first1=Islam A. T. F.|last2=Afifi|first2=Mahmoud|last3=Korashy|first3=Mostafa|last4=Ahmed|first4=Ali H.|last5=Ng|first5=Yoke Cheng|last6=Hernandez|first6=Evelyng|last7=Abdel-Latif|first7=Salma M.|date=November 2017|title=Can we see photosynthesis? Magnifying the tiny color changes of plant green leaves using Eulerian video magnification|journal=Journal of Electronic Imaging|volume=26|issue=6|pages=060501|doi=10.1117/1.jei.26.6.060501|issn=1017-9909|arxiv=1706.03867|bibcode=2017JEI....26f0501T|s2cid=12367169}}

|Taj-Eddin, I. A. T. F. et al.

Mathematical Mathematics Memes

|Collection of 10,000 memes on mathematics.

|None

|~10,000

|Images

|Visual storytelling, object detection.

|2021

|{{Cite web |title=Mathematical Mathematics Memes |url=https://www.kaggle.com/abdelghanibelgaid/mathematical-mathematics-memes}}

|Mathematical Mathematics Memes

Flickr-Faces-HQ Dataset

|Collection of images containing a face each, crawled from Flickr

|Pruned with "various automatic filters", cropped and aligned to faces, and had images of statues, paintings, or photos of photos removed via crowdsourcing

|70,000

|Images

|Face Generation

|2019

|{{Cite book |last1=Karras |first1=Tero |last2=Laine |first2=Samuli |last3=Aila |first3=Timo |title=2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |chapter=A Style-Based Generator Architecture for Generative Adversarial Networks |date=June 2019 |chapter-url=http://dx.doi.org/10.1109/cvpr.2019.00453 |pages=4396–4405 |publisher=IEEE |doi=10.1109/cvpr.2019.00453|arxiv=1812.04948 |isbn=978-1-7281-3293-8 |s2cid=54482423 }}

|Karras et al.

Fruits-360 dataset

|Collection of images containing 170 fruits, vegetables, nuts, and seeds.

|100x100 pixels, white background.

|115499

|Images (jpg)

|Classification

|2017–2025

|{{cite web|last1 = Oltean| first1 = Mihai | year = 2017 | title = Fruits-360 dataset| website = GitHub | url = https://www.github.com/fruits-360}}

|Mihai Oltean

List of datasets in computer vision and image processing

Object detection and recognition

= 3D Objects =

=Object detection and recognition for autonomous vehicles=

Facial recognition

Action recognition

Handwriting and character recognition

Aerial images

Underwater images

Other images

References