List of datasets in computer vision and image processing
{{short description|none}}
{{machine learning bar|Related articles}}
This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.
Object detection and recognition
= 3D Objects =
See (Calli et al, 2015){{Cite journal |last1=Calli |first1=Berk |last2=Walsman |first2=Aaron |last3=Singh |first3=Arjun |last4=Srinivasa |first4=Siddhartha |last5=Abbeel |first5=Pieter |last6=Dollar |first6=Aaron M. |date=September 2015 |title=Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set |url=https://ieeexplore.ieee.org/document/7254318 |journal=IEEE Robotics & Automation Magazine |volume=22 |issue=3 |pages=36–52 |doi=10.1109/MRA.2015.2448951 |issn=1070-9932|arxiv=1502.03143 }} for a review of 33 datasets of 3D object as of 2015. See (Downs et al., 2022){{Cite book |last1=Downs |first1=Laura |last2=Francis |first2=Anthony |last3=Koenig |first3=Nate |last4=Kinman |first4=Brandon |last5=Hickman |first5=Ryan |last6=Reymann |first6=Krista |last7=McHugh |first7=Thomas B. |last8=Vanhoucke |first8=Vincent |chapter=Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items |date=2022-05-23 |title=2022 International Conference on Robotics and Automation (ICRA) |chapter-url=https://ieeexplore.ieee.org/document/9811809 |publisher=IEEE |pages=2553–2560 |doi=10.1109/ICRA46639.2022.9811809 |isbn=978-1-7281-9681-7|arxiv=2204.11918 }} for a review of more datasets as of 2022.
=Object detection and recognition for autonomous vehicles=
{{Self-driving car}}
Facial recognition
In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces. See {{Cite web |title=Face Recognition Homepage - Databases |url=https://www.face-rec.org/databases/ |access-date=2025-04-26 |website=www.face-rec.org}} for a curated list of datasets, focused on the pre-2005 period.
Action recognition
class="wikitable sortable" style="width: 100%"
! scope="col" style="width: 15%;" |Dataset name ! scope="col" style="width: 18%;" | Brief description ! scope="col" style="width: 18%;" | Preprocessing ! scope="col" style="width: 6%;" | Instances ! scope="col" style="width: 7%;" | Format ! scope="col" style="width: 7%;" | Default Task ! scope="col" style="width: 6%;" | Created (updated) ! scope="col" style="width: 6%;" | Reference ! scope="col" style="width: 11%;" | Creator |
AVA-Kinetics Localized Human Actions Video
|Annotated 80 action classes from keyframes from videos from Kinetics-700. | |1.6 million annotations. 238,906 video clips, 624,430 keyframes. |Annotations, videos. |Action prediction |2020 |{{Cite web |title=AVA: A Video Dataset of Atomic Visual Action |url=https://research.google.com/ava/ |access-date=2024-10-18 |website=research.google.com}}{{cite arXiv |last1=Li |first1=Ang |title=The AVA-Kinetics Localized Human Actions Video Dataset |date=2020-05-20 |eprint=2005.00214 |last2=Thotakuri |first2=Meghana |last3=Ross |first3=David A. |last4=Carreira |first4=João |last5=Vostrikov |first5=Alexander |last6=Zisserman |first6=Andrew|class=cs.CV }} |Li et al from Perception Team of Google AI. |
TV Human Interaction Dataset
|Videos from 20 different TV shows for prediction social actions: handshake, high five, hug, kiss and none. |None. |6,766 video clips |video clips |Action prediction |2013 |Patron-Perez, A. et al. |
Berkeley Multimodal Human Action Database (MHAD)
|Recordings of a single person performing 12 actions |MoCap pre-processing |660 action samples |8 PhaseSpace Motion Capture, 2 Stereo Cameras, 4 Quad Cameras, 6 accelerometers, 4 microphones |Action classification |2013 |Ofli, F. et al. |
THUMOS Dataset
|Large video dataset for action classification. |Actions classified and labeled. |45M frames of video |Video, images, text |Classification, action detection |2013 |Jiang, Y. G., et al. "THUMOS challenge: Action recognition with a large number of classes." ICCV Workshop on Action Recognition with a Large Number of Classes, http://crcv.ucf.edu/ICCV13-Action-Workshop. 2013.Simonyan, Karen, and Andrew Zisserman. "[https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf Two-stream convolutional networks for action recognition in videos]." Advances in Neural Information Processing Systems. 2014. |Y. Jiang et al. |
MEXAction2
|Video dataset for action localization and spotting |Actions classified and labeled. |1000 |Video |Action detection |2014 |Stoian et al. |
Handwriting and character recognition
Aerial images
Underwater images
class="wikitable sortable" style="width: 100%"
! scope="col" style="width: 15%;" |Dataset name ! scope="col" style="width: 18%;" | Brief description ! scope="col" style="width: 18%;" | Preprocessing ! scope="col" style="width: 6%;" | Instances ! scope="col" style="width: 7%;" | Format ! scope="col" style="width: 7%;" | Default Task ! scope="col" style="width: 6%;" | Created (updated) ! scope="col" style="width: 6%;" | Reference ! scope="col" style="width: 11%;" | Creator |
SUIM Dataset
|The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants. |Images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. |1,635 |Images |Segmentation |2020 |Md Jahidul Islam et al. |
LIACI Dataset
|Images have been collected during underwater ship inspections and annotated by human domain experts. |Images with pixel annotations for ten object categories: defects, corrosion, paint peel, marine growth, sea chest gratings, overboard valves, propeller, anodes, bilge keel and ship hull. |1,893 |Images |Segmentation |2022 |Waszak et al. |
Other images
class="wikitable sortable" style="width: 100%"
! scope="col" style="width: 15%;" |Dataset name ! scope="col" style="width: 18%;" | Brief description ! scope="col" style="width: 18%;" | Preprocessing ! scope="col" style="width: 6%;" | Instances ! scope="col" style="width: 7%;" | Format ! scope="col" style="width: 7%;" | Default Task ! scope="col" style="width: 6%;" | Created (updated) ! scope="col" style="width: 6%;" | Reference ! scope="col" style="width: 11%;" | Creator |
Kodak Lossless True Color Image Suite
|RGB images for testing image compression. |None |24 |Image |Image compression |1999 |
NRC-GAMMA
|A novel benchmark gas meter image dataset |None |28,883 |Image, Label |Classification |2021 |{{cite arXiv|last1=Ebadi|first1=Ashkan|last2=Paul|first2=Patrick|last3=Auer|first3=Sofia|last4=Tremblay|first4=Stéphane|date=2021-11-12|title=NRC-GAMMA: Introducing a Novel Large Gas Meter Image Dataset|class=cs.CV|eprint=2111.06827}}{{Cite journal|last=Canada|first=Government of Canada National Research Council|title=The gas meter image dataset (NRC-GAMMA) - NRC Digital Repository|url=https://nrc-digital-repository.canada.ca/eng/view/object/?id=ba1fc493-e65f-4c0a-ab31-ecbcdf00bfa4|access-date=2021-12-02|website=nrc-digital-repository.canada.ca|year=2021|doi=10.4224/3c8s-z290}} |A. Ebadi, P. Paul, S. Auer, & S. Tremblay |
The SUPATLANTIQUE dataset
|Images of scanned official and Wikipedia documents |None |4908 |TIFF/pdf |Source device identification, forgery detection, Classification,.. |2020 |C. Ben Rabah et al. |
Density functional theory quantum simulations of graphene
|Labelled images of raw input to a simulation of graphene |Raw data (in HDF5 format) and output labels from density functional theory quantum simulation | 60744 test and 501473 training files |Labeled images |Regression |2019 |K. Mills & I. Tamblyn |
Quantum simulations of an electron in a two dimensional potential well
|Labelled images of raw input to a simulation of 2d Quantum mechanics |Raw data (in HDF5 format) and output labels from quantum simulation |1.3 million images |Labeled images |Regression |2017 |K. Mills, M.A. Spanner, & I. Tamblyn |
MPII Cooking Activities Dataset
|Videos and images of various cooking activities. |Activity paths and directions, labels, fine-grained motion labeling, activity class, still image extraction and labeling. |881,755 frames |Labeled video, images, text |Classification |2012 |{{cite conference | last1=Rohrbach | first1=M. | last2=Amin | first2=S. | last3=Andriluka | first3=M. | last4=Schiele | first4=B. | title=2012 IEEE Conference on Computer Vision and Pattern Recognition | chapter=A database for fine grained activity detection of cooking activities | publisher=IEEE | year=2012 | pages=1194–1201 | isbn=978-1-4673-1228-8 | doi=10.1109/cvpr.2012.6247801 }}Kuehne, Hilde, Ali Arslan, and Thomas Serre. "[https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Kuehne_The_Language_of_2014_CVPR_paper.pdf The language of actions: Recovering the syntax and semantics of goal-directed human activities]."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. |M. Rohrbach et al. |
FAMOS Dataset
|5,000 unique microstructures, all samples have been acquired 3 times with two different cameras. |Original PNG files, sorted per camera and then per acquisition. MATLAB datafiles with one 16384 times 5000 matrix per camera per acquisition. |30,000 |Images and .mat files |Authentication |2012 |S. Voloshynovskiy, et al. |
PharmaPack Dataset
|1,000 unique classes with 54 images per class. |Class labeling, many local descriptors, like SIFT and aKaZE, and local feature agreators, like Fisher Vector (FV). |54,000 |Images and .mat files |Fine-grain classification |2017 |O. Taran and S. Rezaeifar, et al. |
Stanford Dogs Dataset
|Images of 120 breeds of dogs from around the world. |Train/test splits and ImageNet annotations provided. |20,580 |Images, text |Fine-grain classification |2011 |Khosla, Aditya, et al. "[https://people.csail.mit.edu/khosla/papers/fgvc2011.pdf Novel dataset for fine-grained image categorization: Stanford dogs]."Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC). 2011.Parkhi, Omkar M., et al. "[http://www.robots.ox.ac.uk:5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf Cats and dogs]."Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. |A. Khosla et al. |
StanfordExtra Dataset
|2D keypoints and segmentations for the Stanford Dogs Dataset. |2D keypoints and segmentations provided. |12,035 |Labelled images |3D reconstruction/pose estimation |2020 |B. Biggs et al. |
The Oxford-IIIT Pet Dataset
|37 categories of pets with roughly 200 images of each. |Breed labeled, tight bounding box, foreground-background segmentation. |~ 7,400 |Images, text |Classification, object detection |2012 |Razavian, Ali, et al. "[https://www.cv-foundation.org/openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf CNN features off-the-shelf: an astounding baseline for recognition]." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014. |O. Parkhi et al. |
Corel Image Features Data Set
|Database of images with features extracted. |Many features including color histogram, co-occurrence texture, and colormoments, |68,040 |Text |Classification, object detection |1999 |{{cite journal | last1 = Ortega | first1 = Michael | display-authors = et al | year = 1998 | title = Supporting ranked boolean similarity queries in MARS | journal = IEEE Transactions on Knowledge and Data Engineering| volume = 10 | issue = 6| pages = 905–925 | doi=10.1109/69.738357| citeseerx = 10.1.1.36.6079 }}He, Xuming, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán. "[ftp://www-vhost.cs.toronto.edu/public_html/public_html/dist/zemel/Papers/cvpr04.pdf Multiscale conditional random fields for image labeling]{{dead link|date=May 2025|bot=medic}}{{cbignore|bot=medic}}." Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on. Vol. 2. IEEE, 2004. |M. Ortega-Bindenberger et al. |
Online Video Characteristics and Transcoding Time Dataset.
|Transcoding times for various different videos and video properties. |Video features given. |168,286 |Text |Regression |2015 |T. Deneke et al. |
Microsoft Sequential Image Narrative Dataset (SIND)
|Dataset for sequential vision-to-language |Descriptive caption and storytelling given for each photo, and photos are arranged in sequences |81,743 |Images, text |Visual storytelling |2016 |
Caltech-UCSD Birds-200-2011 Dataset
|Large dataset of images of birds. |Part locations for birds, bounding boxes, 312 binary attributes given |11,788 |Images, text |Classification |2011 |Wah, Catherine, et al. "[https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf The caltech-ucsd birds-200-2011 dataset]." (2011).Duan, Kun, et al. "[http://vision.soic.indiana.edu/papers/attributes2012cvpr.pdf Discovering localized attributes for fine-grained recognition]." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. |C. Wah et al. |
YouTube-8M
|Large and diverse labeled video dataset |YouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities |8 million |Video, text |Video classification |2016 |{{cite web|title=YouTube-8M Dataset|url=https://research.google.com/youtube8m/|website=research.google.com|access-date=1 October 2016}}{{cite arXiv |author1=Abu-El-Haija, Sami |author2=Kothari, Nisarg |author3=Lee, Joonseok |author4=Natsev, Paul |author5=Toderici, George |author6=Varadarajan, Balakrishnan |author7=Vijayanarasimhan, Sudheendra |eprint=1609.08675 |title=YouTube-8M: A Large-Scale Video Classification Benchmark |class=cs.CV |date=27 September 2016 }} |S. Abu-El-Haija et al. |
YFCC100M
|Large and diverse labeled image and video dataset |Flickr Videos and Images and associated description, titles, tags, and other metadata (such as EXIF and geotags) |100 million |Video, Image, Text |Video and Image classification |2016 |{{cite web|title=YFCC100M Dataset|url=http://mmcommons.org|website=mmcommons.org|publisher=Yahoo-ICSI-LLNL|access-date=1 June 2017}}{{cite journal |author1=Bart Thomee |author2=David A Shamma |author3=Gerald Friedland |author4=Benjamin Elizalde |author5=Karl Ni |author6=Douglas Poland |author7=Damian Borth |author8=Li-Jia Li |arxiv=1503.01817 |title=Yfcc100m: The new data in multimedia research |date=25 April 2016 |doi=10.1145/2812802 |volume=59 |issue=2 |journal=Communications of the ACM |pages=64–73 |s2cid=207230134 }} |B. Thomee et al. |
Discrete LIRIS-ACCEDE
|Short videos annotated for valence and arousal. |Valence and arousal labels. |9800 |Video |Video emotion elicitation detection |2015 |Y. Baveye et al. |
Continuous LIRIS-ACCEDE
|Long videos annotated for valence and arousal while also collecting Galvanic Skin Response. |Valence and arousal labels. |30 |Video |Video emotion elicitation detection |2015 |Y. Baveye et al. |
MediaEval LIRIS-ACCEDE
|Extension of Discrete LIRIS-ACCEDE including annotations for violence levels of the films. |Violence, valence and arousal labels. |10900 |Video |Video emotion elicitation detection |2015 |Y. Baveye et al. |
Leeds Sports Pose
|Articulated human pose annotations in 2000 natural sports images from Flickr. |Rough crop around single person of interest with 14 joint labels |2000 |Images plus .mat file labels |Human pose estimation |2010 |S. Johnson and M. Everingham |
Leeds Sports Pose Extended Training
|Articulated human pose annotations in 10,000 natural sports images from Flickr. |14 joint labels via crowdsourcing |10000 |Images plus .mat file labels |Human pose estimation |2011 |S. Johnson and M. Everingham |
MCQ Dataset
|6 different real multiple choice-based exams (735 answer sheets and 33,540 answer boxes) to evaluate computer vision techniques and systems developed for multiple choice test assessment systems. |None |735 answer sheets and 33,540 answer boxes |Images and .mat file labels |Development of multiple choice test assessment systems |2017 |{{cite arXiv|last1=Afifi|first1=Mahmoud|last2=Hussain|first2=Khaled F.|date=2017-11-02|title=The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques|eprint=1711.00972|class=cs.CV}}{{Cite web|url=https://sites.google.com/view/mcq-dataset/mcqe-dataset|title=MCQ Dataset|website=sites.google.com|language=en-US|access-date=2017-11-18}} |Afifi, M. et al. |
Surveillance Videos
|Real surveillance videos cover a large surveillance time (7 days with 24 hours each). |None |19 surveillance videos (7 days with 24 hours each). |Videos |Data compression |2016 |Taj-Eddin, I. A. T. F. et al. |
LILA BC
|Labeled Information Library of Alexandria: Biology and Conservation. Labeled images that support machine learning research around ecology and environmental science. |None |~10M images |Images |Classification |2019 |LILA working group |
Can We See Photosynthesis?
|32 videos for eight live and eight dead leaves recorded under both DC and AC lighting conditions. |None |32 videos |Videos |Liveness detection of plants |2017 |{{Cite journal|last1=Taj-Eddin|first1=Islam A. T. F.|last2=Afifi|first2=Mahmoud|last3=Korashy|first3=Mostafa|last4=Ahmed|first4=Ali H.|last5=Ng|first5=Yoke Cheng|last6=Hernandez|first6=Evelyng|last7=Abdel-Latif|first7=Salma M.|date=November 2017|title=Can we see photosynthesis? Magnifying the tiny color changes of plant green leaves using Eulerian video magnification|journal=Journal of Electronic Imaging|volume=26|issue=6|pages=060501|doi=10.1117/1.jei.26.6.060501|issn=1017-9909|arxiv=1706.03867|bibcode=2017JEI....26f0501T|s2cid=12367169}} |Taj-Eddin, I. A. T. F. et al. |
Mathematical Mathematics Memes
|Collection of 10,000 memes on mathematics. |None |~10,000 |Images |Visual storytelling, object detection. |2021 |Mathematical Mathematics Memes |
Flickr-Faces-HQ Dataset
|Collection of images containing a face each, crawled from Flickr |Pruned with "various automatic filters", cropped and aligned to faces, and had images of statues, paintings, or photos of photos removed via crowdsourcing |70,000 |Images |Face Generation |2019 |Karras et al. |
Fruits-360 dataset
|Collection of images containing 170 fruits, vegetables, nuts, and seeds. |100x100 pixels, white background. |115499 |Images (jpg) |Classification |2017–2025 |{{cite web|last1 = Oltean| first1 = Mihai | year = 2017 | title = Fruits-360 dataset| website = GitHub | url = https://www.github.com/fruits-360}} |Mihai Oltean |