Fisher kernel

In statistical classification, the Fisher kernel, named after Ronald Fisher, is a function that measures the similarity of two objects on the basis of sets of measurements for each object and a statistical model. In a classification procedure, the class for a new object (whose real class is unknown) can be estimated by minimising, across classes, an average of the Fisher kernel distance from the new object to each known member of the given class.

The Fisher kernel was introduced in 1998.{{Cite journal |last=Jaakkola |first=Tommi |last2=Haussler |first2=David |date=1998 |title=Exploiting Generative Models in Discriminative Classifiers |url=https://proceedings.neurips.cc/paper_files/paper/1998/hash/db1915052d15f7815c8b88e879465a1e-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=MIT Press |volume=11}} It combines the advantages of generative statistical models (like the hidden Markov model) and those of discriminative methods (like support vector machines):

generative models can process data of variable length (adding or removing data is well-supported)
discriminative methods can have flexible criteria and yield better results.

Derivation

= Fisher score =

The Fisher kernel makes use of the Fisher score, defined as

: $U_X = \nabla_{\theta} \log P(X|\theta)$

with θ being a set (vector) of parameters. The function taking θ to log P(X|θ) is the log-likelihood of the probabilistic model.

= Fisher kernel =

The Fisher kernel is defined as

: $K(X_i, X_j) = U_{X_i}^T \mathcal{I}^{-1} U_{X_j}$

with $\mathcal{I}$ being the Fisher information matrix.

Applications

= Information retrieval =

The Fisher kernel is the kernel for a generative probabilistic model. As such, it constitutes a bridge between generative and probabilistic models of documents.Cyril Goutte, Eric Gaussier, Nicola Cancedda, Hervé Dejean (2004))[http://www.xrce.xerox.com/Research-Development/Publications/2003-079 "Generative vs Discriminative Approaches to Entity Recognition from Label-Deficient Data"] JADT 2004, 7èmes journées internationales analyse statistique des données textuelles, Louvain-la-Neuve, Belgium, 10-12 mars 2004 Fisher kernels exist for numerous models, notably tf–idf,{{cite conference|author=Charles Elkan |title=Deriving TF-IDF as a fisher kernel |year=2005 |conference=SPIRE |url=http://lvk.cs.msu.su/~bruzz/articles/not_processed/spire05.pdf |url-status=dead |archiveurl=https://web.archive.org/web/20131220030051/http://lvk.cs.msu.su/~bruzz/articles/not_processed/spire05.pdf |archivedate=December 20, 2013 }} Naive Bayes and probabilistic latent semantic analysis.

= Image classification and retrieval =

The Fisher kernel can also be applied to image representation for classification or retrieval problems. Currently, the most popular bag-of-visual-words representation suffers from sparsity and high dimensionality. The Fisher kernel can result in a compact and dense representation, which is more desirable for image classification{{Cite journal |last=Perronnin |first=Florent |last2=Dance |first2=Christopher |date=June 2007 |title=Fisher Kernels on Visual Vocabularies for Image Categorization |url=https://ieeexplore.ieee.org/document/4270291/ |journal=2007 IEEE Conference on Computer Vision and Pattern Recognition |pages=1–8 |doi=10.1109/CVPR.2007.383266}}{{Cite journal |last=Perronnin |first=Florent |last2=Sánchez |first2=Jorge |last3=Mensink |first3=Thomas |date=2010 |editor-last=Daniilidis |editor-first=Kostas |editor2-last=Maragos |editor2-first=Petros |editor3-last=Paragios |editor3-first=Nikos |title=Improving the Fisher Kernel for Large-Scale Image Classification |url=https://link.springer.com/chapter/10.1007/978-3-642-15561-1_11 |journal=Computer Vision – ECCV 2010 |language=en |location=Berlin, Heidelberg |publisher=Springer |pages=143–156 |doi=10.1007/978-3-642-15561-1_11 |isbn=978-3-642-15561-1}} and retrieval{{Cite journal |last=Jégou |first=Hervé |last2=Douze |first2=Matthijs |last3=Schmid |first3=Cordelia |last4=Pérez |first4=Patrick |date=June 2010 |title=Aggregating local descriptors into a compact image representation |url=https://ieeexplore.ieee.org/document/5540039/ |journal=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition |pages=3304–3311 |doi=10.1109/CVPR.2010.5540039}}{{Cite journal |last=Twinanda |first=Andru Putra |last2=De Mathelin |first2=Michel |last3=Padoy |first3=Nicolas |date=2014 |editor-last=Golland |editor-first=Polina |editor2-last=Hata |editor2-first=Nobuhiko |editor3-last=Barillot |editor3-first=Christian |editor4-last=Hornegger |editor4-first=Joachim |editor5-last=Howe |editor5-first=Robert |title=Fisher Kernel Based Task Boundary Retrieval in Laparoscopic Database with Single Video Query |url=https://link.springer.com/chapter/10.1007/978-3-319-10443-0_52 |journal=Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014 |language=en |location=Cham |publisher=Springer International Publishing |pages=409–416 |doi=10.1007/978-3-319-10443-0_52 |isbn=978-3-319-10443-0}} problems.

The Fisher Vector (FV), a special, approximate, and improved case of the general Fisher kernel,{{Cite web|url=http://www.vlfeat.org/api/fisher-fundamentals.html|title=VLFeat - Documentation > C API|website=www.vlfeat.org|access-date=2017-03-04}} is an image representation obtained by pooling local image features. The FV encoding stores the mean and the covariance deviation vectors per component k of the Gaussian-Mixture-Model (GMM) and each element of the local feature descriptors together. In a systematic comparison, FV outperformed all compared encoding methods (Bag of Visual Words (BoW), Kernel Codebook encoding (KCB), Locality Constrained Linear Coding (LLC), Vector of Locally Aggregated Descriptors (VLAD)) showing that the encoding of second order information (aka codeword covariances) indeed benefits classification performance.{{Cite journal|last1=Seeland|first1=Marco|last2=Rzanny|first2=Michael|last3=Alaqraa|first3=Nedal|last4=Wäldchen|first4=Jana|last5=Mäder|first5=Patrick|date=2017-02-24|title=Plant species classification using flower images—A comparative study of local feature representations|journal=PLOS ONE|volume=12|issue=2|pages=e0170629|doi=10.1371/journal.pone.0170629|issn=1932-6203|pmc=5325198|pmid=28234999|doi-access=free|bibcode=2017PLoSO..1270629S }}

Notes and references

{{cite book |last1=Schölkopf |first1=Bernhard |title=Learning with kernels: support vector machines, regularization, optimization, and beyond |last2=Smola |first2=Alexander J. |date=2002 |publisher=MIT press |isbn=9780262194754 |chapter=15. Kernel Fisher Discriminant}}
{{Cite journal |last=Sánchez |first=Jorge |last2=Perronnin |first2=Florent |last3=Mensink |first3=Thomas |last4=Verbeek |first4=Jakob |date=December 2013 |title=Image Classification with the Fisher Vector: Theory and Practice |url=http://link.springer.com/10.1007/s11263-013-0636-x |journal=International Journal of Computer Vision |language=en |volume=105 |issue=3 |pages=222–245 |doi=10.1007/s11263-013-0636-x |issn=0920-5691}}

Category:Kernel methods for machine learning