Margin-infused relaxed algorithm

Margin-infused relaxed algorithm (MIRA)

{{cite journal

| last1 = Crammer | first1 = Koby

| last2 = Singer | first2 = Yoram

| year = 2003

| title = Ultraconservative Online Algorithms for Multiclass Problems

| journal = Journal of Machine Learning Research

| volume = 3

| pages = 951–991

| url = http://jmlr.csail.mit.edu/papers/v3/crammer03a.html

}}

is a machine learning algorithm, an online algorithm for multiclass classification problems. It is designed to learn a set of parameters (vector or matrix) by processing all the given training examples one-by-one and updating the parameters according to each training example, so that the current training example is classified correctly with a margin against incorrect classifications at least as large as their loss.

{{cite conference

| last1 = McDonald | first1 = Ryan

| last2 = Crammer | first2 = Koby

| last3 = Pereira | first3 = Fernando

| title = Online Large-Margin Training of Dependency Parsers

| book-title = Proceedings of the 43rd Annual Meeting of the ACL

| publisher = Association for Computational Linguistics

| date = 2005

| pages = 91–98

| url = http://aclweb.org/anthology-new/P/P05/P05-1012.pdf }}

The change of the parameters is kept as small as possible.

A two-class version called binary MIRA simplifies the algorithm by not requiring the solution of a quadratic programming problem (see below). When used in a one-vs-all configuration, binary MIRA can be extended to a multiclass learner that approximates full MIRA, but may be faster to train.

The flow of the algorithmWatanabe, T. et al (2007): "Online Large Margin Training for Statistical Machine Translation". In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 764–773.Bohnet, B. (2009): Efficient Parsing of Syntactic and Semantic Dependency Structures. Proceedings of Conference on Natural Language Learning (CoNLL), Boulder, 67–72. looks as follows:

Input: Training examples $T = \{x_i, y_i\}$

Output: Set of parameters $w$

$i$ ← 0, $w^{(0)}$ ← 0

for $n$ ← 1 to $N$

for $t$ ← 1 to $|T|$

$w^{(i+1)}$ ← update $w^{(i)}$ according to $\{x_t, y_t\}$

$i$ ← $i + 1$

end for

return $\frac{\sum_{j=1}^{N \times |T|} w^{(j)}}{N \times |T|}$

The update step is then formalized as a quadratic programming problem: Find $min\|w^{(i+1)} - w^{(i)}\|$ , so that $score(x_t,y_t) - score(x_t,y')\geq L(y_t,y')\ \forall y'$ , i.e. the score of the current correct training $y$ must be greater than the score of any other possible $y'$ by at least the loss (number of errors) of that $y'$ in comparison to $y$ .

References

External links

[https://github.com/jihelhere/adMIRAble adMIRAble] – MIRA implementation in C++
[https://code.google.com/p/miralium/ Miralium] – MIRA implementation in Java
[https://cwiki.apache.org/confluence/display/MAHOUT/Online+Passive+Aggressive MIRA implementation] for Mahout in Hadoop

Category:Classification algorithms