elastic net regularization

{{short description|Statistical regression method}}

In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.

Nevertheless, elastic net regularization is typically more accurate than both methods with regard to reconstruction.{{cite journal | last1 = Huang | first1 = Yunfei. | display-authors = etal | year = 2019 | title = Traction force microscopy with optimized regularization and automated Bayesian parameter selection for comparing cells | journal = Scientific Reports | volume = 9 | number = 1| page = 537 | doi = 10.1038/s41598-018-36896-x | pmid = 30679578 | doi-access = free | pmc = 6345967 | arxiv = 1810.05848 | bibcode = 2019NatSR...9..539H }}

Specification

The elastic net method overcomes the limitations of the LASSO (least absolute shrinkage and selection operator) method which uses a penalty function based on

:\|\beta\|_1 = \textstyle \sum_{j=1}^p |\beta_j|.

Use of this penalty function has several limitations.{{cite journal|last1=Zou|first1=Hui|first2=Trevor|last2=Hastie|date=2005|title=Regularization and Variable Selection via the Elastic Net|journal=Journal of the Royal Statistical Society, Series B|volume=67|issue=2|pages=301–320|doi=10.1111/j.1467-9868.2005.00503.x|citeseerx=10.1.1.124.4696|s2cid=122419596 }} For example, in the "large p, small n" case (high-dimensional data with few examples), the LASSO selects at most n variables before it saturates. Also if there is a group of highly correlated variables, then the LASSO tends to select one variable from a group and ignore the others. To overcome these limitations, the elastic net adds a quadratic part (\|\beta\|^2) to the penalty, which when used alone is ridge regression (known also as Tikhonov regularization).

The estimates from the elastic net method are defined by

: \hat{\beta} \equiv \underset{\beta}{\operatorname{argmin}} (\| y-X \beta \|^2 + \lambda_2 \|\beta\|^2 + \lambda_1 \|\beta\|_1) .

The quadratic penalty term makes the loss function strongly convex, and it therefore has a unique minimum. The elastic net method includes the LASSO and ridge regression: in other words, each of them is a special case where \lambda_1 = \lambda, \lambda_2 = 0 or \lambda_1 = 0, \lambda_2 = \lambda. Meanwhile, the naive version of elastic net method finds an estimator in a two-stage procedure : first for each fixed \lambda_2 it finds the ridge regression coefficients, and then does a LASSO type shrinkage. This kind of estimation incurs a double amount of shrinkage, which leads to increased bias and poor predictions. To improve the prediction performance, sometimes the coefficients of the naive version of elastic net is rescaled by multiplying the estimated coefficients by (1 + \lambda_2).

Examples of where the elastic net method has been applied are:

  • Support vector machine{{cite journal|last1=Wang|first1=Li|last2=Zhu|first2=Ji|last3=Zou|first3=Hui|date=2006|title=The doubly regularized support vector machine|journal=Statistica Sinica|volume=16|pages=589–615|url=http://www.stat.lsa.umich.edu/~jizhu/pubs/Wang-Sinica06.pdf}}
  • Metric learning{{cite book|last1=Liu|first1=Meizhu|last2=Vemuri|first2=Baba|chapter=A robust and efficient doubly regularized metric learning approach|title=Proceedings of the 12th European Conference on Computer Vision|series=Lecture Notes in Computer Science|year=2012|volume=Part IV|pages=646–659 |doi=10.1007/978-3-642-33765-9_46|pmid=24013160|pmc=3761969|isbn=978-3-642-33764-2}}
  • Portfolio optimization{{cite journal|last1=Shen|first1=Weiwei|last2=Wang|first2=Jun|last3=Ma|first3=Shiqian|s2cid=11017740|title=Doubly Regularized Portfolio with Risk Minimization|journal=Proceedings of the AAAI Conference on Artificial Intelligence|year=2014|volume=28 |pages=1286–1292 |doi=10.1609/aaai.v28i1.8906 |doi-access=free}}
  • Cancer prognosis{{Cite journal|last1=Milanez-Almeida|first1=Pedro|last2=Martins|first2=Andrew J.|last3=Germain|first3=Ronald N.|last4=Tsang|first4=John S.|date=2020-02-10|title=Cancer prognosis with shallow tumor RNA sequencing|url=https://www.nature.com/articles/s41591-019-0729-3|journal=Nature Medicine|volume=26|issue=2|language=en|pages=188–192|doi=10.1038/s41591-019-0729-3|pmid=32042193|s2cid=211074147|issn=1546-170X}}

Reduction to support vector machine

It was proven in 2014 that the elastic net can be reduced to the linear support vector machine.

{{cite conference |last1=Zhou |first1=Quan |last2=Chen |first2=Wenlin |last3=Song |first3=Shiji |last4=Gardner |first4=Jacob |last5=Weinberger |first5=Kilian |last6=Chen |first6=Yixin |title=A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing |url=https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9856 |conference=Association for the Advancement of Artificial Intelligence}}

A similar reduction was previously proven for the LASSO in 2014.{{cite book

|title=An Equivalence between the Lasso and Support Vector Machines

|last=Jaggi|first=Martin

|editor-last1=Suykens|editor-first1=Johan

|editor-last2=Signoretto|editor-first2=Marco

|editor-last3=Argyriou|editor-first3=Andreas

|year=2014

|publisher=Chapman and Hall/CRC

|arxiv=1303.1152}}

The authors showed that for every instance of the elastic net, an artificial binary classification problem can be constructed such that the hyper-plane solution of a linear support vector machine (SVM) is identical to the solution \beta (after re-scaling). The reduction immediately enables the use of highly optimized SVM solvers for elastic net problems. It also enables the use of GPU acceleration, which is often already used for large-scale SVM solvers.{{cite web|url=http://ttic.uchicago.edu/~cotter/projects/gtsvm/|title=GTSVM|work=uchicago.edu}} The reduction is a simple transformation of the original data and regularization constants

: X\in{\mathbb R}^{n\times p},y\in {\mathbb R}^n,\lambda_1\geq 0,\lambda_2\geq 0

into new artificial data instances and a regularization constant that specify a binary classification problem and the SVM regularization constant

: X_2\in{\mathbb R}^{2p\times n},y_2\in\{-1,1\}^{2p}, C\geq 0.

Here, y_2 consists of binary labels {-1,1}. When 2p>n it is typically faster to solve the linear SVM in the primal, whereas otherwise the dual formulation is faster.

Some authors have referred to the transformation as Support Vector Elastic Net (SVEN), and provided the following MATLAB pseudo-code:

function β=SVEN(X, y, t, λ2);

[n,p] = size(X);

X2 = [bsxfun(@minus, X, y./t); bsxfun(@plus, X, y./t)]’;

Y2 = [ones(p,1);-ones(p,1)];

if 2p > n then

w = SVMPrimal(X2, Y2, C = 1/(2*λ2));

α = C * max(1-Y2.*(X2*w), 0);

else

α = SVMDual(X2, Y2, C = 1/(2*λ2));

end if

β = t * (α(1:p) - α(p+1:2p)) / sum(α);

Software

  • "Glmnet: Lasso and elastic-net regularized generalized linear models" is a software which is implemented as an R source package and as a MATLAB toolbox.{{cite journal|last=Friedman|first=Jerome |author2=Trevor Hastie |author3=Rob Tibshirani|date=2010|title=Regularization Paths for Generalized Linear Models via Coordinate Descent|journal=Journal of Statistical Software|volume=33 |issue=1 |pages=1–22|doi=10.18637/jss.v033.i01 |doi-access=free|pmid=20808728 |pmc=2929880 }}{{cite web|url=https://cran.r-project.org/web/packages/glmnet/index.html|title=CRAN - Package glmnet|work=r-project.org|date=22 August 2023 }} This includes fast algorithms for estimation of generalized linear models with ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two penalties (the elastic net) using cyclical coordinate descent, computed along a regularization path.
  • JMP Pro 11 includes elastic net regularization, using the Generalized Regression personality with Fit Model.
  • "pensim: Simulation of high-dimensional data and parallelized repeated penalized regression" implements an alternate, parallelised "2D" tuning method of the ℓ parameters, a method claimed to result in improved prediction accuracy.{{Cite journal

| last1 = Waldron | first1 = L.

| last2 = Pintilie | first2 = M.

| last3 = Tsao | first3 = M. -S.

| last4 = Shepherd | first4 = F. A.

| last5 = Huttenhower | first5 = C.

| last6 = Jurisica | first6 = I.

| doi = 10.1093/bioinformatics/btr591

| title = Optimized application of penalized regression methods to diverse genomic data

| journal = Bioinformatics

| volume = 27

| issue = 24

| pages = 3399–3406

| year = 2011

| pmid = 22156367

| pmc =3232376

}}{{cite web|url=https://cran.r-project.org/web/packages/pensim/index.html|title=CRAN - Package pensim|work=r-project.org|date=9 December 2022 }}

  • scikit-learn includes linear regression and logistic regression with elastic net regularization.
  • SVEN, a Matlab implementation of Support Vector Elastic Net. This solver reduces the Elastic Net problem to an instance of SVM binary classification and uses a Matlab SVM solver to find the solution. Because SVM is easily parallelizable, the code can be faster than Glmnet on modern hardware.{{cite web|url=https://bitbucket.org/mlcircus/sven|title=mlcircus / SVEN — Bitbucket|work=bitbucket.org}}
  • [http://www.imm.dtu.dk/projects/spasm/ SpaSM], a Matlab implementation of sparse regression, classification and principal component analysis, including elastic net regularized regression.{{Cite journal|url = http://www.imm.dtu.dk/projects/spasm/references/spasm.pdf|title = SpaSM: A Matlab Toolbox for Sparse Statistical Modeling|last1 = Sjöstrand|first1 = Karl|date = 2 February 2016|journal = Journal of Statistical Software|last2 = Clemmensen|first2 = Line|last3 = Einarsson|first3 = Gudmundur|last4 = Larsen|first4 = Rasmus|last5 = Ersbøll|first5 = Bjarne}}
  • Apache Spark provides support for Elastic Net Regression in its [http://spark.apache.org/mllib/ MLlib] machine learning library. The method is available as a parameter of the more general LinearRegression class.{{Cite web|url=http://spark.apache.org/docs/1.6.1/api/python/pyspark.ml.html#pyspark.ml.regression.LinearRegression|title=pyspark.ml package — PySpark 1.6.1 documentation|website=spark.apache.org|access-date=2019-04-17}}
  • SAS (software) The SAS procedure Glmselect{{Cite web|url=http://support.sas.com/documentation/cdl/en/statug/66859/HTML/default/viewer.htm#statug_glmselect_examples06.htm|title=Proc Glmselect|access-date=2019-05-09}} and SAS Viya procedure Regselect {{Cite web|url=https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4287-2020.pdf |title=A Survey of Methods in Variable Selection and Penalized Regression}} support the use of elastic net regularization for model selection.

References

{{Reflist}}

Further reading

  • {{cite book |first1=Trevor |last1=Hastie |author-link=Trevor Hastie |first2=Robert |last2=Tibshirani |author-link2=Robert Tibshirani |first3=Jerome |last3=Friedman |author-link3=Jerome H. Friedman |title=The Elements of Statistical Learning : Data Mining, Inference, and Prediction |location=New York |publisher=Springer |edition=2nd |year=2017 |isbn=978-0-387-84857-0 |chapter=Shrinkage Methods |pages=61–79 |chapter-url=https://web.stanford.edu/~hastie/Papers/ESLII.pdf#page=80 }}