variance inflation factor

{{Short description|Statistical measure in mathematical model}}

In statistics, the variance inflation factor (VIF) is the ratio (quotient) of the variance of a parameter estimate when fitting a full model that includes other parameters to the variance of the parameter estimate if the model is fit with only the parameter on its own.{{cite book |last1=James |first1=Gareth |last2=Witten |first2=Daniela |last3=Hastie |first3=Trevor |last4=Tibshirani |first4=Robert |title=An Introduction to Statistical Learning |edition=8th |publisher=Springer Science+Business Media New York |year=2017 |isbn=978-1-4614-7138-7 }} The VIF provides an index that measures how much the variance (the square of the estimate's standard deviation) of an estimated regression coefficient is increased because of collinearity.

Cuthbert Daniel claims to have invented the concept behind the variance inflation factor, but did not come up with the name.{{cite tech report |last=Snee |first=Ron |title=Origins of the Variance Inflation Factor as Recalled by Cuthbert Daniel |location= |publisher=Snee Associates |year=1981 |url=https://www.researchgate.net/publication/291808767_Who_Invented_the_Variance_Inflation_Factor}}

Definition

Consider the following linear model with k independent variables:

: Y = β0 + β1 X1 + β2 X 2 + ... + βk Xk + ε.

The standard error of the estimate of βj is the square root of the j + 1 diagonal element of s2(XX)−1, where s is the root mean squared error (RMSE) (note that RMSE2 is a consistent estimator of the true variance of the error term, \sigma^2 ); X is the regression design matrix — a matrix such that Xi, j+1 is the value of the jth independent variable for the ith case or observation, and such that Xi,1, the predictor vector associated with the intercept term, equals 1 for all i. It turns out that the square of this standard error, the estimated variance of the estimate of βj, can be equivalently expressed as:{{Cite book |title=Applied regression analysis : a research tool |url=https://archive.org/details/appliedregressio00rawl_492 |url-access=limited |last1=Rawlings|first1=John O.|date=1998|publisher=Springer|last2=Pantula |first2=Sastry G. |last3=Dickey |first3=David A. |isbn=0387227539|edition=Second |location=New York|pages=[https://archive.org/details/appliedregressio00rawl_492/page/n384 372], 373|oclc=54851769}}{{Cite book|url=https://cran.r-project.org/doc/contrib/Faraway-PRA.pdf|title=Practical Regression and Anova using R|last=Faraway|first=Julian J.|year=2002|pages=117, 118}}

:

\widehat{\operatorname{var}}(\hat{\beta}_j) = \frac{s^2}{(n-1)\widehat{\operatorname{var}}(X_j)}\cdot \frac{1}{1-R_j^2},

where Rj2 is the multiple R2 for the regression of Xj on the other covariates (a regression that does not involve the response variable Y) and \hat{\beta}_j are the coefficient estimates, id est, the estimates of {\beta}_j. This identity separates the influences of several distinct factors on the variance of the coefficient estimate:

  • s2: greater scatter in the data around the regression surface leads to proportionately more variance in the coefficient estimates
  • n: greater sample size results in proportionately less variance in the coefficient estimates
  • \widehat\operatorname{var}(X_j): greater variability in a particular covariate leads to proportionately less variance in the corresponding coefficient estimate

The remaining term, 1 / (1 − Rj2) is the VIF. It reflects all other factors that influence the uncertainty in the coefficient estimates. The VIF equals 1 when the vector Xj is orthogonal to each column of the design matrix for the regression of Xj on the other covariates. By contrast, the VIF is greater than 1 when the vector Xj is not orthogonal to all columns of the design matrix for the regression of Xj on the other covariates. Finally, note that the VIF is invariant to the scaling of the variables (that is, we could scale each variable Xj by a constant cj without changing the VIF).

:

\widehat{\operatorname{var}}(\hat{\beta}_j) = s^2 [(X^T X)^{-1}]_{jj}

Now let r= X^T X, and without losing generality, we reorder the columns of X to set the first column to be X_j

:

r^{-1} = \begin{bmatrix} r_{j,j} & r_{j,-j} \\ r_{-j,j} & r_{-j,-j}\end{bmatrix}^{-1}

: r_{j,j} = X_j^T X_j, r_{j,-j} = X_j^T X_{-j}, r_{-j,j} = X_{-j}^T X_j, r_{-j,-j} = X_{-j}^T X_{-j}.

By using Schur complement, the element in the first row and first column in r^{-1} is,

:r^{-1}_{1,1} = [r_{j,j} - r_{j,-j} r_{-j,-j}^{-1} r_{-j,j} ]^{-1}

Then we have,

:

\begin{align}

& \widehat{\operatorname{var}}(\hat{\beta}_j) = s^2 [(X^T X)^{-1}]_{jj} = s^2 r^{-1}_{1,1} \\

= {} & s^2 [X_j^T X_j - X_j^T X_{-j} (X_{-j}^T X_{-j})^{-1} X_{-j}^T X_j ]^{-1} \\

= {} & s^2 [X_j^T X_j - X_j^T X_{-j} (X_{-j}^T X_{-j})^{-1} (X_{-j}^T X_{-j}) (X_{-j}^T X_{-j})^{-1} X_{-j}^T X_j ]^{-1} \\

= {} & s^2 [X_j^T X_j - \hat{\beta}_{*j}^T(X_{-j}^T X_{-j}) \hat{\beta}_{*j} ]^{-1} \\

= {} & s^2 \frac{1}{\mathrm{RSS}_j} \\

= {} & \frac{s^2}{(n-1)\widehat\operatorname{var}(X_j)}\cdot \frac{1}{1-R_j^2}

\end{align}

Here \hat{\beta}_{*j} is the coefficient of regression of dependent variable X_j over covariate X_{-j} . \mathrm{RSS}_j is the corresponding residual sum of squares.

Calculation and analysis

We can calculate k different VIFs (one for each Xi) in three steps:

= Step one =

First we run an ordinary least square regression that has Xi as a function of all the other explanatory variables in the first equation.
If i = 1, for example, equation would be

:X_1=\alpha_0 + \alpha_2 X_2 + \alpha_3 X_3 + \cdots + \alpha_k X_k +\varepsilon

where \alpha_0 is a constant and \varepsilon is the error term.

= Step two =

Then, calculate the VIF factor for \hat\alpha_i with the following formula :

: \mathrm{VIF}_i = \frac{1}{1-R^2_i}

where R2i is the coefficient of determination of the regression equation in step one, with X_i on the left hand side, and all other predictor variables (all the other X variables) on the right hand side.

= Step three =

Analyze the magnitude of multicollinearity by considering the size of the \operatorname{VIF}(\hat \alpha_i). A rule of thumb is that if \operatorname{VIF}(\hat \alpha_i) > 10 then multicollinearity is high{{cite book |last1=Kutner |first1=M. H. |last2=Nachtsheim |first2=C. J. |last3=Neter |first3=J. |title=Applied Linear Regression Models |edition=4th |publisher=McGraw-Hill Irwin |year=2004 }} (a cutoff of 5 is also commonly used{{cite book | last=Sheather | first=Simon | title=A modern approach to regression with R | publisher=Springer | publication-place=New York, NY | year=2009 | isbn=978-0-387-09607-0 }}). However, there is no value of VIF greater than 1 in which the variance of the slopes of predictors isn't inflated. As a result, including two or more variables in a multiple regression that are not orthogonal (i.e. have correlation = 0), will alter each other's slope, SE of the slope, and P-value, because there is shared variance between the predictors that can't be uniquely attributed to any one of them.{{cite book |last1=James |first1=Gareth |last2=Witten |first2=Daniela |last3=Hastie |first3=Trevor |last4=Tibshirani |first4=Robert |title=An introduction to statistical learning: with applications in R |date=2021 |publisher=Springer |location=New York, NY |isbn=978-1-0716-1418-1 |page=116 |edition=Second |url=https://link.springer.com/book/10.1007/978-1-0716-1418-1 |access-date=1 November 2024}}

Some software instead calculates the tolerance which is just the reciprocal of the VIF. The choice of which to use is a matter of personal preference.

Interpretation

The square root of the variance inflation factor indicates how much larger the standard error increases compared to if that variable had 0 correlation to other predictor variables in the model.

Example

If the variance inflation factor of a predictor variable were 5.27 (√5.27 = 2.3), this means that the standard error for the coefficient of that predictor variable is 2.3 times larger than if that predictor variable had 0 correlation with the other predictor variables.

Implementation

  • vif function in the [https://cran.r-project.org/package=car car] R package
  • ols_vif_tol function in the [https://cran.r-project.org/package=olsrr olsrr] R package
  • PROC REG in SAS [https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_reg_sect007.htm System]
  • variance_inflation_factor function in [http://www.statsmodels.org statsmodels] Python package
  • estat vif in [https://www.stata.com/help.cgi?regress_postestimation Stata]
  • [https://grass.osgeo.org/grass78/manuals/addons/r.vif.html r.vif] addon for GRASS GIS
  • vif (non categorical) and gvif (categorical data) functions in [https://juliastats.org/StatsModels.jl/stable/ StatsModels] Julia programing language

References

{{Reflist}}

Further reading

  • {{cite book |last=Allison |first=P. D. |title=Multiple Regression: A Primer |page=142 |publisher=Pine Forge Press |location=Thousand Oaks, CA |year=1999 }}
  • {{cite book |last1=Hair |first1=J. F. |last2=Anderson |first2=R. |last3=Tatham |first3=R. L. |last4=Black |first4=W. C. |title=Multivariate Data Analysis |publisher=Prentice Hall |place= Upper Saddle River, NJ |year=2006 }}
  • {{cite book |last1=Kutner |first1=M. H. |last2= Nachtsheim |first2=C. J. |last3= Neter |first3=J. |title=Applied Linear Regression Models |edition=4th |publisher=McGraw-Hill Irwin |year=2004 }}
  • {{cite book |last1=Longnecker |first1=M. T. |last2=Ott |first2=R. L. |title=A First Course in Statistical Methods |page=615 |publisher=Thomson Brooks/Cole |year=2004 }}
  • {{cite journal |last=Marquardt |first=D. W. |year=1970 |title=Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation |journal= Technometrics |volume=12 |issue=3 |pages=591–612 [pp. 605–7] |doi=10.1080/00401706.1970.10488699 }}
  • {{cite book |last=Studenmund |first=A. H. |title=Using Econometrics: A Practical Guide |edition=5th |pages=258–259 |publisher=Pearson International |year=2006 }}
  • {{cite journal |last1=Zuur |first1=A.F. |last2=Ieno| first2=E.N.|last3=Elphick|first3=C.S|year=2010 |title=A protocol for data exploration to avoid common statistical problems |journal=Methods in Ecology and Evolution |volume=1 |pages=3–14 |doi=10.1111/j.2041-210X.2009.00001.x |s2cid=18814132 |doi-access=free }}

See also