User:Michael Hardy/Envelope model

Envelope models, or envelopes, include a class of statistical models that lie at the intersection of multivariate analysis and sufficient dimension reduction. The first envelope model was introduced by Cook, Li, and Chiaromonte in 2010 under the framework of multivariate linear regression. It applies dimension reduction techniques to remove the immaterial information in the data and achieve efficient estimation of the regression coefficients. In cases where there is a lot of immaterial information, the envelope model can obtain substantial efficiency gains in the estimation of the coefficients compared to the ordinary least squares method (the latter of which does not account for the covariance structure of the responses). The basic idea of envelopes is to build a link that connects the covariance matrix and the regression coefficients, which results in a parsimonious parameterization of the multivariate linear regression model. Envelopes have been extended to many fields in multivariate analysis such as discriminant analysis, partial least squares, Bayesian analysis, variable selection, reduced rank regression, and generalized linear models.

Introduction

=Multivariate linear regression model=

The envelope model was originally introduced under the framework of multivariate linear regression:

:Y=\alpha+\beta X+\varepsilon,

where Y\in \mathbb{R}^{r} is the random response vector and X\in \mathbb{R}^p is the non-stochastic predictor vector centered to have mean zero. Under this model, \alpha\in \mathbb{R}^{r} and \beta\in \mathbb{R}^{r\times p} are the unknown intercept and coefficient parameter respectively. The error vector \varepsilon\in \mathbb{R}^{r} is assumed to have mean \mathbf{0} and covariance \Sigma\in \mathbb{R}^{r\times r}, where \Sigma is a positive-definite matrix.

=Assumptions=

Let \mathcal{S}\subseteq \mathbb{R}^{r} be a subspace that divides Y into two disjoint parts: the material part P_{\mathcal{S}}Y and the immaterial part Q_{\mathcal{S}}Y. P_{\mathcal{S}} is the projection matrix onto \mathcal{S} and Q_{\mathcal{S}}=I_r-P_{\mathcal{S}} , where I_r is the r\times r identity matrix, and Q_{\mathcal{S}} is the projection onto the orthogonal complement of \mathcal{S} (denoted by \mathcal{S}^{\bot} ). We assume that the following two conditions hold:

Q_{\mathcal{S}}Y\mid X \sim Q_{\mathcal{S}}Y,

Q_{\mathcal{S}} Y ~\bot~ P_{\mathcal{S}}Y\mid X,

where "A ~ B" means A and B are identically distributed, "A | B" denotes the conditional distribution of A given B and A \mathbin{\bot} B denotes that A and B are statistically independent.

Assumption 1 states that the conditional distribution of the immaterial part given X is the same as the distribution of the immaterial part. Assumption 2 states that given X, the material part and immaterial part are independent of each other.

Under the linear model, the first assumption indicates that the immaterial part does not give any information about \beta, while the second assumption means that the covariance matrix of Y\mid X (i.e. the covariance matrix of the error) can be decomposed into two parts: the variability according to the material part and the variability according to immaterial part. Since Y can be decomposed as Y = Q_{\mathcal{S}}Y+P_{\mathcal{S}}Y, we see that P_{\mathcal{S}}Y carries all the material information and maybe some immaterial information about \beta, while Q_{\mathcal{S}}Y carries just immaterial information about \beta. To ensure that we exclude immaterial information from our estimate of \beta, we aim to the find the smallest \mathcal{S} that carries only material information about \beta.

Definition of envelope

In 2010 Cook et al.{{cite journal | title=Envelope Models for parsimonious and Efficient Multivariate Linear Regression| last=Cook | first=R.D. |author2=Li, B. |author3=Chiaromonte,F.| journal=Statistica Sinica | volume=20 | year=2010 | pages=927–960 | citeseerx=10.1.1.362.5659| issue=3 | jstor=2240725}}

showed that assumptions 1 and 2 are equivalent to

{{ordered list|type=lower-roman

|\mathcal{B} \subseteq \mathcal{S}, where \mathcal{B} = \operatorname{span}(\beta),

|\Sigma=\Sigma_{\mathcal{S}}+\Sigma_{\mathcal{S}^\bot}=P_S\Sigma P_{\mathcal{S}} + Q_{\mathcal{S}} \Sigma Q_{\mathcal{S}}.

}}

In (i), the span denotes the linear subpace spanned by the columns of \beta. When (ii) holds, \mathcal{S} is called a reducing subspace of \Sigma.

The \Sigma-envelope of \mathcal{B}, denoted by \mathcal{E}_\Sigma(\mathcal{B}), is defined as the intersection of all reducing subspaces of \Sigma that contain \mathcal{B}.

We let u denote the dimension of envelope subspace, where u \leq r. From (ii), it is clear that \mathcal{E}_\Sigma(\mathcal{B}) decomposes \Sigma into the variation of the material and immaterial parts of Y. Namely, we have \Sigma_{\mathcal{S}}=\operatorname{var}(P_{\mathcal{S}}Y\mid X) and \Sigma_{\mathcal{S}^\bot}=\operatorname{var}(Q_{\mathcal{S}}Y\mid X).

Properties of envelope subspace

Let \mathbb{S}^{r\times r} denote the class of all symmetric r\times r matrices. The following propositions establish some important properties of the envelope subspace \mathcal{E}_{\Sigma}(\mathcal{S}), namely its relationship with the eigenstructure of \Sigma and its behavior under linear transformations of \mathcal{S}.

  • Proposition: Let \Sigma\in \mathbb{S}^{r\times r} and let P_i, i=1,\ldots,q, be the projection onto the eigenspaces of \Sigma, and let \mathcal{S} be a subspace of \operatorname{span}(\Sigma). Then \mathcal{E}_{\Sigma}(\mathcal{S})=\sum_{i=1}^{q}P_i \mathcal{S} is the intersection of all reducing subspaces of \Sigma that contain \mathcal{S}.
  • Proposition: Let K\in \mathbb{S}^{r\times r} be the matrix that commute with \Sigma\in \mathbb{S}^{r\times r} and let \mathcal{S} be a subspace of span(\Sigma). Then K\mathcal{S}\subseteq \operatorname{span}(\Sigma) and the following equivariance holds: \mathcal{E}_\Sigma(K\mathcal{S})=K\mathcal{E}_\Sigma (\mathcal{S}). If, in addition, K\mathcal{S}\subseteq \operatorname{span}(K) and \mathcal{E}_\Sigma(\mathcal{S}) reduces K, then the following invariance holds: \mathcal{E}_{\Sigma}(K\mathcal{S})=\mathcal{E}_{\Sigma}(\mathcal{S}).
  • Proposition: Under the standard model, let \Sigma_{Y} be the covariance matrix of Y. Then \Sigma^{-1}\mathcal{B}=\Sigma_{Y}^{-1}\mathcal{B}, and: \mathcal{E}_\Sigma(\mathcal{B})=\mathcal{E}_{\Sigma_Y}(\mathcal{B})=\mathcal{E}_\Sigma(\Sigma^{-1}\mathcal{B})=\mathcal{E}_{\Sigma_Y}(\Sigma_Y^{-1}\mathcal{B})=\mathcal{E}_{\Sigma_Y}(\Sigma^{-1}\mathcal{B})=\mathcal{E}_{\Sigma}(\Sigma_Y^{-1}\mathcal{B}).

Coordinate form of the envelope model

Under the envelope parameterization, the multivariate linear regression model (which we call the standard model) can be written as

:

\begin{align}

Y & =\alpha+\Gamma\eta X+\varepsilon, \\[6pt]

\Sigma & =\Gamma\Omega\Gamma^T+\Gamma_0\Omega_0 \Gamma^T_0,

\end{align}

where \beta=\Gamma\eta, \Gamma\in \mathbb{R}^{r\times u} is an orthonormal basis for the envelope subspace \mathcal{E}_{\Sigma}(\mathcal{B}), and \Gamma_{0}\in \mathbb{R}^{r\times (r-u)} is an orthonormal basis for the orthogonal complement of the envelope subspace \mathcal{E}_{\Sigma}(\mathcal{B})^{\bot} and the completion of \Gamma (that is, (\Gamma,\Gamma_{0})\in\mathbb{R}^{r\times r} is an orthogonal matrix).

\eta \in \mathbb{R}^{u\times p} carries the coordinates of \beta with respect to \Gamma and \Omega\in \mathbb{R}^{u\times u} and \Omega_{0}\in \mathbb{R}^{(r-u)\times (r-u)} are positive definite matrices which carry the coordinates of \Sigma with respect to \Gamma and \Gamma_{0} respectively.

Under the standard model, the total number of parameters is

:N(r)=r+rp+\frac{r(r+1)}{2},

where r is the number of parameters in \alpha, rp is the number of parameters in \beta, and \frac{r(r+1)}{2} is the number of parameters in \Sigma which is the covariance of \varepsilon.

Under the envelope model, the total number of parameters is

:N(u)=r+up+\frac{r(r+1)}{2}

where r is the number of parameters in \alpha, u(r-u) is the number of parameters in \operatorname{span}(\Gamma), up is the number of parameters in \eta, \frac{u(u+1)}{2} is the number of parameters in \Omega and \frac{(r-u)(r-u+1)}{2} is the number of parameters in \Omega_{0}. When we add them together, we obtain N(u).

Therefore, N(r)- N(u) = p(r-u). So if u < r, the envelope model reduces the number of parameters that need to be estimated. If u=r, the envelope model reduces to the standard model. If u=0, it means that \beta=0.

Estimation

The envelope model parameterization does not depend on normality. However, if we assume that the errors are normally distributed, i.e. \varepsilon \sim N(\mathbf{0},\Sigma), then we can use maximum likelihood estimation (MLE) to estimate the unknown parameters: \alpha, \mathcal{E}_{\Sigma}(\mathcal{B}), \eta,\Omega and \Omega_{0}.

In the envelope model, \Gamma is not identifiable. However, we can uniquely estimate \mathcal{E}_\Sigma(\mathcal{B}). In order to estimate \mathcal{E}_\Sigma(\mathcal{B}), we need to solve the following optimization problem:

:\hat{\mathcal{E}}_\Sigma (\mathcal{B}) = \displaystyle \underset{{\operatorname{span} (\Gamma) \in \mathbb{G}^{r \times u}}}{\operatorname{argmin}} \log | \Gamma^T S_{Y\mid X} \Gamma | + \log | \Gamma^T S_{Y}^{-1} \Gamma |,

where S_{Y\mid X} is the sample covariance matrix of the residuals from ordinary least squares regression and S_Y is the sample covariance of Y, \mathbb{G}^{r \times u} denotes an r\times u Grassmann manifold u in \mathbb{R}^r (that is, the collection of all u-dimensional subspaces of \mathbb{R}^r).

If we do not have normality we can still use the above objective function to estimate the envelope model. If the error has finite fourth moments then we can still get a root-n consistent estimator of the parameters. This objective function can be solved by some standard manifold optimization algorithm such as [https://arxiv.org/pdf/1308.5200.pdf Manopt ]. Recently, a very fast algorithm for estimating \hat{\mathcal{E}}_{\Sigma}(\mathcal{B}), which does not require optimization over a Grassmannian, has also been developed.{{cite journal | title= A Note on Fast Envelope Estimation | last=Cook | first=R.D. |author2=Forzani, L.|author3=Su, Z.| journal= Journal of Multivariate Analysis | volume=150 | year=2016 | pages=42-54}}

Let \hat{\mathcal{E}}_\Sigma(\mathcal{B}) denote the minimizer of this objective function and \hat{\Gamma} be the orthonormal basis of \hat{\mathcal{E}}_\Sigma(\mathcal{B}). Then \hat{\eta}=\hat{\Gamma}^T\hat{\beta}_{\text{OLS}}, where \hat{\beta}_{\text{OLS}} denotes the ordinary least squares (OLS) estimator \mathbb{Y}_C^T \mathbb{X}(\mathbb{X}^{T}\mathbb{X})^{-1}, \mathbb{Y}_C \in \mathbb{R}^{n \times r} is the centered data matrix of the response (that is, every element in the i^\text{th} row of \mathbb{Y}_C is the i^\text{th} data point in the sample of \mathbb{Y} minus the sample mean \bar{\mathbb{Y}}) and \mathbb{X}\in \mathbb{R}^{n \times p} is the centered data matrix of the predictor. The envelope estimator of \beta is then \hat{\beta}_{\text{env}}=\hat{\Gamma}\hat{\eta}.

We see that the envelope estimator \hat{\beta}_{\text{env}} is the projection of the OLS estimator (i.e. the estimator under the standard model) onto the estimated envelope subspace which removes the immaterial information. Since the immaterial information is identified and removed in subsequent analysis, the envelope estimator \hat{\beta}_{\text{env}} is potentially more efficient than \hat{\beta}_{\text{OLS}}. If u =r, then the envelope estimator reduces to the OLS estimator, i.e. \hat{\beta}_{\text{env}} = \hat{\beta}_{\text{OLS}}. For the full derivation, refer to Cook et al. (2010).

Efficiency gains from envelope estimation

In Cook et al. it is shown that under normality assumptions, the asymptotic variance of \hat{\beta}_{\text{env}} satisfies

:\operatorname{avar}(\operatorname{vec}(\hat{\beta}_{\text{env}}))\leq \operatorname{avar}(\operatorname{vec}(\hat{\beta}_{\text{OLS}}))

where \operatorname{vec}(A) is the vectorization of matrix A and \operatorname{avar}(\cdot) is the asymptotic variance; that is, if \sqrt{n}(T-\theta) \mathrel{\xrightarrow{\mathcal{D}}} N(0,A), then \operatorname{avar}(T)=A. This means, the envelope estimator performs at least as good as the OLS estimator in terms of efficiency.

In particular, if \|\Omega\|\ll \|\Omega_0\|, where \|\cdot\| represents the matrix spectral norm, then the envelope estimator is expected to provide substantial efficiency gains. That is, if the immaterial part is more variant than material part, by identifying and removing the immaterial information, we expect to realize substantial efficieny gain (i.e. much smaller standard errors).

Choosing envelope dimension size

The dimension u of \mathcal{E}_{\Sigma}(\mathcal{B}) can be chosen by different methods such as likelihood ratio testing (LRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), or cross-validation.

Examples

=Comparing two population means=

To illustrate the efficiency gains by envelope models, consider the problem of estimating two means of two bivariate normal populations, Y_1 \sim N(\mu_1,\Sigma) and Y_2 \sim N(\mu_2,\Sigma). This can be studied in the envelope framework by considering \mathbb{Y}=(\mathbb{Y_1},\mathbb{Y_2})^{T} as a bivariate response vector and X as an indicator variable taking value X=0 in the first population and X=1 in the second population. We can parametrize in such a way that \alpha=\mu_1 is the mean of the first population and \beta=\mu_2-\mu_1 is the mean difference. Therefore, the multivariate linear model is

:Y=\mu_1+(\mu_2-\mu_1)X+\varepsilon=\alpha+\beta X+\varepsilon.

The standard estimator of \beta is the difference in the sample means for the two populations \hat{\beta}_{\text{OLS}}=\bar{Y}_{2}-\bar{Y}_{1} and the corresponding estimator of \Sigma is the intra-sample covariance matrix. In the standard model, the estimator \hat{\beta}_{\text{OLS}} does not make use of the dependence between the responses and is equivalent to performing two univariate two univariate regressions of Y_j on X,j=1,2.

File:Envelope_Estimation2.jpg

In contrast, by considering envelope model, we obtain a different estimator for \beta that accounts for the dependence structure of Y_{1} and Y_{2} and that removes all the immaterial information to the group differences. The envelope estimator, \hat{\beta}_{\text{env}} has smaller standard errors than \hat{\beta}_{\text{OLS}}, indicating the estimation for \beta is greatly improved. The graphical illustration of those analysis is shown in the figure. For further analysis, refer to

Cook, R.D. [http://users.stat.umn.edu/~rdcook/Stat8931F15/EnvCh%201.pdf An Introduction to envelope Models and Methods]

In the figure, without loss of generality, it is assumed \mu_{1}+\mu_2=0. Therefore, \mathcal{B}=\operatorname{span}(\mu_1-\mu_2)=\operatorname{span}(\beta). The left panel illustrats the standard analysis. It directly projects the data Y onto the Y_2 axis, following the dashed line marked A, and then proceeds with inference based on the resulting univariate sample. The curves along the horizontal axis in the left panel stand for the projected distributions from the two populations. A standard analysis might involve constructing a two-sample t-test on samples drawn from these populations. There is considerable overlap between the two projected distributions, so it may take a large sample size to infer that \beta_2\neq 0 (or \mu_1 \neq \mu_2) in a standard analysis, even though it is clear that the bivariate populations have different means. This illustration is based on \beta_{2}\neq 0 to facilitate visualization; the same conclusions could be reached using a different linear combination of the elements of \beta.

The maximum likelihood envelope estimator of \beta, \hat{\beta}_{\text{env}}, however, can be formed by first projecting the data onto \mathcal{E}_{\Sigma}(\mathcal{B}) to remove the immaterial information, and then projecting onto horizontal axis, as shown by the paths marked "B" in the right panel. The two curves at the horizontal axis are the projection distribution from envelope analysis. The two populations have been arranged so that they have equal distribution when projected onto \mathcal{E}_\Sigma^\bot(\mathcal{B}); that is, Q_\varepsilon Y\mid (X=0)\sim Q_\varepsilon Y\mid (X=1), but if the distribution is projected onto \mathcal{E}_\Sigma(\mathcal{B}), they have different distributions. The dashed line direction is the immaterial part that is in orthogonal complement of envelope subpace, and the solid line direction is material part of population that is the envelope subspace. From this figure, it is obvious that the two population means \mu_1 and \mu_2 differ significantly.

=Wheat protein data example=

Another example is given in and deals with wheat protein data. This dataset contains measurements and the logarithms of near infrared reflectance at six wavelengths across the range 1680-2310 nm, measured on each of n=50 samples ground wheat. We take r=2 wavelengths as responses Y=(Y_1,Y_2)^T and convert the continuous measure of protein into a categorical predictor X indicating low and high protein (with sample sizes 24 and 26 respectively). The mean difference \mu_1-\mu_2 corresponds to the parameter vector \beta in the standard model with X representing a binary indicator: X=0 for high protein, and X=1 for low protein wheat.

For this dataset, \hat{\beta}_{\text{OLS}}^{T}=(7.5,-2.1), with standard errors 8.6 and 9.5 respectively. There is no indication from these marginal results that Y depends on X, since the likelihood ratio test statistic has the value 27.5 on 2 degrees of freedom. Under a standard analysis, the simultaneous occurrence of relatively small Z-scores and a relatively large likelihood ratio statistic indicates that an envelope analysis offers advantages, although these conditions are certainly not necessary.

In particular, the envelope estimator is \hat{\beta}_\text{env}^T=(5.1,-4.7), with standard errors of .51 and .46. We see that the standard errors for the envelope estimates of the coefficients are significantly lower than the standard errors for the OLS estimates, indicating substantial efficiency gains in estimation. To achieve the magnitude of this drop in standard errors under the standard model, we would need a sample size of n \approx 20,000 to reduce the standard error from 9.4 to .46.

Extensions

The envelope model has since been extended well beyond multivariate linear regression to many other areas in statistics. In 2011, Su and Cook introduced the partial envelope model{{cite journal | title=Partial Envelopes for Efficient Estimation in Multivariate Linear Regression| last=Su | first=Z. |author2=Cook, R.D.| journal=Biometrika | volume=98 | year=2011 | pages=133–146 | issue=1}} which allows us to apply the envelope method to a subset of predictors of interest.

In 2013, Cook et al. developed the predictor envelope{{cite journal | title= Envelopes and Partial Least Squares Regression

| last=Cook | first=R.D. |author2=Helland, I.S.|author3=Su, Z.| journal= Journal of the Royal Statistical Society: Series B (Statistical Methodology) | volume=75 | year=2013 | pages=851–877 | issue=5}} which applies the envelope method to the predictor space, X\in\mathbb{R}^p, where X is stochastic with mean \mu_{X} and covariance matrix \Sigma_{X}. In 2015, Cook et al. proposed the reduced-rank envelope model,{{cite journal | title= Envelopes and reduced-rank regression

| last=Cook | first=R.D. |author2=Forzani, L.|author3=Zhang, X.|journal= Biometrika | volume=102 | year=2015 | pages=439–456 | issue=2}} which combines the strength of reduced-rank regression and envelope model. Cook and Zhang{{cite journal | title= Foundations for envelope models and methods

| last=Cook | first=R.D. |author2=Zhang, X.| journal=Journal of the American Statistical Association | volume=110 | year=2015 | pages=599–611 | issue=510}} built the envelope model to generalized linear models such as weighted least squares, Cox regression, discriminant analysis, logistic regression, and Poisson regression.

The envelope model can also be applied to other contexts, such as variable selection, Bayesian inference, and high-dimensional data analysis. In 2016, Su et al. introduced the sparse envelope model{{cite journal | title= Sparse Envelope Model: Efficient Estimation and Response Variable Selection in Multivariate Linear Regression

| last=Su | first=Z. |author2=Zhu, G.|author3=Chen, X.|author4=Yang, X.| journal= Biometrika | volume=103 | year=2016 | pages=579–593 | issue=3}} which performs variable selection of the response variables, allowing us to identify the response variables for which the regression coefficients are zero. In 2017, Khare et al. developed the Bayesian envelope model{{cite journal | title= A Bayesian Approach for Envelope Models

| last=Khare | first=K. |author2=Pal, S.|author3=Su,Z.| journal= The Annals of Statistics | volume=45 | year=2017 | pages=196–222 | issue=1}} which allows us to incorporate prior information of the parameters into the analysis. The Bayesian envelope model is also applicable where the sample size is smaller than the number of responses.

Implementation of envelopes

The Matlab toolbox [https://github.com/emeryyi/envlp envlp ] and R package [http://www.stat.ufl.edu/~zhihuasu/styled-18/index.html Renvlp ] implement a variety of envelope estimators under the framework of multivariate linear regression, including the envelope model, the partial envelope model, and the predictor envelope. The capabilities of this toolbox include estimation of the model parameters, as well as performing standard multivariate inference in the context of envelope models; for example, hypothesis tests, and bootstrap. Examples and datasets are contained in the toolbox to illustrate the use of each model.{{cite journal | title=envlp: A MATLAB Toolbox for Computing Envelope Estimators in Multivariate Linear Regression

| last=Cook | first=R.D. |author2=Su, Z.|author3=Yang, Y.| journal=Journal of Statistical Software | volume=62 | year=2015 | pages=1–20 | issue=1}}

See also

References

{{Reflist}}

{{CC-notice|cc=by3|url=https://www.jstatsoft.org/article/view/v062i08|author=R. Dennis Cook, Zhihua Su, Yi Yang}}

{{DEFAULTSORT:Linear Regression}}

tr:Regresyon analizi