centering matrix#Properties

In mathematics and multivariate statistics, the centering matrixJohn I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, {{ISBN|0-412-99521-2}}, page 59. is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component of that vector.

Definition

The centering matrix of size n is defined as the n-by-n matrix

: $C_n = I_n - \tfrac{1}{n}J_n$

where $I_n\,$ is the identity matrix of size n and $J_n$ is an n-by-n matrix of all 1's.

For example

: $C_1 = \begin{bmatrix}
0 \end{bmatrix}$ ,

: $C_2= \left[ \begin{array}{rrr}
1 & 0 \\
0 & 1
\end{array} \right] - \frac{1}{2}\left[ \begin{array}{rrr}
1 & 1 \\
1 & 1
\end{array} \right] = \left[ \begin{array}{rrr}
\frac{1}{2} & -\frac{1}{2} \\
-\frac{1}{2} & \frac{1}{2}
\end{array} \right]$ ,

: $C_3 = \left[ \begin{array}{rrr}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{array} \right] - \frac{1}{3}\left[ \begin{array}{rrr}
1 & 1 & 1 \\
1 & 1 & 1 \\
1 & 1 & 1
\end{array} \right]
= \left[ \begin{array}{rrr}
\frac{2}{3} & -\frac{1}{3} & -\frac{1}{3} \\
-\frac{1}{3} & \frac{2}{3} & -\frac{1}{3} \\
-\frac{1}{3} & -\frac{1}{3} & \frac{2}{3}
\end{array} \right]$

Properties

Given a column-vector, $\mathbf{v}\,$ of size n, the centering property of $C_n\,$ can be expressed as

: $C_n\,\mathbf{v} = \mathbf{v} - (\tfrac{1}{n}J_{n,1}^\textrm{T}\mathbf{v})J_{n,1}$

where $J_{n,1}$ is a column vector of ones and $\tfrac{1}{n}J_{n,1}^\textrm{T}\mathbf{v}$ is the mean of the components of $\mathbf{v}\,$ .

$C_n\,$ is symmetric positive semi-definite.

$C_n\,$ is idempotent, so that $C_n^k=C_n$ , for $k=1,2,\ldots$ . Once the mean has been removed, it is zero and removing it again has no effect.

$C_n\,$ is singular. The effects of applying the transformation $C_n\,\mathbf{v}$ cannot be reversed.

$C_n\,$ has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.

$C_n\,$ has a nullspace of dimension 1, along the vector $J_{n,1}$ .

$C_n\,$ is an orthogonal projection matrix. That is, $C_n\mathbf{v}$ is a projection of $\mathbf{v}\,$ onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace $J_{n,1}$ . (This is the subspace of all n-vectors whose components sum to zero.)

The trace of $C_n$ is $n(n-1)/n = n-1$ .

Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an m-by-n matrix $X$ .

The left multiplication by $C_m$ subtracts a corresponding mean value from each of the n columns, so that each column of the product $C_m\,X$ has a zero mean. Similarly, the multiplication by $C_n$ on the right subtracts a corresponding mean value from each of the m rows, and each row of the product $X\,C_n$ has a zero mean.

The multiplication on both sides creates a doubly centred matrix $C_m\,X\,C_n$ , whose row and column means are equal to zero.

The centering matrix provides in particular a succinct way to express the scatter matrix, $S=(X-\mu J_{n,1}^{\mathrm{T}})(X-\mu J_{n,1}^{\mathrm{T}})^{\mathrm{T}}$ of a data sample $X\,$ , where $\mu=\tfrac{1}{n}X J_{n,1}$ is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as

: $S=X\,C_n(X\,C_n)^{\mathrm{T}}=X\,C_n\,C_n\,X\,^{\mathrm{T}}=X\,C_n\,X\,^{\mathrm{T}}.$

$C_n$ is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are $k=n$ , and $p_1=p_2=\cdots=p_n=\frac{1}{n}$ .

References

Category:Data processing

Category:Matrices (mathematics)