.

In mathematics and multivariate statistics, the centering matrix[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.

Definition

The centering matrix of size n is defined as the n-by-n matrix

\( C_n = I_n - \tfrac{1}{n}\mathbb{O} \)

where I_n\, is the identity matrix of size n and \mathbb{O} is an n-by-n matrix of all 1's. This can also be written as:

\( C_n = I_n - \tfrac{1}{n}\mathbf{1}\mathbf{1}^\top \)

where \( \mathbf{1} \) is the column-vector of n ones and where \( \top \) denotes matrix transpose.

For example

\( C_1 = \begin{bmatrix} 0 \end{bmatrix} , \)

\( C_2= \left[ \begin{array}{rrr} 1 & 0 \\ \\ 0 & 1 \end{array} \right] - \frac{1}{2}\left[ \begin{array}{rrr} 1 & 1 \\ \\ 1 & 1 \end{array} \right] = \left[ \begin{array}{rrr} \frac{1}{2} & -\frac{1}{2} \\ \\ -\frac{1}{2} & \frac{1}{2} \end{array} \right] , \)

\( C_3 = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ \\ 0 & 1 & 0 \\ \\ 0 & 0 & 1 \end{array} \right] - \frac{1}{3}\left[ \begin{array}{rrr} 1 & 1 & 1 \\ \\ 1 & 1 & 1 \\ \\ 1 & 1 & 1 \end{array} \right] = \left[ \begin{array}{rrr} \frac{2}{3} & -\frac{1}{3} & -\frac{1}{3} \\ \\ -\frac{1}{3} & \frac{2}{3} & -\frac{1}{3} \\ \\ -\frac{1}{3} & -\frac{1}{3} & \frac{2}{3} \end{array} \right] \)

Properties

Given a column-vector, \( \mathbf{v}\ \) , of size n, the centering property of \( C_n\ \) , can be expressed as

\( C_n\,\mathbf{v} = \mathbf{v}-(\tfrac{1}{n}\mathbf{1}'\mathbf{v})\mathbf{1} \)

where \( \tfrac{1}{n}\mathbf{1}'\mathbf{v} \) is the mean of the components of \( \mathbf{v}\ \) ,.

\( C_n\ \) , is symmetric positive semi-definite.

\(C_n\ \) , is idempotent, so that \( C_n^k=C_n, for k=1,2,\ldots \) . Once the mean has been removed, it is zero and removing it again has no effect.

\( C_n\ \) , is singular. The effects of applying the transformation \( C_n\,\mathbf{v} \) cannot be reversed.

\( C_n\ \) , has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.

\( C_n\ \) , has a nullspace of dimension 1, along the vector \( \mathbf{1}. \)

\( C_n\ \) , is a projection matrix. That is, \( C_n\mathbf{v} \) is a projection of \( \mathbf{v}\, \) onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace \( \mathbf{1} \) . (This is the subspace of all n-vectors whose components sum to zero.)

Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it forms an analytical tool that conveniently and succinctly expresses mean removal. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of a matrix. For an m-by-n matrix \( X\,, \) the multiplication \( C_m\,X \) removes the means from each of the n columns, while \( X\,C_n \) removes the means from each of the m rows.

The centering matrix provides in particular a succinct way to express the scatter matrix, \( S=(X-\mu\mathbf{1}')(X-\mu\mathbf{1}')' \) of a data sample \( X\,, \) where \( \mu=\tfrac{1}{n}X\mathbf{1} is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as

\( S=X\,C_n(X\,C_n)'=X\,C_n\,C_n\,X\,'=X\,C_n\,X\,'. \)

\( C_n is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are k=n, and \( p_1=p_2=\cdots=p_n=\frac{1}{n}. \)

References

John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0-412-99521-2, page 59.

Mathematics Encyclopedia

Retrieved from "http://en.wikipedia.org/"
All text is available under the terms of the GNU Free Documentation License

Home - Hellenica World

.

Centering matrix