12.8. AN APPLICATION TO STATISTICS 311

Corollary 12.7.4 Let F ∈ L (X,X). Then there exists a Hermitian W ∈ L (X,X) , anda unitary matrix Q such that F = WQ, and there exists a Hermitian U ∈ L (X,X) and aunitary R, such that F = RU.

This corollary has a fascinating relation to the question whether a given linear transfor-mation is normal. Recall that an n×n matrix A, is normal if AA∗ = A∗A. Retain the samedefinition for an element of L (X,X) .

Theorem 12.7.5 Let F ∈ L (X,X) . Then F is normal if and only if in Corollary 12.7.4RU = UR and QW =WQ.

Proof: I will prove the statement about RU = UR and leave the other part as anexercise. First suppose that RU = UR and show F is normal. To begin with,

UR∗ = (RU)∗= (UR)

∗= R∗U.

Therefore,

F ∗F = UR∗RU = U2

FF ∗ = RUUR∗ = URR∗U = U2

which shows F is normal.Now suppose F is normal. Is RU = UR? Since F is normal,

FF ∗ = RUUR∗ = RU2R∗

andF ∗F = UR∗RU = U2.

Therefore, RU2R∗ = U2, and both are nonnegative and self adjoint. Therefore, the squareroots of both sides must be equal by the uniqueness part of the theorem on fractional powers.It follows that the square root of the first, RUR∗ must equal the square root of the second,U. Therefore, RUR∗ = U and so RU = UR. This proves the theorem in one case. The othercase in which W and Q commute is left as an exercise. ■

12.8 An Application to Statistics

A random vector is a function X : Ω → Rp where Ω is a probability space. This meansthat there exists a σ algebra of measurable sets F and a probability measure P : F → [0, 1].In practice, people often don’t worry too much about the underlying probability space andinstead pay more attention to the distribution measure of the random variable. For E asuitable subset of Rp, this measure gives the probability that X has values in E. Thereare often excellent reasons for believing that a random vector is normally distributed. Thismeans that the probability that X has values in a set E is given by∫

E

1

(2π)p/2

det (Σ)1/2

exp

(−1

2(x−m)

∗Σ−1 (x−m)

)dx

The expression in the integral is called the normal probability density function. There aretwo parameters, m and Σ where m is called the mean and Σ is called the covariance matrix.It is a symmetric matrix which has all real eigenvalues which are all positive. While it maybe reasonable to assume this is the distribution, in general, you won’t know m and Σ andin order to use this formula to predict anything, you would need to know these quantities. I