Kenneth Kuttler

312 CHAPTER 12. SELF ADJOINT OPERATORS

am following a nice discussion given in Wikipedia which makes use of the existence of squareroots.

What people do to estimate these is to take n independent observations x1, · · · ,xn andtry to predict what m and Σ should be based on these observations. One criterion used formaking this determination is the method of maximum likelihood. In this method, you seekto choose the two parameters in such a way as to maximize the likelihood which is given as

n∏i=1

det (Σ)1/2

exp

(−1

2(xi−m)

∗Σ−1 (xi−m)

For convenience the term (2π)p/2

was ignored. Maximizing the above is equivalent to max-imizing the ln of the above. So taking ln,

2ln(det(Σ−1

))− 1

n∑i=1

(xi−m)∗Σ−1 (xi−m)

Note that the above is a function of the entries of m. Take the partial derivative withrespect to ml. Since the matrix Σ−1 is symmetric this implies

n∑i=1

∑r

(xir −mr) Σ−1rl = 0 each l.

Written in terms of vectors,n∑

i=1

(xi −m)∗Σ−1 = 0

and so, multiplying by Σ on the right and then taking adjoints, this yields

n∑i=1

(xi −m) = 0, nm =

n∑i=1

xi, m =1

n∑i=1

xi ≡ x̄.

Now that m is determined, it remains to find the best estimate for Σ. (xi−m)∗Σ−1 (xi−m)

is a scalar, so since trace (AB) = trace (BA) ,

(xi−m)∗Σ−1 (xi−m) = trace

((xi−m)

∗Σ−1 (xi−m)

)= trace

((xi−m) (xi−m)

∗Σ−1

)Therefore, the thing to maximize is

n ln(det(Σ−1

))−

n∑i=1

trace((xi−m) (xi−m)

∗Σ−1

)

= n ln(det(Σ−1

))− trace

S︷︸︸︷(

n∑i=1

(xi−m) (xi−m)∗

)Σ−1

We assume that S has rank p. Thus it is a self adjoint matrix which has all positive eigen-values. Therefore, from the property of the trace, the thing to maximize is

n ln(det(Σ−1

))− trace

(S1/2Σ−1S1/2

)

312 CHAPTER 12. SELF ADJOINT OPERATORSam following a nice discussion given in Wikipedia which makes use of the existence of squareroots.What people do to estimate these is to take n independent observations x1,--- ,X, andtry to predict what m and © should be based on these observations. One criterion used formaking this determination is the method of maximum likelihood. In this method, you seekto choose the two parameters in such a way as to maximize the likelihood which is given asFor convenience the term (27)? /? was ignored. Maximizing the above is equivalent to max-imizing the In of the above. So taking In,5 In (det (5~')) — 5 3 (x;—m)* 57! (x;—-m)i=1Note that the above is a function of the entries of m. Take the partial derivative withrespect to m;. Since the matrix {~! is symmetric this implies3 > (wir —m,) S7;' =0 each l.i=lrWritten in terms of vectors,So (xi -m)* ="! =0i=1and so, multiplying by © on the right and then taking adjoints, this yieldsn n 1 n(x; —m) = 0, nm ~L m =~ di =x.= i odi=lNow that m is determined, it remains to find the best estimate for ©. (x;-m)* 07! (x;—m)is a scalar, so since trace (AB) = trace(BA),(x;-m)* 57! (x;-m) = trace ((x;-m)* y (x;—m))= trace ((x;—m) (x;-m)* =")Therefore, the thing to maximize isnIn (det (=~")) — S trace ((x;—m) (x;-m)* 57")i=1S= nln (det (=~')) — trace (>: (x;—m) xm") yti=1We assume that S has rank p. Thus it is a self adjoint matrix which has all positive eigen-values. Therefore, from the property of the trace, the thing to maximize isnin (det (=~')) — trace (s¥?n-1s1/2)