312 CHAPTER 12. SELF ADJOINT OPERATORS
am following a nice discussion given in Wikipedia which makes use of the existence of squareroots.
What people do to estimate these is to take n independent observations x1, · · · ,xn andtry to predict what m and Σ should be based on these observations. One criterion used formaking this determination is the method of maximum likelihood. In this method, you seekto choose the two parameters in such a way as to maximize the likelihood which is given as
n∏i=1
1
det (Σ)1/2
exp
(−1
2(xi−m)
∗Σ−1 (xi−m)
).
For convenience the term (2π)p/2
was ignored. Maximizing the above is equivalent to max-imizing the ln of the above. So taking ln,
n
2ln(det(Σ−1
))− 1
2
n∑i=1
(xi−m)∗Σ−1 (xi−m)
Note that the above is a function of the entries of m. Take the partial derivative withrespect to ml. Since the matrix Σ−1 is symmetric this implies
n∑i=1
∑r
(xir −mr) Σ−1rl = 0 each l.
Written in terms of vectors,n∑
i=1
(xi −m)∗Σ−1 = 0
and so, multiplying by Σ on the right and then taking adjoints, this yields
n∑i=1
(xi −m) = 0, nm =
n∑i=1
xi, m =1
n
n∑i=1
xi ≡ x̄.
Now that m is determined, it remains to find the best estimate for Σ. (xi−m)∗Σ−1 (xi−m)
is a scalar, so since trace (AB) = trace (BA) ,
(xi−m)∗Σ−1 (xi−m) = trace
((xi−m)
∗Σ−1 (xi−m)
)= trace
((xi−m) (xi−m)
∗Σ−1
)Therefore, the thing to maximize is
n ln(det(Σ−1
))−
n∑i=1
trace((xi−m) (xi−m)
∗Σ−1
)
= n ln(det(Σ−1
))− trace
S︷ ︸︸ ︷(
n∑i=1
(xi−m) (xi−m)∗
)Σ−1
We assume that S has rank p. Thus it is a self adjoint matrix which has all positive eigen-values. Therefore, from the property of the trace, the thing to maximize is
n ln(det(Σ−1
))− trace
(S1/2Σ−1S1/2
)