316 CHAPTER 12. SELF ADJOINT OPERATORS
Why is the singular value decomposition important? It implies
A = U
(σ 0
0 0
)V ∗
where σ is the diagonal matrix having the singular values down the diagonal. Now sometimesA is a huge matrix, 1000×2000 or something like that. This happens in applications tosituations where the entries of A describe a picture. What also happens is that most of thesingular values are very small. What if you deleted those which were very small, say for alli ≥ l and got a new matrix
A′ ≡ U
(σ′ 0
0 0
)V ∗?
Then the entries of A′ would end up being close to the entries of A but there is much lessinformation to keep track of. This turns out to be very useful. More precisely, letting
σ =
σ1 0
. . .
0 σr
, U∗AV =
(σ 0
0 0
),
||A−A′||2F =
∣∣∣∣∣∣∣∣∣∣U(σ − σ′ 0
0 0
)V ∗
∣∣∣∣∣∣∣∣∣∣2
F
=
r∑k=l+1
σ2k
Thus A is approximated by A′ where A′ has rank l < r. In fact, it is also true that outof all matrices of rank l, this A′ is the one which is closest to A in the Frobenius norm.Thus A is approximated by A′ where A′ has rank l < r. In fact, it is also true that out ofall matrices of rank l, this A′ is the one which is closest to A in the Frobenius norm.
Here is roughly why this is so. Suppose B̃ approximates A =
(σr×r 0
0 0
)as well as
possible out of all matrices B̃ having rank no more than l < r the size of the matrix σr×r.Suppose the rank of B̃ is l. Then obviously no column xj of B̃ in a basis for the columnspace can have j > r since if so, the approximation of A could be improved by simply
making this column into a zero column. Therefore there are
(r
l
)choices for columns
for a basis for the column space of B̃. Suppose you pick the first l for instance. Thus thefirst column of B̃ should be σ1e1 to make the approximation up to the first column as goodas possible. Now consider approximating as well as possible up to the first two columns.Clearly the second column should be σ2e2 and in this way, the approximation up to thefirst two columns is exact. Continue this way till the lth column. Then since B̃ has rankl, all other columns should be zero columns since you cannot have a nonzero entry in anydiagonal position and keep the rank of B̃ only l. Then since it is desired to get the bestapproximation of A you wouldn’t want any off diagonal nonzero terms either. The squareof the error in doing this, picking the first l columns as a basis would be
∑rj=l+1 σ
2j . On the
other hand, if you picked other columns than the first l in the basis for the column spaceof B̃, you would have a larger error because you would include sums involving the largersingular values. Thus letting σ′ denote the l × l upper left corner of σ, B̃ should be of the