316 CHAPTER 12. SELF ADJOINT OPERATORS

Why is the singular value decomposition important? It implies

A = U

(σ 0

0 0

)V ∗

where σ is the diagonal matrix having the singular values down the diagonal. Now sometimesA is a huge matrix, 1000×2000 or something like that. This happens in applications tosituations where the entries of A describe a picture. What also happens is that most of thesingular values are very small. What if you deleted those which were very small, say for alli ≥ l and got a new matrix

A′ ≡ U

(σ′ 0

0 0

)V ∗?

Then the entries of A′ would end up being close to the entries of A but there is much lessinformation to keep track of. This turns out to be very useful. More precisely, letting

σ =

σ1 0

. . .

0 σr

 , U∗AV =

(σ 0

0 0

),

||A−A′||2F =

∣∣∣∣∣∣∣∣∣∣U(σ − σ′ 0

0 0

)V ∗

∣∣∣∣∣∣∣∣∣∣2

F

=

r∑k=l+1

σ2k

Thus A is approximated by A′ where A′ has rank l < r. In fact, it is also true that outof all matrices of rank l, this A′ is the one which is closest to A in the Frobenius norm.Thus A is approximated by A′ where A′ has rank l < r. In fact, it is also true that out ofall matrices of rank l, this A′ is the one which is closest to A in the Frobenius norm.

Here is roughly why this is so. Suppose B̃ approximates A =

(σr×r 0

0 0

)as well as

possible out of all matrices B̃ having rank no more than l < r the size of the matrix σr×r.Suppose the rank of B̃ is l. Then obviously no column xj of B̃ in a basis for the columnspace can have j > r since if so, the approximation of A could be improved by simply

making this column into a zero column. Therefore there are

(r

l

)choices for columns

for a basis for the column space of B̃. Suppose you pick the first l for instance. Thus thefirst column of B̃ should be σ1e1 to make the approximation up to the first column as goodas possible. Now consider approximating as well as possible up to the first two columns.Clearly the second column should be σ2e2 and in this way, the approximation up to thefirst two columns is exact. Continue this way till the lth column. Then since B̃ has rankl, all other columns should be zero columns since you cannot have a nonzero entry in anydiagonal position and keep the rank of B̃ only l. Then since it is desired to get the bestapproximation of A you wouldn’t want any off diagonal nonzero terms either. The squareof the error in doing this, picking the first l columns as a basis would be

∑rj=l+1 σ

2j . On the

other hand, if you picked other columns than the first l in the basis for the column spaceof B̃, you would have a larger error because you would include sums involving the largersingular values. Thus letting σ′ denote the l × l upper left corner of σ, B̃ should be of the

316 CHAPTER 12. SELF ADJOINT OPERATORSWhy is the singular value decomposition important? It impliesA=u( 7? ° \v0 0where o is the diagonal matrix having the singular values down the diagonal. Now sometimesA is a huge matrix, 1000x2000 or something like that. This happens in applications tosituations where the entries of A describe a picture. What also happens is that most of thesingular values are very small. What if you deleted those which were very small, say for alli> land got a new matrix/A=u(? © \ye0 0Then the entries of A’ would end up being close to the entries of A but there is much lessinformation to keep track of. This turns out to be very useful. More precisely, lettingO1 0o= “ urava( 7 9).0 00 Oru a—o’ 0 ve0 0Thus A is approximated by A’ where A’ has rank | < r. In fact, it is also true that outof all matrices of rank I, this A’ is the one which is closest to A in the Frobenius norm.Thus A is approximated by A’ where A’ has rank | < r. In fact, it is also true that out ofall matrices of rank |, this A’ is the one which is closest to A in the Frobenius norm.2 TrF k=l412A= Alp =0possible out of all matrices B having rank no more than / < r the size of the matrix 0, x,.Suppose the rank of B is 1. Then obviously no column x; of B in a basis for the columnspace can have j > r since if so, the approximation of A could be improved by simply~ 0Here is roughly why this is so. Suppose B approximates A = ( Orxr 0 ) as well asrmaking this column into a zero column. Therefore there are ( I choices for columnsfor a basis for the column space of B. Suppose you pick the first | for instance. Thus thefirst column of B should be o1e; to make the approximation up to the first column as goodas possible. Now consider approximating as well as possible up to the first two columns.Clearly the second column should be o2e2 and in this way, the approximation up to thefirst two columns is exact. Continue this way till the /’” column. Then since B has rank1, all other columns should be zero columns since you cannot have a nonzero entry in anydiagonal position and keep the rank of B only I. Then since it is desired to get the bestapproximation of A you wouldn’t want any off diagonal nonzero terms either. The squareof the error in doing this, picking the first | columns as a basis would be i=l 41 oF. On theother hand, if you picked other columns than the first / in the basis for the column spaceof B, you would have a larger error because you would include sums involving the largersingular values. Thus letting o’ denote the J x 1 upper left corner of o, B should be of the