the sum of the squares of the singular values of A.
Why is the singular value decomposition important? It implies
( )
A = U σ 0 V ∗
0 0
where σ is the diagonal matrix having the singular values down the diagonal. Now sometimes A is a huge
matrix, 1000×2000 or something like that. This happens in applications to situations where
the entries of A describe a picture. What also happens is that most of the singular values are
very small. What if you deleted those which were very small, say for all i ≥ l and got a new
matrix
( )
′ σ′ 0 ∗
A ≡ U 0 0 V ?
Then the entries of A^{′} would end up being close to the entries of A but there is much less information to
keep track of. This turns out to be very useful. More precisely, letting
|| ( ) ||2
′ 2 |||| σ − σ′ 0 ∗|||| ∑r 2
||A − A ||F = ||||U 0 0 V |||| = σk
F k=l+1
Thus A is approximated by A^{′} where A^{′} has rank l < r. In fact, it is also true that out of all matrices of
rank l, this A^{′} is the one which is closest to A in the Frobenius norm. Here is roughly why this is so.
Suppose
˜B
approximates A =
( )
σr×r 0
0 0
as well as possible out of all matrices
B˜
having rank no more
than l < r the size of the matrix σ_{r×r}.
Suppose the rank of
B˜
is l. Then obviously no column x_{j} of
B˜
in a basis for the column space can have
j > r since if so, the approximation of A could be improved by simply making this column into a zero
column. Therefore there are
( r )
l
choices for columns for a basis for the column space of
B˜
.
Let x be a column in the basis for the column space of
B˜
and let it be column j in the matrix
B˜
.
Denote the diagonal entry by x_{j} = σ_{j} + h. Then the error incurred due to approximating with this column
is
∑
h2 + x2i
i⁄=j
One obviously minimizes this error by letting h = 0 = x_{i} for all i≠j. That is, the column should have all
zeroes with σ_{j} in the diagonal position. As to any columns of
˜B
which are not pivot columns, such a
column is a linear combination of these basis columns which have exactly one entry, in the
diagonal position. These non pivot columns must have a 0 in the diagonal position since if
not, the rank of the matrix would be more than l. Then the off diagonal entries should equal
zero to make the approximation as good as possible. Thus the non basis columns are columns
consisting of zeros and
˜B
is a diagonal matrix with l nonzero diagonal entries selected from
the first r columns of A. It only remains to observe that, since the singular values decrease
in size from upper left to lower right in A, to minimize the error, one should pick the first l
columns for the basis for
˜B
in order to use the sum of the squares of the smallest possible
singular values in the error. That is, you would replace σ_{r×r} with the upper left l × l corner of
σ_{r×r}.
( ) ( )
σr×r 0 ˜ σl×l 0
A = 0 0 ,⇒ B = 0 0
For example, consider
( )
3 0 0 0
|( 0 2 0 0 |)
0 0 1 0
The best rank 2 approximation is
( )
| 3 0 0 0 |
( 0 2 0 0 )
0 0 0 0
Now suppose A is an m × n matrix. Let U,V be unitary and of the right size such that
( )
∗ σr×r 0
U AV = 0 0
Then suppose B approximates A as well as possible in the Frobenius norm, B having rank l < r. Then you
would want