200 CHAPTER 11. MATRICES AND THE INNER PRODUCT
11.6.2 Least SquaresSuppose there is no solution to the system Ax= b. This happens often in applicationswhere you want to find the best solution. In other words, you want to find x such that forall x̂,
|Ax−b| ≤ |Ax̂−b|It turns out that the solution to this problem is any solution x to
AT Ax= ATb
So this raises the question whether there is a solution to this last equation.In order to present this material using notation which is common in more general
situations, we begin to denote the dot product x ·y as (x,y). Thus, the property ofthe transpose mentioned above about how it interacts with the dot product is written as(Ax,y) =
(x,ATy
).
Theorem 11.6.7 Let A be a real m×n matrix and let b ∈ Rm. Then there exists a solutionx to the system
AT Ax= ATb
Proof: First note that(AT A
)T= AT A. Thus, by the Fredholm alternative, it suf-
fices to verify that ATb is in(
N(AT A
)T)⊥
. So suppose AT Az = 0. Does it follow that(z,ATb
)= 0? First note that N
(AT A
)= N (A) . To see this note that since any matrix times
the zero vector is zero, the left side is at least as large as the right. But if AT Ax= 0, then0 =
(AT Ax,x
)= (Ax,Ax) = |Ax|2 so Ax= 0. Hence the two sets are the same. Thus(
z,ATb)= (Az,b) = (0,b) = 0. By Fredholm alternative, it follows there exists a solution
to the above equation. ■Next we verify that any solution to this equation is a solution to the least squares prob-
lem of finding x such that Ax is as close as possible to b.
Theorem 11.6.8 |Ax−b| ≤ |Ax̂−b| for all x̂ in Rn if and only if AT Ax= ATb.
Proof: x is such that Ax is as close as possible to bif and only if |A(x+ tz)−b|2 isminimized when t = 0 for any choice of z. This equals
(Ax−b+ tAz,Ax−b+ tAz)
Now, expanding this yields
|Ax−b|2 +2t (Ax−b,Az)+ t2 |Az|2
If x solves the minimization problem, then taking a derivative and setting equal to 0 gives
0 = (Ax−b,Az) =(AT (Ax−b) ,z
)=(AT Ax−ATb,z
)for all z. In particular this holds for z = AT Ax−ATb. Hence AT Ax= ATb.
Conversely, if the equation holds, then 0 = (Ax−b,Az) =(AT (Ax−b) ,z
)and so for
any z,|A(x+ tz)−b|2 = |Ax−b|2 + t2 |Az|2
which shows that the minimization property holds since you could let t = 1 in the above.■