200 CHAPTER 11. MATRICES AND THE INNER PRODUCT

11.6.2 Least SquaresSuppose there is no solution to the system Ax= b. This happens often in applicationswhere you want to find the best solution. In other words, you want to find x such that forall x̂,

|Ax−b| ≤ |Ax̂−b|It turns out that the solution to this problem is any solution x to

AT Ax= ATb

So this raises the question whether there is a solution to this last equation.In order to present this material using notation which is common in more general

situations, we begin to denote the dot product x ·y as (x,y). Thus, the property ofthe transpose mentioned above about how it interacts with the dot product is written as(Ax,y) =

(x,ATy

).

Theorem 11.6.7 Let A be a real m×n matrix and let b ∈ Rm. Then there exists a solutionx to the system

AT Ax= ATb

Proof: First note that(AT A

)T= AT A. Thus, by the Fredholm alternative, it suf-

fices to verify that ATb is in(

N(AT A

)T)⊥

. So suppose AT Az = 0. Does it follow that(z,ATb

)= 0? First note that N

(AT A

)= N (A) . To see this note that since any matrix times

the zero vector is zero, the left side is at least as large as the right. But if AT Ax= 0, then0 =

(AT Ax,x

)= (Ax,Ax) = |Ax|2 so Ax= 0. Hence the two sets are the same. Thus(

z,ATb)= (Az,b) = (0,b) = 0. By Fredholm alternative, it follows there exists a solution

to the above equation. ■Next we verify that any solution to this equation is a solution to the least squares prob-

lem of finding x such that Ax is as close as possible to b.

Theorem 11.6.8 |Ax−b| ≤ |Ax̂−b| for all x̂ in Rn if and only if AT Ax= ATb.

Proof: x is such that Ax is as close as possible to bif and only if |A(x+ tz)−b|2 isminimized when t = 0 for any choice of z. This equals

(Ax−b+ tAz,Ax−b+ tAz)

Now, expanding this yields

|Ax−b|2 +2t (Ax−b,Az)+ t2 |Az|2

If x solves the minimization problem, then taking a derivative and setting equal to 0 gives

0 = (Ax−b,Az) =(AT (Ax−b) ,z

)=(AT Ax−ATb,z

)for all z. In particular this holds for z = AT Ax−ATb. Hence AT Ax= ATb.

Conversely, if the equation holds, then 0 = (Ax−b,Az) =(AT (Ax−b) ,z

)and so for

any z,|A(x+ tz)−b|2 = |Ax−b|2 + t2 |Az|2

which shows that the minimization property holds since you could let t = 1 in the above.■

200 CHAPTER 11. MATRICES AND THE INNER PRODUCT11.6.2 Least SquaresSuppose there is no solution to the system Ax = b. This happens often in applicationswhere you want to find the best solution. In other words, you want to find x such that forall z,|Ax — b| < |A#—b|It turns out that the solution to this problem is any solution z toA’Ax =A"bSo this raises the question whether there is a solution to this last equation.In order to present this material using notation which is common in more generalsituations, we begin to denote the dot product x-y as (x,y). Thus, the property ofthe transpose mentioned above about how it interacts with the dot product is written as(Ax, y) = (@,ATy).Theorem 11.6.7 Let A be a real m x n matrix and let b € R"™. Then there exists a solutionx to the systemA’Ax =A"bProof: First note that (ATA)! =A'™A. Thus, by the Fredholm alternative, it suf-afices to verify that A’ b is in (v (a7A)") . So suppose A?Az = 0. Does it follow that(z,A’b) = 0? First note that N (A’A) = N (A). To see this note that since any matrix timesthe zero vector is zero, the left side is at least as large as the right. But if A7Ax = 0, then0 = (A’Aax,a) = (Aw,Ax) = |Aa|” so Aa = 0. Hence the two sets are the same. Thus(z,A"b) = (Az, b) = (0,b) = 0. By Fredholm alternative, it follows there exists a solutionto the above equation. MiNext we verify that any solution to this equation is a solution to the least squares prob-lem of finding a such that Az is as close as possible to b.Theorem 11.6.8 |Ax — b| < |A#— | for all & in R" if and only if A’ Ax = A’ b.Proof: «x is such that Ax is as close as possible to bif and only if |A(a+tz) —6/° isminimized when t = 0 for any choice of z. This equals(Av —b+tAz,Axv—b+tAz)Now, expanding this yields|Aa — b|* + 2t (Aw —b,Az) +27 |Az|"If x solves the minimization problem, then taking a derivative and setting equal to 0 gives0 = (Ax —b,Az) = (A? (Aw —b),z) = (A’Ax —A’B,z)for all z. In particular this holds for z = A’ Aw — A’ b. Hence A? Ax = A’ b.Conversely, if the equation holds, then 0 = (Ax — b,Az) = (A’ (Aw — b) ,z) and so forany Z,|A (x +tz) —b|* = |Aw— bd)? +27 |Az|which shows that the minimization property holds since you could let t = | in the above.a