11.6. APPROXIMATIONS 205

Proof: Let u1 ≡ v1/ |v1| . Thus for k = 1, span(u1) = span(v1) and {u1} is anorthonormal set. Now suppose for some k < n, u1, · · · , uk have been chosen such that(u j,ul) = δ jl and span(v1, · · · ,vk) = span(u1, · · · ,uk). Then define

uk+1 ≡vk+1−∑

kj=1 (vk+1,u j)u j∣∣∣vk+1−∑kj=1 (vk+1,u j)u j

∣∣∣ , (11.9)

where the denominator is not equal to zero because the v j form a basis, and so

vk+1 /∈ span(v1, · · · ,vk) = span(u1, · · · ,uk)

Thus by induction,

uk+1 ∈ span(u1, · · · ,uk,vk+1) = span(v1, · · · ,vk,vk+1) .

Also, vk+1 ∈ span(u1, · · · ,uk,uk+1) which is seen easily by solving 11.9 for vk+1, and itfollows

span(v1, · · · ,vk,vk+1) = span(u1, · · · ,uk,uk+1) .

If l ≤ k,

(uk+1,ul) =C

((vk+1,ul)−

k

∑j=1

(vk+1,u j)(u j,ul)

)=

C

((vk+1,ul)−

k

∑j=1

(vk+1,u j)δ l j

)=C ((vk+1,ul)− (vk+1,ul)) = 0.

The vectors,{u j}n

j=1 , generated in this way are therefore orthonormal because each vectorhas unit length. ■

Corollary 11.6.14 If you have a basis for Rp,{u1, · · · ,um,um+1, · · · ,up

}and {u1, · · · ,um} is orthonormal, then when the Gram Schmidt process is used on thisbasis, it returns {u1, · · · ,um} . Thus it is always possible to extend an orthonormal set ofvectors to an orthonormal basis.

Proof:This follows right away from the algorithm. ■Did we ever use the fact that all of this is taking place in Rp? No, this was never used at

all! In fact everything in the Gram Schmidt process holds if V is a subspace of an arbitraryinner product space. You just need something which is a vector space which has an innerproduct to have it all work out exactly the same. A vector space is something in whichyou can add the “vectors” and multiply them by scalars in the usual way which we do forvectors in Rn.

Now return to the stated problem which was to compute the closest point in V . This isthe content of the next theorem.

Theorem 11.6.15 Let V be an m dimensional subspace of Rp having orthonormal basis{u1, · · · ,um}. Let b ∈ Rp and let y be the point of V closest to b. Then

y =m

∑k=1

(b,uk)uk (11.10)

11.6. APPROXIMATIONS 205Proof: Let wu; = v;/|v1|. Thus for k = 1, span(w,) = span(v,) and {w)} is anorthonormal set. Now suppose for some k <n, u4, +++, ux have been chosen such that(uj, uy) = dj; and span(v1,--- ,vx) = span(w),--- , ux). Then definekVEL — Lint (UR, Uj) UjU1 = Ft — (11.9)Veep — Dp (Ver, Uj) Ujwhere the denominator is not equal to zero because the v; form a basis, and soVeq1 F Span (V1,°+- , Vg) = span (tw, -++ , Uy)Thus by induction,Uc+1 € span (U1 ,-++ , Ug, Ve+1) = Span (V1,-++ , VE, VE+1) +Also, Uzi © span (u1,++- ,Ux,Uz41) Which is seen easily by solving 11.9 for v.41, and itfollowsspan (v1,°-* , Ux, Vx41) = Span (U1, --* , Ue, Ue1)-Ifl<k,knorm) =e Uk+1; U1) hI Uk+1; Uj) iw) =ko( Uk+1; U1) hI Uk+1, Uj 13) =C ((ve41, U7) — (Ve41,U)) = 9.The vectors, {u j yi , generated in this way are therefore orthonormal because each vectorhas unit length.Corollary 11.6.14 [f you have a basis for R?,{ui,-*: > Um, Um+15°°* {Up}and {t,-++,Um} is orthonormal, then when the Gram Schmidt process is used on thisbasis, it returns {u,--- ,Um}. Thus it is always possible to extend an orthonormal set ofvectors to an orthonormal basis.Proof:This follows right away from the algorithm. MfDid we ever use the fact that all of this is taking place in R’? No, this was never used atall! In fact everything in the Gram Schmidt process holds if V is a subspace of an arbitraryinner product space. You just need something which is a vector space which has an innerproduct to have it all work out exactly the same. A vector space is something in whichyou can add the “vectors” and multiply them by scalars in the usual way which we do forvectors in R”.Now return to the stated problem which was to compute the closest point in V. This isthe content of the next theorem.Theorem 11.6.15 Let V be an m dimensional subspace of R? having orthonormal basis{uj,--+,Um}. Let b € R? and let y be the point of V closest to b. Thenmy = V(b, uy) ux (11.10)k=1