17.7. EXERCISES 319

17.7.3 Proof Of The Chain RuleAs in the case of a function of one variable, it is important to consider the derivative ofa composition of two functions. As in the case of a function of one variable, this rule iscalled the chain rule. Its proof depends on the following fundamental lemma. This proofwill include the one dimensional case. First let M be a matrix and v a vector of length 1.Then

|Mv|2 = ∑i

(∑

jMi jv j

)2

≤∑i

(∑

j

∣∣Mi j∣∣)2

< ∞

Here is the rough idea of the following lemma.

|o(g (x+v)−g (x))||v|

=

→0 as v→0︷ ︸︸ ︷|o(g (x+v)−g (x))||g (x+v)−g (x)|

bounded︷ ︸︸ ︷|g (x+v)−g (x)|

|v|

Lemma 17.7.7 Let g : U → Rp where U is an open set in Rn and suppose g has a deriva-tive at x ∈U. Then o(g (x+v)−g (x)) = o(v).

Proof: Let

H (v)≡

{ |o(g(x+v)−g(x))||g(x+v)−g(x)| if g (x+v)−g (x) ̸= 0

0 if g (x+v)−g (x) = 0

Then limv→0 H (v) = 0 because of continuity of g at x and

|o(g (x+v)−g (x))||v|

= H (v)|g (x+v)−g (x)|

|v|

Also

|g (x+v)−g (x)||v|

≤ |Dg (x)v||v|

+|o(v)||v|

=

∣∣∣∣Dg (x)

(v

|v|

)∣∣∣∣+ |o(v)||v|

which is bounded for small v. Therefore,

limv→0

|o(g (x+v)−g (x))||v|

= 0. ■

Recall the notation f ◦g (x)≡ f (g (x)). Thus f ◦g is the name of a function, and thisfunction is defined by what was just written. The following theorem is known as the chainrule.

Theorem 17.7.8 (Chain rule) Let U be an open set in Rn, let V be an open set in Rp, letg : U → Rp be such that g (U) ⊆ V , and let f : V → Rq. Suppose Dg (x) exists for somex ∈U and that Df (g (x)) exists. Then D(f ◦g)(x) exists and furthermore,

D(f ◦g)(x) = Df (g (x))Dg (x) . (17.14)

In particular, If y = g (x) so yi = gi (x),

∂ (f ◦g)(x)∂x j

=p

∑i=1

∂f (g (x))

∂yi

∂gi (x)

∂x j. (17.15)

17.7. EXERCISES 31917.7.3. Proof Of The Chain RuleAs in the case of a function of one variable, it is important to consider the derivative ofa composition of two functions. As in the case of a function of one variable, this rule iscalled the chain rule. Its proof depends on the following fundamental lemma. This proofwill include the one dimensional case. First let M be a matrix and v a vector of length 1.Then2 2\IMv|? = y (Emin < y (E 1) <00i\i i\iHere is the rough idea of the following lemma.—0 as v0 boundedlo(g(a@+v)—g(x))| _ lo(g(@+v)—g(#))| lg (e+) —g(x)||v| lg(@+v)—g(x)| |v|Lemma 17.7.7 Let g : U — R? where U is an open set in R” and suppose g has a deriva-tive atx €U. Then o(g(a+v)—g(az)) =o(v).Proof: LetH(v)=gle sl-#2) itg(e+v)—g(a) £0Oifg(a+v)—g(x)=0Then lim,_,9 H (v) = 0 because of continuity of g at a andjo(g (a+) —g(#))| ig(a@+v)—g(@)||v| ||=H(v)AlsoIg(e@+v)—g9(#)] — |\Pgle)v| , lol)! = [pa e) (2) |+ a|v| |v| |v|which is bounded for small v. Therefore,km (ge +) —9(@))|v0 |v|=0. iRecall the notation f og (x) = f (g(x)). Thus f og is the name of a function, and thisfunction is defined by what was just written. The following theorem is known as the chainrule.Theorem 17.7.8 (Chain rule) Let U be an open set in R", let V be an open set in R?, letg:U —R? be such that g(U) CV, and let f : V + R14. Suppose Dg (a) exists for somex €U and that Df (g(ax)) exists. Then D( f og) (x) exists and furthermore,D(fog)(#) =Df (g(x)) Dg (x). (17.14)In particular, If y = g(a) so y; = g; (2),O(fog)(@) _oe (17.15)> IF (g(@)) Agi (a)—| OY; Ox; ‘L