4.2. THE CHAIN RULE 93
Proof: From the definition of the derivative, f(x+v)− f(x) = Df(x)v+ o(v) . Let∥v∥ be small enough that o(∥v∥)
∥v∥ < 1 so that ∥o(v)∥ ≤ ∥v∥. Then for such v,
∥f(x+v)− f(x)∥ ≤ ∥Df(x)v∥+∥v∥ ≤ (∥Df(x)∥+1)∥v∥
This proves the lemma with K = ∥Df(x)∥+ 1. Recall the operator norm discussed inDefinitions 2.8.4, 4.1.1.
The last assertion is implied by the first as follows. Define
h(v)≡
{o(∥f(x+v)−f(x)∥)∥f(x+v)−f(x)∥ if ∥f(x+v)− f(x)∥ ̸= 0
0 if ∥f(x+v)− f(x)∥= 0
Then lim∥v∥→0 h(v) = 0 from continuity of f at x which is implied by the first part. Alsofrom the above estimate,∥∥∥∥o(∥f(x+v)− f(x)∥)
∥v∥
∥∥∥∥= ∥h(v)∥ ∥f(x+v)− f(x)∥∥v∥
≤ ∥h(v)∥(∥Df(x)∥+1)
This establishes the second claim. ■Here ∥Df(x)∥ is the operator norm of the linear transformation, Df(x). This will always
be the case unless specified to be otherwise.
4.2 The Chain RuleWith the above lemma, it is easy to prove the chain rule.
Theorem 4.2.1 (The chain rule) Let U and V be open sets U ⊆ X and V ⊆ Y . Sup-pose f : U→V is differentiable at x∈U and suppose g : V → Z is differentiable at f(x)∈Vwhere Z is a normed linear space. Then g◦ f is differentiable at x and
D(g◦ f)(x) = Dg(f(x))Df(x) .
Proof: This follows from a computation. Let B(x,r)⊆U and let r also be small enoughthat for ∥v∥ ≤ r, it follows that f(x+v) ∈V . Such an r exists because f is continuous at x.For ∥v∥< r, the definition of differentiability of g and f implies
g(f(x+v))−g(f(x)) =
Dg(f(x))(f(x+v)− f(x))+o(f(x+v)− f(x))= Dg(f(x)) [Df(x)v+o(v)]+o(f(x+v)− f(x))= D(g(f(x)))D(f(x))v+o(v)+o(f(x+v)− f(x)) (4.4)= D(g(f(x)))D(f(x))v+o(v)
By Lemma 4.1.4. From the definition of the derivative, D(g◦ f)(x) exists and equalsD(g(f(x)))D(f(x)). ■