118 CHAPTER 6. MULTI-VARIABLE CALCULUS

Lemma 6.12.3 Let f be differentiable at x. Then f is continuous at x and in fact, thereexists K > 0 such that whenever |v| is small enough,

|f(x+v)− f(x)| ≤ K |v|

Proof: From the definition of the derivative, f(x+v)− f(x) = Df(x)v+o(v). Let |v|be small enough that o(|v|)

|v| < 1 so that |o(v)| ≤ |v|. Then for such v,

|f(x+v)− f(x)| ≤ |Df(x)v|+ |v|≤ (|Df(x)|+1) |v|

This proves the lemma with K = |Df(x)|+1.

Theorem 6.12.4 (The chain rule) Let U and V be open sets, U ⊆ Fn and V ⊆ Fm. Supposef : U → V is differentiable at x ∈U and suppose g : V → Fq is differentiable at f(x) ∈ V .Then g◦ f is differentiable at x and

D(g◦ f)(x) = Dg(f(x))Df(x) .

Proof: This follows from a computation. Let B(x,r)⊆U and let r also be small enoughthat for |v| ≤ r, it follows that f(x+v) ∈ V . Such an r exists because f is continuous at x.For |v|< r, the definition of differentiability of g and f implies

g(f(x+v))−g(f(x)) =

Dg(f(x))(f(x+v)− f(x))+o(f(x+v)− f(x))= Dg(f(x)) [Df(x)v+o(v)]+o(f(x+v)− f(x))= Dg(f(x))Df(x)v+o(v)+o(f(x+v)− f(x)) . (6.12.17)

It remains to show o(f(x+v)− f(x)) = o(v).By Lemma 6.12.3, with K given there, letting ε > 0, it follows that for |v| small enough,

|o(f(x+v)− f(x))| ≤ (ε/K) |f(x+v)− f(x)| ≤ (ε/K)K |v|= ε |v| .

Since ε > 0 is arbitrary, this shows o(f(x+v)− f(x))= o(v) because whenever |v| is smallenough,

|o(f(x+v)− f(x))||v|

≤ ε.

By 6.12.17, this shows

g(f(x+v))−g(f(x)) = Dg(f(x))Df(x)v+o(v)

which proves the theorem.The derivative is a linear transformation. What is the matrix of this linear transforma-

tion taken with respect to the usual basis vectors? Let ei denote the vector of Fn which hasa one in the ith entry and zeroes elsewhere. Then the matrix of the linear transformation is