22.5. THE CHAIN RULE 471

To see this works, note that f is defined everywhere and | f (x,y)| ≤ |x| so clearly f iscontinuous at (0,0).

f (x,0)− f (0,0)x

=0−0

x= 0,

f (0,y)− f (0,0)y

=0−0

y= 0

and so fx (0,0) = 0 and fy (0,0) = 0. Thus the partial derivatives exist. However, thefunction is not differentiable at (0,0) because

lim(x,y)→(0,0)

xsin(

1xy

)|(x,y)|

does not even exist, much less equals 0. To see this, let x = y and let x → 0.

22.5 The Chain Rule22.5.1 The Chain Rule for Functions of One VariableFirst recall the chain rule for a function of one variable. Consider the following picture.

Ig→ J

f→ R

Here I and J are open intervals and it is assumed that g(I)⊆ J. The chain rule says that iff ′ (g(x)) exists and g′ (x) exists for x ∈ I, then the composition, f ◦g also has a derivativeat x and

( f ◦g)′ (x) = f ′ (g(x))g′ (x) .

Recall that f ◦g is the name of the function defined by f ◦g(x)≡ f (g(x)). In the notationof this chapter, the chain rule is written as

D f (g(x))Dg(x) = D( f ◦g)(x) . (22.9)

22.5.2 The Chain Rule for Functions of Many VariablesLet U ⊆Rn and V ⊆Rp be open sets and let f be a function defined on V having values inRq while g is a function defined on U such that g (U)⊆V as in the following picture.

Ug→V

f→ Rq

The chain rule says that if the linear transformations (matrices) on the left in 22.9 both existthen the same formula holds in this more general case. Thus

Df (g (x))Dg (x) = D(f ◦g)(x)

Note this all makes sense because Df (g (x)) is a q× p matrix and Dg (x) is a p×n matrix.Remember it is all right to do (q× p)(p×n). The middle numbers match.

It turns out that the chain rule is an easy computation once you have the followinglemma. The rough idea is as follows. Here g is differentiable at x.

|o(g (x+v)−g (x))||v|

=

→0 as v→0︷ ︸︸ ︷|o(g (x+v)−g (x))||g (x+v)−g (x)|

bounded by 22.8︷ ︸︸ ︷|g (x+v)−g (x)|

|v|