21.6.2 The Chain Rule For Functions Of Many Variables
Let U ⊆ ℝ^{n} and V ⊆ ℝ^{p} be open sets and let f be a function defined on V having values
in ℝ^{q} while g is a function defined on U such that g
(U )
⊆ V as in the following
picture.
U g→ V →f ℝq
The chain rule says that if the linear transformations (matrices) on the left in 21.8 both
exist then the same formula holds in this more general case. Thus
Df (g(x))Dg (x) = D (f ∘ g)(x)
Note this all makes sense because Df
(g (x))
is a q × p matrix and Dg
(x)
is a p × n
matrix. Remember it is all right to do
(q × p)
(p× n)
. The middle numbers match. More
precisely,
Theorem 21.6.1(Chain rule) Let U be an open set in ℝ^{n}, let V be an open setin ℝ^{p}, let g : U → ℝ^{p}be such that g
There is an easy way to remember this in terms of the repeated index summation
convention presented earlier. Let y = g
(x)
and z = f
(y)
. Then the above says
∂z ∂y ∂z
-----i-= ---. (21.11)
∂yi∂xk ∂xk
(21.11)
Remember there is a sum on the repeated index. In particular, for each index
r,
∂zr∂yi-= ∂zr.
∂yi∂xk ∂xk
The proof of this major theorem will be given later. It will include the chain rule for
functions of one variable as a special case. First here are some examples.
I hope that by now it is clear that all the information you could desire about
various partial derivatives is available and it all reduces to matrix multiplication
and the consideration of entries of the matrix obtained by multiplying the two
derivatives.