21.3 The Derivative Of Functions Of Many Variables
The way of thinking about the derivative in Theorem 21.1.3 is exactly what is needed to
define the derivative of a function of n variables. One can argue that it is also the right
way to define the derivative of a function of one variable in order to reduce confusion
later on.
As observed by Deudonne, “...In the classical teaching of Calculus, this idea (that the
derivative is a linear transformation) is immediately obscured by the accidental fact
that, on a one-dimensional vector space, there is a one-to-one correspondence
between linear forms and numbers, and therefore the derivative at a point is
defined as a number instead of a linear form. This slavish subservience to the
shibboleth^{1}
of numerical interpretation at any cost becomes much worse when dealing with functions
of several variables...”
In fact, the derivative is a linear transformation and it is useless to pretend otherwise.
This is the main reason for including the introductory material on linear algebra in this
book.
Recall the following definition.
Definition 21.3.1A function T which maps ℝ^{n}to ℝ^{p}is called a lineartransformation if for every pair of scalars, a,b and vectorsx,y∈ ℝ^{n}, it follows thatT
(ax+ by)
= aT
(x)
+ bT
(y )
.
Recall that from the properties of matrix multiplication, if A is a p×n matrix, and if
x,y are vectors in ℝ^{n}, then A
(ax + by )
= aA
(x )
+ bA
(y)
. Thus you can define a linear
transformation by multiplying by a matrix. Of course the simplest example is that of a
1 × 1 matrix or number. You can think of the number 3 as a linear transformation T
mapping ℝ to ℝ according to the rule Tx = 3x. It satisfies the properties needed for a
linear transformation because 3
(ax+ by)
= a3x + b3y = aTx + bTy. The case of the
derivative of a scalar valued function of one variable is of this sort. You get a
number for the derivative. However, you can think of this number as a linear
transformation and this is the way you must think of it for a function of n
variables.
Definition 21.3.2Let f : U → ℝ^{p}where U is an open set in ℝ^{n}for n,p ≥ 1
and let x ∈ U be given. Then f is defined to be differentiable at x ∈ U if and only ifthere exists a linear transformation T such that,
f (x + h) = f (x)+ T h+ o (h ). (21.5)
(21.5)
The derivative of the functionf,denoted by Df
(x)
, is this linear transformation.Thus
f (x + h) = f (x) +Df (x)h + o(h)
Ifh = x − x_{0},this takes the form
f (x) = f (x0)+ Df (x0)(x − x0) + o(x− x0)
If you deleted the o
(x− x0)
term and considered the function of x given by what is
left, this is called the linear approximation to the function at the point x_{0}. In the
case where x ∈ ℝ^{2} and f has values in ℝ one can draw a picture to illustrate
this.
PICT
Of course the first and most obvious question is whether the linear transformation is
unique. Otherwise, the definition of the derivative Df
(x )
would not be well
defined.
Theorem 21.3.3Suppose f is differentiable, as given above in (21.5). Then Tis uniquely determined. Furthermore, the matrix of T is the following p × nmatrix
( )
∂f∂x(x) ⋅⋅⋅ ∂f∂(xx)
1 n
where
∂f f (x+te )− f (x)
---(x) ≡ hli→m0------i-------,
∂xi t
the k^{th}partial derivative of f.
Proof: Suppose T_{1} is another linear transformation which works. Thus, letting t be a
small positive real number,
f (x+th ) = f (x)+ Tth + o(th )
f (x+th ) = f (x)+ T1th+ o (th)
Now o
(th )
= o
(t)
and so, subtracting these yields
T th − T1th = o (t)
Divide both sides by t to obtain
o (t)
Th − T1h =--t-
It follows on letting t → 0 that Th = T_{1}h. Since h is arbitrary, this shows that T = T_{1}.
Thus the derivative is well defined. So what is the matrix of this linear transformation?
From Theorem 18.2.4, this is the matrix whose i^{th} column is Te_{i}. However, from the
definition of T, letting t≠0,
f (x+-tei)−-f (x)= 1(T (te )+ o(te )) = T (e)+ o(tei)= T (e )+ o(t)
t t i i i t i t
Then letting t → 0, it follows that
T ei = ∂f-(x)
∂xi
Recall from theorem 18.2.4 this shows that the matrix of the linear transformation is as
claimed. ■
Other notations which are often used for this matrix or the linear transformation are
f^{′}
(x)
,J
(x)
, and even
∂∂fx
or
dfdx-
. Also, the above definition can now be written in the
form
∑p ∂f (x)
f (x+ v) = f (x)+ ∂xj vj + o (v )
j=1
or
f (x+ v)− f (x) = ( ∂f(x) ∂f(x)) v+ o (v )
∂x1 ⋅⋅⋅ ∂xn
Here is an example of a scalar valued nonlinear function.
Example 21.3.4Suppose f
(x,y)
=
√ ---
xy
. Find the approximate change in f if xgoes from 1 to 1.01 and y goes from 4 to 3.99.
We can do this by noting that
f (1.01,3.99)− f (1,4) ≈ fx(1,2)(.01)+ fy(1,2)(− .01)
= 1(.01) + 1(− .01) = 7.5× 10−3.
4
Notation 21.3.5When f is a scalar valued function of n variables, the following isoften written to express the idea that a small change in f due to small changes in thevariables can be expressed in the form
df (x) = fx (x)dx1 + ⋅⋅⋅+ fx (x)dxn
1 n
where the small change in x_{i}is denoted as dx_{i}. As explained above, df is the approximatechange in the function f. Sometimes df is referred to as the differential of f.
Let f : U → ℝ^{q} where U is an open subset of ℝ^{p} and f is differentiable. It was just
shown that
( )
f (x + v) = f (x) + ∂f(x) ⋅⋅⋅ ∂f(x) v+ o(v).
∂x1 ∂xp
Taking the i^{th} coordinate of the above equation yields
p
f (x +v ) = f (x)+ ∑ ∂fi(x)v + o(v),
i i j=1 ∂xj j
and it follows that the term with a sum is nothing more than the i^{th} component of
J
and to reiterate, the linear transformation which results by multiplication by this q × p
matrix is known as the derivative.
Sometimes x,y,z is written instead of x_{1},x_{2}, and x_{3}. This is to save on notation and
is easier to write and to look at although it lacks generality. When this is done it is
understood that x = x_{1},y = x_{2}, and z = x_{3}. Thus the derivative is the linear
transformation determined by
( )
f1x f1y f1z
( f2x f2y f2z ) .
f3x f3y f3z
Example 21.3.6Let A be a constant m×n matrix and consider f
(x)
= Ax.FindDf
(x)
if it exists.
f (x +h )− f (x) = A (x+ h)− A (x) = Ah = Ah +o (h).
In fact in this case, o
(h)
= 0. Therefore, Df
(x )
= A. Note that this looks the same as
the case in one variable, f
(x)
= ax.
Example 21.3.7Let f
(x,y,z)
= xy + z^{2}x. Find Df
(x,y,z)
.
Consider f
(x + h,y+ k,z + l)
− f
(x,y,z)
. This is something which is easily
computed from the definition of the function. It equals
(x+ h)(y+ k)+ (z + l)2(x+ h)− (xy +z2x)
Multiply everything together and collect the terms. This yields
(z2 + y)h + xk+ 2zxl+ (hk+ +2zlh + l2x + l2h)
It follows easily the last term at the end is o
(h,k,l)
and so the derivative of this function
is the linear transformation coming from multiplication by the matrix
(( 2 ) )
z + y ,x,2zx
and so this is the derivative. It follows from this and the description of the derivative in
terms of partial derivatives that
Of course you could compute these partial derivatives directly.
Given a function of many variables, how can you tell if it is differentiable? In other
words, when you make the linear approximation, how can you tell easily that what is left
over is o
(v )
. Sometimes you have to go directly to the definition and verify it is
differentiable from the definition. For example, you may have seen the following
important example in one variable calculus.
Example 21.3.8Let f
(x)
=
{ 2 (1)
x sin x if x ⁄= 0
0 if x = 0
. Find Df
(0)
.
2 ( 1)
f (h) − f (0) = 0h+ h sin h- = o(h),
and so Df
(0)
= 0. If you find the derivative for x≠0, it is totally useless information if
what you want is Df
(0)
. This is because the derivative turns out to be discontinuous.
Try it. Find the derivative for x≠0 and try to obtain Df
(0)
from it. You see, in this
example you had to revert to the definition to find the derivative.
It isn’t really too hard to use the definition even for more ordinary examples.
Example 21.3.9Let f
(x,y)
=
( )
x2y + y2
y3x
. Find Df
(1,2)
.
First of all, note that the thing you are after is a 2 × 2 matrix.