12.3 The Derivative Of Functions Of Many Variables
The way of thinking about the derivative in Theorem 12.1.3 is exactly what is needed to define the
derivative of a function of n variables. Recall the following definition.
Definition 12.3.1A function T which maps ℝ^{n}to ℝ^{p}is called a linear transformation if for everypair of scalars, a,b and vectorsx,y∈ ℝ^{n}, it follows that T
(ax + by)
= aT
(x)
+ bT
(y)
.
Recall that from the properties of matrix multiplication, if A is an p×n matrix, and if x,y are vectors
in ℝ^{n}, then A
(ax+ by)
= aA
(x)
+ bA
(y )
. Thus you can define a linear transformation by multiplying by a
matrix. Of course the simplest example is that of a 1 × 1 matrix or number. You can think of the number 3
as a linear transformation T mapping ℝ to ℝ according to the rule Tx = 3x. It satisfies the properties
needed for a linear transformation because 3
(ax + by)
= a3x + b3y = aTx + bTy. The case of the derivative
of a scalar valued function of one variable is of this sort. You get a number for the derivative. However, you
can think of this number as a linear transformation. Of course it might not be worth the fuss to think of it
this way for a function of one variable but this is the way you must think of it for a function of n
variables.
Definition 12.3.2Let f : U → ℝ^{p}where U is an open set in ℝ^{n}for n,p ≥ 1 and let x ∈ U be given.Then f is defined to be differentiable at x ∈ U if and only if there exists a linear transformation T suchthat,
f (x + h) = f (x)+ T h+ o (h ). (12.5)
(12.5)
The derivative of the functionf,denoted by Df
(x)
, is this linear transformation. Thus
f (x + h) = f (x)+ Df (x)h + o(h)
Ifh = x − x_{0},this takes the form
f (x) = f (x0) + Df (x0)(x− x0) +o (x− x0)
If you deleted the o
(x− x )
0
term and considered the function of x given by what is left, this is called
the linear approximation to the function at the point x_{0}. In the case where x ∈ ℝ^{2} and f has values in ℝ
one can draw a picture to illustrate this.
PICT
Of course the first and most obvious question is whether the linear transformation is unique. Otherwise,
the definition of the derivative Df
(x)
would not be well defined.
Theorem 12.3.3Suppose f is differentiable, as given above in (12.5). Then T is uniquely determined.Furthermore, the matrix of T is the following p × n matrix
( )
∂f(x) ⋅⋅⋅ ∂f(x)
∂x1 ∂xn
where
∂f-(x) ≡ lim f (x+tei)−-f (x),
∂xi h→0 t
the k^{th}partial derivative of f.
Proof: Suppose T_{1} is another linear transformation which works. Thus, letting t be a small positive
real number,
f (x+th ) = f (x)+ Tth + o(th )
f (x+th ) = f (x)+ T1th+ o (th)
Now o
(th )
= o
(t)
and so, subtracting these yields
Tth − T1th = o (t)
Divide both sides by t to obtain
Th − T h =o-(t)
1 t
It follows on letting t → 0 that Th = T_{1}h. Since h is arbitrary, this shows that T = T_{1}. Thus the
derivative is well defined. So what is the matrix of this linear transformation? From Theorem 5.1.2,
this is the matrix whose i^{th} column is Te_{i}. However, from the definition of T, letting t≠0,
f (x+ te )− f (x) 1
-------i------- = - (T (tei)+ o (tei))
t t
= T (ei)+ o-(tei)= T (ei) + o(t)
t t
Then letting t → 0, it follows that
∂f-
Tei = ∂xi (x)
Recall from theorem 5.1.2 this shows the matrix of the linear transformation is as claimed.
■
Other notations which are often used for this matrix or the linear transformation are f^{′}
(x)
,J
(x)
, and
even
∂f
∂x
or
df
dx
. Also, the above definition can now be written in the form
∑p ∂f (x)
f (x+ v) = f (x )+ ∂xj vj + o (v )
j=1
or
( ∂f(x) ∂f(x) )
f (x + v)− f (x) = ∂x1 ⋅⋅⋅ ∂xn v+ o (v )
Here is an example of a scalar valued nonlinear function.
Example 12.3.4Suppose f
(x,y)
=
√ ---
xy
. Find the approximate change in f if x goes from 1 to1.01 and y goes from 4 to 3.99.
We can do this by noting that
f (1.01,3.99)− f (1,4) ≈ fx(1,2)(.01)+ fy(1,2) (− .01)
= 1(.01)+ 1(− .01) = 7.5 × 10−3.
4
Notation 12.3.5When f is a scalar valued function of n variables, the following is often written toexpress the idea that a small change in f due to small changes in the variables can be expressed in theform
df (x) = fx (x)dx1 +⋅⋅⋅+ fx (x)dxn
1 n
where the small change in x_{i}is denoted as dx_{i}. As explained above, df is the approximate change in thefunction f. Sometimes df is referred to as the differential of f.
Let f : U → ℝ^{q} where U is an open subset of ℝ^{p} and f is differentiable. It was just shown
that
( ∂f(x) ∂f(x) )
f (x + v) = f (x)+ ∂x1- ⋅⋅⋅ -∂xp- v + o(v).
Taking the i^{th} coordinate of the above equation yields
∑p ∂fi(x)
fi(x+ v ) = fi(x )+ ∂xj vj + o(v),
j=1
and it follows that the term with a sum is nothing more than the i^{th} component of J
and to reiterate, the linear transformation which results by multiplication by this q ×p matrix is known as
the derivative.
Sometimes x,y,z is written instead of x_{1},x_{2}, and x_{3}. This is to save on notation and is
easier to write and to look at although it lacks generality. When this is done it is understood
that x = x_{1},y = x_{2}, and z = x_{3}. Thus the derivative is the linear transformation determined
by
( )
f1x f1y f1z
|( f2x f2y f2z|) .
f f f
3x 3y 3z
Example 12.3.6Let A be a constant m × n matrix and consider f
(x)
= Ax.Find Df
(x)
if itexists.
f (x+ h )− f (x) = A (x + h)− A (x) = Ah = Ah + o (h ).
In fact in this case, o
(h)
= 0. Therefore, Df
(x )
= A. Note that this looks the same as the case in one
variable, f
(x)
= ax.
Example 12.3.7Let f
(x,y,z)
= xy + z^{2}x. Find Df
(x,y,z)
.
Consider f
(x + h,y+ k,z + l)
− f
(x,y,z)
. This is something which is easily computed from the
definition of the function. It equals
2 ( 2 )
(x + h)(y+ k)+ (z + l) (x + h)− xy+ z x
Multiply everything together and collect the terms. This yields
( 2 ) ( 2 2 )
z + y h +xk + 2zxl+ hk+ +2zlh + lx + lh
It follows easily the last term at the end is o
(h,k,l)
and so the derivative of this function is the linear
transformation coming from multiplication by the matrix
(( ) )
z2 + y ,x,2zx
and so this is the
derivative. It follows from this and the description of the derivative in terms of partial derivatives
that
Of course you could compute these partial derivatives directly.
Given a function of many variables, how can you tell if it is differentiable? In other words, when
you make the linear approximation, how can you tell easily that what is left over is o
(v )
.
Sometimes you have to go directly to the definition and verify it is differentiable from the
definition. For example, you may have seen the following important example in one variable
calculus.
Example 12.3.8Let f
(x)
=
{ ( )
x2sin 1x if x ⁄= 0
0 if x = 0
. Find Df
(0)
.
( )
f (h) − f (0) = 0h+ h2 sin 1 = o(h),
h
and so Df
(0)
= 0. If you find the derivative for x≠0, it is totally useless information if what you want is
Df
(0)
. This is because the derivative turns out to be discontinuous. Try it. Find the derivative for x≠0 and
try to obtain Df
(0)
from it. You see, in this example you had to revert to the definition to find the
derivative.
It isn’t really too hard to use the definition even for more ordinary examples.
Example 12.3.9Let f
(x,y)
=
( 2 2 )
x y + y
y3x
. Find Df
(1,2)
.
First of all, note that the thing you are after is a 2 × 2 matrix.