21.3 The Derivative Of Functions Of Many Variables
The way of thinking about the derivative in Theorem 21.1.3 is exactly what is needed to
define the derivative of a function of n variables. One can argue that it is also the right
way to define the derivative of a function of one variable in order to reduce confusion
As observed by Deudonne, “...In the classical teaching of Calculus, this idea (that the
derivative is a linear transformation) is immediately obscured by the accidental fact
that, on a one-dimensional vector space, there is a one-to-one correspondence
between linear forms and numbers, and therefore the derivative at a point is
defined as a number instead of a linear form. This slavish subservience to the
of numerical interpretation at any cost becomes much worse when dealing with functions
of several variables...”
In fact, the derivative is a linear transformation and it is useless to pretend otherwise.
This is the main reason for including the introductory material on linear algebra in this
Recall the following definition.
Definition 21.3.1 A function T which maps ℝn to ℝp is called a linear
transformation if for every pair of scalars, a,b and vectors x,y ∈ ℝn, it follows that
Recall that from the properties of matrix multiplication, if A is a p×n matrix, and if
x,y are vectors in ℝn, then A
. Thus you can define a linear
transformation by multiplying by a matrix. Of course the simplest example is that of a
1 matrix or number. You can think of the number 3 as a linear transformation T
according to the rule Tx
. It satisfies the properties needed for a
linear transformation because 3
. The case of the
derivative of a scalar valued function of one variable is of this sort. You get a
number for the derivative. However, you can think of this number as a linear
transformation and this is the way you must think of it for a function of n
Definition 21.3.2 Let f : U → ℝp where U is an open set in ℝn for n,p ≥ 1
and let x ∈ U be given. Then f is defined to be differentiable at x ∈ U if and only if
there exists a linear transformation T such that,
The derivative of the function f, denoted by Df
, is this linear transformation.
If h = x − x0, this takes the form
If you deleted the o
term and considered the function of
given by what is
left, this is called the linear approximation to the function at the point x0.
case where x ∈ ℝ2
has values in ℝ
one can draw a picture to illustrate
Of course the first and most obvious question is whether the linear transformation is
unique. Otherwise, the definition of the derivative Df
would not be well
Theorem 21.3.3 Suppose f is differentiable, as given above in (21.5). Then T
is uniquely determined. Furthermore, the matrix of T is the following p × n
the kth partial derivative of f.
Proof: Suppose T1 is another linear transformation which works. Thus, letting t be a
small positive real number,
and so, subtracting these yields
Divide both sides by t to obtain
It follows on letting t → 0 that Th = T1h. Since h is arbitrary, this shows that T = T1.
Thus the derivative is well defined. So what is the matrix of this linear transformation?
From Theorem 18.2.4, this is the matrix whose ith column is Tei. However, from the
definition of T, letting t≠0,
Then letting t → 0, it follows that
Recall from theorem 18.2.4 this shows that the matrix of the linear transformation is as
Other notations which are often used for this matrix or the linear transformation are
, and even
. Also, the above definition can now be written in the
Here is an example of a scalar valued nonlinear function.
Example 21.3.4 Suppose f
. Find the approximate change in f if x
1 to 1.01 and y goes from
4 to 3.99.
We can do this by noting that
Of course the exact value is
Notation 21.3.5 When f is a scalar valued function of n variables, the following is
often written to express the idea that a small change in f due to small changes in the
variables can be expressed in the form
where the small change in xi is denoted as dxi. As explained above, df is the approximate
change in the function f. Sometimes df is referred to as the differential of f.
Let f : U → ℝq where U is an open subset of ℝp and f is differentiable. It was just
Taking the ith coordinate of the above equation yields
and it follows that the term with a sum is nothing more than the ith component of
q × p
and to reiterate, the linear transformation which results by multiplication by this q × p
matrix is known as the derivative.
Sometimes x,y,z is written instead of x1,x2, and x3. This is to save on notation and
is easier to write and to look at although it lacks generality. When this is done it is
understood that x = x1,y = x2, and z = x3. Thus the derivative is the linear
transformation determined by
Example 21.3.6 Let A be a constant m×n matrix and consider f
if it exists.
In fact in this case, o
. Therefore, Df
. Note that this looks the same as
the case in one variable, f
Example 21.3.7 Let f
+ z2x. Find Df
. This is something which is easily
computed from the definition of the function. It equals
Multiply everything together and collect the terms. This yields
It follows easily the last term at the end is o
and so the derivative of this function
is the linear transformation coming from multiplication by the matrix
and so this is the derivative. It follows from this and the description of the derivative in
terms of partial derivatives that
Of course you could compute these partial derivatives directly.
Given a function of many variables, how can you tell if it is differentiable? In other
words, when you make the linear approximation, how can you tell easily that what is left
over is o
. Sometimes you have to go directly to the definition and verify it is
differentiable from the definition. For example, you may have seen the following
important example in one variable calculus.
Example 21.3.8 Let f
. Find Df
and so Df
= 0. If you find the derivative for
0, it is totally useless information if
what you want is Df
. This is because the derivative turns out to be discontinuous.
Try it. Find the derivative for
0 and try to obtain Df
from it. You see, in this
example you had to revert to the definition to find the derivative.
It isn’t really too hard to use the definition even for more ordinary examples.
Example 21.3.9 Let f
. Find Df
First of all, note that the thing you are after is a 2 × 2 matrix.
Therefore, the matrix of the derivative is
Example 21.3.10 Let f
. Find Df
You know that if there is a derivative, its standard matrix is of the form
Does it work? Is
Doing the computations, it follows the left side of the equal sign is of the form
This is o
because it involves terms like
, etc. Each term being of degree 2