Usually no harm is occasioned by thinking of this linear transformation as its matrixtaken with respect to the usual basis vectors.

The definition 6.12.14 means that the error,

f(x+v)− f(x)−Lv

converges to 0 faster than |v|. Thus the above definition is equivalent to saying


|f(x+v)− f(x)−Lv||v|

= 0 (6.12.15)

or equivalently,


|f(y)− f(x)−Df(x)(y−x)||y−x|

= 0. (6.12.16)

Now it is clear this is just a generalization of the notion of the derivative of a function ofone variable because in this more specialized situation,


| f (x+ v)− f (x)− f ′ (x)v||v|

= 0,

due to the definition which says

f ′ (x) = limv→0

f (x+ v)− f (x)v


For functions of n variables, you can’t define the derivative as the limit of a differencequotient like you can for a function of one variable because you can’t divide by a vector.That is why there is a need for a more general definition.

The term o(v) is notation that is descriptive of the behavior in 6.12.14 and it is onlythis behavior that is of interest. Thus, if t and k are constants,

o(v) = o(v)+o(v) , o(tv) = o(v) , ko(v) = o(v)

and other similar observations hold. The sloppiness built in to this notation is useful be-cause it ignores details which are not important. It may help to think of o(v) as an adjectivedescribing what is left over after approximating f(x+v) by f(x)+Df(x)v.

Theorem 6.12.2 The derivative is well defined.

Proof: First note that for a fixed vector, v, o(tv) = o(t). Now suppose both L1 and L2work in the above definition. Then let v be any vector and let t be a real scalar which ischosen small enough that tv+x ∈U . Then

f(x+ tv) = f(x)+L1tv+o(tv) , f(x+ tv) = f(x)+L2tv+o(tv) .

Therefore, subtracting these two yields (L2−L1)(tv) = o(tv) = o(t). Therefore, dividingby t yields (L2−L1)(v) = o(t)

t . Now let t → 0 to conclude that (L2−L1)(v) = 0. Sincethis is true for all v, it follows L2 = L1. This proves the theorem.