6.12. THE FRECHET DERIVATIVE 117

Usually no harm is occasioned by thinking of this linear transformation as its matrixtaken with respect to the usual basis vectors.

The definition 6.12.14 means that the error,

f(x+v)− f(x)−Lv

converges to 0 faster than |v|. Thus the above definition is equivalent to saying

lim|v|→0

|f(x+v)− f(x)−Lv||v|

= 0 (6.12.15)

or equivalently,

limy→x

|f(y)− f(x)−Df(x)(y−x)||y−x|

= 0. (6.12.16)

Now it is clear this is just a generalization of the notion of the derivative of a function ofone variable because in this more specialized situation,

lim|v|→0

| f (x+ v)− f (x)− f ′ (x)v||v|

= 0,

due to the definition which says

f ′ (x) = limv→0

f (x+ v)− f (x)v

.

For functions of n variables, you can’t define the derivative as the limit of a differencequotient like you can for a function of one variable because you can’t divide by a vector.That is why there is a need for a more general definition.

The term o(v) is notation that is descriptive of the behavior in 6.12.14 and it is onlythis behavior that is of interest. Thus, if t and k are constants,

o(v) = o(v)+o(v) , o(tv) = o(v) , ko(v) = o(v)

and other similar observations hold. The sloppiness built in to this notation is useful be-cause it ignores details which are not important. It may help to think of o(v) as an adjectivedescribing what is left over after approximating f(x+v) by f(x)+Df(x)v.

Theorem 6.12.2 The derivative is well defined.

Proof: First note that for a fixed vector, v, o(tv) = o(t). Now suppose both L1 and L2work in the above definition. Then let v be any vector and let t be a real scalar which ischosen small enough that tv+x ∈U . Then

f(x+ tv) = f(x)+L1tv+o(tv) , f(x+ tv) = f(x)+L2tv+o(tv) .

Therefore, subtracting these two yields (L2−L1)(tv) = o(tv) = o(t). Therefore, dividingby t yields (L2−L1)(v) = o(t)

t . Now let t → 0 to conclude that (L2−L1)(v) = 0. Sincethis is true for all v, it follows L2 = L1. This proves the theorem.