5.3. DERIVATIVES OF INVERSE FUNCTIONS 125
Proof: This is left to you. Use the chain rule and the product rule.Higher order derivatives are defined in the obvious way. f ′′ ≡ ( f ′)′ etc. Also the Leibniz
notation is defined by dydx = f ′ (x) where y = f (x) and the second derivative is denoted as
d2ydx2 with various other higher order derivatives defined similarly. When people write y(n)
they mean the nth derivative. Similarly f (n) (x) refers to the nth derivative.The chain rule has a particularly attractive form in Leibniz’s notation. Suppose y= g(u)
and u = f (x) . Thus y = g◦ f (x) . Then from the above theorem
(g◦ f )′ (x) = g′ ( f (x)) f ′ (x) = g′ (u) f ′ (x)
or in other words, dydx = dy
dududx . Notice how the du cancels. This particular form is a very
useful crutch and is used extensively in applications.
5.3 Derivatives of Inverse FunctionsIt happens that if f is a differentiable one to one function defined on an interval, [a,b] ,and f ′ (x) exists and is non zero then the inverse function f−1 has a derivative or one sidedderivative at the point f (x) .
Theorem 5.3.1 Let f : [a,b] → R be continuous and one to one. Suppose f ′ (x)exists for some x ∈ [a,b] and f ′ (x) ̸= 0, a one sided derivative at the end points. Then(
f−1)′( f (x)) exists and is given by the formula,
(f−1)′( f (x)) = 1
f ′(x) .
Proof: By Lemma 4.4.3, and Corollary 4.5.1 on Page 105 f is either strictly increasingor strictly decreasing and f−1 is continuous on an interval f ([a,b]). Constrain h to have theappropriate sign if at an endpoint of f ([a,b]) , and letting |h| be sufficiently small otherwise,let x be a point where f ′ (x) ̸= 0 and f (x) = y
h = f(
f−1 (y+h))− f
(f−1 (y)
)=
f ′ (x)(
f−1 (y+h)− f−1 (y))+o(
f−1 (y+h)− f−1 (y))
(∗)
By continuity of f−1,∣∣o( f−1 (y+h)− f−1 (y)
)∣∣ < 12 | f
′ (x)|∣∣ f−1 (y+h)− f−1 (y)
∣∣ if h issmall enough and so, from the triangle inequality in ∗,
|h| ≥ 12
∣∣ f ′ (x)∣∣ ∣∣ f−1 (y+h)− f−1 (y)∣∣ ,∣∣o( f−1 (y+h)− f−1 (y)
)∣∣|h|
≤2∣∣o( f−1 (y+h)− f−1 (y)
)∣∣| f ′ (x)| | f−1 (y+h)− f−1 (y)|
showing that o(
f−1 (y+h)− f−1 (y))= o(h) . From ∗,
1f ′ (x)
h+o(h) = f−1 (y+h)− f−1 (y) = f−1 ( f (x)+h)− f−1 ( f (x))
Which proves the theorem.This is one of those theorems which is very easy to remember if you neglect the
difficult questions and simply focus on formal manipulations. Consider the following.f−1 ( f (x)) = x. Now use the chain rule to write
(f−1)′( f (x)) f ′ (x) = 1, and then divide
both sides by f ′ (x) to obtain(
f−1)′( f (x)) = 1
f ′(x) . Of course this gives the conclusion ofthe above theorem rather effortlessly and it is formal manipulations like this which aid inremembering formulas such as the one given in the theorem.