Understanding Derivatives: Instantaneous Velocity and Tangent Lines, Exams of Differential and Integral Calculus

The concept of derivatives in calculus, focusing on the definition of instantaneous velocity and the equation of the tangent line to a graph at a given point. It covers the limits of functions as h approaches 0 and the relationship between derivatives and differentiability.

Typology: Exams

2021/2022

Uploaded on 08/05/2022

char_s67
char_s67 🇱🇺

4.5

(116)

1.9K documents

1 / 35

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
8 The derivative
We think, informally, as continuity as being a measure of “smoothness”: if a function
f
is
continuous at
a
, then small changes in the input to
f
near
a
lead only to small changes in
the output.
But there are definitely “degrees of smoothness”. The functions
f
(
x
) =
|x|
and
g
(
x
) =
x2
(see figure) are both continuous at 0, and both achieve their minima at 0, but their graphs
behave very differently near 0 gcurves gently, while fhas a sharp point.
The tool we introduce now, that (among many many other things) distinguishes these
two behaviors, is the familiar tool of the derivative.
8.1 Two motivating examples
Instantaneous velocity
Suppose that a particle is moving along a line, and that its distance
from the origin at time tis given by the function s(t).
It’s easy to calculate the average velocity of the particle over a time interval for, say,
time
t
=
a
to time
t
=
b
: it’s the total displacement of the particle,
s
(
b
)
s
(
a
), divided
by the total time, ba.104
But what is the instantaneous velocity of the particle at a certain time
t
? To make
sense of this, we might do the following: over a small time interval [
t, t
+
t
] (starting
104
Remember that velocity is a signed quantity: if a particle starts 10 units to the right of the origin, and
two seconds later is 14 units to the right of the origin, then its average velocity over those two seconds is
(14
10)
/
2 = 2 units per second, positive because the particle has progressed further from the origin. If, on
the other hand, it starts 14 units to the right of the origin, and two seconds later is 10 units to the right of
the origin, then its average velocity over those two seconds is (10
14)
/
2 =
2 units per second, negative
because the particle has progressed closer to the origin. In both cases the average speed is the same 2
units per second speed being the absolute value of velocity.
163
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23

Partial preview of the text

Download Understanding Derivatives: Instantaneous Velocity and Tangent Lines and more Exams Differential and Integral Calculus in PDF only on Docsity!

8 The derivative

We think, informally, as continuity as being a measure of “smoothness”: if a function f is continuous at a, then small changes in the input to f near a lead only to small changes in the output. But there are definitely “degrees of smoothness”. The functions f (x) = |x| and g(x) = x^2 (see figure) are both continuous at 0, and both achieve their minima at 0, but their graphs behave very differently near 0 — g curves gently, while f has a sharp point.

The tool we introduce now, that (among many many other things) distinguishes these two behaviors, is the familiar tool of the derivative.

8.1 Two motivating examples

Instantaneous velocity Suppose that a particle is moving along a line, and that its distance from the origin at time t is given by the function s(t). It’s easy to calculate the average velocity of the particle over a time interval for, say, time t = a to time t = b: it’s the total displacement of the particle, s(b) − s(a), divided by the total time, b − a.^104 But what is the instantaneous velocity of the particle at a certain time t? To make sense of this, we might do the following: over a small time interval [t, t + ∆t] (starting (^104) Remember that velocity is a signed quantity: if a particle starts 10 units to the right of the origin, and two seconds later is 14 units to the right of the origin, then its average velocity over those two seconds is (14 − 10)/2 = 2 units per second, positive because the particle has progressed further from the origin. If, on the other hand, it starts 14 units to the right of the origin, and two seconds later is 10 units to the right of the origin, then its average velocity over those two seconds is (10 − 14)/2 = −2 units per second, negative because the particle has progressed closer to the origin. In both cases the average speed is the same — 2 units per second — speed being the absolute value of velocity.

at time t, ending at time t + ∆t), with ∆t > 0, the average velocity is

displacement time

s(t + ∆t) − s(t) ∆t

Similarly over a small time interval [t + ∆t, t], with ∆t < 0, the average velocity

s(t) − s(t + ∆t) −∆t

s(t + ∆t) − s(t) ∆t

If this common quantity, (s(t + ∆t) − s(t))/∆t, is approaching a limit as ∆t approaches 0, then it makes sense to define instantaneous velocity at time t to be that limit, that is, to be lim ∆t→ 0

s(t + ∆t) − s(t) ∆t

Tangent line : What is the equation of the tangent line to the graph of function f at some point (a, f (a)) on the graph? To answer that, we must answer the more fundamental question, “what do we mean by ‘tangent line’ ?”. A preliminary definition might be that

a tangent line to a graph at a point on the graph is a straight line that touches the graph only at that point.

This is a fairly crude definition, and fairly clearly doesn’t work: the line y = 1 touches the graph of y = cos x infinitely many times, at x = 0, ±π, ± 2 π,.. ., but clearly should be declared to be a tangent line to y = cos x at (0, 1); on the other hand, the line y = 10x touches the graph of y = cos x only once, at (0, 1), but clearly should not be declared to be a tangent line to y = cos x at (0, 1).

What we really want to say, is that a tangent line to a graph at a point on the graph is a straight line that passes through the point, and that just “glances off” the graph at that point, or is “going in the same direction as the graph” at that point, or “has the same slope as the graph does” at that point. Clearly these phrases in quotes need to be made more precise. What do we mean by “the slope of a graph, at a point”? We can make this precise, in a similar way to the way we made precise the notion of instantaneous velocity.

From the previous section, we obtain immediately two interpretations of the quantity f ′(a):

Velocity if s(t) measures the position at time t of a particle that is moving along the number line, then s′(a) measures the velocity of the particle at time a.

Slope f ′(a) is the slope of the tangent line of the graph of function f at the point (a, f (a). Consequently the equation of the tangent line is

y = f ′(a)(x − a) + f (a).

Once we have the notion of the derivate of a function at a point, it’s a very short leap to considering the derivative as a function.

Definition of the derivative function If f : D → R is a function defined on some domain D, then the derivative of f is a function, denoted f ′^106 , whose domain is {a ∈ D : f differentiable at a}^107 , and whose value at a is the derivative of f at a.

As we will see in a series of examples, the domain of f ′^ may the same as the domain of f , or slightly smaller, or much smaller. Before going on to the examples, we mention an alternate definition for of the definition of derivative: f ′(a) = lim b→a

f (b) − f (a) b − a

Indeed, suppose limb→a(f (b)−f (a))/(b−a) exists and equal L. Then for all ε > 0 there is δ > 0 such that whenever b is within δ of a (but not equal to a), we have that (f (b) − f (a))/(b − a) is within ε of L. Rewriting b as a + h (so b − a = h), this says that whenever a + h is within δ of a (but not equal to a), that is, whenever h is within δ of 0 (but not equal to 0), we have that (f (a + h) − f (a))/h is within ε of L. This says limh→ 0 (f (a + h) − f (a))/h exists and equal L. The converse direction goes along the same lines.

  • f ′(a)
  • f˙ (a)
  • (^) dxd f (x) |x=a
  • df dx^ (x )|x=a
  • (^) dxdf |x=a
  • dydx |x=a (if y is understood to be another name for f )
  • y˙(a) (again, if y is another name for f ).

We will almost exclusively use the first of these. (^106) or (^) dxdf , or f˙. (^107) We will shortly modify this definition slightly, to deal with functions which are defined on closed intervals such as [0, 1]; we will introduce a notion of “differentiable from the right” and “differentiable from the left” so as to be able to talk about what happens at the end points of the interval.

8.3 Some examples of derivatives

Given the work we have done on limits and continuity, calculating the derivatives of many simple function, even directly from the definition, is fairly straightforward. We give a bunch of examples here.

Constant function f (x) = c, where c is some fixed real number. Presumably, the derivative of this function is 0 at any real a, that is,

hlim→ 0

f (a + h) − f (a) h

Notice that we can’t verify this instantly by appealing to continuity of the expression (f (a + h) − f (a))/h, viewed as a function of h, at h = 0, and then just evaluating the expression at h = 0; the expression is not only not continuous at h = 0, it is not even defined at h = 0! This will be a common theme in computing derivatives: the expression (f (a + h) − f (a))/h (viewed as a function of h), regardless of the f under consideration, will always not be defined at h = 0, since the numerator and the denominator both evaluate to 0 at h = 0. So here, and in all other examples that we do, we will have to engage in some algebraic manipulation of the expression (f (a + h) − f (a))/h. The goal of the manipulation is to try and find an alternate expression, that is equal to (f (a + h) − f (a))/h for all h except (possibly) h = 0 (the one value of h we do not really care about); and then see if we can use some of our previous developed techniques to evaluate the limit as h goes to 0 of the new expression. For any real a we have, for h 6 = 0, f (a + h) − f (a) h

c − c h

h

and so lim h→ 0

f (a + h) − f (a) h = lim h→ 0

from which we conclude that f is differentiable at all a, with derivative 0. (Of course: the line y = c is clearly the tangent line to f at any point, and this line has slope 0; or, if a particle is located at the same position, c, on the line at all times, its velocity at all times is 0.) In this example, f ′^ is the constant 0 function, on the same domain (R) as f. This example is really simple, but it is worth doing in detail for two reasons. First, a philosophical reason: to act as a reality check for the definition, and our understanding of the definition. Second, a practical reason: to illustrate a subtlety of writing up proofs from first principles of derivatives of functions. It’s very tempting to argue that f ′(a) = 0 by writing

“f ′(a) = lim h→ 0

f (a + h) − f (a) h = lim h→ 0

c − c h = lim h→ 0

h = lim h→ 0

For each real a, and for h 6 = 0, we have (a + h)^2 − a^2 h

a^2 + 2ah + h^2 − a^2 h =

2 ah + h^2 h = 2 a + h.

Since limh→ 0 (2a + h) evidently exists and equals 2a, we conclude that limh→ 0 ((a + h)^2 − a^2 )/h exists and equals 2a, and so for all real a,

f ′(a) = 2a.

In this example, f ′^ is the linear function x 7 → 2 x, on the same domain (R) as f.

Power function In general, calculating the derivative of f (x) = xn^ for n ∈ N at arbitrary real a is not much harder than in the special case of n = 2, just as long as we bring the right tool to the algebraic manipulation. Since we’ll be faced with the expression (a + h)n^ − an, it seems that the Binomial Theorem is probably the^110 right tool. For each real a, and for h 6 = 0, we have

(a + h)n^ − an h

an^ +

(n 1

an−^1 h +

(n 2

an−^2 h^2 + · · · +

( (^) n n− 1

ahn−^1 + hn^ − an h =

(n 1

an−^1 h +

(n 2

an−^2 h^2 + · · · +

( (^) n n− 1

ahn−^1 + hn h =

n 1

an−^1 +

n 2

an−^2 h + · · · +

n n − 1

ahn−^2 + hn−^1.

Now

hlim→ 0

n 1

an−^1 =

n 1

an−^1 = nan−^1 ,

while

lim h→ 0

n 2

an−^2 h = lim h→ 0

n 3

an−^3 h^2 = · · · = lim h→ 0

n n − 1

ahn−^2 = lim h→ 0 hn−^1 = 0,

all these facts following from our previous work on continuity. So by the sum part of the sum/product/reciprocal theorem for limits, we conclude that

hlim→ 0

n 1

an−^1 +

n 2

an−^2 h + · · · +

n n − 1

ahn−^2 + hn−^1 = nan−^1.

But then it follows that lim h→ 0

(a + h)n^ − an h

= nan−^1 ; (^110) or at least a

in other words, f is differentiable for all real a, with

f ′(a) = nan−^1.

In this example, f ′^ is the power function x 7 → nxn−^1 , on the same domain (R) as f.

Quadratic reciprocal One final example in the vein of the previous ones: f (x) = 1/x^2. As long as a 6 = 0, we have

f ′(a) = lim h→ 0 f (a + h) − f (a) h

= lim h→ 0

1 (a+h)^2 −^ 1 a^2 h = lim h→ 0

a^2 − (a + h)^2 (a + h)^2 a^2 h

= lim h→ 0

− 2 ah − h^2 (a + h)^2 a^2 h = lim h→ 0

− 2 a − h (a + h)^2 a^2 =

− 2 a a^2 a^2 =

a^3

In this example, f ′^ is the function x 7 → − 2 x^3 , on the same domain (R \ { 0 }) as f.

Absolute value function Here we consider f (x) = |x|. We would strongly expect that for a > 0, we have f differentiable at a, with derivative 1, because a little neighborhood around such a, we have that f (x) = x; indeed, for a > 0 we have that for all sufficiently small h (say, for all h < a/2) |a + h| − |a| h

a + h − a h

h h

and so limh→ 0 (|a + h| − |a|)/h = limh→ 0 1 = 1. We can similarly verify that for all a < 0, f ′(a) = −1. But at a = 0, something different happens: |0 + h| − | 0 | h

|h| h

and we know that limh→ 0 |h|/h does not exist. So, this is our first example of a function that is not always differentiable; the domain of f ′^ here is R \ { 0 } while the domain of f is R. We should not have expected f (x) = |x| to be differentiable at 0, as there is no coherent “direction” that the graph of the function is going near 0 — if we look to the right of

so in order for f to be continuous at 1, we require a + b = 1. For differentiability, at 1, we have

lim h→ 0 +

f (1 + h) − f (1) h = lim h→ 0 +

a + ah + b − (a + b) h = lim h→ 0 +

ah h = lim h→ 0 +^ a = a,

and (recalling that a + b = 1, since we require f to be continuous at 1)

lim h→ 0 −

f (1 + h) − f (1) h = lim h→ 0 −

(1 + h)^2 − (a + b) h = lim h→ 0 −

2 h + h^2 h = lim h→ 0 − (2 + h) = 2.

So, for f to be differentiable at 1 we require a = 2; and since a + b = 1 this says b = −1. The function we are considering is thus

f (x) =

x^2 if x < 1 2 x − 1 if x ≥ 1. Here is the graph. It shows the two pieces not justing fitting together at 1, but fitting together smoothly.

The square root function Consider f (x) =

x, defined on [0, ∞). Tho compute its derivative at any a ∈ (0, ∞) we proceed in the usual way:

lim h→ 0

a + h −

a h

= lim h→ 0

a + h −

a h

a + h +

a √ a + h +

a

= (^) hlim→ 0

(a + h) − a h(

a + h +

a) = (^) hlim→ 0

a + h +

a)

a

So f is differentiable on (0, ∞), with derivative f ′(a) = 1/ 2

a. What about at 0? Because f is not defined for negative inputs, we must consider a one sided derivative, in particular the right derivative, and we have

lim h→ 0 +

0 + h −

h = lim h→ 0 +

h

This limit does not exist, so f is not left differentiable at 0. A more dramatic example in a similar vein comes from considering g(x) = x^1 /^3 , which has all of R as its domain. By a similar calculation to above, we get that f is differentiable at all a 6 = 0, with derivative f ′(a) = 1/(3a^2 /^3 ). At a = 0 we have

lim h→ 0 (0 + h)^1 /^3 − 01 /^3 h = lim h→ 0

h^2 /^3

which again does not exist, so g is not differentiable at 0. What’s odd about this is that from a drawing of the graph of g, it seems that g has an unambiguous slope/tangent line at the point (0, 0):

It is the vertical line, x = 0. We are failing to see this in the math, because the vertical line has infinite slope, and we have no real number that captures that.^111

sin(1/x) and variants Consider the three functions

f 1 (x) = sin(1/x), x 6 = 0 f 2 (x) = x sin(1/x), x 6 = 0 f 3 (x) = x^2 sin(1/x), x 6 = 0, (^111) Shortly we will talk about “infinite limits” and rectify this deficiency.

It is far less easy to come with an example of a function which is continuous everywhere, but differentiable nowhere; nor is it easy to imagine what such a function could look like. There are examples^114 , but they are not as easy to explain as the Dirichlet function (our example of a function that is defined everywhere but continuous nowhere). The first such example was found by Karl Weiestrass in 1872, and so is traditionally called the Weierstrass function. It is infinitely jagged, and displays a self-similarity or fractal behavior: zoom in on any portion of the graph, and you see something very similar to the global picture (see figure below).

Higher derivatives Let f be a function on some domain D. As we have been discussing in these examples, there may be some points in the domain of f at which f is differentiable, leading to a function f ′, the derivative function, which might have a smaller domain than D. But the function f ′^ may itself be differentiable at some points, leading to a function (f ′)′^ (which might have a smaller domain than that of f ′). Rather than (^114) In fact, in a quite precise sense most continuous function are nowhere differentiable.

working with this ungainly notation, we denote the second derivative by f ′′. Formally, the second derivative of a function f at a point a is defined to be

f ′′(a) = lim h→ 0

f ′(a + h) − f ′(a) h

assuming that limit exists — which presupposes that f is both defined at and near a, and is differentiable at and near a. We may further define the third derivative function, denoted f ′′′, as the derivative of the second derivative function f ′′. And we can go on; but even without the parentheses, this “prime” notation gets a little ungainly, quickly. We use the notation f (k)^ to denote the kth derivative of f , for and natural number k (so f (3)^ = f ′′′^ and f (1)^ = f ′). By convention, f (0)^ = f. Physically, if f (t) is the position of a particle at time t, then

  • f ′(t) is velocity at time t (rate of change of position with respect to time);
  • f ′′(t) is the acceleration at time t (rate of change of velocity with respect to time);
  • f ′′′(t) is the jerk at time t (rate of change of acceleration with respect to time), and so on.

Consider, for example, f (x) = 1/x, with domain all reals except 0. We have

  • f ′(x) = − 1 /x^2 , domain R \ { 0 };
  • f ′′(x) = 2/x^3 , domain R \ { 0 };
  • f ′′′(x) = − 6 /x^3 , domain R \ { 0 }, and so on.

As another example, consider the function that is obtained by splicing the cube function and the square function, i.e.

f (x) =

x^3 if x ≤ 0 x^2 if x ≥ 0.

By looking at one sides limits, it is easy to check that f is continuous at 0, differentiable at 0, and even twice differentiable at 0, but not thrice differentiable. A homework problem asks for an example of a function that, at least at some points, is differentiable k times, but not k + 1 times.

Before moving on to some more theoretical properties of the derivative, we mention one more motivation. The tangent line to a curve at a point, as we have defined it, seems to represent a good approximation to the curve, at least close to the point of tangency. Now the tangent line is a straight line, and it is relatively easy to calculate exact values of points along a straight line, while the graph of a typical function near a point may well be curved, and the

If we use Lf, 16 to approximate

14, we get √ 14 = f (14) ≈ Lf, 16 (14) = 4 + 0.125(14 − 16) = 3. 75.

This is not too bad! A calculator suggests that

14 = 3. 7416 · · · , so the linearization gives an answer with an absolute error of around 0.0083, and a relative error of around 2.2%. Of course, the situation won’t always be so good: if we use Lf, 16 to approximate

100 , we get an estimate of 4 + 0.125(100 − 16) = 14.5, which differs from the true value (10) by a large amount^115 ; and if we use it to estimate

− 8 we get an estimate of 4 + 0.125(− 8 − 16) = 1 for a quantity that doesn’t exist! This leads to the first of two natural questions to ask about the linearization (the second, you are probably already asking yourself):

  • How good is the linearization as an approximation tool, precisely?: It’s easy to approxi- mate any function, at any possible input: just say “7”. An approximation is only useful if it comes with some guarantee of its accuracy, such as “

14 is approximately 3.75; and this estimate is accurate to error ± 0 .2”, meaning that “

14 is certain to lie in the interval (3. 55 , 3 .95)”. The linearization does come with a guarantee of accuracy, but we will not explore it until next semester, when we consider the much more general (and powerful) Taylor polynomial.

  • Why use a scheme like this, to estimate the values of complicated functions, when we could just use a calculator?: To answer this, ask another question: how does a calculator figure out the values of complicated functions?!? Here’s a theoretical justification for the linearization as a tool for approximating the values of a function, near the point around which we are linearizing: it’s certainly the case (^115) Not too surprising, since by most measures 100 is not close to 16.

that xlim→a(f^ (x)^ −^ Lf,a(x)) = lim x→a f^ (x)^ −^ xlim→a Lf,a(x) =^ f^ (a)^ −^ f^ (a) = 0, which says that as x approaches a, the linearization gets closer and closer to f (makes smaller and smaller error). But this is true of lots and lots of candidates for a simple approximating function; in particular it’s true about the constant function f (a), but something as naive as that can hardly be considered as a good tool for approximating the function f away from a (it takes into account nothing except the value of the function at a). The linearization takes a little more into account about the function; it consider the direction in which the graph of the function is moving, at the point (a, f (a)). As a consequence of this extra data being built into the linearization, we have the following fact:

xlim→a

f (x) − Lf,a(x) x − a = (^) xlim→a f (x) − f (a) − f ′(a)(x − a) x − a = (^) xlim→a

f (x) − f (a) x − a − f ′(a)

= (^) xlim→a f (x) − f (a) x − a − (^) xlim→a f ′(a) = f ′(a) − f ′(a) = 0. In other words, not only does the value of the linearization get closer and closer to the value of f as x approaches a, but also the linearization get closer and closer to f as x approaches a, even when the error is measured relative to x − a

(a stronger statement, since x − a is getting smaller as x approaches a).^116

8.4 The derivative of sin

Here we go through an informal calculation of the derivative of the sin function. It is informal, because we have only informally defined sin. Next semester, we will give a proper definition of sin (via an integral), from which all of its basic properties will emerge quite easily. Along the way, we will derive the important and non-obvious trigonometric limit

hlim→ 0

sin h h

Because we haven’t yet rigorously defined sin , the treatment here will be quite casual and intuitive. But at least it will give a sense of the behavior of the trigonometric functions vis a vis the derivative, and allow us to add sin and cos to the army of functions that we can differentiate. Recall how we (informally, geometrically) defined the trigonometric functions sin and cos: (^116) The linearization is actually the unique linear function with this property. We’ll have much more to say about this next semester, when we look at Taylor series.

Question 1: Why does this prove

sin(α + β) = sin α cos β + cos α sin β?

Answer:

  • First argue that angle RPQ is α (look first at OQA, then RQO, then RQP, then RPQ).
  • Argue (from the definition of sin, and similar triangles) that PQ is sin β, and so P R is cos α sin β.
  • Argue similarly that OQ is cos β, and so AQ is sin α cos β.
  • Since AQ is the same as RB, and since PB is known, conclude that

sin(α + β) = sin α cos β + cos α sin β.

Of course, this geometric proof only works for α, β ≥ 0, α + β ≤ π/2; but similar pictures can be drawn for all other cases.^117

Now using the sin sum formula, we have (throughout assuming that all the various limits (^117) Here is another picture that justifies the trigonometric sum formulae, due to Tam´as G¨orbe:

in fact exist):

sin′^ θ = lim h→ 0

sin(θ + h) − sin θ h = lim h→ 0

sin θ cos h + cos θ sin h − sin θ h = (^) hlim→ 0 sin θ(cos h − 1) + cos θ sin h h = (^) hlim→ 0 sin θ(cos h − 1) h

  • lim h→ 0 cos θ sin h h = sin θ lim h→ 0 cos h − 1 h
  • cos θ (^) hlim→ 0 sin h h

We have reduced to two limits, neither of which look any easier than the one we started with! But, it turns out they they are essentially the same limit:

lim h→ 0

cos h − 1 h = lim h→ 0

cos h − 1 h

cos h + 1 cos h + 1

= lim h→ 0

cos^2 h − 1 h(cos h + 1)

= lim h→ 0

sin^2 h h(cos h + 1) = (^) hlim→ 0 sin h h hlim→ 0

sin h cos h + 1 = 0 lim h→ 0

sin h h = 0.

In the second from last line we used continuity of sin and cos, and in the last line, we used the (as yet unjustified) fact that (sin h)/h actually tends to a limit, as h nears 0. On this assumption, we get

sin′^ θ = cos θ lim h→ 0

sin h h