WEBVTT

00:00:15.259 --> 00:00:18.960
The goal here is simple, explain what a derivative is.

00:00:19.160 --> 00:00:21.658
The thing is though, there's some subtlety to this topic,

00:00:21.658 --> 00:00:24.200
and a lot of potential for paradoxes if you're not careful.

00:00:24.780 --> 00:00:27.551
So a secondary goal is that you have an appreciation

00:00:27.551 --> 00:00:30.219
for what those paradoxes are and how to avoid them.

00:00:31.219 --> 00:00:35.591
You see, it's common for people to say that the derivative measures an instantaneous

00:00:35.591 --> 00:00:39.759
rate of change, but when you think about it, that phrase is actually an oxymoron.

00:00:40.240 --> 00:00:43.542
Change is something that happens between separate points in time,

00:00:43.542 --> 00:00:46.646
and when you blind yourself to all but just a single instant,

00:00:46.646 --> 00:00:48.600
there's not really any room for change.

00:00:49.500 --> 00:00:51.840
You'll see what I mean more as we get into it,

00:00:51.840 --> 00:00:56.022
but when you appreciate that a phrase like instantaneous rate of change is actually

00:00:56.021 --> 00:01:00.103
nonsense, I think it makes you appreciate just how clever the fathers of calculus

00:01:00.103 --> 00:01:02.991
were in capturing the idea that phrase is meant to evoke,

00:01:02.991 --> 00:01:05.980
but with a perfectly sensible piece of math, the derivative.

00:01:07.540 --> 00:01:11.877
As our central example, I want you to imagine a car that starts at some point A,

00:01:11.876 --> 00:01:15.839
speeds up, and then slows down to a stop at some point B 100 meters away,

00:01:15.840 --> 00:01:19.000
and let's say it all happens over the course of 10 seconds.

00:01:20.519 --> 00:01:23.899
That's the setup to have in mind as we lay out what the derivative is.

00:01:23.900 --> 00:01:29.099
Well, we could graph this motion, letting the vertical axis represent the

00:01:29.099 --> 00:01:34.579
distance traveled, and the horizontal axis represent time, so at each time t,

00:01:34.578 --> 00:01:38.723
represented with a point somewhere on the horizontal axis,

00:01:38.724 --> 00:01:44.134
the height of the graph tells us how far the car has traveled in total after

00:01:44.134 --> 00:01:45.540
that amount of time.

00:01:46.760 --> 00:01:50.160
It's pretty common to name a distance function like this s of t.

00:01:50.159 --> 00:01:52.706
I would use the letter d for distance, but that

00:01:52.706 --> 00:01:55.359
guy already has another full time job in calculus.

00:01:56.500 --> 00:01:59.760
Initially, the curve is quite shallow, since the car is slow to start.

00:02:00.280 --> 00:02:04.340
During that first second, the distance it travels doesn't change that much.

00:02:04.980 --> 00:02:07.582
For the next few seconds, as the car speeds up,

00:02:07.581 --> 00:02:10.454
the distance traveled in a given second gets larger,

00:02:10.455 --> 00:02:13.219
which corresponds to a steeper slope in this graph.

00:02:13.800 --> 00:02:17.520
Then towards the end, when it slows down, that curve shallows out again.

00:02:20.759 --> 00:02:25.516
If we were to plot the car's velocity in meters per second as a function of time,

00:02:25.516 --> 00:02:27.199
it might look like this bump.

00:02:27.860 --> 00:02:30.000
At early times, the velocity is very small.

00:02:30.460 --> 00:02:34.224
Up to the middle of the journey, the car builds up to some maximum velocity,

00:02:34.223 --> 00:02:36.619
covering a relatively large distance each second.

00:02:37.659 --> 00:02:39.919
Then it slows back down towards a speed of zero.

00:02:41.379 --> 00:02:44.180
These two curves are definitely related to each other.

00:02:44.840 --> 00:02:47.159
If you change the specific distance vs.

00:02:47.259 --> 00:02:50.299
time function, you'll have some different velocity vs.

00:02:50.419 --> 00:02:51.079
time function.

00:02:51.759 --> 00:02:55.039
What we want to understand is the specifics of that relationship.

00:02:55.680 --> 00:02:59.099
Exactly how does velocity depend on a distance vs.

00:02:59.400 --> 00:02:59.819
time function?

00:03:01.939 --> 00:03:04.681
To do that, it's worth taking a moment to think

00:03:04.681 --> 00:03:07.539
critically about what exactly velocity means here.

00:03:08.379 --> 00:03:11.879
Intuitively, we all might know what velocity at a given moment means,

00:03:11.879 --> 00:03:14.979
it's just whatever the car's speedometer shows in that moment.

00:03:17.180 --> 00:03:21.457
Intuitively, it might make sense that the car's velocity should be higher at times when

00:03:21.457 --> 00:03:25.639
this distance function is steeper, when the car traverses more distance per unit time.

00:03:26.699 --> 00:03:30.719
But the funny thing is, velocity at a single moment makes no sense.

00:03:31.360 --> 00:03:34.895
If I show you a picture of a car, just a snapshot in an instant,

00:03:34.895 --> 00:03:38.540
and I ask you how fast it's going, you'd have no way of telling me.

00:03:39.620 --> 00:03:42.379
What you'd need are two separate points in time to compare.

00:03:43.180 --> 00:03:47.310
That way you can compute whatever the change in distance across those times is,

00:03:47.310 --> 00:03:48.860
divided by the change in time.

00:03:49.560 --> 00:03:49.740
Right?

00:03:49.819 --> 00:03:54.159
I mean, that's what velocity is, it's the distance traveled per unit time.

00:03:55.620 --> 00:03:59.069
So how is it that we're looking at a function for velocity that

00:03:59.069 --> 00:04:02.359
only takes in a single value of t, a single snapshot in time?

00:04:02.900 --> 00:04:04.280
It's weird, isn't it?

00:04:04.280 --> 00:04:07.868
We want to associate individual points in time with a velocity,

00:04:07.868 --> 00:04:12.300
but actually computing velocity requires comparing two separate points in time.

00:04:14.639 --> 00:04:17.399
If that feels strange and paradoxical, good!

00:04:17.920 --> 00:04:20.959
You're grappling with the same conflicts that the fathers of calculus did.

00:04:21.379 --> 00:04:25.339
And if you want a deep understanding for rates of change, not just for a moving car,

00:04:25.339 --> 00:04:29.346
but for all sorts of things in science, you're going to need to resolve this apparent

00:04:29.346 --> 00:04:29.719
paradox.

00:04:32.199 --> 00:04:34.705
First, I think it's best to talk about the real world,

00:04:34.706 --> 00:04:36.939
and then we'll go into a purely mathematical one.

00:04:37.540 --> 00:04:40.460
Let's think about what the car's speedometer is probably doing.

00:04:41.199 --> 00:04:43.932
At some point, say 3 seconds into the journey,

00:04:43.932 --> 00:04:48.757
the speedometer might measure how far the car goes in a very small amount of time,

00:04:48.757 --> 00:04:52.420
maybe the distance traveled between 3 seconds and 3.01 seconds.

00:04:53.360 --> 00:04:57.514
Then it could compute the speed in meters per second as that tiny

00:04:57.514 --> 00:05:01.860
distance traversed in meters divided by that tiny time, 0.01 seconds.

00:05:02.899 --> 00:05:05.555
That is, a physical car just side-steps the paradox and

00:05:05.555 --> 00:05:08.259
doesn't actually compute speed at a single point in time.

00:05:08.779 --> 00:05:11.679
It computes speed during a very small amount of time.

00:05:13.180 --> 00:05:18.687
So let's call that difference in time dt, which you might think of as 0.01 seconds,

00:05:18.687 --> 00:05:22.360
and let's call that resulting difference in distance ds.

00:05:22.959 --> 00:05:26.743
So the velocity at some point in time is ds divided by dt,

00:05:26.744 --> 00:05:30.400
the tiny change in distance over the tiny change in time.

00:05:31.579 --> 00:05:35.339
Graphically, you can imagine zooming in on some point of this distance vs.

00:05:35.500 --> 00:05:37.680
time graph above t equals 3.

00:05:38.560 --> 00:05:43.143
That dt is a small step to the right, since time is on the horizontal axis,

00:05:43.142 --> 00:05:47.001
and that ds is the resulting change in the height of the graph,

00:05:47.002 --> 00:05:50.439
since the vertical axis represents the distance traveled.

00:05:51.220 --> 00:05:55.472
So ds divided by dt is something you can think of as the rise

00:05:55.471 --> 00:05:59.519
over run slope between two very close points on this graph.

00:06:00.699 --> 00:06:03.439
Of course, there's nothing special about the value t equals 3.

00:06:03.939 --> 00:06:06.797
We could apply this to any other point in time,

00:06:06.797 --> 00:06:10.665
so we consider this expression ds over dt to be a function of t,

00:06:10.665 --> 00:06:15.605
something where I can give you a time t and you can give me back the value of this

00:06:15.605 --> 00:06:18.879
ratio at that time, the velocity as a function of time.

00:06:19.600 --> 00:06:22.838
For example, when I had the computer draw this bump curve here,

00:06:22.838 --> 00:06:27.240
the one representing the velocity function, here's what I had the computer actually do.

00:06:27.939 --> 00:06:32.620
First, I chose a small value for dt, I think in this case it was 0.01.

00:06:33.439 --> 00:06:38.207
Then I had the computer look at a whole bunch of times t between 0 and 10,

00:06:38.208 --> 00:06:41.386
and compute the distance function s at t plus dt,

00:06:41.386 --> 00:06:44.820
and then subtract off the value of that function at t.

00:06:45.420 --> 00:06:51.064
In other words, that's the difference in the distance traveled between the given time,

00:06:51.064 --> 00:06:53.660
t, and the time 0.01 seconds after that.

00:06:54.519 --> 00:06:58.305
Then you can just divide that difference by the change in time, dt,

00:06:58.305 --> 00:07:02.480
and that gives you velocity in meters per second around each point in time.

00:07:04.420 --> 00:07:08.721
So with a formula like this, you could give the computer any curve representing any

00:07:08.721 --> 00:07:12.920
distance function s of t, and it could figure out the curve representing velocity.

00:07:13.540 --> 00:07:17.437
Now would be a good time to pause, reflect, and make sure this idea

00:07:17.437 --> 00:07:21.622
of relating distance to velocity by looking at tiny changes makes sense,

00:07:21.622 --> 00:07:25.520
because we're going to tackle the paradox of the derivative head on.

00:07:27.480 --> 00:07:32.771
This idea of ds over dt, a tiny change in the value of the function s divided by

00:07:32.771 --> 00:07:38.000
the tiny change in the input that caused it, that's almost what a derivative is.

00:07:38.699 --> 00:07:43.908
And even though a car's speedometer will actually look at a concrete change in time,

00:07:43.908 --> 00:07:49.055
like 0.01 seconds, and even though the drawing program here is looking at an actual

00:07:49.055 --> 00:07:54.447
concrete change in time, in pure math the derivative is not this ratio ds over dt for a

00:07:54.447 --> 00:07:59.962
specific choice of dt. Instead, it's whatever that ratio approaches as your choice for dt

00:07:59.963 --> 00:08:00.760
approaches 0.

00:08:02.540 --> 00:08:07.333
Luckily there is a really nice visual understanding for what it means to ask what

00:08:07.333 --> 00:08:11.075
this ratio approaches, Remember, for any specific choice of dt,

00:08:11.074 --> 00:08:15.810
this ratio ds over dt is the slope of a line passing through two separate points

00:08:15.810 --> 00:08:16.980
on the graph, right?

00:08:17.740 --> 00:08:22.357
Well as dt approaches 0, and as those two points approach each other,

00:08:22.357 --> 00:08:26.314
the slope of the line approaches the slope of a line that's

00:08:26.314 --> 00:08:30.140
tangent to the graph at whatever point t we're looking at.

00:08:30.579 --> 00:08:33.928
So the true honest-to-goodness pure math derivative is not the

00:08:33.928 --> 00:08:37.119
rise over run slope between two nearby points on the graph,

00:08:37.119 --> 00:08:41.000
it's equal to the slope of a line tangent to the graph at a single point.

00:08:42.360 --> 00:08:45.864
Now notice what I'm not saying, I'm not saying that the derivative is

00:08:45.864 --> 00:08:49.420
whatever happens when dt is infinitely small, whatever that would mean.

00:08:50.000 --> 00:08:52.340
Nor am I saying that you plug in 0 for dt.

00:08:53.039 --> 00:08:58.899
This dt is always a finitely small non-zero value, it's just that it approaches 0 is all.

00:09:03.620 --> 00:09:04.960
I think that's really clever.

00:09:05.379 --> 00:09:08.277
Even though change in an instant makes no sense,

00:09:08.277 --> 00:09:12.003
this idea of letting dt approach 0 is a really sneaky backdoor

00:09:12.003 --> 00:09:16.379
way to talk reasonably about the rate of change at a single point in time.

00:09:17.019 --> 00:09:17.519
Isn't that neat?

00:09:18.059 --> 00:09:20.495
It's kind of flirting with the paradox of change in

00:09:20.495 --> 00:09:22.980
an instant without ever needing to actually touch it.

00:09:23.299 --> 00:09:25.743
And it comes with such a nice visual intuition too,

00:09:25.744 --> 00:09:28.660
as the slope of a tangent line to a single point on the graph.

00:09:30.159 --> 00:09:33.098
And because change in an instant still makes no sense,

00:09:33.099 --> 00:09:37.427
I think it's healthiest for you to think of this slope not as some instantaneous

00:09:37.427 --> 00:09:41.543
rate of change, but instead as the best constant approximation for a rate of

00:09:41.543 --> 00:09:42.720
change around a point.

00:09:44.340 --> 00:09:46.940
By the way, it's worth saying a couple words on notation here.

00:09:47.340 --> 00:09:51.743
Throughout this video I've been using dt to refer to a tiny change in t with

00:09:51.743 --> 00:09:55.403
some actual size, and ds to refer to the resulting change in s,

00:09:55.403 --> 00:09:59.807
which again has an actual size, and this is because that's how I want you to

00:09:59.807 --> 00:10:00.779
think about them.

00:10:01.659 --> 00:10:05.795
But the convention in calculus is that whenever you're using the letter d like this,

00:10:05.796 --> 00:10:08.910
you're kind of announcing your intention that eventually you're

00:10:08.909 --> 00:10:11.100
going to see what happens as dt approaches 0.

00:10:11.919 --> 00:10:16.828
For example, the honest-to-goodness pure math derivative is written as ds divided by dt,

00:10:16.828 --> 00:10:19.697
even though it's technically not a fraction per se,

00:10:19.697 --> 00:10:23.779
but whatever that fraction approaches for smaller and smaller nudges in t.

00:10:25.779 --> 00:10:27.679
I think a specific example should help here.

00:10:28.259 --> 00:10:31.374
You might think that asking about what this ratio approaches

00:10:31.374 --> 00:10:35.303
for smaller and smaller values would make it much more difficult to compute,

00:10:35.303 --> 00:10:37.500
but weirdly it kind of makes things easier.

00:10:38.200 --> 00:10:43.160
Let's say you have a given distance vs time function that happens to be exactly t cubed.

00:10:43.159 --> 00:10:47.771
So after 1 second the car has traveled 1 cubed equals 1 meters,

00:10:47.772 --> 00:10:52.240
after 2 seconds it's traveled 2 cubed, or 8 meters, and so on.

00:10:53.019 --> 00:10:55.635
Now what I'm about to do might seem somewhat complicated,

00:10:55.635 --> 00:10:57.800
but once the dust settles it really is simpler,

00:10:57.801 --> 00:11:01.680
and more importantly it's the kind of thing you only ever have to do once in calculus.

00:11:03.100 --> 00:11:06.951
Let's say you wanted to compute the velocity, ds divided by dt,

00:11:06.951 --> 00:11:09.299
at some specific time, like t equals 2.

00:11:09.940 --> 00:11:13.257
For right now let's think of dt as having an actual size,

00:11:13.256 --> 00:11:16.459
some concrete nudge, we'll let it go to 0 in just a bit.

00:11:17.139 --> 00:11:22.323
The tiny change in distance between 2 seconds and 2 plus dt

00:11:22.323 --> 00:11:27.939
seconds is s of 2 plus dt minus s of 2, and we divide that by dt.

00:11:28.620 --> 00:11:34.659
Since our function is t cubed, that numerator looks like 2 plus dt cubed minus 2 cubed.

00:11:35.259 --> 00:11:38.100
And this is something we can work out algebraically.

00:11:38.100 --> 00:11:42.320
Again, bear with me, there's a reason I'm showing you the details here.

00:11:42.799 --> 00:11:49.843
When you expand that top, what you get is 2 cubed plus 3 times 2 squared dt

00:11:49.844 --> 00:11:57.260
plus 3 times 2 times dt squared plus dt cubed, and all of that is minus 2 cubed.

00:11:58.379 --> 00:12:01.961
Now there's a lot of terms, and I want you to remember that it looks like a mess,

00:12:01.961 --> 00:12:02.879
but it does simplify.

00:12:03.779 --> 00:12:05.899
Those 2 cubed terms cancel out.

00:12:06.519 --> 00:12:11.606
Everything remaining here has a dt in it, and since there's a dt on the bottom there,

00:12:11.606 --> 00:12:13.559
many of those cancel out as well.

00:12:14.279 --> 00:12:19.682
What this means is that the ratio ds divided by dt has boiled down into

00:12:19.682 --> 00:12:24.860
3 times 2 squared plus 2 different terms that each have a dt in them.

00:12:25.580 --> 00:12:30.129
So if we ask what happens as dt approaches 0, representing the idea of looking at a

00:12:30.129 --> 00:12:34.679
smaller and smaller change in time, we can just completely ignore those other terms.

00:12:36.100 --> 00:12:39.250
By eliminating the need to think about a specific dt,

00:12:39.250 --> 00:12:43.100
we've eliminated a lot of the complication in the full expression.

00:12:43.879 --> 00:12:47.360
So what we're left with is this nice clean 3 times 2 squared.

00:12:48.360 --> 00:12:52.490
You can think of that as meaning that the slope of a line tangent to

00:12:52.490 --> 00:12:56.919
the point at t equals 2 of this graph is exactly 3 times 2 squared, or 12.

00:12:57.820 --> 00:13:01.060
And of course, there's nothing special about the time t equals 2.

00:13:01.559 --> 00:13:04.720
We could more generally say that the derivative

00:13:04.721 --> 00:13:08.080
of t cubed as a function of t is 3 times t squared.

00:13:10.740 --> 00:13:13.220
Now take a step back, because that's beautiful.

00:13:13.820 --> 00:13:16.280
The derivative is this crazy complicated idea.

00:13:16.600 --> 00:13:19.623
We've got tiny changes in distance over tiny changes in time,

00:13:19.623 --> 00:13:22.208
but instead of looking at any specific one of those,

00:13:22.207 --> 00:13:24.500
we're talking about what that thing approaches.

00:13:24.500 --> 00:13:26.980
I mean, that's a lot to think about.

00:13:27.639 --> 00:13:31.559
And yet what we've come out with is such a simple expression, 3 times t squared.

00:13:32.960 --> 00:13:36.060
And in practice, you wouldn't go through all this algebra each time.

00:13:36.419 --> 00:13:40.413
Knowing that the derivative of t cubed is 3t squared is one of those things that all

00:13:40.413 --> 00:13:44.500
calculus students learn how to do immediately without having to re-derive it each time.

00:13:45.059 --> 00:13:48.339
And in the next video, I'm going to show you a nice way to think about

00:13:48.340 --> 00:13:51.759
this and a couple other derivative formulas in really nice geometric ways.

00:13:52.500 --> 00:13:56.384
But the point I want to make by showing you all of the algebraic guts

00:13:56.384 --> 00:14:00.326
here is that when you consider the tiny change in distance caused by a

00:14:00.326 --> 00:14:04.600
tiny change in time for some specific value of dt, you'd have kind of a mess.

00:14:05.259 --> 00:14:08.902
But when you consider what that ratio approaches as dt approaches 0,

00:14:08.902 --> 00:14:13.020
it lets you ignore much of that mess, and it really does simplify the problem.

00:14:13.779 --> 00:14:16.720
That right there is kind of the heart of why calculus becomes useful.

00:14:18.019 --> 00:14:21.528
Another reason to show you a concrete derivative like this is that it

00:14:21.528 --> 00:14:25.088
sets the stage for an example of the kind of paradoxes that come about

00:14:25.089 --> 00:14:28.700
if you believe too much in the illusion of instantaneous rate of change.

00:14:30.000 --> 00:14:34.812
So think about the actual car traveling according to this t cubed distance function,

00:14:34.812 --> 00:14:38.720
and consider its motion at the moment t equals 0, right at the start.

00:14:39.700 --> 00:14:43.379
Now ask yourself whether or not the car is moving at that time.

00:14:45.559 --> 00:14:50.298
On the one hand, we can compute its speed at that point using the derivative,

00:14:50.298 --> 00:14:53.700
3t squared, which for time t equals 0 works out to be 0.

00:14:54.779 --> 00:14:59.990
Visually, this means that the tangent line to the graph at that point is perfectly flat,

00:14:59.990 --> 00:15:03.269
so the car's quote-unquote instantaneous velocity is 0,

00:15:03.269 --> 00:15:06.139
and that suggests that obviously it's not moving.

00:15:07.159 --> 00:15:11.860
But on the other hand, if it doesn't start moving at time 0, when does it start moving?

00:15:12.580 --> 00:15:14.540
Really, pause and ponder that for a moment.

00:15:15.100 --> 00:15:17.779
Is the car moving at time t equals 0?

00:15:22.600 --> 00:15:23.379
Do you see the paradox?

00:15:24.259 --> 00:15:26.000
The issue is that the question makes no sense.

00:15:26.539 --> 00:15:30.439
It references the idea of change in a moment, but that doesn't actually exist.

00:15:30.860 --> 00:15:32.600
That's just not what the derivative measures.

00:15:33.480 --> 00:15:38.369
What it means for the derivative of a distance function to be 0 is that the best

00:15:38.369 --> 00:15:43.320
constant approximation for the car's velocity around that point is 0 m per second.

00:15:44.080 --> 00:15:47.580
For example, if you look at an actual change in time,

00:15:47.580 --> 00:15:51.080
say between time 0 and 0.1 seconds, the car does move.

00:15:51.500 --> 00:15:53.700
It moves 0.001 m.

00:15:54.600 --> 00:15:59.853
That's very small, and importantly, it's very small compared to the change in time,

00:15:59.852 --> 00:16:02.979
giving an average speed of only 0.01 m per second.

00:16:03.679 --> 00:16:08.704
And remember, what it means for the derivative of this motion to be 0 is that

00:16:08.705 --> 00:16:13.860
for smaller and smaller nudges in time, this ratio of m per second approaches 0.

00:16:14.840 --> 00:16:16.720
But that's not to say that the car is static.

00:16:17.539 --> 00:16:20.966
Approximating its movement with a constant velocity of 0 is,

00:16:20.966 --> 00:16:22.820
after all, just an approximation.

00:16:24.340 --> 00:16:29.211
So whenever you hear people refer to the derivative as an instantaneous rate of change,

00:16:29.211 --> 00:16:33.472
a phrase which is intrinsically oxymoronic, I want you to think of that as a

00:16:33.472 --> 00:16:37.679
conceptual shorthand for the best constant approximation for rate of change.

00:16:39.179 --> 00:16:42.126
In the next couple videos, I'll be talking more about the derivative,

00:16:42.126 --> 00:16:45.241
what it looks like in different contexts, how do you actually compute it,

00:16:45.241 --> 00:16:48.399
why is it useful, things like that, focusing on visual intuition as always.
