WEBVTT

00:00:12.140 --> 00:00:16.414
Now that we've seen what a derivative means and what it has to do with rates of change,

00:00:16.414 --> 00:00:19.379
our next step is to learn how to actually compute these guys.

00:00:19.839 --> 00:00:22.939
As in, if I give you some kind of function with an explicit formula,

00:00:22.939 --> 00:00:26.039
you'd want to be able to find what the formula for its derivative is.

00:00:26.699 --> 00:00:30.341
Maybe it's obvious, but I think it's worth stating explicitly why this

00:00:30.341 --> 00:00:34.085
is an important thing to be able to do, why much of a calculus student's

00:00:34.085 --> 00:00:37.469
time ends up going towards grappling with derivatives of abstract

00:00:37.469 --> 00:00:41.060
functions rather than thinking about concrete rate of change problems.

00:00:42.219 --> 00:00:45.908
It's because a lot of real-world phenomena, the sort of things that

00:00:45.908 --> 00:00:49.543
we want to use calculus to analyze, are modeled using polynomials,

00:00:49.543 --> 00:00:53.559
trigonometric functions, exponentials, and other pure functions like that.

00:00:53.979 --> 00:00:58.284
So if you build up some fluency with the ideas of rates of change for those kinds of

00:00:58.284 --> 00:01:02.641
pure abstract functions, it gives you a language to more readily talk about the rates

00:01:02.642 --> 00:01:07.100
at which things change in concrete situations that you might be using calculus to model.

00:01:07.920 --> 00:01:12.141
But it is way too easy for this process to feel like just memorizing a list of rules,

00:01:12.141 --> 00:01:16.166
and if that happens, if you get that feeling, it's also easy to lose sight of the

00:01:16.165 --> 00:01:20.239
fact that derivatives are fundamentally about just looking at tiny changes to some

00:01:20.239 --> 00:01:24.019
quantity and how that relates to a resulting tiny change in another quantity.

00:01:24.780 --> 00:01:28.731
So in this video and in the next one, my aim is to show you how you can think

00:01:28.731 --> 00:01:31.671
about a few of these rules intuitively and geometrically,

00:01:31.671 --> 00:01:35.674
and I really want to encourage you to never forget that tiny nudges are at the

00:01:35.674 --> 00:01:36.739
heart of derivatives.

00:01:37.920 --> 00:01:41.280
Let's start with a simple function like f of x equals x squared.

00:01:41.620 --> 00:01:42.740
What if I asked you its derivative?

00:01:43.519 --> 00:01:47.037
That is, if you were to look at some value x, like x equals 2,

00:01:47.037 --> 00:01:50.332
and compare it to a value slightly bigger, just dx bigger,

00:01:50.332 --> 00:01:53.739
what's the corresponding change in the value of the function?

00:01:54.260 --> 00:01:54.700
dF.

00:01:55.620 --> 00:01:58.692
And in particular, what's dF divided by dx, the rate

00:01:58.692 --> 00:02:01.939
at which this function is changing per unit change in x.

00:02:03.159 --> 00:02:07.248
As a first step for intuition, we know that you can think of this ratio

00:02:07.248 --> 00:02:10.882
dF dx as the slope of a tangent line to the graph of x squared,

00:02:10.883 --> 00:02:15.200
and from that you can see that the slope generally increases as x increases.

00:02:15.840 --> 00:02:18.400
At zero, the tangent line is flat, and the slope is zero.

00:02:19.000 --> 00:02:21.259
At x equals 1, it's something a bit steeper.

00:02:22.599 --> 00:02:24.400
At x equals 2, it's steeper still.

00:02:25.120 --> 00:02:27.579
But looking at graphs isn't generally the best way

00:02:27.579 --> 00:02:30.040
to understand the precise formula for a derivative.

00:02:30.719 --> 00:02:34.933
For that, it's best to take a more literal look at what x squared actually means,

00:02:34.933 --> 00:02:38.840
and in this case let's go ahead and picture a square whose side length is x.

00:02:39.919 --> 00:02:43.119
If you increase x by some tiny nudge, some little dx,

00:02:43.120 --> 00:02:46.379
what's the resulting change in the area of that square?

00:02:47.719 --> 00:02:51.479
That slight change in area is what dF means in this context.

00:02:52.020 --> 00:02:55.777
It's the tiny increase to the value of f of x equals x squared,

00:02:55.776 --> 00:02:58.419
caused by increasing x by that tiny nudge dx.

00:02:59.360 --> 00:03:03.031
Now you can see that there's three new bits of area in this diagram,

00:03:03.031 --> 00:03:05.319
two thin rectangles and a minuscule square.

00:03:06.240 --> 00:03:10.106
The two thin rectangles each have side lengths of x and dx,

00:03:10.105 --> 00:03:13.780
so they account for 2 times x times dx units of new area.

00:03:18.240 --> 00:03:21.031
For example, let's say x was 3 and dx was 0.01,

00:03:21.031 --> 00:03:25.741
then that new area from these two thin rectangles would be 2 times 3 times 0.01,

00:03:25.741 --> 00:03:28.300
which is 0.06, about 6 times the size of dx.

00:03:29.699 --> 00:03:32.871
That little square there has an area of dx squared,

00:03:32.872 --> 00:03:36.960
but you should think of that as being really tiny, negligibly tiny.

00:03:37.699 --> 00:03:41.339
For example, if dx was 0.01, that would be only 0.0001,

00:03:41.340 --> 00:03:46.086
and keep in mind I'm drawing dx with a fair bit of width here just so we

00:03:46.086 --> 00:03:49.662
can actually see it, but always remember in principle,

00:03:49.662 --> 00:03:54.992
dx should be thought of as a truly tiny amount, and for those truly tiny amounts,

00:03:54.992 --> 00:03:59.674
a good rule of thumb is that you can ignore anything that includes a dx

00:03:59.674 --> 00:04:01.819
raised to a power greater than 1.

00:04:02.400 --> 00:04:05.879
That is, a tiny change squared is a negligible change.

00:04:07.500 --> 00:04:13.055
What this leaves us with is that dF is just some multiple of dx, and that multiple 2x,

00:04:13.055 --> 00:04:18.100
which you could also write as dF divided by dx, is the derivative of x squared.

00:04:19.040 --> 00:04:24.297
For example, if you were starting at x equals 3, then as you slightly increase x,

00:04:24.297 --> 00:04:29.683
the rate of change in the area per unit change in length added, dx squared over dx,

00:04:29.682 --> 00:04:34.427
would be 2 times 3, or 6, and if instead you were starting at x equals 5,

00:04:34.427 --> 00:04:38.980
then the rate of change would be 10 units of area per unit change in x.

00:04:41.220 --> 00:04:45.420
Let's go ahead and try a different simple function, f of x equals x cubed.

00:04:45.939 --> 00:04:48.038
This is going to be the geometric view of the stuff

00:04:48.038 --> 00:04:50.139
that I went through algebraically in the last video.

00:04:51.019 --> 00:04:55.640
What's nice here is that we can think of x cubed as the volume of an actual

00:04:55.641 --> 00:05:00.020
cube whose side lengths are x, and when you increase x by a tiny nudge,

00:05:00.019 --> 00:05:04.519
a tiny dx, the resulting increase in volume is what I have here in yellow.

00:05:04.860 --> 00:05:08.747
That represents all the volume in a cube with side lengths x plus dx

00:05:08.747 --> 00:05:12.579
that's not already in the original cube, the one with side length x.

00:05:13.579 --> 00:05:18.601
It's nice to think of this new volume as broken up into multiple components,

00:05:18.601 --> 00:05:22.385
but almost all of it comes from these three square faces,

00:05:22.386 --> 00:05:25.843
or said a little more precisely, as dx approaches 0,

00:05:25.843 --> 00:05:30.278
those three squares comprise a portion closer and closer to 100% of

00:05:30.278 --> 00:05:31.779
that new yellow volume.

00:05:33.839 --> 00:05:38.057
Each of those thin squares has a volume of x squared times dx,

00:05:38.057 --> 00:05:41.540
the area of the face times that little thickness dx.

00:05:42.220 --> 00:05:46.260
So in total this gives us 3x squared dx of volume change.

00:05:47.300 --> 00:05:51.062
And to be sure there are other slivers of volume here along the edges

00:05:51.062 --> 00:05:54.877
and that tiny one in the corner, but all of that volume is going to be

00:05:54.877 --> 00:05:58.639
proportional to dx squared, or dx cubed, so we can safely ignore them.

00:05:59.459 --> 00:06:03.466
Again this is ultimately because they're going to be divided by dx,

00:06:03.466 --> 00:06:07.117
and if there's still any dx remaining then those terms aren't

00:06:07.117 --> 00:06:10.300
going to survive the process of letting dx approach 0.

00:06:11.279 --> 00:06:14.434
What this means is that the derivative of x cubed,

00:06:14.435 --> 00:06:19.199
the rate at which x cubed changes per unit change of x, is 3 times x squared.

00:06:20.639 --> 00:06:25.185
What that means in terms of graphical intuition is that the slope of

00:06:25.185 --> 00:06:29.600
the graph of x cubed at every single point x is exactly 3x squared.

00:06:34.079 --> 00:06:38.786
And reasoning about that slope, it should make sense that this derivative is high on the

00:06:38.786 --> 00:06:42.805
left and then 0 at the origin and then high again as you move to the right,

00:06:42.805 --> 00:06:47.141
but just thinking in terms of the graph would never have landed us on the precise

00:06:47.141 --> 00:06:48.199
quantity 3x squared.

00:06:48.879 --> 00:06:53.060
For that we had to take a much more direct look at what x cubed actually means.

00:06:54.259 --> 00:06:57.560
Now in practice you wouldn't necessarily think of the square every

00:06:57.560 --> 00:06:59.926
time you're taking the derivative of x squared,

00:06:59.927 --> 00:07:03.278
nor would you necessarily think of this cube whenever you're taking

00:07:03.278 --> 00:07:04.560
the derivative of x cubed.

00:07:04.879 --> 00:07:08.399
Both of them fall under a pretty recognizable pattern for polynomial terms.

00:07:09.199 --> 00:07:13.341
The derivative of x to the fourth turns out to be 4x cubed,

00:07:13.341 --> 00:07:17.759
the derivative of x to the fifth is 5x to the fourth, and so on.

00:07:18.879 --> 00:07:22.791
Abstractly you'd write this as the derivative of x to

00:07:22.791 --> 00:07:26.559
the n for any power n is n times x to the n minus 1.

00:07:27.300 --> 00:07:30.560
This right here is what's known in the business as the power rule.

00:07:31.740 --> 00:07:35.913
In practice we all quickly just get jaded and think about this symbolically as

00:07:35.913 --> 00:07:39.769
the exponent hopping down in front, leaving behind one less than itself,

00:07:39.769 --> 00:07:44.259
rarely pausing to think about the geometric delights that underlie these derivatives.

00:07:45.240 --> 00:07:47.295
That's the kind of thing that happens when these tend

00:07:47.295 --> 00:07:49.199
to fall in the middle of much longer computations.

00:07:50.639 --> 00:07:53.327
But rather than tracking it all off to symbolic patterns,

00:07:53.327 --> 00:07:57.359
let's just take a moment and think about why this works for powers beyond just 2 and 3.

00:07:58.439 --> 00:08:02.773
When you nudge that input x, increasing it slightly to x plus dx,

00:08:02.773 --> 00:08:06.974
working out the exact value of that nudged output would involve

00:08:06.973 --> 00:08:10.519
multiplying together these n separate x plus dx terms.

00:08:11.339 --> 00:08:13.889
The full expansion would be really complicated,

00:08:13.889 --> 00:08:18.459
but part of the point of derivatives is that most of that complication can be ignored.

00:08:19.279 --> 00:08:22.019
The first term in your expansion is x to the n.

00:08:22.680 --> 00:08:25.584
This is analogous to the area of the original square,

00:08:25.584 --> 00:08:28.920
or the volume of the original cube from our previous examples.

00:08:30.819 --> 00:08:36.038
For the next terms in the expansion you can choose mostly x's with a single dx.

00:08:41.720 --> 00:08:46.803
Since there are n different parentheticals from which you could have chosen

00:08:46.803 --> 00:08:50.015
that single dx, this gives us n separate terms,

00:08:50.015 --> 00:08:53.159
all of which include n minus 1 x's times a dx,

00:08:53.159 --> 00:08:56.639
giving a value of x to the power n minus 1 times dx.

00:08:57.580 --> 00:09:02.820
This is analogous to how the majority of the new area in the square came from those

00:09:02.820 --> 00:09:07.997
two bars, each with area x times dx, or how the bulk of the new volume in the cube

00:09:07.996 --> 00:09:13.299
came from those three thin squares, each of which had a volume of x squared times dx.

00:09:14.539 --> 00:09:17.432
There will be many other terms of this expansion,

00:09:17.432 --> 00:09:21.250
but all of them are just going to be some multiple of dx squared,

00:09:21.250 --> 00:09:25.184
so we can safely ignore them, and what that means is that all but a

00:09:25.184 --> 00:09:29.349
negligible portion of the increase in the output comes from n copies of

00:09:29.350 --> 00:09:31.259
this x to the n minus 1 times dx.

00:09:31.940 --> 00:09:37.520
That's what it means for the derivative of x to the n to be n times x to the n minus 1.

00:09:38.960 --> 00:09:43.220
And even though, like I said in practice, you'll find yourself performing this

00:09:43.220 --> 00:09:47.911
derivative quickly and symbolically, imagining the exponent hopping down to the front,

00:09:47.910 --> 00:09:52.279
every now and then it's nice to just step back and remember why these rules work.

00:09:52.820 --> 00:09:56.879
Not just because it's pretty, and not just because it helps remind us that math

00:09:56.879 --> 00:10:00.331
actually makes sense and isn't just a pile of formulas to memorize,

00:10:00.331 --> 00:10:04.494
but because it flexes that very important muscle of thinking about derivatives in

00:10:04.494 --> 00:10:05.560
terms of tiny nudges.

00:10:07.500 --> 00:10:11.639
As another example, think of the function f of x equals 1 divided by x.

00:10:12.700 --> 00:10:16.738
Now on the hand you could just blindly try applying the power rule,

00:10:16.738 --> 00:10:20.540
since 1 divided by x is the same as writing x to the negative 1.

00:10:21.100 --> 00:10:24.432
That would involve letting the negative 1 hop down in front,

00:10:24.432 --> 00:10:27.439
leaving behind 1 less than itself, which is negative 2.

00:10:28.240 --> 00:10:31.443
But let's have some fun and see if we can reason about this geometrically,

00:10:31.443 --> 00:10:33.579
rather than just plugging it through some formula.

00:10:34.860 --> 00:10:40.180
The value 1 over x is asking what number multiplied by x equals 1.

00:10:40.960 --> 00:10:42.820
So here's how I'd like to visualize it.

00:10:42.820 --> 00:10:48.120
Imagine a little rectangular puddle of water sitting in two dimensions whose area is 1.

00:10:48.960 --> 00:10:53.766
And let's say that its width is x, which means that the height has to be 1 over x,

00:10:53.765 --> 00:10:55.620
since the total area of it is 1.

00:10:56.360 --> 00:11:01.039
So if x was stretched out to 2, then that height is forced down to 1 half.

00:11:01.779 --> 00:11:05.919
And if you increased x up to 3, then the other side has to be squished down to 1 third.

00:11:07.039 --> 00:11:10.679
This is a nice way to think about the graph of 1 over x, by the way.

00:11:11.279 --> 00:11:15.360
If you think of this width x of the puddle as being in the xy-plane,

00:11:15.360 --> 00:11:20.623
then that corresponding output 1 divided by x, the height of the graph above that point,

00:11:20.623 --> 00:11:24.940
is whatever the height of your puddle has to be to maintain an area of 1.

00:11:26.360 --> 00:11:29.358
So with this visual in mind, for the derivative,

00:11:29.357 --> 00:11:33.579
imagine nudging up that value of x by some tiny amount, some tiny dx.

00:11:34.580 --> 00:11:37.401
How must the height of this rectangle change so

00:11:37.400 --> 00:11:40.339
that the area of the puddle remains constant at 1?

00:11:41.340 --> 00:11:46.019
That is, increasing the width by dx adds some new area to the right here.

00:11:46.259 --> 00:11:50.355
So the puddle has to decrease in height by some d 1 over x,

00:11:50.355 --> 00:11:54.860
so that the area lost off of that top cancels out the area gained.

00:11:56.100 --> 00:11:59.259
You should think of that d 1 over x as being a negative amount,

00:11:59.259 --> 00:12:02.320
by the way, since it's decreasing the height of the rectangle.

00:12:03.539 --> 00:12:04.399
And you know what?

00:12:04.840 --> 00:12:07.027
I'm going to leave the last few steps here for you,

00:12:07.027 --> 00:12:09.720
for you to pause and ponder and work out an ultimate expression.

00:12:10.559 --> 00:12:14.120
And once you reason out what d of 1 over x divided by dx should be,

00:12:14.120 --> 00:12:17.838
I want you to compare it to what you would have gotten if you had just

00:12:17.839 --> 00:12:21.820
blindly applied the power rule, purely symbolically, to x to the negative 1.

00:12:23.980 --> 00:12:26.143
And while I'm encouraging you to pause and ponder,

00:12:26.143 --> 00:12:28.519
here's another fun challenge if you're feeling up to it.

00:12:29.059 --> 00:12:33.419
See if you can reason through what the derivative of the square root of x should be.

00:12:36.399 --> 00:12:40.052
To finish things off, I want to tackle one more type of function,

00:12:40.052 --> 00:12:44.259
trigonometric functions, and in particular let's focus on the sine function.

00:12:45.320 --> 00:12:48.230
So for this section I'm going to assume that you're already

00:12:48.230 --> 00:12:51.673
familiar with how to think about trig functions using the unit circle,

00:12:51.673 --> 00:12:54.100
the circle with a radius 1 centered at the origin.

00:12:55.240 --> 00:12:59.152
For a given value of theta, like say 0.8, you imagine yourself

00:12:59.152 --> 00:13:02.878
walking around the circle starting from the rightmost point

00:13:02.878 --> 00:13:06.480
until you've traversed that distance of 0.8 in arc length.

00:13:06.759 --> 00:13:11.717
This is the same thing as saying that the angle right here is exactly theta radians,

00:13:11.717 --> 00:13:13.759
since the circle has a radius of 1.

00:13:14.759 --> 00:13:20.013
Then what sine of theta means is the height of that point above the x-axis,

00:13:20.013 --> 00:13:24.507
and as your theta value increases and you walk around the circle

00:13:24.506 --> 00:13:28.239
your height bobs up and down between negative 1 and 1.

00:13:29.019 --> 00:13:33.615
So when you graph sine of theta versus theta you get this wave pattern,

00:13:33.615 --> 00:13:35.659
the quintessential wave pattern.

00:13:37.600 --> 00:13:40.311
And just from looking at this graph we can start to

00:13:40.311 --> 00:13:43.180
get a feel for the shape of the derivative of the sine.

00:13:44.019 --> 00:13:48.827
The slope at 0 is something positive since sine of theta is increasing there,

00:13:48.827 --> 00:13:54.191
and as we move to the right and sine of theta approaches its peak that slope goes down

00:13:54.191 --> 00:13:54.500
to 0.

00:13:55.720 --> 00:13:58.340
Then the slope is negative for a little while,

00:13:58.340 --> 00:14:03.080
while the sine is decreasing before coming back up to 0 as the sine graph levels out.

00:14:04.460 --> 00:14:07.432
And as you continue thinking this through and drawing it out,

00:14:07.432 --> 00:14:11.173
if you're familiar with the graph of trig functions you might guess that this

00:14:11.173 --> 00:14:13.668
derivative graph should be exactly cosine of theta,

00:14:13.668 --> 00:14:17.264
since all the peaks and valleys line up perfectly with where the peaks and

00:14:17.264 --> 00:14:19.279
valleys for the cosine function should be.

00:14:20.340 --> 00:14:23.910
And spoiler alert, the derivative is in fact the cosine of theta,

00:14:23.909 --> 00:14:27.860
but aren't you a little curious about why it's precisely cosine of theta?

00:14:28.240 --> 00:14:32.158
I mean you could have all sorts of functions with peaks and valleys at the same points

00:14:32.158 --> 00:14:34.365
that have roughly the same shape, but who knows,

00:14:34.365 --> 00:14:38.103
maybe the derivative of sine could have turned out to be some entirely new type of

00:14:38.102 --> 00:14:40.399
function that just happens to have a similar shape.

00:14:41.600 --> 00:14:44.831
Well just like the previous examples, a more exact understanding

00:14:44.831 --> 00:14:48.662
of the derivative requires looking at what the function actually represents,

00:14:48.662 --> 00:14:51.100
rather than looking at the graph of the function.

00:14:52.399 --> 00:14:54.995
So think back to that walk around the unit circle,

00:14:54.995 --> 00:14:58.966
having traversed an arc with length theta and thinking about sine of theta as

00:14:58.966 --> 00:15:00.240
the height of that point.

00:15:01.700 --> 00:15:06.247
Now zoom into that point on the circle and consider a slight nudge of d theta

00:15:06.246 --> 00:15:10.620
along their circumference, a tiny step in your walk around the unit circle.

00:15:11.480 --> 00:15:14.639
How much does that tiny step change the sine of theta?

00:15:15.440 --> 00:15:20.420
How much does this increase d theta of arc length increase the height above the x-axis?

00:15:21.639 --> 00:15:26.181
Well zoomed in close enough, the circle basically looks like a straight line in this

00:15:26.181 --> 00:15:30.777
neighborhood, so let's go ahead and think of this right triangle where the hypotenuse

00:15:30.777 --> 00:15:34.891
of that right triangle represents the nudge d theta along the circumference,

00:15:34.890 --> 00:15:39.539
and that left side here represents the change in height, the resulting d sine of theta.

00:15:40.139 --> 00:15:44.184
Now this tiny triangle is actually similar to this larger triangle here,

00:15:44.184 --> 00:15:48.840
with the defining angle theta and whose hypotenuse is the radius of the circle with

00:15:48.841 --> 00:15:49.340
length 1.

00:15:50.960 --> 00:15:55.940
Specifically this little angle right here is precisely equal to theta radians.

00:15:57.419 --> 00:16:00.519
Now think about what the derivative of sine is supposed to mean.

00:16:01.220 --> 00:16:05.585
It's the ratio between that d sine of theta, the tiny change to the height,

00:16:05.585 --> 00:16:09.320
divided by d theta, the tiny change to the input of the function.

00:16:10.519 --> 00:16:14.052
And from the picture we can see that that's the ratio between the

00:16:14.052 --> 00:16:17.960
length of the side adjacent to the angle theta divided by the hypotenuse.

00:16:18.799 --> 00:16:21.517
Well let's see, adjacent divided by hypotenuse,

00:16:21.518 --> 00:16:26.220
that's exactly what the cosine of theta means, that's the definition of the cosine.

00:16:27.539 --> 00:16:30.222
So this gives us two different really nice ways of

00:16:30.222 --> 00:16:32.959
thinking about how the derivative of sine is cosine.

00:16:33.139 --> 00:16:36.641
One of them is looking at the graph and getting a loose feel for the shape of

00:16:36.642 --> 00:16:40.280
things based on thinking about the slope of the sine graph at every single point.

00:16:41.100 --> 00:16:45.399
And the other is a more precise line of reasoning looking at the unit circle itself.

00:16:47.080 --> 00:16:49.276
For those of you that like to pause and ponder,

00:16:49.275 --> 00:16:52.846
see if you can try a similar line of reasoning to find what the derivative of

00:16:52.846 --> 00:16:54.220
the cosine of theta should be.

00:16:56.320 --> 00:16:59.496
In the next video I'll talk about how you can take derivatives

00:16:59.495 --> 00:17:02.470
of functions who combine simple functions like these ones,

00:17:02.470 --> 00:17:06.000
either as sums or products or function compositions, things like that.

00:17:06.559 --> 00:17:09.643
And similar to this video the goal is going to be to understand each one

00:17:09.643 --> 00:17:13.358
geometrically in a way that makes it intuitively reasonable and somewhat more memorable.