WEBVTT

00:00:14.500 --> 00:00:18.652
In the last videos I talked about the derivatives of simple functions,

00:00:18.652 --> 00:00:22.455
and the goal was to have a clear picture or intuition to hold in

00:00:22.455 --> 00:00:26.199
your mind that actually explains where these formulas come from.

00:00:26.839 --> 00:00:31.387
But most of the functions you deal with in modeling the world involve mixing,

00:00:31.387 --> 00:00:35.176
combining, or tweaking these simple functions in some other way,

00:00:35.176 --> 00:00:39.782
so our next step is to understand how you take derivatives of more complicated

00:00:39.781 --> 00:00:40.539
combinations.

00:00:41.280 --> 00:00:43.995
Again, I don't want these to be something to memorize,

00:00:43.994 --> 00:00:47.599
I want you to have a clear picture in mind for where each one comes from.

00:00:49.520 --> 00:00:53.600
Now, this really boils down into three basic ways to combine functions.

00:00:54.100 --> 00:00:56.591
You can add them together, you can multiply them,

00:00:56.591 --> 00:00:59.780
and you can throw one inside the other, known as composing them.

00:01:00.600 --> 00:01:03.829
Sure, you could say subtracting them, but really that's just

00:01:03.829 --> 00:01:07.219
multiplying the second by negative one and adding them together.

00:01:08.239 --> 00:01:11.164
Likewise, dividing functions doesn't really add anything,

00:01:11.164 --> 00:01:14.844
because that's the same as plugging one inside the function, one over x,

00:01:14.843 --> 00:01:16.759
and then multiplying the two together.

00:01:17.659 --> 00:01:20.664
So really, most functions you come across just involve layering

00:01:20.664 --> 00:01:23.200
together these three different types of combinations,

00:01:23.200 --> 00:01:26.439
though there's not really a bound on how monstrous things can become.

00:01:27.099 --> 00:01:31.298
But as long as you know how derivatives play with just those three combination types,

00:01:31.299 --> 00:01:34.376
you'll always be able to take it step by step and peel through

00:01:34.376 --> 00:01:36.719
the layers for any kind of monstrous expression.

00:01:38.719 --> 00:01:42.540
So the question is, if you know the derivative of two functions,

00:01:42.540 --> 00:01:45.774
what is the derivative of their sum, of their product,

00:01:45.774 --> 00:01:48.420
and of the function composition between them?

00:01:50.319 --> 00:01:54.259
The sum rule is easiest, if somewhat tongue-twisting to say out loud.

00:01:54.840 --> 00:01:58.600
The derivative of a sum of two functions is the sum of their derivatives.

00:01:59.799 --> 00:02:03.700
But it's worth warming up with this example by really thinking through

00:02:03.700 --> 00:02:07.105
what it means to take a derivative of a sum of two functions,

00:02:07.105 --> 00:02:11.170
since the derivative patterns for products and function composition won't

00:02:11.169 --> 00:02:15.619
be so straightforward, and they're going to require this kind of deeper thinking.

00:02:16.699 --> 00:02:21.199
For example, let's think about this function f of x equals sine of x plus x squared.

00:02:22.199 --> 00:02:25.211
It's a function where, for every input, you add together

00:02:25.211 --> 00:02:27.959
the values of sine of x and x squared at that point.

00:02:29.759 --> 00:02:34.048
For example, let's say at x equals 0.5, the height of the sine

00:02:34.049 --> 00:02:38.201
graph is given by this vertical bar, and the height of the x

00:02:38.201 --> 00:02:42.560
squared parabola is given by this slightly smaller vertical bar.

00:02:44.379 --> 00:02:47.319
And their sum is the length you get by just stacking them together.

00:02:48.520 --> 00:02:53.939
For the derivative, you want to ask what happens as you nudge that input slightly,

00:02:53.938 --> 00:02:56.419
maybe increasing it up to 0.5 plus dx.

00:02:57.560 --> 00:03:02.920
The difference in the value of f between those two places is what we call df.

00:03:04.360 --> 00:03:08.978
And when you picture it like this, I think you'll agree that the total

00:03:08.978 --> 00:03:13.271
change in the height is whatever the change to the sine graph is,

00:03:13.270 --> 00:03:18.799
what we might call d sine of x, plus whatever the change to x squared is, dx squared.

00:03:22.240 --> 00:03:27.540
We know that the derivative of sine is cosine, and remember what that means.

00:03:27.919 --> 00:03:33.299
It means that this little change, d sine of x, is about cosine of x times dx.

00:03:33.780 --> 00:03:37.711
It's proportional to the size of our initial nudge dx,

00:03:37.711 --> 00:03:43.359
and the proportionality constant equals cosine of whatever input we started at.

00:03:43.979 --> 00:03:48.072
Likewise, because the derivative of x squared is 2x,

00:03:48.072 --> 00:03:53.939
the change in the height of the x squared graph is 2x times whatever dx was.

00:03:55.599 --> 00:04:00.475
So rearranging df divided by dx, the ratio of the tiny change to

00:04:00.475 --> 00:04:04.752
the sum function to the tiny change in x that caused it,

00:04:04.752 --> 00:04:10.079
is indeed cosine of x plus 2x, the sum of the derivatives of its parts.

00:04:11.520 --> 00:04:15.329
But like I said, things are a bit different for products,

00:04:15.329 --> 00:04:19.139
and let's think through why in terms of tiny nudges again.

00:04:20.060 --> 00:04:23.139
In this case, I don't think graphs are our best bet for visualizing things.

00:04:23.819 --> 00:04:27.040
Pretty commonly in math, at a lot of levels of math really,

00:04:27.040 --> 00:04:29.617
if you're dealing with a product of two things,

00:04:29.617 --> 00:04:32.140
it helps to understand it as some kind of area.

00:04:33.079 --> 00:04:36.014
In this case, maybe you try to configure some mental setup

00:04:36.014 --> 00:04:39.000
of a box where the side lengths are sine of x and x squared.

00:04:39.879 --> 00:04:41.040
But what would that mean?

00:04:42.319 --> 00:04:46.606
Well, since these are functions, you might think of those sides as adjustable,

00:04:46.607 --> 00:04:49.972
dependent on the value of x, which maybe you think of as this

00:04:49.971 --> 00:04:52.739
number that you can just freely adjust up and down.

00:04:53.740 --> 00:04:56.812
So getting a feel for what this means, focus on

00:04:56.812 --> 00:05:00.139
that top side who changes as the function sine of x.

00:05:01.060 --> 00:05:05.305
As you change this value of x up from 0, it increases up to

00:05:05.305 --> 00:05:09.127
a length of 1 as sine of x moves up towards its peak,

00:05:09.127 --> 00:05:13.939
and after that it starts to decrease as sine of x comes down from 1.

00:05:15.100 --> 00:05:18.580
And in the same way, that height there is always changing as x squared.

00:05:20.079 --> 00:05:25.800
So f of x, defined as the product of these two functions, is the area of this box.

00:05:27.060 --> 00:05:30.120
And for the derivative, let's think about how

00:05:30.120 --> 00:05:33.180
a tiny change to x by dx influences that area.

00:05:33.839 --> 00:05:36.279
What is that resulting change in area df?

00:05:39.000 --> 00:05:44.115
Well, the nudge dx caused that width to change by some small d sine of x,

00:05:44.115 --> 00:05:47.919
and it caused that height to change by some dx squared.

00:05:50.180 --> 00:05:53.649
And this gives us three little snippets of new area,

00:05:53.649 --> 00:05:58.033
a thin rectangle on the bottom whose area is its width, sine of x,

00:05:58.033 --> 00:06:00.259
times its thin height, dx squared.

00:06:01.779 --> 00:06:06.406
And there's this thin rectangle on the right, whose area is its height,

00:06:06.406 --> 00:06:09.299
x squared, times its thin width, d sine of x.

00:06:10.740 --> 00:06:14.139
And there's also this little bit in the corner, but we can ignore that.

00:06:14.439 --> 00:06:17.856
Its area is ultimately proportional to dx squared,

00:06:17.857 --> 00:06:22.480
and as we've seen before, that becomes negligible as dx goes to zero.

00:06:23.939 --> 00:06:27.375
I mean, this whole setup is very similar to what I showed last video,

00:06:27.375 --> 00:06:28.699
with the x squared diagram.

00:06:29.459 --> 00:06:32.704
And just like then, keep in mind that I'm using somewhat beefy

00:06:32.704 --> 00:06:35.899
changes here to draw things, just so we can actually see them.

00:06:36.360 --> 00:06:39.818
But in principle, dx is something very very small,

00:06:39.817 --> 00:06:44.699
and that means that dx squared and d sine of x are also very very small.

00:06:45.980 --> 00:06:51.175
So, applying what we know about the derivative of sine and of x squared,

00:06:51.175 --> 00:06:55.660
that tiny change, dx squared, is going to be about 2x times dx.

00:06:56.360 --> 00:07:01.580
And that tiny change, d sine of x, well that's going to be about cosine of x times dx.

00:07:02.920 --> 00:07:09.019
As usual, we divide out by that dx to see that the ratio we want, df divided by dx,

00:07:09.019 --> 00:07:12.504
is sine of x times the derivative of x squared,

00:07:12.504 --> 00:07:15.699
plus x squared times the derivative of sine.

00:07:17.959 --> 00:07:21.259
And nothing we've done here is specific to sine or to x squared.

00:07:21.579 --> 00:07:25.359
This same line of reasoning would work for any two functions, g and h.

00:07:27.000 --> 00:07:29.310
And sometimes people like to remember this pattern with

00:07:29.310 --> 00:07:31.539
a certain mnemonic that you kind of sing in your head.

00:07:32.220 --> 00:07:33.680
Left d right, right d left.

00:07:34.399 --> 00:07:38.812
In this example, where we have sine of x times x squared, left d right,

00:07:38.812 --> 00:07:43.778
means you take that left function, sine of x, times the derivative of the right,

00:07:43.778 --> 00:07:44.759
in this case 2x.

00:07:45.480 --> 00:07:48.876
Then you add on right d left, that right function,

00:07:48.875 --> 00:07:52.939
x squared, times the derivative of the left one, cosine of x.

00:07:54.360 --> 00:07:57.271
Now out of context, presented as a rule to remember,

00:07:57.271 --> 00:08:00.019
I think this would feel pretty strange, don't you?

00:08:00.740 --> 00:08:03.381
But when you actually think of this adjustable box,

00:08:03.380 --> 00:08:05.819
you can see what each of those terms represents.

00:08:06.579 --> 00:08:10.971
Left d right is the area of that little bottom rectangle,

00:08:10.971 --> 00:08:15.439
and right d left is the area of that rectangle on the side.

00:08:20.160 --> 00:08:23.847
By the way, I should mention that if you multiply by a constant,

00:08:23.846 --> 00:08:26.739
say 2 times sine of x, things end up a lot simpler.

00:08:27.399 --> 00:08:30.875
The derivative is just the same as the constant multiplied by

00:08:30.875 --> 00:08:34.519
the derivative of the function, in this case 2 times cosine of x.

00:08:35.558 --> 00:08:40.178
I'll leave it to you to pause and ponder and verify that makes sense.

00:08:41.918 --> 00:08:46.533
Aside from addition and multiplication, the other common way to combine functions,

00:08:46.533 --> 00:08:49.201
and believe me, this one comes up all the time,

00:08:49.201 --> 00:08:52.259
is to shove one inside the other, function composition.

00:08:53.220 --> 00:08:56.898
For example, maybe we take the function x squared and shove it

00:08:56.898 --> 00:09:00.460
inside sine of x to get this new function, sine of x squared.

00:09:01.399 --> 00:09:04.079
What do you think the derivative of that new function is?

00:09:05.299 --> 00:09:09.146
To think this one through, I'll choose yet another way to visualize things,

00:09:09.147 --> 00:09:12.540
just to emphasize that in creative math, we've got lots of options.

00:09:13.320 --> 00:09:18.591
I'll put up three different number lines, the top one is going to hold the value of x,

00:09:18.591 --> 00:09:21.440
the second one is going to hold the x squared,

00:09:21.440 --> 00:09:25.500
and the third line is going to hold the value of sine of x squared.

00:09:26.460 --> 00:09:30.310
That is, the function x squared gets you from line 1 to line 2,

00:09:30.309 --> 00:09:33.500
and the function sine gets you from line 2 to line 3.

00:09:34.840 --> 00:09:39.581
As I shift around this value of x, maybe moving it up to the value 3,

00:09:39.581 --> 00:09:45.340
that second value stays pegged to whatever x squared is, in this case moving up to 9.

00:09:46.200 --> 00:09:49.355
That bottom value, being sine of x squared, is

00:09:49.355 --> 00:09:52.580
going to go to whatever sine of 9 happens to be.

00:09:54.899 --> 00:10:00.399
So, for the derivative, let's again start by nudging that x value by some little dx.

00:10:01.539 --> 00:10:04.799
I always think that it's helpful to think of x as starting

00:10:04.799 --> 00:10:07.839
at some actual concrete number, maybe 1.5 in this case.

00:10:08.759 --> 00:10:14.737
The resulting nudge to that second value, the change in x squared caused by such a dx,

00:10:14.738 --> 00:10:15.700
is dx squared.

00:10:16.960 --> 00:10:21.062
We could expand this like we have before, as 2x times dx,

00:10:21.062 --> 00:10:25.307
which for our specific input would be 2 times 1.5 times dx,

00:10:25.307 --> 00:10:30.120
but it helps to keep things written as dx squared, at least for now.

00:10:31.019 --> 00:10:36.384
In fact, I'm going to go one step further, give a new name to this x squared,

00:10:36.384 --> 00:10:41.200
maybe h, so instead of writing dx squared for this nudge, we write dh.

00:10:42.620 --> 00:10:47.259
This makes it easier to think about that third value, which is now pegged at sine of h.

00:10:48.200 --> 00:10:53.680
Its change is d sine of h, the tiny change caused by the nudge dh.

00:10:55.000 --> 00:11:00.134
By the way, the fact that it's moving to the left while the dh bump is going to the right

00:11:00.134 --> 00:11:05.039
just means that this change, d sine of h, is going to be some kind of negative number.

00:11:06.139 --> 00:11:09.639
Once again, we can use our knowledge of the derivative of the sine.

00:11:10.500 --> 00:11:14.419
This d sine of h is going to be about cosine of h times dh.

00:11:15.240 --> 00:11:18.639
That's what it means for the derivative of sine to be cosine.

00:11:19.539 --> 00:11:23.771
Unfolding things, we can replace that h with x squared again,

00:11:23.772 --> 00:11:29.780
so we know that the bottom nudge will be a size of cosine of x squared times dx squared.

00:11:31.039 --> 00:11:32.480
Let's unfold things even further.

00:11:32.840 --> 00:11:38.100
That intermediate nudge dx squared is going to be about 2x times dx.

00:11:39.059 --> 00:11:41.445
It's always a good habit to remind yourself of

00:11:41.446 --> 00:11:43.680
what an expression like this actually means.

00:11:44.340 --> 00:11:48.578
In this case, where we started at x equals 1.5 up top,

00:11:48.577 --> 00:11:54.512
this whole expression is telling us that the size of the nudge on that third

00:11:54.513 --> 00:12:00.754
line is going to be about cosine of 1.5 squared times 2 times 1.5 times whatever

00:12:00.754 --> 00:12:02.220
the size of dx was.

00:12:02.720 --> 00:12:05.112
It's proportional to the size of dx, and this

00:12:05.111 --> 00:12:07.919
derivative is giving us that proportionality constant.

00:12:10.919 --> 00:12:12.559
Notice what we came out with here.

00:12:12.960 --> 00:12:15.855
We have the derivative of the outside function,

00:12:15.855 --> 00:12:19.235
and it's still taking in the unaltered inside function,

00:12:19.235 --> 00:12:23.220
and then multiplying it by the derivative of that inside function.

00:12:25.820 --> 00:12:29.220
Again, there's nothing special about sine of x or x squared.

00:12:29.740 --> 00:12:33.501
If you have any two functions, g of x and h of x,

00:12:33.501 --> 00:12:37.263
the derivative of their composition, g of h of x,

00:12:37.264 --> 00:12:43.659
is going to be the derivative of g evaluated on h, multiplied by the derivative of h.

00:12:47.139 --> 00:12:50.899
This pattern right here is what we usually call the chain rule.

00:12:52.039 --> 00:12:57.679
Notice for the derivative of g, I'm writing it as dg dh instead of dg dx.

00:12:58.679 --> 00:13:02.234
On the symbolic level, this is a reminder that the thing you plug

00:13:02.235 --> 00:13:06.060
into that derivative is still going to be that intermediary function h.

00:13:07.019 --> 00:13:09.745
But more than that, it's an important reflection of what

00:13:09.745 --> 00:13:12.519
this derivative of the outer function actually represents.

00:13:13.200 --> 00:13:18.449
Remember, in our three line setup, when we took the derivative of the sine on

00:13:18.448 --> 00:13:23.899
that bottom, we expanded the size of that nudge, d sine, as cosine of h times dh.

00:13:24.940 --> 00:13:27.496
This was because we didn't immediately know how

00:13:27.495 --> 00:13:29.840
the size of that bottom nudge depended on x.

00:13:30.419 --> 00:13:32.599
That's kind of the whole thing we were trying to figure out.

00:13:33.259 --> 00:13:37.360
But we could take the derivative with respect to that intermediate variable, h.

00:13:38.100 --> 00:13:41.916
That is, figure out how to express the size of that nudge on the third

00:13:41.916 --> 00:13:45.680
line as some multiple of dh, the size of the nudge on the second line.

00:13:46.580 --> 00:13:50.700
It was only after that that we unfolded further by figuring out what dh was.

00:13:53.320 --> 00:13:58.727
In this chain rule expression, we're saying, look at the ratio between a tiny change in

00:13:58.726 --> 00:14:02.351
g, the final output, to a tiny change in h that caused it,

00:14:02.351 --> 00:14:04.379
h being the value we plug into g.

00:14:05.320 --> 00:14:08.680
Then multiply that by the tiny change in h, divided

00:14:08.679 --> 00:14:11.199
by the tiny change in x that caused it.

00:14:12.299 --> 00:14:15.481
So notice, those dh's cancel out, and they give us a ratio

00:14:15.481 --> 00:14:19.473
between the change in that final output and the change to the input that,

00:14:19.474 --> 00:14:22.280
through a certain chain of events, brought it about.

00:14:23.860 --> 00:14:26.980
And that cancellation of dh is not just a notational trick.

00:14:26.980 --> 00:14:30.350
That is a genuine reflection of what's going on with the

00:14:30.350 --> 00:14:33.899
tiny nudges that underpin everything we do with derivatives.

00:14:36.299 --> 00:14:39.877
So those are the three basic tools to have in your belt to handle

00:14:39.878 --> 00:14:43.240
derivatives of functions that combine a lot of smaller things.

00:14:43.840 --> 00:14:47.379
You've got the sum rule, the product rule, and the chain rule.

00:14:48.399 --> 00:14:51.922
And I'll be honest with you, there is a big difference between knowing

00:14:51.922 --> 00:14:54.551
what the chain rule is and what the product rule is,

00:14:54.551 --> 00:14:58.620
and actually being fluent with applying them in even the most hairy of situations.

00:14:59.480 --> 00:15:03.100
Watching videos, any videos, about the mechanics of calculus is

00:15:03.100 --> 00:15:06.892
never going to substitute for practicing those mechanics yourself,

00:15:06.892 --> 00:15:10.400
and building up the muscles to do these computations yourself.

00:15:11.240 --> 00:15:13.600
I really wish I could offer to do that for you,

00:15:13.600 --> 00:15:17.440
but I'm afraid the ball is in your court, my friend, to seek out the practice.

00:15:18.039 --> 00:15:20.940
What I can offer, and what I hope I have offered,

00:15:20.941 --> 00:15:23.960
is to show you where these rules actually come from.

00:15:24.139 --> 00:15:27.774
To show that they're not just something to be memorized and hammered away,

00:15:27.774 --> 00:15:31.264
but they're natural patterns, things that you too could have discovered

00:15:31.264 --> 00:15:34.560
just by patiently thinking through what a derivative actually means.
