Advertisement
15:56
Visualizing the chain rule and product rule | Chapter 4, Essence of calculus
3Blue1Brown
·
May 12, 2026
Open on YouTube
Transcript
0:14
In the last videos I talked about the derivatives of simple functions,
0:18
and the goal was to have a clear picture or intuition to hold in
0:22
your mind that actually explains where these formulas come from.
0:26
But most of the functions you deal with in modeling the world involve mixing,
0:31
combining, or tweaking these simple functions in some other way,
0:35
so our next step is to understand how you take derivatives of more complicated
0:39
combinations.
0:41
Again, I don't want these to be something to memorize,
Advertisement
0:43
I want you to have a clear picture in mind for where each one comes from.
0:49
Now, this really boils down into three basic ways to combine functions.
0:54
You can add them together, you can multiply them,
0:56
and you can throw one inside the other, known as composing them.
1:00
Sure, you could say subtracting them, but really that's just
1:03
multiplying the second by negative one and adding them together.
1:08
Likewise, dividing functions doesn't really add anything,
1:11
because that's the same as plugging one inside the function, one over x,
1:14
and then multiplying the two together.
1:17
So really, most functions you come across just involve layering
Advertisement
1:20
together these three different types of combinations,
1:23
though there's not really a bound on how monstrous things can become.
1:27
But as long as you know how derivatives play with just those three combination types,
1:31
you'll always be able to take it step by step and peel through
1:34
the layers for any kind of monstrous expression.
1:38
So the question is, if you know the derivative of two functions,
1:42
what is the derivative of their sum, of their product,
1:45
and of the function composition between them?
1:50
The sum rule is easiest, if somewhat tongue-twisting to say out loud.
1:54
The derivative of a sum of two functions is the sum of their derivatives.
1:59
But it's worth warming up with this example by really thinking through
2:03
what it means to take a derivative of a sum of two functions,
2:07
since the derivative patterns for products and function composition won't
2:11
be so straightforward, and they're going to require this kind of deeper thinking.
2:16
For example, let's think about this function f of x equals sine of x plus x squared.
2:22
It's a function where, for every input, you add together
2:25
the values of sine of x and x squared at that point.
2:29
For example, let's say at x equals 0.5, the height of the sine
2:34
graph is given by this vertical bar, and the height of the x
2:38
squared parabola is given by this slightly smaller vertical bar.
2:44
And their sum is the length you get by just stacking them together.
2:48
For the derivative, you want to ask what happens as you nudge that input slightly,
2:53
maybe increasing it up to 0.5 plus dx.
2:57
The difference in the value of f between those two places is what we call df.
3:04
And when you picture it like this, I think you'll agree that the total
3:08
change in the height is whatever the change to the sine graph is,
3:13
what we might call d sine of x, plus whatever the change to x squared is, dx squared.
3:22
We know that the derivative of sine is cosine, and remember what that means.
3:27
It means that this little change, d sine of x, is about cosine of x times dx.
3:33
It's proportional to the size of our initial nudge dx,
3:37
and the proportionality constant equals cosine of whatever input we started at.
3:43
Likewise, because the derivative of x squared is 2x,
3:48
the change in the height of the x squared graph is 2x times whatever dx was.
3:55
So rearranging df divided by dx, the ratio of the tiny change to
4:00
the sum function to the tiny change in x that caused it,
4:04
is indeed cosine of x plus 2x, the sum of the derivatives of its parts.
4:11
But like I said, things are a bit different for products,
4:15
and let's think through why in terms of tiny nudges again.
4:20
In this case, I don't think graphs are our best bet for visualizing things.
4:23
Pretty commonly in math, at a lot of levels of math really,
4:27
if you're dealing with a product of two things,
4:29
it helps to understand it as some kind of area.
4:33
In this case, maybe you try to configure some mental setup
4:36
of a box where the side lengths are sine of x and x squared.
4:39
But what would that mean?
4:42
Well, since these are functions, you might think of those sides as adjustable,
4:46
dependent on the value of x, which maybe you think of as this
4:49
number that you can just freely adjust up and down.
4:53
So getting a feel for what this means, focus on
4:56
that top side who changes as the function sine of x.
5:01
As you change this value of x up from 0, it increases up to
5:05
a length of 1 as sine of x moves up towards its peak,
5:09
and after that it starts to decrease as sine of x comes down from 1.
5:15
And in the same way, that height there is always changing as x squared.
5:20
So f of x, defined as the product of these two functions, is the area of this box.
5:27
And for the derivative, let's think about how
5:30
a tiny change to x by dx influences that area.
5:33
What is that resulting change in area df?
5:39
Well, the nudge dx caused that width to change by some small d sine of x,
5:44
and it caused that height to change by some dx squared.
5:50
And this gives us three little snippets of new area,
5:53
a thin rectangle on the bottom whose area is its width, sine of x,
5:58
times its thin height, dx squared.
6:01
And there's this thin rectangle on the right, whose area is its height,
6:06
x squared, times its thin width, d sine of x.
6:10
And there's also this little bit in the corner, but we can ignore that.
6:14
Its area is ultimately proportional to dx squared,
6:17
and as we've seen before, that becomes negligible as dx goes to zero.
6:23
I mean, this whole setup is very similar to what I showed last video,
6:27
with the x squared diagram.
6:29
And just like then, keep in mind that I'm using somewhat beefy
6:32
changes here to draw things, just so we can actually see them.
6:36
But in principle, dx is something very very small,
6:39
and that means that dx squared and d sine of x are also very very small.
6:45
So, applying what we know about the derivative of sine and of x squared,
6:51
that tiny change, dx squared, is going to be about 2x times dx.
6:56
And that tiny change, d sine of x, well that's going to be about cosine of x times dx.
7:02
As usual, we divide out by that dx to see that the ratio we want, df divided by dx,
7:09
is sine of x times the derivative of x squared,
7:12
plus x squared times the derivative of sine.
7:17
And nothing we've done here is specific to sine or to x squared.
7:21
This same line of reasoning would work for any two functions, g and h.
7:27
And sometimes people like to remember this pattern with
7:29
a certain mnemonic that you kind of sing in your head.
7:32
Left d right, right d left.
7:34
In this example, where we have sine of x times x squared, left d right,
7:38
means you take that left function, sine of x, times the derivative of the right,
7:43
in this case 2x.
7:45
Then you add on right d left, that right function,
7:48
x squared, times the derivative of the left one, cosine of x.
7:54
Now out of context, presented as a rule to remember,
7:57
I think this would feel pretty strange, don't you?
8:00
But when you actually think of this adjustable box,
8:03
you can see what each of those terms represents.
8:06
Left d right is the area of that little bottom rectangle,
8:10
and right d left is the area of that rectangle on the side.
8:20
By the way, I should mention that if you multiply by a constant,
8:23
say 2 times sine of x, things end up a lot simpler.
8:27
The derivative is just the same as the constant multiplied by
8:30
the derivative of the function, in this case 2 times cosine of x.
8:35
I'll leave it to you to pause and ponder and verify that makes sense.
8:41
Aside from addition and multiplication, the other common way to combine functions,
8:46
and believe me, this one comes up all the time,
8:49
is to shove one inside the other, function composition.
8:53
For example, maybe we take the function x squared and shove it
8:56
inside sine of x to get this new function, sine of x squared.
9:01
What do you think the derivative of that new function is?
9:05
To think this one through, I'll choose yet another way to visualize things,
9:09
just to emphasize that in creative math, we've got lots of options.
9:13
I'll put up three different number lines, the top one is going to hold the value of x,
9:18
the second one is going to hold the x squared,
9:21
and the third line is going to hold the value of sine of x squared.
9:26
That is, the function x squared gets you from line 1 to line 2,
9:30
and the function sine gets you from line 2 to line 3.
9:34
As I shift around this value of x, maybe moving it up to the value 3,
9:39
that second value stays pegged to whatever x squared is, in this case moving up to 9.
9:46
That bottom value, being sine of x squared, is
9:49
going to go to whatever sine of 9 happens to be.
9:54
So, for the derivative, let's again start by nudging that x value by some little dx.
10:01
I always think that it's helpful to think of x as starting
10:04
at some actual concrete number, maybe 1.5 in this case.
10:08
The resulting nudge to that second value, the change in x squared caused by such a dx,
10:14
is dx squared.
10:16
We could expand this like we have before, as 2x times dx,
10:21
which for our specific input would be 2 times 1.5 times dx,
10:25
but it helps to keep things written as dx squared, at least for now.
10:31
In fact, I'm going to go one step further, give a new name to this x squared,
10:36
maybe h, so instead of writing dx squared for this nudge, we write dh.
10:42
This makes it easier to think about that third value, which is now pegged at sine of h.
10:48
Its change is d sine of h, the tiny change caused by the nudge dh.
10:55
By the way, the fact that it's moving to the left while the dh bump is going to the right
11:00
just means that this change, d sine of h, is going to be some kind of negative number.
11:06
Once again, we can use our knowledge of the derivative of the sine.
11:10
This d sine of h is going to be about cosine of h times dh.
11:15
That's what it means for the derivative of sine to be cosine.
11:19
Unfolding things, we can replace that h with x squared again,
11:23
so we know that the bottom nudge will be a size of cosine of x squared times dx squared.
11:31
Let's unfold things even further.
11:32
That intermediate nudge dx squared is going to be about 2x times dx.
11:39
It's always a good habit to remind yourself of
11:41
what an expression like this actually means.
11:44
In this case, where we started at x equals 1.5 up top,
11:48
this whole expression is telling us that the size of the nudge on that third
11:54
line is going to be about cosine of 1.5 squared times 2 times 1.5 times whatever
12:00
the size of dx was.
12:02
It's proportional to the size of dx, and this
12:05
derivative is giving us that proportionality constant.
12:10
Notice what we came out with here.
12:12
We have the derivative of the outside function,
12:15
and it's still taking in the unaltered inside function,
12:19
and then multiplying it by the derivative of that inside function.
12:25
Again, there's nothing special about sine of x or x squared.
12:29
If you have any two functions, g of x and h of x,
12:33
the derivative of their composition, g of h of x,
12:37
is going to be the derivative of g evaluated on h, multiplied by the derivative of h.
12:47
This pattern right here is what we usually call the chain rule.
12:52
Notice for the derivative of g, I'm writing it as dg dh instead of dg dx.
12:58
On the symbolic level, this is a reminder that the thing you plug
13:02
into that derivative is still going to be that intermediary function h.
13:07
But more than that, it's an important reflection of what
13:09
this derivative of the outer function actually represents.
13:13
Remember, in our three line setup, when we took the derivative of the sine on
13:18
that bottom, we expanded the size of that nudge, d sine, as cosine of h times dh.
13:24
This was because we didn't immediately know how
13:27
the size of that bottom nudge depended on x.
13:30
That's kind of the whole thing we were trying to figure out.
13:33
But we could take the derivative with respect to that intermediate variable, h.
13:38
That is, figure out how to express the size of that nudge on the third
13:41
line as some multiple of dh, the size of the nudge on the second line.
13:46
It was only after that that we unfolded further by figuring out what dh was.
13:53
In this chain rule expression, we're saying, look at the ratio between a tiny change in
13:58
g, the final output, to a tiny change in h that caused it,
14:02
h being the value we plug into g.
14:05
Then multiply that by the tiny change in h, divided
14:08
by the tiny change in x that caused it.
14:12
So notice, those dh's cancel out, and they give us a ratio
14:15
between the change in that final output and the change to the input that,
14:19
through a certain chain of events, brought it about.
14:23
And that cancellation of dh is not just a notational trick.
14:26
That is a genuine reflection of what's going on with the
14:30
tiny nudges that underpin everything we do with derivatives.
14:36
So those are the three basic tools to have in your belt to handle
14:39
derivatives of functions that combine a lot of smaller things.
14:43
You've got the sum rule, the product rule, and the chain rule.
14:48
And I'll be honest with you, there is a big difference between knowing
14:51
what the chain rule is and what the product rule is,
14:54
and actually being fluent with applying them in even the most hairy of situations.
14:59
Watching videos, any videos, about the mechanics of calculus is
15:03
never going to substitute for practicing those mechanics yourself,
15:06
and building up the muscles to do these computations yourself.
15:11
I really wish I could offer to do that for you,
15:13
but I'm afraid the ball is in your court, my friend, to seek out the practice.
15:18
What I can offer, and what I hope I have offered,
15:20
is to show you where these rules actually come from.
15:24
To show that they're not just something to be memorized and hammered away,
15:27
but they're natural patterns, things that you too could have discovered
15:31
just by patiently thinking through what a derivative actually means.
— end of transcript —
Advertisement