Visualizing the chain rule and product rule | Chapter 4, Essence of calculus

3Blue1Brown · May 12, 2026

Open on YouTube

Transcript ~2554 words · 15:56

0:14

In the last videos I talked about the derivatives of simple functions,

0:18

and the goal was to have a clear picture or intuition to hold in

0:22

your mind that actually explains where these formulas come from.

0:26

But most of the functions you deal with in modeling the world involve mixing,

0:31

combining, or tweaking these simple functions in some other way,

0:35

so our next step is to understand how you take derivatives of more complicated

0:39

combinations.

0:41

Again, I don't want these to be something to memorize,

0:43

I want you to have a clear picture in mind for where each one comes from.

0:49

Now, this really boils down into three basic ways to combine functions.

0:54

You can add them together, you can multiply them,

0:56

and you can throw one inside the other, known as composing them.

1:00

Sure, you could say subtracting them, but really that's just

1:03

multiplying the second by negative one and adding them together.

1:08

Likewise, dividing functions doesn't really add anything,

1:11

because that's the same as plugging one inside the function, one over x,

1:14

and then multiplying the two together.

1:17

So really, most functions you come across just involve layering

1:20

together these three different types of combinations,

1:23

though there's not really a bound on how monstrous things can become.

1:27

But as long as you know how derivatives play with just those three combination types,

1:31

you'll always be able to take it step by step and peel through

1:34

the layers for any kind of monstrous expression.

1:38

So the question is, if you know the derivative of two functions,

1:42

what is the derivative of their sum, of their product,

1:45

and of the function composition between them?

1:50

The sum rule is easiest, if somewhat tongue-twisting to say out loud.

1:54

The derivative of a sum of two functions is the sum of their derivatives.

1:59

But it's worth warming up with this example by really thinking through

2:03

what it means to take a derivative of a sum of two functions,

2:07

since the derivative patterns for products and function composition won't

2:11

be so straightforward, and they're going to require this kind of deeper thinking.

2:16

For example, let's think about this function f of x equals sine of x plus x squared.

2:22

It's a function where, for every input, you add together

2:25

the values of sine of x and x squared at that point.

2:29

For example, let's say at x equals 0.5, the height of the sine

2:34

graph is given by this vertical bar, and the height of the x

2:38

squared parabola is given by this slightly smaller vertical bar.

2:44

And their sum is the length you get by just stacking them together.

2:48

For the derivative, you want to ask what happens as you nudge that input slightly,

2:53

maybe increasing it up to 0.5 plus dx.

2:57

The difference in the value of f between those two places is what we call df.

3:04

And when you picture it like this, I think you'll agree that the total

3:08

change in the height is whatever the change to the sine graph is,

3:13

what we might call d sine of x, plus whatever the change to x squared is, dx squared.

3:22

We know that the derivative of sine is cosine, and remember what that means.

3:27

It means that this little change, d sine of x, is about cosine of x times dx.

3:33

It's proportional to the size of our initial nudge dx,

3:37

and the proportionality constant equals cosine of whatever input we started at.

3:43

Likewise, because the derivative of x squared is 2x,

3:48

the change in the height of the x squared graph is 2x times whatever dx was.

3:55

So rearranging df divided by dx, the ratio of the tiny change to

4:00

the sum function to the tiny change in x that caused it,

4:04

is indeed cosine of x plus 2x, the sum of the derivatives of its parts.

4:11

But like I said, things are a bit different for products,

4:15

and let's think through why in terms of tiny nudges again.

4:20

In this case, I don't think graphs are our best bet for visualizing things.

4:23

Pretty commonly in math, at a lot of levels of math really,

4:27

if you're dealing with a product of two things,

4:29

it helps to understand it as some kind of area.

4:33

In this case, maybe you try to configure some mental setup

4:36

of a box where the side lengths are sine of x and x squared.

4:39

But what would that mean?

4:42

Well, since these are functions, you might think of those sides as adjustable,

4:46

dependent on the value of x, which maybe you think of as this

4:49

number that you can just freely adjust up and down.

4:53

So getting a feel for what this means, focus on

4:56

that top side who changes as the function sine of x.

5:01

As you change this value of x up from 0, it increases up to

5:05

a length of 1 as sine of x moves up towards its peak,

5:09

and after that it starts to decrease as sine of x comes down from 1.

5:15

And in the same way, that height there is always changing as x squared.

5:20

So f of x, defined as the product of these two functions, is the area of this box.

5:27

And for the derivative, let's think about how

5:30

a tiny change to x by dx influences that area.

5:33

What is that resulting change in area df?

5:39

Well, the nudge dx caused that width to change by some small d sine of x,

5:44

and it caused that height to change by some dx squared.

5:50

And this gives us three little snippets of new area,

5:53

a thin rectangle on the bottom whose area is its width, sine of x,

5:58

times its thin height, dx squared.

6:01

And there's this thin rectangle on the right, whose area is its height,

6:06

x squared, times its thin width, d sine of x.

6:10

And there's also this little bit in the corner, but we can ignore that.

6:14

Its area is ultimately proportional to dx squared,

6:17

and as we've seen before, that becomes negligible as dx goes to zero.

6:23

I mean, this whole setup is very similar to what I showed last video,

6:27

with the x squared diagram.

6:29

And just like then, keep in mind that I'm using somewhat beefy

6:32

changes here to draw things, just so we can actually see them.

6:36

But in principle, dx is something very very small,

6:39

and that means that dx squared and d sine of x are also very very small.

6:45

So, applying what we know about the derivative of sine and of x squared,

6:51

that tiny change, dx squared, is going to be about 2x times dx.

6:56

And that tiny change, d sine of x, well that's going to be about cosine of x times dx.

7:02

As usual, we divide out by that dx to see that the ratio we want, df divided by dx,

7:09

is sine of x times the derivative of x squared,

7:12

plus x squared times the derivative of sine.

7:17

And nothing we've done here is specific to sine or to x squared.

7:21

This same line of reasoning would work for any two functions, g and h.

7:27

And sometimes people like to remember this pattern with

7:29

a certain mnemonic that you kind of sing in your head.

7:32

Left d right, right d left.

7:34

In this example, where we have sine of x times x squared, left d right,

7:38

means you take that left function, sine of x, times the derivative of the right,

7:43

in this case 2x.

7:45

Then you add on right d left, that right function,

7:48

x squared, times the derivative of the left one, cosine of x.

7:54

Now out of context, presented as a rule to remember,

7:57

I think this would feel pretty strange, don't you?

8:00

But when you actually think of this adjustable box,

8:03

you can see what each of those terms represents.

8:06

Left d right is the area of that little bottom rectangle,

8:10

and right d left is the area of that rectangle on the side.

8:20

By the way, I should mention that if you multiply by a constant,

8:23

say 2 times sine of x, things end up a lot simpler.

8:27

The derivative is just the same as the constant multiplied by

8:30

the derivative of the function, in this case 2 times cosine of x.

8:35

I'll leave it to you to pause and ponder and verify that makes sense.

8:41

Aside from addition and multiplication, the other common way to combine functions,

8:46

and believe me, this one comes up all the time,

8:49

is to shove one inside the other, function composition.

8:53

For example, maybe we take the function x squared and shove it

8:56

inside sine of x to get this new function, sine of x squared.

9:01

What do you think the derivative of that new function is?

9:05

To think this one through, I'll choose yet another way to visualize things,

9:09

just to emphasize that in creative math, we've got lots of options.

9:13

I'll put up three different number lines, the top one is going to hold the value of x,

9:18

the second one is going to hold the x squared,

9:21

and the third line is going to hold the value of sine of x squared.

9:26

That is, the function x squared gets you from line 1 to line 2,

9:30

and the function sine gets you from line 2 to line 3.

9:34

As I shift around this value of x, maybe moving it up to the value 3,

9:39

that second value stays pegged to whatever x squared is, in this case moving up to 9.

9:46

That bottom value, being sine of x squared, is

9:49

going to go to whatever sine of 9 happens to be.

9:54

So, for the derivative, let's again start by nudging that x value by some little dx.

10:01

I always think that it's helpful to think of x as starting

10:04

at some actual concrete number, maybe 1.5 in this case.

10:08

The resulting nudge to that second value, the change in x squared caused by such a dx,

10:14

is dx squared.

10:16

We could expand this like we have before, as 2x times dx,

10:21

which for our specific input would be 2 times 1.5 times dx,

10:25

but it helps to keep things written as dx squared, at least for now.

10:31

In fact, I'm going to go one step further, give a new name to this x squared,

10:36

maybe h, so instead of writing dx squared for this nudge, we write dh.

10:42

This makes it easier to think about that third value, which is now pegged at sine of h.

10:48

Its change is d sine of h, the tiny change caused by the nudge dh.

10:55

By the way, the fact that it's moving to the left while the dh bump is going to the right

11:00

just means that this change, d sine of h, is going to be some kind of negative number.

11:06

Once again, we can use our knowledge of the derivative of the sine.

11:10

This d sine of h is going to be about cosine of h times dh.

11:15

That's what it means for the derivative of sine to be cosine.

11:19

Unfolding things, we can replace that h with x squared again,

11:23

so we know that the bottom nudge will be a size of cosine of x squared times dx squared.

11:31

Let's unfold things even further.

11:32

That intermediate nudge dx squared is going to be about 2x times dx.

11:39

It's always a good habit to remind yourself of

11:41

what an expression like this actually means.

11:44

In this case, where we started at x equals 1.5 up top,

11:48

this whole expression is telling us that the size of the nudge on that third

11:54

line is going to be about cosine of 1.5 squared times 2 times 1.5 times whatever

12:00

the size of dx was.

12:02

It's proportional to the size of dx, and this

12:05

derivative is giving us that proportionality constant.

12:10

Notice what we came out with here.

12:12

We have the derivative of the outside function,

12:15

and it's still taking in the unaltered inside function,

12:19

and then multiplying it by the derivative of that inside function.

12:25

Again, there's nothing special about sine of x or x squared.

12:29

If you have any two functions, g of x and h of x,

12:33

the derivative of their composition, g of h of x,

12:37

is going to be the derivative of g evaluated on h, multiplied by the derivative of h.

12:47

This pattern right here is what we usually call the chain rule.

12:52

Notice for the derivative of g, I'm writing it as dg dh instead of dg dx.

12:58

On the symbolic level, this is a reminder that the thing you plug

13:02

into that derivative is still going to be that intermediary function h.

13:07

But more than that, it's an important reflection of what

13:09

this derivative of the outer function actually represents.

13:13

Remember, in our three line setup, when we took the derivative of the sine on

13:18

that bottom, we expanded the size of that nudge, d sine, as cosine of h times dh.

13:24

This was because we didn't immediately know how

13:27

the size of that bottom nudge depended on x.

13:30

That's kind of the whole thing we were trying to figure out.

13:33

But we could take the derivative with respect to that intermediate variable, h.

13:38

That is, figure out how to express the size of that nudge on the third

13:41

line as some multiple of dh, the size of the nudge on the second line.

13:46

It was only after that that we unfolded further by figuring out what dh was.

13:53

In this chain rule expression, we're saying, look at the ratio between a tiny change in

13:58

g, the final output, to a tiny change in h that caused it,

14:02

h being the value we plug into g.

14:05

Then multiply that by the tiny change in h, divided

14:08

by the tiny change in x that caused it.

14:12

So notice, those dh's cancel out, and they give us a ratio

14:15

between the change in that final output and the change to the input that,

14:19

through a certain chain of events, brought it about.

14:23

And that cancellation of dh is not just a notational trick.

14:26

That is a genuine reflection of what's going on with the

14:30

tiny nudges that underpin everything we do with derivatives.

14:36

So those are the three basic tools to have in your belt to handle

14:39

derivatives of functions that combine a lot of smaller things.

14:43

You've got the sum rule, the product rule, and the chain rule.

14:48

And I'll be honest with you, there is a big difference between knowing

14:51

what the chain rule is and what the product rule is,

14:54

and actually being fluent with applying them in even the most hairy of situations.

14:59

Watching videos, any videos, about the mechanics of calculus is

15:03

never going to substitute for practicing those mechanics yourself,

15:06

and building up the muscles to do these computations yourself.

15:11

I really wish I could offer to do that for you,

15:13

but I'm afraid the ball is in your court, my friend, to seek out the practice.

15:18

What I can offer, and what I hope I have offered,

15:20

is to show you where these rules actually come from.

15:24

To show that they're not just something to be memorized and hammered away,

15:27

but they're natural patterns, things that you too could have discovered

15:31

just by patiently thinking through what a derivative actually means.

— end of transcript —

More from 3Blue1Brown

17:04

The essence of calculus

3Blue1Brown

44:52

How (and why) to take a logarithm of an image

3Blue1Brown

30:38

Solving Wordle using information theory

3Blue1Brown

11:15

The hardest problem on the hardest test

3Blue1Brown

Trending Transcripts

26:15

Is AI pushing our planet too far? | BBC News

BBC News

4:19

George Carlin — I Just Don't Care

Robin Slater

3:17:57

Joe Rogan Experience #2493 - Protect Our Parks 16

PowerfulJRE

17:04

The essence of calculus

3Blue1Brown