Taylor series | Chapter 11, Essence of calculus

3Blue1Brown · May 12, 2026

Open on YouTube

Transcript ~3521 words · 22:19

0:14

When I first learned about Taylor series, I definitely

0:17

didn't appreciate just how important they are.

0:20

But time and time again they come up in math, physics,

0:22

and many fields of engineering because they're one of the most

0:25

powerful tools that math has to offer for approximating functions.

0:30

I think one of the first times this clicked for me as a

0:32

student was not in a calculus class but a physics class.

0:35

We were studying a certain problem that had to do with the potential energy of a

0:40

pendulum, and for that you need an expression for how high the weight of the

0:44

pendulum is above its lowest point, and when you work that out it comes out to be

0:48

proportional to 1 minus the cosine of the angle between the pendulum and the vertical.

0:53

The specifics of the problem we were trying to solve are beyond the point here,

0:57

but what I'll say is that this cosine function made the problem awkward and unwieldy,

1:02

and made it less clear how pendulums relate to other oscillating phenomena.

1:07

But if you approximate cosine of theta as 1 minus theta squared over 2,

1:12

everything just fell into place much more easily.

1:16

If you've never seen anything like this before,

1:19

an approximation like that might seem completely out of left field.

1:23

If you graph cosine of theta along with this function, 1 minus theta squared over 2,

1:28

they do seem rather close to each other, at least for small angles near 0,

1:33

but how would you even think to make this approximation,

1:36

and how would you find that particular quadratic?

1:41

The study of Taylor series is largely about taking non-polynomial

1:44

functions and finding polynomials that approximate them near some input.

1:48

The motive here is that polynomials tend to be much easier to deal

1:52

with than other functions, they're easier to compute,

1:55

easier to take derivatives, easier to integrate, just all around more friendly.

2:00

So let's take a look at that function, cosine of x,

2:03

and really take a moment to think about how you might construct a quadratic

2:08

approximation near x equals 0.

2:10

That is, among all of the possible polynomials that look like c0 plus c1

2:16

times x plus c2 times x squared, for some choice of these constants, c0,

2:21

c1, and c2, find the one that most resembles cosine of x near x equals 0,

2:27

whose graph kind of spoons with the graph of cosine x at that point.

2:33

Well, first of all, at the input 0, the value of cosine of x is 1,

2:38

so if our approximation is going to be any good at all,

2:41

it should also equal 1 at the input x equals 0.

2:45

Plugging in 0 just results in whatever c0 is, so we can set that equal to 1.

2:53

This leaves us free to choose constants c1 and c2 to make this

2:56

approximation as good as we can, but nothing we do with them is

3:00

going to change the fact that the polynomial equals 1 at x equals 0.

3:04

It would also be good if our approximation had the same

3:08

tangent slope as cosine x at this point of interest.

3:11

Otherwise the approximation drifts away from the

3:14

cosine graph much faster than it needs to.

3:18

The derivative of cosine is negative sine, and at x equals 0,

3:22

that equals 0, meaning the tangent line is perfectly flat.

3:26

On the other hand, when you work out the derivative of our quadratic,

3:31

you get c1 plus 2 times c2 times x.

3:35

At x equals 0, this just equals whatever we choose for c1.

3:40

So this constant c1 has complete control over the

3:43

derivative of our approximation around x equals 0.

3:47

Setting it equal to 0 ensures that our approximation

3:49

also has a flat tangent line at this point.

3:53

This leaves us free to change c2, but the value and the slope of our

3:57

polynomial at x equals 0 are locked in place to match that of cosine.

4:04

The final thing to take advantage of is the fact that the cosine graph

4:08

curves downward above x equals 0, it has a negative second derivative.

4:13

Or in other words, even though the rate of change is 0 at that point,

4:17

the rate of change itself is decreasing around that point.

4:21

Specifically, since its derivative is negative sine of x,

4:25

its second derivative is negative cosine of x, and at x equals 0, that equals negative 1.

4:33

Now in the same way that we wanted the derivative of our approximation to

4:37

match that of the cosine so that their values wouldn't drift apart needlessly quickly,

4:41

making sure that their second derivatives match will ensure that they

4:45

curve at the same rate, that the slope of our polynomial doesn't drift

4:49

away from the slope of cosine x any more quickly than it needs to.

4:54

Pulling up the same derivative we had before, and then taking its derivative,

4:59

we see that the second derivative of this polynomial is exactly 2 times c2.

5:04

So to make sure that this second derivative also equals negative 1 at x equals 0,

5:10

2 times c2 has to be negative 1, meaning c2 itself should be negative 1 half.

5:16

This gives us the approximation 1 plus 0x minus 1 half x squared.

5:23

To get a feel for how good it is, if you estimate cosine of 0.1 using this polynomial,

5:29

you'd estimate it to be 0.995, and this is the true value of cosine of 0.1.

5:36

It's a really good approximation!

5:40

Take a moment to reflect on what just happened.

5:42

You had 3 degrees of freedom with this quadratic approximation,

5:46

the constants c0, c1, and c2.

5:49

c0 was responsible for making sure that the output of the approximation matches that of

5:55

cosine x at x equals 0, c1 was in charge of making sure that the derivatives match at

6:01

that point, and c2 was responsible for making sure that the second derivatives match up.

6:08

This ensures that the way your approximation changes as you move away from x equals 0,

6:14

and the way that the rate of change itself changes,

6:17

is as similar as possible to the behaviour of cosine x,

6:20

given the amount of control you have.

6:24

You could give yourself more control by allowing more terms

6:27

in your polynomial and matching higher order derivatives.

6:30

For example, let's say you added on the term c3 times x cubed for some constant c3.

6:36

In that case, if you take the third derivative of a cubic polynomial,

6:41

anything quadratic or smaller goes to 0.

6:45

As for that last term, after 3 iterations of the power rule,

6:50

it looks like 1 times 2 times 3 times c3.

6:56

On the other hand, the third derivative of cosine x comes out to sine x,

7:01

which equals 0 at x equals 0.

7:03

So to make sure that the third derivatives match, the constant c3 should be 0.

7:09

Or in other words, not only is 1 minus ½ x2 the best possible quadratic

7:14

approximation of cosine, it's also the best possible cubic approximation.

7:21

You can make an improvement by adding on a fourth order term, c4 times x to the fourth.

7:27

The fourth derivative of cosine is itself, which equals 1 at x equals 0.

7:34

And what's the fourth derivative of our polynomial with this new term?

7:38

Well, when you keep applying the power rule over and over,

7:42

with those exponents all hopping down in front,

7:45

you end up with 1 times 2 times 3 times 4 times c4, which is 24 times c4.

7:51

So if we want this to match the fourth derivative of cosine x,

7:56

which is 1, c4 has to be 1 over 24.

7:59

And indeed, the polynomial 1 minus ½ x2 plus 1 24 times x to the fourth,

8:05

which looks like this, is a very close approximation for cosine x around x equals 0.

8:13

In any physics problem involving the cosine of a small angle, for example,

8:18

predictions would be almost unnoticeably different if you substituted this polynomial

8:23

for cosine of x.

8:26

Take a step back and notice a few things happening with this process.

8:30

First of all, factorial terms come up very naturally in this process.

8:35

When you take n successive derivatives of the function x to the n,

8:39

letting the power rule keep cascading on down,

8:43

what you'll be left with is 1 times 2 times 3 on and on up to whatever n is.

8:49

So you don't simply set the coefficients of the polynomial equal to whatever derivative

8:53

you want, you have to divide by the appropriate factorial to cancel out this effect.

8:59

For example, that x to the fourth coefficient was the fourth derivative of cosine,

9:05

1, but divided by 4 factorial, 24.

9:09

The second thing to notice is that adding on new terms,

9:12

like this c4 times x to the fourth, doesn't mess up what the old terms should be,

9:17

and that's really important.

9:20

For example, the second derivative of this polynomial at x equals 0 is still equal

9:25

to 2 times the second coefficient, even after you introduce higher order terms.

9:30

And it's because we're plugging in x equals 0,

9:33

so the second derivative of any higher order term, which all include an x,

9:38

will just wash away.

9:40

And the same goes for any other derivative, which is why each derivative of a

9:45

polynomial at x equals 0 is controlled by one and only one of the coefficients.

9:52

If instead you were approximating near an input other than 0, like x equals pi,

9:57

in order to get the same effect you would have to write your polynomial in

10:01

terms of powers of x minus pi, or whatever input you're looking at.

10:06

This makes it look noticeably more complicated,

10:09

but all we're doing is making sure that the point pi looks and behaves like 0,

10:13

so that plugging in x equals pi will result in a lot of nice cancellation that

10:18

leaves only one constant.

10:22

And finally, on a more philosophical level, notice how what we're doing here is basically

10:27

taking information about higher order derivatives of a function at a single point,

10:32

and translating that into information about the value of the function near that point.

10:40

You can take as many derivatives of cosine as you want.

10:44

It follows this nice cyclic pattern, cosine of x,

10:47

negative sine of x, negative cosine, sine, and then repeat.

10:52

And the value of each one of these is easy to compute at x equals 0.

10:56

It gives this cyclic pattern 1, 0, negative 1, 0, and then repeat.

11:02

And knowing the values of all those higher order derivatives is a lot of information

11:07

about cosine of x, even though it only involves plugging in a single number, x equals 0.

11:14

So what we're doing is leveraging that information to get an approximation around this

11:19

input, and you do it by creating a polynomial whose higher order derivatives are designed

11:25

to match up with those of cosine, following this same 1, 0, negative 1, 0, cyclic pattern.

11:31

And to do that, you just make each coefficient of the polynomial follow that

11:35

same pattern, but you have to divide each one by the appropriate factorial.

11:40

Like I mentioned before, this is what cancels out

11:42

the cascading effect of many power rule applications.

11:47

The polynomials you get by stopping this process at

11:50

any point are called Taylor polynomials for cosine of x.

11:53

More generally, and hence more abstractly, if we were dealing with some other function

11:58

other than cosine, you would compute its derivative, its second derivative, and so on,

12:03

getting as many terms as you'd like, and you would evaluate each one of them at x equals

12:08

12:09

Then for the polynomial approximation, the coefficient of each x to the n term should be

12:15

the value of the nth derivative of the function evaluated at 0,

12:20

but divided by n factorial.

12:23

This whole rather abstract formula is something you'll likely

12:27

see in any text or course that touches on Taylor polynomials.

12:31

And when you see it, think to yourself that the constant term ensures that

12:36

the value of the polynomial matches with the value of f,

12:39

the next term ensures that the slope of the polynomial matches the slope

12:43

of the function at x equals 0, the next term ensures that the rate at which

12:48

the slope changes is the same at that point, and so on,

12:51

depending on how many terms you want.

12:54

And the more terms you choose, the closer the approximation,

12:57

but the tradeoff is that the polynomial you'd get would be more complicated.

13:02

And to make things even more general, if you wanted to approximate near some input

13:07

other than 0, which we'll call a, you would write this polynomial in terms of powers

13:12

of x minus a, and you would evaluate all the derivatives of f at that input, a.

13:18

This is what Taylor polynomials look like in their fullest generality.

13:24

Changing the value of a changes where this approximation is hugging the original

13:28

function, where its higher order derivatives will be equal to those of the original

13:33

function.

13:35

One of the simplest meaningful examples of this is

13:38

the function e to the x around the input x equals 0.

13:42

Computing the derivatives is super nice, as nice as it gets,

13:46

because the derivative of e to the x is itself,

13:49

so the second derivative is also e to the x, as is its third, and so on.

13:54

So at the point x equals 0, all of these are equal to 1.

13:59

And what that means is our polynomial approximation should look like

14:05

1 plus 1 times x plus 1 over 2 times x squared plus 1 over 3 factorial times x cubed,

14:13

and so on, depending on how many terms you want.

14:19

These are the Taylor polynomials for e to the x.

14:26

Ok, so with that as a foundation, in the spirit of showing you just how connected all

14:31

the topics of calculus are, let me turn to something kind of fun,

14:34

a completely different way to understand this second order term of the Taylor

14:38

polynomials, but geometrically.

14:41

It's related to the fundamental theorem of calculus,

14:43

which I talked about in chapters 1 and 8 if you need a quick refresher.

14:47

Like we did in those videos, consider a function that gives the area

14:52

under some graph between a fixed left point and a variable right point.

14:56

What we're going to do here is think about how to approximate this area function,

15:00

not the function for the graph itself, like we've been doing before.

15:04

Focusing on that area is what's going to make the second order term pop out.

15:10

Remember, the fundamental theorem of calculus is that this graph itself represents the

15:16

derivative of the area function, and it's because a slight nudge dx to the right bound

15:22

of the area gives a new bit of area approximately equal to the height of the graph times

15:28

dx.

15:30

And that approximation is increasingly accurate for smaller and smaller choices of dx.

15:35

But if you wanted to be more accurate about this change in area,

15:39

given some change in x that isn't meant to approach 0,

15:42

you would have to take into account this portion right here,

15:46

which is approximately a triangle.

15:49

Let's name the starting input a, and the nudged input above it x, so that change is x-a.

15:58

The base of that little triangle is that change, x-a,

16:02

and its height is the slope of the graph times x-a.

16:08

Since this graph is the derivative of the area function,

16:11

its slope is the second derivative of the area function, evaluated at the input a.

16:18

So the area of this triangle, 1 half base times height,

16:22

is 1 half times the second derivative of this area function, evaluated at a,

16:28

multiplied by x-a2.

16:30

And this is exactly what you would see with a Taylor polynomial.

16:34

If you knew the various derivative information about this area function at the point a,

16:40

how would you approximate the area at the point x?

16:45

Well you have to include all that area up to a, f of a,

16:49

plus the area of this rectangle here, which is the first derivative, times x-a,

16:54

plus the area of that little triangle, which is 1 half times the second derivative,

17:00

times x-a2.

17:02

I really like this, because even though it looks a bit messy all written out,

17:06

each one of the terms has a very clear meaning that you can just point to on the diagram.

17:13

If you wanted, we could call it an end here, and you would have a

17:16

phenomenally useful tool for approximating these Taylor polynomials.

17:21

But if you're thinking like a mathematician, one question you might ask is

17:25

whether or not it makes sense to never stop and just add infinitely many terms.

17:31

In math, an infinite sum is called a series, so even though one of these

17:35

approximations with finitely many terms is called a Taylor polynomial,

17:40

adding all infinitely many terms gives what's called a Taylor series.

17:45

You have to be really careful with the idea of an infinite series,

17:48

because it doesn't actually make sense to add infinitely many things,

17:52

you can only hit the plus button on the calculator so many times.

17:57

But if you have a series where adding more and more of the terms,

18:01

which makes sense at each step, gets you increasingly close to some specific value,

18:06

what you say is that the series converges to that value.

18:10

Or, if you're comfortable extending the definition of equality to

18:14

include this kind of series convergence, you'd say that the series as a whole,

18:19

this infinite sum, equals the value it's converging to.

18:23

For example, look at the Taylor polynomial for e to the x,

18:27

and plug in some input, like x equals 1.

18:31

As you add more and more polynomial terms, the total sum gets closer and

18:36

closer to the value e, so you say that this infinite series converges to the number e,

18:42

or what's saying the same thing, that it equals the number e.

18:47

In fact, it turns out that if you plug in any other value of x, like x equals 2,

18:53

and look at the value of the higher and higher order Taylor polynomials at this value,

18:59

they will converge towards e to the x, which is e squared.

19:04

This is true for any input, no matter how far away from 0 it is,

19:08

even though these Taylor polynomials are constructed only from derivative information

19:14

gathered at the input 0.

19:18

In a case like this, we say that e to the x equals its own Taylor series at all inputs x,

19:24

which is kind of a magical thing to have happen.

19:28

Even though this is also true for a couple other important functions,

19:32

like sine and cosine, sometimes these series only converge within a

19:36

certain range around the input whose derivative information you're using.

19:41

If you work out the Taylor series for the natural log of x around the input x equals 1,

19:47

which is built by evaluating the higher order derivatives of the natural

19:51

log of x at x equals 1, this is what it would look like.

19:56

When you plug in an input between 0 and 2, adding more and more terms of this

20:00

series will indeed get you closer and closer to the natural log of that input.

20:06

But outside of that range, even by just a little bit,

20:09

the series fails to approach anything.

20:12

As you add on more and more terms, the sum bounces back and forth wildly.

20:18

It does not, as you might expect, approach the natural log of that value,

20:22

even though the natural log of x is perfectly well defined for inputs above 2.

20:28

In some sense, the derivative information of ln

20:31

of x at x equals 1 doesn't propagate out that far.

20:36

In a case like this, where adding more terms of the series doesn't approach anything,

20:41

you say that the series diverges.

20:44

And that maximum distance between the input you're approximating

20:47

near and points where the outputs of these polynomials actually

20:51

converge is called the radius of convergence for the Taylor series.

20:56

There remains more to learn about Taylor series.

20:59

There are many use cases, tactics for placing bounds on the error of

21:03

these approximations, tests for understanding when series do and don't converge,

21:07

and for that matter, there remains more to learn about calculus as a whole,

21:11

and the countless topics not touched by this series.

21:15

The goal with these videos is to give you the fundamental intuitions

21:19

that make you feel confident and efficient in learning more on your own,

21:23

and potentially even rediscovering more of the topic for yourself.

21:28

In the case of Taylor series, the fundamental intuition to keep in mind

21:32

as you explore more of what there is, is that they translate derivative

21:36

information at a single point to approximation information around that point.

21:43

Thank you once again to everybody who supported this series.

21:47

The next series like it will be on probability,

21:49

and if you want early access as those videos are made, you know where to go.

22:11

Thank you.

— end of transcript —

More from 3Blue1Brown

17:04

The essence of calculus

3Blue1Brown

44:52

How (and why) to take a logarithm of an image

3Blue1Brown

30:38

Solving Wordle using information theory

3Blue1Brown

11:15

The hardest problem on the hardest test

3Blue1Brown

Trending Transcripts

26:15

Is AI pushing our planet too far? | BBC News

BBC News

4:19

George Carlin — I Just Don't Care

Robin Slater

3:17:57

Joe Rogan Experience #2493 - Protect Our Parks 16

PowerfulJRE

17:04

The essence of calculus

3Blue1Brown