[00:14] When I first learned about Taylor series, I definitely
[00:17] didn't appreciate just how important they are.
[00:20] But time and time again they come up in math, physics,
[00:22] and many fields of engineering because they're one of the most
[00:25] powerful tools that math has to offer for approximating functions.
[00:30] I think one of the first times this clicked for me as a
[00:32] student was not in a calculus class but a physics class.
[00:35] We were studying a certain problem that had to do with the potential energy of a
[00:40] pendulum, and for that you need an expression for how high the weight of the
[00:44] pendulum is above its lowest point, and when you work that out it comes out to be
[00:48] proportional to 1 minus the cosine of the angle between the pendulum and the vertical.
[00:53] The specifics of the problem we were trying to solve are beyond the point here,
[00:57] but what I'll say is that this cosine function made the problem awkward and unwieldy,
[01:02] and made it less clear how pendulums relate to other oscillating phenomena.
[01:07] But if you approximate cosine of theta as 1 minus theta squared over 2,
[01:12] everything just fell into place much more easily.
[01:16] If you've never seen anything like this before,
[01:19] an approximation like that might seem completely out of left field.
[01:23] If you graph cosine of theta along with this function, 1 minus theta squared over 2,
[01:28] they do seem rather close to each other, at least for small angles near 0,
[01:33] but how would you even think to make this approximation,
[01:36] and how would you find that particular quadratic?
[01:41] The study of Taylor series is largely about taking non-polynomial
[01:44] functions and finding polynomials that approximate them near some input.
[01:48] The motive here is that polynomials tend to be much easier to deal
[01:52] with than other functions, they're easier to compute,
[01:55] easier to take derivatives, easier to integrate, just all around more friendly.
[02:00] So let's take a look at that function, cosine of x,
[02:03] and really take a moment to think about how you might construct a quadratic
[02:08] approximation near x equals 0.
[02:10] That is, among all of the possible polynomials that look like c0 plus c1
[02:16] times x plus c2 times x squared, for some choice of these constants, c0,
[02:21] c1, and c2, find the one that most resembles cosine of x near x equals 0,
[02:27] whose graph kind of spoons with the graph of cosine x at that point.
[02:33] Well, first of all, at the input 0, the value of cosine of x is 1,
[02:38] so if our approximation is going to be any good at all,
[02:41] it should also equal 1 at the input x equals 0.
[02:45] Plugging in 0 just results in whatever c0 is, so we can set that equal to 1.
[02:53] This leaves us free to choose constants c1 and c2 to make this
[02:56] approximation as good as we can, but nothing we do with them is
[03:00] going to change the fact that the polynomial equals 1 at x equals 0.
[03:04] It would also be good if our approximation had the same
[03:08] tangent slope as cosine x at this point of interest.
[03:11] Otherwise the approximation drifts away from the
[03:14] cosine graph much faster than it needs to.
[03:18] The derivative of cosine is negative sine, and at x equals 0,
[03:22] that equals 0, meaning the tangent line is perfectly flat.
[03:26] On the other hand, when you work out the derivative of our quadratic,
[03:31] you get c1 plus 2 times c2 times x.
[03:35] At x equals 0, this just equals whatever we choose for c1.
[03:40] So this constant c1 has complete control over the
[03:43] derivative of our approximation around x equals 0.
[03:47] Setting it equal to 0 ensures that our approximation
[03:49] also has a flat tangent line at this point.
[03:53] This leaves us free to change c2, but the value and the slope of our
[03:57] polynomial at x equals 0 are locked in place to match that of cosine.
[04:04] The final thing to take advantage of is the fact that the cosine graph
[04:08] curves downward above x equals 0, it has a negative second derivative.
[04:13] Or in other words, even though the rate of change is 0 at that point,
[04:17] the rate of change itself is decreasing around that point.
[04:21] Specifically, since its derivative is negative sine of x,
[04:25] its second derivative is negative cosine of x, and at x equals 0, that equals negative 1.
[04:33] Now in the same way that we wanted the derivative of our approximation to
[04:37] match that of the cosine so that their values wouldn't drift apart needlessly quickly,
[04:41] making sure that their second derivatives match will ensure that they
[04:45] curve at the same rate, that the slope of our polynomial doesn't drift
[04:49] away from the slope of cosine x any more quickly than it needs to.
[04:54] Pulling up the same derivative we had before, and then taking its derivative,
[04:59] we see that the second derivative of this polynomial is exactly 2 times c2.
[05:04] So to make sure that this second derivative also equals negative 1 at x equals 0,
[05:10] 2 times c2 has to be negative 1, meaning c2 itself should be negative 1 half.
[05:16] This gives us the approximation 1 plus 0x minus 1 half x squared.
[05:23] To get a feel for how good it is, if you estimate cosine of 0.1 using this polynomial,
[05:29] you'd estimate it to be 0.995, and this is the true value of cosine of 0.1.
[05:36] It's a really good approximation!
[05:40] Take a moment to reflect on what just happened.
[05:42] You had 3 degrees of freedom with this quadratic approximation,
[05:46] the constants c0, c1, and c2.
[05:49] c0 was responsible for making sure that the output of the approximation matches that of
[05:55] cosine x at x equals 0, c1 was in charge of making sure that the derivatives match at
[06:01] that point, and c2 was responsible for making sure that the second derivatives match up.
[06:08] This ensures that the way your approximation changes as you move away from x equals 0,
[06:14] and the way that the rate of change itself changes,
[06:17] is as similar as possible to the behaviour of cosine x,
[06:20] given the amount of control you have.
[06:24] You could give yourself more control by allowing more terms
[06:27] in your polynomial and matching higher order derivatives.
[06:30] For example, let's say you added on the term c3 times x cubed for some constant c3.
[06:36] In that case, if you take the third derivative of a cubic polynomial,
[06:41] anything quadratic or smaller goes to 0.
[06:45] As for that last term, after 3 iterations of the power rule,
[06:50] it looks like 1 times 2 times 3 times c3.
[06:56] On the other hand, the third derivative of cosine x comes out to sine x,
[07:01] which equals 0 at x equals 0.
[07:03] So to make sure that the third derivatives match, the constant c3 should be 0.
[07:09] Or in other words, not only is 1 minus ½ x2 the best possible quadratic
[07:14] approximation of cosine, it's also the best possible cubic approximation.
[07:21] You can make an improvement by adding on a fourth order term, c4 times x to the fourth.
[07:27] The fourth derivative of cosine is itself, which equals 1 at x equals 0.
[07:34] And what's the fourth derivative of our polynomial with this new term?
[07:38] Well, when you keep applying the power rule over and over,
[07:42] with those exponents all hopping down in front,
[07:45] you end up with 1 times 2 times 3 times 4 times c4, which is 24 times c4.
[07:51] So if we want this to match the fourth derivative of cosine x,
[07:56] which is 1, c4 has to be 1 over 24.
[07:59] And indeed, the polynomial 1 minus ½ x2 plus 1 24 times x to the fourth,
[08:05] which looks like this, is a very close approximation for cosine x around x equals 0.
[08:13] In any physics problem involving the cosine of a small angle, for example,
[08:18] predictions would be almost unnoticeably different if you substituted this polynomial
[08:23] for cosine of x.
[08:26] Take a step back and notice a few things happening with this process.
[08:30] First of all, factorial terms come up very naturally in this process.
[08:35] When you take n successive derivatives of the function x to the n,
[08:39] letting the power rule keep cascading on down,
[08:43] what you'll be left with is 1 times 2 times 3 on and on up to whatever n is.
[08:49] So you don't simply set the coefficients of the polynomial equal to whatever derivative
[08:53] you want, you have to divide by the appropriate factorial to cancel out this effect.
[08:59] For example, that x to the fourth coefficient was the fourth derivative of cosine,
[09:05] 1, but divided by 4 factorial, 24.
[09:09] The second thing to notice is that adding on new terms,
[09:12] like this c4 times x to the fourth, doesn't mess up what the old terms should be,
[09:17] and that's really important.
[09:20] For example, the second derivative of this polynomial at x equals 0 is still equal
[09:25] to 2 times the second coefficient, even after you introduce higher order terms.
[09:30] And it's because we're plugging in x equals 0,
[09:33] so the second derivative of any higher order term, which all include an x,
[09:38] will just wash away.
[09:40] And the same goes for any other derivative, which is why each derivative of a
[09:45] polynomial at x equals 0 is controlled by one and only one of the coefficients.
[09:52] If instead you were approximating near an input other than 0, like x equals pi,
[09:57] in order to get the same effect you would have to write your polynomial in
[10:01] terms of powers of x minus pi, or whatever input you're looking at.
[10:06] This makes it look noticeably more complicated,
[10:09] but all we're doing is making sure that the point pi looks and behaves like 0,
[10:13] so that plugging in x equals pi will result in a lot of nice cancellation that
[10:18] leaves only one constant.
[10:22] And finally, on a more philosophical level, notice how what we're doing here is basically
[10:27] taking information about higher order derivatives of a function at a single point,
[10:32] and translating that into information about the value of the function near that point.
[10:40] You can take as many derivatives of cosine as you want.
[10:44] It follows this nice cyclic pattern, cosine of x,
[10:47] negative sine of x, negative cosine, sine, and then repeat.
[10:52] And the value of each one of these is easy to compute at x equals 0.
[10:56] It gives this cyclic pattern 1, 0, negative 1, 0, and then repeat.
[11:02] And knowing the values of all those higher order derivatives is a lot of information
[11:07] about cosine of x, even though it only involves plugging in a single number, x equals 0.
[11:14] So what we're doing is leveraging that information to get an approximation around this
[11:19] input, and you do it by creating a polynomial whose higher order derivatives are designed
[11:25] to match up with those of cosine, following this same 1, 0, negative 1, 0, cyclic pattern.
[11:31] And to do that, you just make each coefficient of the polynomial follow that
[11:35] same pattern, but you have to divide each one by the appropriate factorial.
[11:40] Like I mentioned before, this is what cancels out
[11:42] the cascading effect of many power rule applications.
[11:47] The polynomials you get by stopping this process at
[11:50] any point are called Taylor polynomials for cosine of x.
[11:53] More generally, and hence more abstractly, if we were dealing with some other function
[11:58] other than cosine, you would compute its derivative, its second derivative, and so on,
[12:03] getting as many terms as you'd like, and you would evaluate each one of them at x equals
[12:08] 0.
[12:09] Then for the polynomial approximation, the coefficient of each x to the n term should be
[12:15] the value of the nth derivative of the function evaluated at 0,
[12:20] but divided by n factorial.
[12:23] This whole rather abstract formula is something you'll likely
[12:27] see in any text or course that touches on Taylor polynomials.
[12:31] And when you see it, think to yourself that the constant term ensures that
[12:36] the value of the polynomial matches with the value of f,
[12:39] the next term ensures that the slope of the polynomial matches the slope
[12:43] of the function at x equals 0, the next term ensures that the rate at which
[12:48] the slope changes is the same at that point, and so on,
[12:51] depending on how many terms you want.
[12:54] And the more terms you choose, the closer the approximation,
[12:57] but the tradeoff is that the polynomial you'd get would be more complicated.
[13:02] And to make things even more general, if you wanted to approximate near some input
[13:07] other than 0, which we'll call a, you would write this polynomial in terms of powers
[13:12] of x minus a, and you would evaluate all the derivatives of f at that input, a.
[13:18] This is what Taylor polynomials look like in their fullest generality.
[13:24] Changing the value of a changes where this approximation is hugging the original
[13:28] function, where its higher order derivatives will be equal to those of the original
[13:33] function.
[13:35] One of the simplest meaningful examples of this is
[13:38] the function e to the x around the input x equals 0.
[13:42] Computing the derivatives is super nice, as nice as it gets,
[13:46] because the derivative of e to the x is itself,
[13:49] so the second derivative is also e to the x, as is its third, and so on.
[13:54] So at the point x equals 0, all of these are equal to 1.
[13:59] And what that means is our polynomial approximation should look like
[14:05] 1 plus 1 times x plus 1 over 2 times x squared plus 1 over 3 factorial times x cubed,
[14:13] and so on, depending on how many terms you want.
[14:19] These are the Taylor polynomials for e to the x.
[14:26] Ok, so with that as a foundation, in the spirit of showing you just how connected all
[14:31] the topics of calculus are, let me turn to something kind of fun,
[14:34] a completely different way to understand this second order term of the Taylor
[14:38] polynomials, but geometrically.
[14:41] It's related to the fundamental theorem of calculus,
[14:43] which I talked about in chapters 1 and 8 if you need a quick refresher.
[14:47] Like we did in those videos, consider a function that gives the area
[14:52] under some graph between a fixed left point and a variable right point.
[14:56] What we're going to do here is think about how to approximate this area function,
[15:00] not the function for the graph itself, like we've been doing before.
[15:04] Focusing on that area is what's going to make the second order term pop out.
[15:10] Remember, the fundamental theorem of calculus is that this graph itself represents the
[15:16] derivative of the area function, and it's because a slight nudge dx to the right bound
[15:22] of the area gives a new bit of area approximately equal to the height of the graph times
[15:28] dx.
[15:30] And that approximation is increasingly accurate for smaller and smaller choices of dx.
[15:35] But if you wanted to be more accurate about this change in area,
[15:39] given some change in x that isn't meant to approach 0,
[15:42] you would have to take into account this portion right here,
[15:46] which is approximately a triangle.
[15:49] Let's name the starting input a, and the nudged input above it x, so that change is x-a.
[15:58] The base of that little triangle is that change, x-a,
[16:02] and its height is the slope of the graph times x-a.
[16:08] Since this graph is the derivative of the area function,
[16:11] its slope is the second derivative of the area function, evaluated at the input a.
[16:18] So the area of this triangle, 1 half base times height,
[16:22] is 1 half times the second derivative of this area function, evaluated at a,
[16:28] multiplied by x-a2.
[16:30] And this is exactly what you would see with a Taylor polynomial.
[16:34] If you knew the various derivative information about this area function at the point a,
[16:40] how would you approximate the area at the point x?
[16:45] Well you have to include all that area up to a, f of a,
[16:49] plus the area of this rectangle here, which is the first derivative, times x-a,
[16:54] plus the area of that little triangle, which is 1 half times the second derivative,
[17:00] times x-a2.
[17:02] I really like this, because even though it looks a bit messy all written out,
[17:06] each one of the terms has a very clear meaning that you can just point to on the diagram.
[17:13] If you wanted, we could call it an end here, and you would have a
[17:16] phenomenally useful tool for approximating these Taylor polynomials.
[17:21] But if you're thinking like a mathematician, one question you might ask is
[17:25] whether or not it makes sense to never stop and just add infinitely many terms.
[17:31] In math, an infinite sum is called a series, so even though one of these
[17:35] approximations with finitely many terms is called a Taylor polynomial,
[17:40] adding all infinitely many terms gives what's called a Taylor series.
[17:45] You have to be really careful with the idea of an infinite series,
[17:48] because it doesn't actually make sense to add infinitely many things,
[17:52] you can only hit the plus button on the calculator so many times.
[17:57] But if you have a series where adding more and more of the terms,
[18:01] which makes sense at each step, gets you increasingly close to some specific value,
[18:06] what you say is that the series converges to that value.
[18:10] Or, if you're comfortable extending the definition of equality to
[18:14] include this kind of series convergence, you'd say that the series as a whole,
[18:19] this infinite sum, equals the value it's converging to.
[18:23] For example, look at the Taylor polynomial for e to the x,
[18:27] and plug in some input, like x equals 1.
[18:31] As you add more and more polynomial terms, the total sum gets closer and
[18:36] closer to the value e, so you say that this infinite series converges to the number e,
[18:42] or what's saying the same thing, that it equals the number e.
[18:47] In fact, it turns out that if you plug in any other value of x, like x equals 2,
[18:53] and look at the value of the higher and higher order Taylor polynomials at this value,
[18:59] they will converge towards e to the x, which is e squared.
[19:04] This is true for any input, no matter how far away from 0 it is,
[19:08] even though these Taylor polynomials are constructed only from derivative information
[19:14] gathered at the input 0.
[19:18] In a case like this, we say that e to the x equals its own Taylor series at all inputs x,
[19:24] which is kind of a magical thing to have happen.
[19:28] Even though this is also true for a couple other important functions,
[19:32] like sine and cosine, sometimes these series only converge within a
[19:36] certain range around the input whose derivative information you're using.
[19:41] If you work out the Taylor series for the natural log of x around the input x equals 1,
[19:47] which is built by evaluating the higher order derivatives of the natural
[19:51] log of x at x equals 1, this is what it would look like.
[19:56] When you plug in an input between 0 and 2, adding more and more terms of this
[20:00] series will indeed get you closer and closer to the natural log of that input.
[20:06] But outside of that range, even by just a little bit,
[20:09] the series fails to approach anything.
[20:12] As you add on more and more terms, the sum bounces back and forth wildly.
[20:18] It does not, as you might expect, approach the natural log of that value,
[20:22] even though the natural log of x is perfectly well defined for inputs above 2.
[20:28] In some sense, the derivative information of ln
[20:31] of x at x equals 1 doesn't propagate out that far.
[20:36] In a case like this, where adding more terms of the series doesn't approach anything,
[20:41] you say that the series diverges.
[20:44] And that maximum distance between the input you're approximating
[20:47] near and points where the outputs of these polynomials actually
[20:51] converge is called the radius of convergence for the Taylor series.
[20:56] There remains more to learn about Taylor series.
[20:59] There are many use cases, tactics for placing bounds on the error of
[21:03] these approximations, tests for understanding when series do and don't converge,
[21:07] and for that matter, there remains more to learn about calculus as a whole,
[21:11] and the countless topics not touched by this series.
[21:15] The goal with these videos is to give you the fundamental intuitions
[21:19] that make you feel confident and efficient in learning more on your own,
[21:23] and potentially even rediscovering more of the topic for yourself.
[21:28] In the case of Taylor series, the fundamental intuition to keep in mind
[21:32] as you explore more of what there is, is that they translate derivative
[21:36] information at a single point to approximation information around that point.
[21:43] Thank you once again to everybody who supported this series.
[21:47] The next series like it will be on probability,
[21:49] and if you want early access as those videos are made, you know where to go.
[22:11] Thank you.