[00:14] When I first learned about Taylor series, I definitely [00:17] didn't appreciate just how important they are. [00:20] But time and time again they come up in math, physics, [00:22] and many fields of engineering because they're one of the most [00:25] powerful tools that math has to offer for approximating functions. [00:30] I think one of the first times this clicked for me as a [00:32] student was not in a calculus class but a physics class. [00:35] We were studying a certain problem that had to do with the potential energy of a [00:40] pendulum, and for that you need an expression for how high the weight of the [00:44] pendulum is above its lowest point, and when you work that out it comes out to be [00:48] proportional to 1 minus the cosine of the angle between the pendulum and the vertical. [00:53] The specifics of the problem we were trying to solve are beyond the point here, [00:57] but what I'll say is that this cosine function made the problem awkward and unwieldy, [01:02] and made it less clear how pendulums relate to other oscillating phenomena. [01:07] But if you approximate cosine of theta as 1 minus theta squared over 2, [01:12] everything just fell into place much more easily. [01:16] If you've never seen anything like this before, [01:19] an approximation like that might seem completely out of left field. [01:23] If you graph cosine of theta along with this function, 1 minus theta squared over 2, [01:28] they do seem rather close to each other, at least for small angles near 0, [01:33] but how would you even think to make this approximation, [01:36] and how would you find that particular quadratic? [01:41] The study of Taylor series is largely about taking non-polynomial [01:44] functions and finding polynomials that approximate them near some input. [01:48] The motive here is that polynomials tend to be much easier to deal [01:52] with than other functions, they're easier to compute, [01:55] easier to take derivatives, easier to integrate, just all around more friendly. [02:00] So let's take a look at that function, cosine of x, [02:03] and really take a moment to think about how you might construct a quadratic [02:08] approximation near x equals 0. [02:10] That is, among all of the possible polynomials that look like c0 plus c1 [02:16] times x plus c2 times x squared, for some choice of these constants, c0, [02:21] c1, and c2, find the one that most resembles cosine of x near x equals 0, [02:27] whose graph kind of spoons with the graph of cosine x at that point. [02:33] Well, first of all, at the input 0, the value of cosine of x is 1, [02:38] so if our approximation is going to be any good at all, [02:41] it should also equal 1 at the input x equals 0. [02:45] Plugging in 0 just results in whatever c0 is, so we can set that equal to 1. [02:53] This leaves us free to choose constants c1 and c2 to make this [02:56] approximation as good as we can, but nothing we do with them is [03:00] going to change the fact that the polynomial equals 1 at x equals 0. [03:04] It would also be good if our approximation had the same [03:08] tangent slope as cosine x at this point of interest. [03:11] Otherwise the approximation drifts away from the [03:14] cosine graph much faster than it needs to. [03:18] The derivative of cosine is negative sine, and at x equals 0, [03:22] that equals 0, meaning the tangent line is perfectly flat. [03:26] On the other hand, when you work out the derivative of our quadratic, [03:31] you get c1 plus 2 times c2 times x. [03:35] At x equals 0, this just equals whatever we choose for c1. [03:40] So this constant c1 has complete control over the [03:43] derivative of our approximation around x equals 0. [03:47] Setting it equal to 0 ensures that our approximation [03:49] also has a flat tangent line at this point. [03:53] This leaves us free to change c2, but the value and the slope of our [03:57] polynomial at x equals 0 are locked in place to match that of cosine. [04:04] The final thing to take advantage of is the fact that the cosine graph [04:08] curves downward above x equals 0, it has a negative second derivative. [04:13] Or in other words, even though the rate of change is 0 at that point, [04:17] the rate of change itself is decreasing around that point. [04:21] Specifically, since its derivative is negative sine of x, [04:25] its second derivative is negative cosine of x, and at x equals 0, that equals negative 1. [04:33] Now in the same way that we wanted the derivative of our approximation to [04:37] match that of the cosine so that their values wouldn't drift apart needlessly quickly, [04:41] making sure that their second derivatives match will ensure that they [04:45] curve at the same rate, that the slope of our polynomial doesn't drift [04:49] away from the slope of cosine x any more quickly than it needs to. [04:54] Pulling up the same derivative we had before, and then taking its derivative, [04:59] we see that the second derivative of this polynomial is exactly 2 times c2. [05:04] So to make sure that this second derivative also equals negative 1 at x equals 0, [05:10] 2 times c2 has to be negative 1, meaning c2 itself should be negative 1 half. [05:16] This gives us the approximation 1 plus 0x minus 1 half x squared. [05:23] To get a feel for how good it is, if you estimate cosine of 0.1 using this polynomial, [05:29] you'd estimate it to be 0.995, and this is the true value of cosine of 0.1. [05:36] It's a really good approximation! [05:40] Take a moment to reflect on what just happened. [05:42] You had 3 degrees of freedom with this quadratic approximation, [05:46] the constants c0, c1, and c2. [05:49] c0 was responsible for making sure that the output of the approximation matches that of [05:55] cosine x at x equals 0, c1 was in charge of making sure that the derivatives match at [06:01] that point, and c2 was responsible for making sure that the second derivatives match up. [06:08] This ensures that the way your approximation changes as you move away from x equals 0, [06:14] and the way that the rate of change itself changes, [06:17] is as similar as possible to the behaviour of cosine x, [06:20] given the amount of control you have. [06:24] You could give yourself more control by allowing more terms [06:27] in your polynomial and matching higher order derivatives. [06:30] For example, let's say you added on the term c3 times x cubed for some constant c3. [06:36] In that case, if you take the third derivative of a cubic polynomial, [06:41] anything quadratic or smaller goes to 0. [06:45] As for that last term, after 3 iterations of the power rule, [06:50] it looks like 1 times 2 times 3 times c3. [06:56] On the other hand, the third derivative of cosine x comes out to sine x, [07:01] which equals 0 at x equals 0. [07:03] So to make sure that the third derivatives match, the constant c3 should be 0. [07:09] Or in other words, not only is 1 minus ½ x2 the best possible quadratic [07:14] approximation of cosine, it's also the best possible cubic approximation. [07:21] You can make an improvement by adding on a fourth order term, c4 times x to the fourth. [07:27] The fourth derivative of cosine is itself, which equals 1 at x equals 0. [07:34] And what's the fourth derivative of our polynomial with this new term? [07:38] Well, when you keep applying the power rule over and over, [07:42] with those exponents all hopping down in front, [07:45] you end up with 1 times 2 times 3 times 4 times c4, which is 24 times c4. [07:51] So if we want this to match the fourth derivative of cosine x, [07:56] which is 1, c4 has to be 1 over 24. [07:59] And indeed, the polynomial 1 minus ½ x2 plus 1 24 times x to the fourth, [08:05] which looks like this, is a very close approximation for cosine x around x equals 0. [08:13] In any physics problem involving the cosine of a small angle, for example, [08:18] predictions would be almost unnoticeably different if you substituted this polynomial [08:23] for cosine of x. [08:26] Take a step back and notice a few things happening with this process. [08:30] First of all, factorial terms come up very naturally in this process. [08:35] When you take n successive derivatives of the function x to the n, [08:39] letting the power rule keep cascading on down, [08:43] what you'll be left with is 1 times 2 times 3 on and on up to whatever n is. [08:49] So you don't simply set the coefficients of the polynomial equal to whatever derivative [08:53] you want, you have to divide by the appropriate factorial to cancel out this effect. [08:59] For example, that x to the fourth coefficient was the fourth derivative of cosine, [09:05] 1, but divided by 4 factorial, 24. [09:09] The second thing to notice is that adding on new terms, [09:12] like this c4 times x to the fourth, doesn't mess up what the old terms should be, [09:17] and that's really important. [09:20] For example, the second derivative of this polynomial at x equals 0 is still equal [09:25] to 2 times the second coefficient, even after you introduce higher order terms. [09:30] And it's because we're plugging in x equals 0, [09:33] so the second derivative of any higher order term, which all include an x, [09:38] will just wash away. [09:40] And the same goes for any other derivative, which is why each derivative of a [09:45] polynomial at x equals 0 is controlled by one and only one of the coefficients. [09:52] If instead you were approximating near an input other than 0, like x equals pi, [09:57] in order to get the same effect you would have to write your polynomial in [10:01] terms of powers of x minus pi, or whatever input you're looking at. [10:06] This makes it look noticeably more complicated, [10:09] but all we're doing is making sure that the point pi looks and behaves like 0, [10:13] so that plugging in x equals pi will result in a lot of nice cancellation that [10:18] leaves only one constant. [10:22] And finally, on a more philosophical level, notice how what we're doing here is basically [10:27] taking information about higher order derivatives of a function at a single point, [10:32] and translating that into information about the value of the function near that point. [10:40] You can take as many derivatives of cosine as you want. [10:44] It follows this nice cyclic pattern, cosine of x, [10:47] negative sine of x, negative cosine, sine, and then repeat. [10:52] And the value of each one of these is easy to compute at x equals 0. [10:56] It gives this cyclic pattern 1, 0, negative 1, 0, and then repeat. [11:02] And knowing the values of all those higher order derivatives is a lot of information [11:07] about cosine of x, even though it only involves plugging in a single number, x equals 0. [11:14] So what we're doing is leveraging that information to get an approximation around this [11:19] input, and you do it by creating a polynomial whose higher order derivatives are designed [11:25] to match up with those of cosine, following this same 1, 0, negative 1, 0, cyclic pattern. [11:31] And to do that, you just make each coefficient of the polynomial follow that [11:35] same pattern, but you have to divide each one by the appropriate factorial. [11:40] Like I mentioned before, this is what cancels out [11:42] the cascading effect of many power rule applications. [11:47] The polynomials you get by stopping this process at [11:50] any point are called Taylor polynomials for cosine of x. [11:53] More generally, and hence more abstractly, if we were dealing with some other function [11:58] other than cosine, you would compute its derivative, its second derivative, and so on, [12:03] getting as many terms as you'd like, and you would evaluate each one of them at x equals [12:08] 0. [12:09] Then for the polynomial approximation, the coefficient of each x to the n term should be [12:15] the value of the nth derivative of the function evaluated at 0, [12:20] but divided by n factorial. [12:23] This whole rather abstract formula is something you'll likely [12:27] see in any text or course that touches on Taylor polynomials. [12:31] And when you see it, think to yourself that the constant term ensures that [12:36] the value of the polynomial matches with the value of f, [12:39] the next term ensures that the slope of the polynomial matches the slope [12:43] of the function at x equals 0, the next term ensures that the rate at which [12:48] the slope changes is the same at that point, and so on, [12:51] depending on how many terms you want. [12:54] And the more terms you choose, the closer the approximation, [12:57] but the tradeoff is that the polynomial you'd get would be more complicated. [13:02] And to make things even more general, if you wanted to approximate near some input [13:07] other than 0, which we'll call a, you would write this polynomial in terms of powers [13:12] of x minus a, and you would evaluate all the derivatives of f at that input, a. [13:18] This is what Taylor polynomials look like in their fullest generality. [13:24] Changing the value of a changes where this approximation is hugging the original [13:28] function, where its higher order derivatives will be equal to those of the original [13:33] function. [13:35] One of the simplest meaningful examples of this is [13:38] the function e to the x around the input x equals 0. [13:42] Computing the derivatives is super nice, as nice as it gets, [13:46] because the derivative of e to the x is itself, [13:49] so the second derivative is also e to the x, as is its third, and so on. [13:54] So at the point x equals 0, all of these are equal to 1. [13:59] And what that means is our polynomial approximation should look like [14:05] 1 plus 1 times x plus 1 over 2 times x squared plus 1 over 3 factorial times x cubed, [14:13] and so on, depending on how many terms you want. [14:19] These are the Taylor polynomials for e to the x. [14:26] Ok, so with that as a foundation, in the spirit of showing you just how connected all [14:31] the topics of calculus are, let me turn to something kind of fun, [14:34] a completely different way to understand this second order term of the Taylor [14:38] polynomials, but geometrically. [14:41] It's related to the fundamental theorem of calculus, [14:43] which I talked about in chapters 1 and 8 if you need a quick refresher. [14:47] Like we did in those videos, consider a function that gives the area [14:52] under some graph between a fixed left point and a variable right point. [14:56] What we're going to do here is think about how to approximate this area function, [15:00] not the function for the graph itself, like we've been doing before. [15:04] Focusing on that area is what's going to make the second order term pop out. [15:10] Remember, the fundamental theorem of calculus is that this graph itself represents the [15:16] derivative of the area function, and it's because a slight nudge dx to the right bound [15:22] of the area gives a new bit of area approximately equal to the height of the graph times [15:28] dx. [15:30] And that approximation is increasingly accurate for smaller and smaller choices of dx. [15:35] But if you wanted to be more accurate about this change in area, [15:39] given some change in x that isn't meant to approach 0, [15:42] you would have to take into account this portion right here, [15:46] which is approximately a triangle. [15:49] Let's name the starting input a, and the nudged input above it x, so that change is x-a. [15:58] The base of that little triangle is that change, x-a, [16:02] and its height is the slope of the graph times x-a. [16:08] Since this graph is the derivative of the area function, [16:11] its slope is the second derivative of the area function, evaluated at the input a. [16:18] So the area of this triangle, 1 half base times height, [16:22] is 1 half times the second derivative of this area function, evaluated at a, [16:28] multiplied by x-a2. [16:30] And this is exactly what you would see with a Taylor polynomial. [16:34] If you knew the various derivative information about this area function at the point a, [16:40] how would you approximate the area at the point x? [16:45] Well you have to include all that area up to a, f of a, [16:49] plus the area of this rectangle here, which is the first derivative, times x-a, [16:54] plus the area of that little triangle, which is 1 half times the second derivative, [17:00] times x-a2. [17:02] I really like this, because even though it looks a bit messy all written out, [17:06] each one of the terms has a very clear meaning that you can just point to on the diagram. [17:13] If you wanted, we could call it an end here, and you would have a [17:16] phenomenally useful tool for approximating these Taylor polynomials. [17:21] But if you're thinking like a mathematician, one question you might ask is [17:25] whether or not it makes sense to never stop and just add infinitely many terms. [17:31] In math, an infinite sum is called a series, so even though one of these [17:35] approximations with finitely many terms is called a Taylor polynomial, [17:40] adding all infinitely many terms gives what's called a Taylor series. [17:45] You have to be really careful with the idea of an infinite series, [17:48] because it doesn't actually make sense to add infinitely many things, [17:52] you can only hit the plus button on the calculator so many times. [17:57] But if you have a series where adding more and more of the terms, [18:01] which makes sense at each step, gets you increasingly close to some specific value, [18:06] what you say is that the series converges to that value. [18:10] Or, if you're comfortable extending the definition of equality to [18:14] include this kind of series convergence, you'd say that the series as a whole, [18:19] this infinite sum, equals the value it's converging to. [18:23] For example, look at the Taylor polynomial for e to the x, [18:27] and plug in some input, like x equals 1. [18:31] As you add more and more polynomial terms, the total sum gets closer and [18:36] closer to the value e, so you say that this infinite series converges to the number e, [18:42] or what's saying the same thing, that it equals the number e. [18:47] In fact, it turns out that if you plug in any other value of x, like x equals 2, [18:53] and look at the value of the higher and higher order Taylor polynomials at this value, [18:59] they will converge towards e to the x, which is e squared. [19:04] This is true for any input, no matter how far away from 0 it is, [19:08] even though these Taylor polynomials are constructed only from derivative information [19:14] gathered at the input 0. [19:18] In a case like this, we say that e to the x equals its own Taylor series at all inputs x, [19:24] which is kind of a magical thing to have happen. [19:28] Even though this is also true for a couple other important functions, [19:32] like sine and cosine, sometimes these series only converge within a [19:36] certain range around the input whose derivative information you're using. [19:41] If you work out the Taylor series for the natural log of x around the input x equals 1, [19:47] which is built by evaluating the higher order derivatives of the natural [19:51] log of x at x equals 1, this is what it would look like. [19:56] When you plug in an input between 0 and 2, adding more and more terms of this [20:00] series will indeed get you closer and closer to the natural log of that input. [20:06] But outside of that range, even by just a little bit, [20:09] the series fails to approach anything. [20:12] As you add on more and more terms, the sum bounces back and forth wildly. [20:18] It does not, as you might expect, approach the natural log of that value, [20:22] even though the natural log of x is perfectly well defined for inputs above 2. [20:28] In some sense, the derivative information of ln [20:31] of x at x equals 1 doesn't propagate out that far. [20:36] In a case like this, where adding more terms of the series doesn't approach anything, [20:41] you say that the series diverges. [20:44] And that maximum distance between the input you're approximating [20:47] near and points where the outputs of these polynomials actually [20:51] converge is called the radius of convergence for the Taylor series. [20:56] There remains more to learn about Taylor series. [20:59] There are many use cases, tactics for placing bounds on the error of [21:03] these approximations, tests for understanding when series do and don't converge, [21:07] and for that matter, there remains more to learn about calculus as a whole, [21:11] and the countless topics not touched by this series. [21:15] The goal with these videos is to give you the fundamental intuitions [21:19] that make you feel confident and efficient in learning more on your own, [21:23] and potentially even rediscovering more of the topic for yourself. [21:28] In the case of Taylor series, the fundamental intuition to keep in mind [21:32] as you explore more of what there is, is that they translate derivative [21:36] information at a single point to approximation information around that point. [21:43] Thank you once again to everybody who supported this series. [21:47] The next series like it will be on probability, [21:49] and if you want early access as those videos are made, you know where to go. [22:11] Thank you.