1 00:00:14,500 --> 00:00:18,652 In the last videos I talked about the derivatives of simple functions, 2 00:00:18,652 --> 00:00:22,455 and the goal was to have a clear picture or intuition to hold in 3 00:00:22,455 --> 00:00:26,199 your mind that actually explains where these formulas come from. 4 00:00:26,839 --> 00:00:31,387 But most of the functions you deal with in modeling the world involve mixing, 5 00:00:31,387 --> 00:00:35,176 combining, or tweaking these simple functions in some other way, 6 00:00:35,176 --> 00:00:39,782 so our next step is to understand how you take derivatives of more complicated 7 00:00:39,781 --> 00:00:40,539 combinations. 8 00:00:41,280 --> 00:00:43,995 Again, I don't want these to be something to memorize, 9 00:00:43,994 --> 00:00:47,599 I want you to have a clear picture in mind for where each one comes from. 10 00:00:49,520 --> 00:00:53,600 Now, this really boils down into three basic ways to combine functions. 11 00:00:54,100 --> 00:00:56,591 You can add them together, you can multiply them, 12 00:00:56,591 --> 00:00:59,780 and you can throw one inside the other, known as composing them. 13 00:01:00,600 --> 00:01:03,829 Sure, you could say subtracting them, but really that's just 14 00:01:03,829 --> 00:01:07,219 multiplying the second by negative one and adding them together. 15 00:01:08,239 --> 00:01:11,164 Likewise, dividing functions doesn't really add anything, 16 00:01:11,164 --> 00:01:14,844 because that's the same as plugging one inside the function, one over x, 17 00:01:14,843 --> 00:01:16,759 and then multiplying the two together. 18 00:01:17,659 --> 00:01:20,664 So really, most functions you come across just involve layering 19 00:01:20,664 --> 00:01:23,200 together these three different types of combinations, 20 00:01:23,200 --> 00:01:26,439 though there's not really a bound on how monstrous things can become. 21 00:01:27,099 --> 00:01:31,298 But as long as you know how derivatives play with just those three combination types, 22 00:01:31,299 --> 00:01:34,376 you'll always be able to take it step by step and peel through 23 00:01:34,376 --> 00:01:36,719 the layers for any kind of monstrous expression. 24 00:01:38,719 --> 00:01:42,540 So the question is, if you know the derivative of two functions, 25 00:01:42,540 --> 00:01:45,774 what is the derivative of their sum, of their product, 26 00:01:45,774 --> 00:01:48,420 and of the function composition between them? 27 00:01:50,319 --> 00:01:54,259 The sum rule is easiest, if somewhat tongue-twisting to say out loud. 28 00:01:54,840 --> 00:01:58,600 The derivative of a sum of two functions is the sum of their derivatives. 29 00:01:59,799 --> 00:02:03,700 But it's worth warming up with this example by really thinking through 30 00:02:03,700 --> 00:02:07,105 what it means to take a derivative of a sum of two functions, 31 00:02:07,105 --> 00:02:11,170 since the derivative patterns for products and function composition won't 32 00:02:11,169 --> 00:02:15,619 be so straightforward, and they're going to require this kind of deeper thinking. 33 00:02:16,699 --> 00:02:21,199 For example, let's think about this function f of x equals sine of x plus x squared. 34 00:02:22,199 --> 00:02:25,211 It's a function where, for every input, you add together 35 00:02:25,211 --> 00:02:27,959 the values of sine of x and x squared at that point. 36 00:02:29,759 --> 00:02:34,048 For example, let's say at x equals 0.5, the height of the sine 37 00:02:34,049 --> 00:02:38,201 graph is given by this vertical bar, and the height of the x 38 00:02:38,201 --> 00:02:42,560 squared parabola is given by this slightly smaller vertical bar. 39 00:02:44,379 --> 00:02:47,319 And their sum is the length you get by just stacking them together. 40 00:02:48,520 --> 00:02:53,939 For the derivative, you want to ask what happens as you nudge that input slightly, 41 00:02:53,938 --> 00:02:56,419 maybe increasing it up to 0.5 plus dx. 42 00:02:57,560 --> 00:03:02,920 The difference in the value of f between those two places is what we call df. 43 00:03:04,360 --> 00:03:08,978 And when you picture it like this, I think you'll agree that the total 44 00:03:08,978 --> 00:03:13,271 change in the height is whatever the change to the sine graph is, 45 00:03:13,270 --> 00:03:18,799 what we might call d sine of x, plus whatever the change to x squared is, dx squared. 46 00:03:22,240 --> 00:03:27,540 We know that the derivative of sine is cosine, and remember what that means. 47 00:03:27,919 --> 00:03:33,299 It means that this little change, d sine of x, is about cosine of x times dx. 48 00:03:33,780 --> 00:03:37,711 It's proportional to the size of our initial nudge dx, 49 00:03:37,711 --> 00:03:43,359 and the proportionality constant equals cosine of whatever input we started at. 50 00:03:43,979 --> 00:03:48,072 Likewise, because the derivative of x squared is 2x, 51 00:03:48,072 --> 00:03:53,939 the change in the height of the x squared graph is 2x times whatever dx was. 52 00:03:55,599 --> 00:04:00,475 So rearranging df divided by dx, the ratio of the tiny change to 53 00:04:00,475 --> 00:04:04,752 the sum function to the tiny change in x that caused it, 54 00:04:04,752 --> 00:04:10,079 is indeed cosine of x plus 2x, the sum of the derivatives of its parts. 55 00:04:11,520 --> 00:04:15,329 But like I said, things are a bit different for products, 56 00:04:15,329 --> 00:04:19,139 and let's think through why in terms of tiny nudges again. 57 00:04:20,060 --> 00:04:23,139 In this case, I don't think graphs are our best bet for visualizing things. 58 00:04:23,819 --> 00:04:27,040 Pretty commonly in math, at a lot of levels of math really, 59 00:04:27,040 --> 00:04:29,617 if you're dealing with a product of two things, 60 00:04:29,617 --> 00:04:32,140 it helps to understand it as some kind of area. 61 00:04:33,079 --> 00:04:36,014 In this case, maybe you try to configure some mental setup 62 00:04:36,014 --> 00:04:39,000 of a box where the side lengths are sine of x and x squared. 63 00:04:39,879 --> 00:04:41,040 But what would that mean? 64 00:04:42,319 --> 00:04:46,606 Well, since these are functions, you might think of those sides as adjustable, 65 00:04:46,607 --> 00:04:49,972 dependent on the value of x, which maybe you think of as this 66 00:04:49,971 --> 00:04:52,739 number that you can just freely adjust up and down. 67 00:04:53,740 --> 00:04:56,812 So getting a feel for what this means, focus on 68 00:04:56,812 --> 00:05:00,139 that top side who changes as the function sine of x. 69 00:05:01,060 --> 00:05:05,305 As you change this value of x up from 0, it increases up to 70 00:05:05,305 --> 00:05:09,127 a length of 1 as sine of x moves up towards its peak, 71 00:05:09,127 --> 00:05:13,939 and after that it starts to decrease as sine of x comes down from 1. 72 00:05:15,100 --> 00:05:18,580 And in the same way, that height there is always changing as x squared. 73 00:05:20,079 --> 00:05:25,800 So f of x, defined as the product of these two functions, is the area of this box. 74 00:05:27,060 --> 00:05:30,120 And for the derivative, let's think about how 75 00:05:30,120 --> 00:05:33,180 a tiny change to x by dx influences that area. 76 00:05:33,839 --> 00:05:36,279 What is that resulting change in area df? 77 00:05:39,000 --> 00:05:44,115 Well, the nudge dx caused that width to change by some small d sine of x, 78 00:05:44,115 --> 00:05:47,919 and it caused that height to change by some dx squared. 79 00:05:50,180 --> 00:05:53,649 And this gives us three little snippets of new area, 80 00:05:53,649 --> 00:05:58,033 a thin rectangle on the bottom whose area is its width, sine of x, 81 00:05:58,033 --> 00:06:00,259 times its thin height, dx squared. 82 00:06:01,779 --> 00:06:06,406 And there's this thin rectangle on the right, whose area is its height, 83 00:06:06,406 --> 00:06:09,299 x squared, times its thin width, d sine of x. 84 00:06:10,740 --> 00:06:14,139 And there's also this little bit in the corner, but we can ignore that. 85 00:06:14,439 --> 00:06:17,856 Its area is ultimately proportional to dx squared, 86 00:06:17,857 --> 00:06:22,480 and as we've seen before, that becomes negligible as dx goes to zero. 87 00:06:23,939 --> 00:06:27,375 I mean, this whole setup is very similar to what I showed last video, 88 00:06:27,375 --> 00:06:28,699 with the x squared diagram. 89 00:06:29,459 --> 00:06:32,704 And just like then, keep in mind that I'm using somewhat beefy 90 00:06:32,704 --> 00:06:35,899 changes here to draw things, just so we can actually see them. 91 00:06:36,360 --> 00:06:39,818 But in principle, dx is something very very small, 92 00:06:39,817 --> 00:06:44,699 and that means that dx squared and d sine of x are also very very small. 93 00:06:45,980 --> 00:06:51,175 So, applying what we know about the derivative of sine and of x squared, 94 00:06:51,175 --> 00:06:55,660 that tiny change, dx squared, is going to be about 2x times dx. 95 00:06:56,360 --> 00:07:01,580 And that tiny change, d sine of x, well that's going to be about cosine of x times dx. 96 00:07:02,920 --> 00:07:09,019 As usual, we divide out by that dx to see that the ratio we want, df divided by dx, 97 00:07:09,019 --> 00:07:12,504 is sine of x times the derivative of x squared, 98 00:07:12,504 --> 00:07:15,699 plus x squared times the derivative of sine. 99 00:07:17,959 --> 00:07:21,259 And nothing we've done here is specific to sine or to x squared. 100 00:07:21,579 --> 00:07:25,359 This same line of reasoning would work for any two functions, g and h. 101 00:07:27,000 --> 00:07:29,310 And sometimes people like to remember this pattern with 102 00:07:29,310 --> 00:07:31,539 a certain mnemonic that you kind of sing in your head. 103 00:07:32,220 --> 00:07:33,680 Left d right, right d left. 104 00:07:34,399 --> 00:07:38,812 In this example, where we have sine of x times x squared, left d right, 105 00:07:38,812 --> 00:07:43,778 means you take that left function, sine of x, times the derivative of the right, 106 00:07:43,778 --> 00:07:44,759 in this case 2x. 107 00:07:45,480 --> 00:07:48,876 Then you add on right d left, that right function, 108 00:07:48,875 --> 00:07:52,939 x squared, times the derivative of the left one, cosine of x. 109 00:07:54,360 --> 00:07:57,271 Now out of context, presented as a rule to remember, 110 00:07:57,271 --> 00:08:00,019 I think this would feel pretty strange, don't you? 111 00:08:00,740 --> 00:08:03,381 But when you actually think of this adjustable box, 112 00:08:03,380 --> 00:08:05,819 you can see what each of those terms represents. 113 00:08:06,579 --> 00:08:10,971 Left d right is the area of that little bottom rectangle, 114 00:08:10,971 --> 00:08:15,439 and right d left is the area of that rectangle on the side. 115 00:08:20,160 --> 00:08:23,847 By the way, I should mention that if you multiply by a constant, 116 00:08:23,846 --> 00:08:26,739 say 2 times sine of x, things end up a lot simpler. 117 00:08:27,399 --> 00:08:30,875 The derivative is just the same as the constant multiplied by 118 00:08:30,875 --> 00:08:34,519 the derivative of the function, in this case 2 times cosine of x. 119 00:08:35,558 --> 00:08:40,178 I'll leave it to you to pause and ponder and verify that makes sense. 120 00:08:41,918 --> 00:08:46,533 Aside from addition and multiplication, the other common way to combine functions, 121 00:08:46,533 --> 00:08:49,201 and believe me, this one comes up all the time, 122 00:08:49,201 --> 00:08:52,259 is to shove one inside the other, function composition. 123 00:08:53,220 --> 00:08:56,898 For example, maybe we take the function x squared and shove it 124 00:08:56,898 --> 00:09:00,460 inside sine of x to get this new function, sine of x squared. 125 00:09:01,399 --> 00:09:04,079 What do you think the derivative of that new function is? 126 00:09:05,299 --> 00:09:09,146 To think this one through, I'll choose yet another way to visualize things, 127 00:09:09,147 --> 00:09:12,540 just to emphasize that in creative math, we've got lots of options. 128 00:09:13,320 --> 00:09:18,591 I'll put up three different number lines, the top one is going to hold the value of x, 129 00:09:18,591 --> 00:09:21,440 the second one is going to hold the x squared, 130 00:09:21,440 --> 00:09:25,500 and the third line is going to hold the value of sine of x squared. 131 00:09:26,460 --> 00:09:30,310 That is, the function x squared gets you from line 1 to line 2, 132 00:09:30,309 --> 00:09:33,500 and the function sine gets you from line 2 to line 3. 133 00:09:34,840 --> 00:09:39,581 As I shift around this value of x, maybe moving it up to the value 3, 134 00:09:39,581 --> 00:09:45,340 that second value stays pegged to whatever x squared is, in this case moving up to 9. 135 00:09:46,200 --> 00:09:49,355 That bottom value, being sine of x squared, is 136 00:09:49,355 --> 00:09:52,580 going to go to whatever sine of 9 happens to be. 137 00:09:54,899 --> 00:10:00,399 So, for the derivative, let's again start by nudging that x value by some little dx. 138 00:10:01,539 --> 00:10:04,799 I always think that it's helpful to think of x as starting 139 00:10:04,799 --> 00:10:07,839 at some actual concrete number, maybe 1.5 in this case. 140 00:10:08,759 --> 00:10:14,737 The resulting nudge to that second value, the change in x squared caused by such a dx, 141 00:10:14,738 --> 00:10:15,700 is dx squared. 142 00:10:16,960 --> 00:10:21,062 We could expand this like we have before, as 2x times dx, 143 00:10:21,062 --> 00:10:25,307 which for our specific input would be 2 times 1.5 times dx, 144 00:10:25,307 --> 00:10:30,120 but it helps to keep things written as dx squared, at least for now. 145 00:10:31,019 --> 00:10:36,384 In fact, I'm going to go one step further, give a new name to this x squared, 146 00:10:36,384 --> 00:10:41,200 maybe h, so instead of writing dx squared for this nudge, we write dh. 147 00:10:42,620 --> 00:10:47,259 This makes it easier to think about that third value, which is now pegged at sine of h. 148 00:10:48,200 --> 00:10:53,680 Its change is d sine of h, the tiny change caused by the nudge dh. 149 00:10:55,000 --> 00:11:00,134 By the way, the fact that it's moving to the left while the dh bump is going to the right 150 00:11:00,134 --> 00:11:05,039 just means that this change, d sine of h, is going to be some kind of negative number. 151 00:11:06,139 --> 00:11:09,639 Once again, we can use our knowledge of the derivative of the sine. 152 00:11:10,500 --> 00:11:14,419 This d sine of h is going to be about cosine of h times dh. 153 00:11:15,240 --> 00:11:18,639 That's what it means for the derivative of sine to be cosine. 154 00:11:19,539 --> 00:11:23,771 Unfolding things, we can replace that h with x squared again, 155 00:11:23,772 --> 00:11:29,780 so we know that the bottom nudge will be a size of cosine of x squared times dx squared. 156 00:11:31,039 --> 00:11:32,480 Let's unfold things even further. 157 00:11:32,840 --> 00:11:38,100 That intermediate nudge dx squared is going to be about 2x times dx. 158 00:11:39,059 --> 00:11:41,445 It's always a good habit to remind yourself of 159 00:11:41,446 --> 00:11:43,680 what an expression like this actually means. 160 00:11:44,340 --> 00:11:48,578 In this case, where we started at x equals 1.5 up top, 161 00:11:48,577 --> 00:11:54,512 this whole expression is telling us that the size of the nudge on that third 162 00:11:54,513 --> 00:12:00,754 line is going to be about cosine of 1.5 squared times 2 times 1.5 times whatever 163 00:12:00,754 --> 00:12:02,220 the size of dx was. 164 00:12:02,720 --> 00:12:05,112 It's proportional to the size of dx, and this 165 00:12:05,111 --> 00:12:07,919 derivative is giving us that proportionality constant. 166 00:12:10,919 --> 00:12:12,559 Notice what we came out with here. 167 00:12:12,960 --> 00:12:15,855 We have the derivative of the outside function, 168 00:12:15,855 --> 00:12:19,235 and it's still taking in the unaltered inside function, 169 00:12:19,235 --> 00:12:23,220 and then multiplying it by the derivative of that inside function. 170 00:12:25,820 --> 00:12:29,220 Again, there's nothing special about sine of x or x squared. 171 00:12:29,740 --> 00:12:33,501 If you have any two functions, g of x and h of x, 172 00:12:33,501 --> 00:12:37,263 the derivative of their composition, g of h of x, 173 00:12:37,264 --> 00:12:43,659 is going to be the derivative of g evaluated on h, multiplied by the derivative of h. 174 00:12:47,139 --> 00:12:50,899 This pattern right here is what we usually call the chain rule. 175 00:12:52,039 --> 00:12:57,679 Notice for the derivative of g, I'm writing it as dg dh instead of dg dx. 176 00:12:58,679 --> 00:13:02,234 On the symbolic level, this is a reminder that the thing you plug 177 00:13:02,235 --> 00:13:06,060 into that derivative is still going to be that intermediary function h. 178 00:13:07,019 --> 00:13:09,745 But more than that, it's an important reflection of what 179 00:13:09,745 --> 00:13:12,519 this derivative of the outer function actually represents. 180 00:13:13,200 --> 00:13:18,449 Remember, in our three line setup, when we took the derivative of the sine on 181 00:13:18,448 --> 00:13:23,899 that bottom, we expanded the size of that nudge, d sine, as cosine of h times dh. 182 00:13:24,940 --> 00:13:27,496 This was because we didn't immediately know how 183 00:13:27,495 --> 00:13:29,840 the size of that bottom nudge depended on x. 184 00:13:30,419 --> 00:13:32,599 That's kind of the whole thing we were trying to figure out. 185 00:13:33,259 --> 00:13:37,360 But we could take the derivative with respect to that intermediate variable, h. 186 00:13:38,100 --> 00:13:41,916 That is, figure out how to express the size of that nudge on the third 187 00:13:41,916 --> 00:13:45,680 line as some multiple of dh, the size of the nudge on the second line. 188 00:13:46,580 --> 00:13:50,700 It was only after that that we unfolded further by figuring out what dh was. 189 00:13:53,320 --> 00:13:58,727 In this chain rule expression, we're saying, look at the ratio between a tiny change in 190 00:13:58,726 --> 00:14:02,351 g, the final output, to a tiny change in h that caused it, 191 00:14:02,351 --> 00:14:04,379 h being the value we plug into g. 192 00:14:05,320 --> 00:14:08,680 Then multiply that by the tiny change in h, divided 193 00:14:08,679 --> 00:14:11,199 by the tiny change in x that caused it. 194 00:14:12,299 --> 00:14:15,481 So notice, those dh's cancel out, and they give us a ratio 195 00:14:15,481 --> 00:14:19,473 between the change in that final output and the change to the input that, 196 00:14:19,474 --> 00:14:22,280 through a certain chain of events, brought it about. 197 00:14:23,860 --> 00:14:26,980 And that cancellation of dh is not just a notational trick. 198 00:14:26,980 --> 00:14:30,350 That is a genuine reflection of what's going on with the 199 00:14:30,350 --> 00:14:33,899 tiny nudges that underpin everything we do with derivatives. 200 00:14:36,299 --> 00:14:39,877 So those are the three basic tools to have in your belt to handle 201 00:14:39,878 --> 00:14:43,240 derivatives of functions that combine a lot of smaller things. 202 00:14:43,840 --> 00:14:47,379 You've got the sum rule, the product rule, and the chain rule. 203 00:14:48,399 --> 00:14:51,922 And I'll be honest with you, there is a big difference between knowing 204 00:14:51,922 --> 00:14:54,551 what the chain rule is and what the product rule is, 205 00:14:54,551 --> 00:14:58,620 and actually being fluent with applying them in even the most hairy of situations. 206 00:14:59,480 --> 00:15:03,100 Watching videos, any videos, about the mechanics of calculus is 207 00:15:03,100 --> 00:15:06,892 never going to substitute for practicing those mechanics yourself, 208 00:15:06,892 --> 00:15:10,400 and building up the muscles to do these computations yourself. 209 00:15:11,240 --> 00:15:13,600 I really wish I could offer to do that for you, 210 00:15:13,600 --> 00:15:17,440 but I'm afraid the ball is in your court, my friend, to seek out the practice. 211 00:15:18,039 --> 00:15:20,940 What I can offer, and what I hope I have offered, 212 00:15:20,941 --> 00:15:23,960 is to show you where these rules actually come from. 213 00:15:24,139 --> 00:15:27,774 To show that they're not just something to be memorized and hammered away, 214 00:15:27,774 --> 00:15:31,264 but they're natural patterns, things that you too could have discovered 215 00:15:31,264 --> 00:15:34,560 just by patiently thinking through what a derivative actually means.