1
00:00:07,740 --> 00:00:11,853
The months ahead of you hold within them a lot of hard work, some neat examples,

2
00:00:11,852 --> 00:00:14,949
some not-so-neat examples, beautiful connections to physics,

3
00:00:14,949 --> 00:00:17,387
not-so-beautiful piles of formulas to memorize,

4
00:00:17,388 --> 00:00:20,943
plenty of moments of getting stuck and banging your head into a wall,

5
00:00:20,943 --> 00:00:24,548
a few nice aha moments sprinkled in as well, and some genuinely lovely

6
00:00:24,547 --> 00:00:27,239
graphical intuition to help guide you through it all.

7
00:00:27,629 --> 00:00:31,699
But if the course ahead of you is anything like my first introduction to calculus,

8
00:00:31,699 --> 00:00:34,543
or any of the first courses I've seen in the years since,

9
00:00:34,543 --> 00:00:38,712
there's one topic you will not see, but which I believe stands to greatly accelerate

10
00:00:38,713 --> 00:00:39,399
your learning.

11
00:00:40,280 --> 00:00:44,620
You see, almost all of the visual intuitions from that first year are based on graphs.

12
00:00:45,079 --> 00:00:49,659
The derivative is the slope of a graph, the integral is a certain area under that graph.

13
00:00:50,200 --> 00:00:54,035
But as you generalize calculus beyond functions whose inputs and outputs are

14
00:00:54,034 --> 00:00:58,019
simply numbers, it's not always possible to graph the function you're analyzing.

15
00:01:00,679 --> 00:01:04,653
So if all your intuitions for the fundamental ideas, like derivatives,

16
00:01:04,653 --> 00:01:08,794
are rooted too rigidly in graphs, it can make for a very tall and largely

17
00:01:08,793 --> 00:01:13,158
unnecessary conceptual hurdle between you and the more quote-unquote advanced

18
00:01:13,159 --> 00:01:17,580
topics like multivariable calculus and complex analysis, differential geometry.

19
00:01:18,840 --> 00:01:22,204
What I want to share with you is a way to think about derivatives,

20
00:01:22,204 --> 00:01:24,716
which I'll refer to as the transformational view,

21
00:01:24,716 --> 00:01:28,433
that generalizes more seamlessly into some of those more general contexts

22
00:01:28,433 --> 00:01:29,640
where calculus comes up.

23
00:01:29,879 --> 00:01:34,859
And then we'll use this alternate view to analyze a fun puzzle about repeated fractions.

24
00:01:35,459 --> 00:01:37,538
But first off, I just want to make sure we're all

25
00:01:37,539 --> 00:01:39,659
on the same page about what the standard visual is.

26
00:01:40,060 --> 00:01:44,936
If you were to graph a function, which simply takes real numbers as inputs and outputs,

27
00:01:44,936 --> 00:01:49,593
one of the first things you learn in a calculus course is that the derivative gives

28
00:01:49,593 --> 00:01:54,138
you the slope of this graph, where what we mean by that is that the derivative of

29
00:01:54,138 --> 00:01:58,240
the function is a new function which for every input x returns that slope.

30
00:01:59,519 --> 00:02:01,978
Now I'd encourage you not to think of this derivative

31
00:02:01,978 --> 00:02:04,439
as slope idea as being the definition of a derivative.

32
00:02:05,000 --> 00:02:07,555
Instead think of it as being more fundamentally about how

33
00:02:07,555 --> 00:02:10,419
sensitive the function is to tiny little nudges around the input.

34
00:02:11,020 --> 00:02:14,058
And the slope is just one way to think about that sensitivity

35
00:02:14,057 --> 00:02:16,900
relevant only to this particular way of viewing functions.

36
00:02:17,340 --> 00:02:19,816
I have not just another video, but a full series on this

37
00:02:19,816 --> 00:02:22,120
topic if it's something you want to learn more about.

38
00:02:22,599 --> 00:02:26,022
The basic idea behind the alternate visual for the derivative is to

39
00:02:26,022 --> 00:02:29,294
think of this function as mapping all of the input points on the

40
00:02:29,294 --> 00:02:32,819
number line to their corresponding outputs on a different number line.

41
00:02:33,400 --> 00:02:36,810
In this context, what the derivative gives you is a measure of how

42
00:02:36,810 --> 00:02:40,219
much the input space gets stretched or squished in various regions.

43
00:02:41,860 --> 00:02:46,673
That is, if you were to zoom in around a specific input and take a look at some

44
00:02:46,673 --> 00:02:51,485
evenly spaced points around it, the derivative of the function of that input is

45
00:02:51,485 --> 00:02:56,599
going to tell you how spread out or contracted those points become after the mapping.

46
00:02:57,939 --> 00:02:59,400
Here, a specific example helps.

47
00:02:59,740 --> 00:03:05,920
Take the function x2, it maps 1 to 1, 2 to 4, 3 to 9, and so on.

48
00:03:06,479 --> 00:03:09,219
You can also see how it acts on all of the points in between.

49
00:03:12,719 --> 00:03:16,760
If you were to zoom in on a little cluster of points around the input 1,

50
00:03:16,760 --> 00:03:19,639
and see where they land around the relevant output,

51
00:03:19,639 --> 00:03:23,736
which for this function also happens to be 1, you'd notice that they tend

52
00:03:23,736 --> 00:03:24,900
to get stretched out.

53
00:03:25,759 --> 00:03:29,019
In fact, it roughly looks like stretching out by a factor of 2.

54
00:03:29,659 --> 00:03:35,534
The closer you zoom in, the more this local behavior looks just like multiplying by a

55
00:03:35,534 --> 00:03:41,682
factor of 2. This is what it means for the derivative of x2 at the input x equals 1 to be

56
00:03:41,682 --> 00:03:41,819
2.

57
00:03:42,340 --> 00:03:45,400
It's what that fact looks like in the context of transformations.

58
00:03:46,460 --> 00:03:49,731
If you looked at a neighborhood of points around the input 3,

59
00:03:49,731 --> 00:03:52,159
they would get stretched out by a factor of 6.

60
00:03:52,740 --> 00:03:57,439
This is what it means for the derivative of this function at the input 3 to equal 6.

61
00:03:58,979 --> 00:04:03,613
Around the input 1 fourth, a small region tends to get contracted specifically by a

62
00:04:03,614 --> 00:04:08,360
factor of 1 half, and that's what it looks like for a derivative to be smaller than 1.

63
00:04:10,719 --> 00:04:12,599
The input 0 is interesting.

64
00:04:13,120 --> 00:04:15,617
Zooming in by a factor of 10, it doesn't really

65
00:04:15,617 --> 00:04:17,959
look like a constant stretching or squishing.

66
00:04:18,379 --> 00:04:21,680
For one thing, all of the outputs end up on the right positive side of things.

67
00:04:23,319 --> 00:04:27,694
As you zoom in closer and closer, by 100x, or by 1000x,

68
00:04:27,694 --> 00:04:33,396
it looks more and more like a small neighborhood of points around 0 just

69
00:04:33,396 --> 00:04:39,959
gets collapsed into 0 itself. This is what it looks like for the derivative to be 0.

70
00:04:40,500 --> 00:04:45,019
The local behavior looks more and more like multiplying the whole number line by 0.

71
00:04:45,680 --> 00:04:49,783
It doesn't have to completely collapse everything to a point at a particular zoom level,

72
00:04:49,783 --> 00:04:53,840
instead it's a matter of what the limiting behavior is as you zoom in closer and closer.

73
00:04:55,279 --> 00:04:58,959
It's also instructive to take a look at the negative inputs here.

74
00:05:00,699 --> 00:05:04,536
Things start to feel a little cramped since they collide with where all the positive

75
00:05:04,536 --> 00:05:08,057
input values go, and this is one of the downsides of thinking of functions as

76
00:05:08,057 --> 00:05:08,780
transformations.

77
00:05:09,399 --> 00:05:13,093
But for derivatives, we only really care about the local behavior anyway,

78
00:05:13,093 --> 00:05:15,639
what happens in a small range around a given input.

79
00:05:16,500 --> 00:05:20,189
Here, notice that the inputs in a little neighborhood around, say,

80
00:05:20,189 --> 00:05:24,100
negative 2, don't just get stretched out, they also get flipped around.

81
00:05:24,680 --> 00:05:28,132
Specifically, the action on such a neighborhood looks more

82
00:05:28,132 --> 00:05:31,819
and more like multiplying by negative 4 the closer you zoom in.

83
00:05:32,319 --> 00:05:35,599
This is what it looks like for the derivative of a function to be negative.

84
00:05:38,459 --> 00:05:40,951
And I think you get the point, this is all well and good,

85
00:05:40,951 --> 00:05:43,660
but let's see how this is actually useful in solving a problem.

86
00:05:44,259 --> 00:05:48,305
A friend of mine recently asked me a pretty fun question about the infinite

87
00:05:48,305 --> 00:05:52,137
fraction 1 plus 1 divided by 1 plus 1 divided by 1 plus 1 divided by 1,

88
00:05:52,137 --> 00:05:56,182
and clearly you watch math videos online, so maybe you've seen this before,

89
00:05:56,182 --> 00:05:59,961
but my friend's question actually cuts to something you might not have

90
00:05:59,961 --> 00:06:04,220
thought about before, relevant to the view of derivatives we're looking at here.

91
00:06:05,019 --> 00:06:09,661
The typical way you might evaluate an expression like this is to set it equal to x,

92
00:06:09,661 --> 00:06:13,639
and then notice that there is a copy of the full fraction inside itself.

93
00:06:14,699 --> 00:06:18,779
So you can replace that copy with another x, and then just solve for x.

94
00:06:19,439 --> 00:06:24,579
That is, what you want is to find a fixed point of the function 1 plus 1 divided by x.

95
00:06:27,160 --> 00:06:30,970
But here's the thing, there are actually two solutions for x,

96
00:06:30,970 --> 00:06:36,380
two special numbers where 1 plus 1 divided by that number gives you back the same thing.

97
00:06:36,939 --> 00:06:42,949
One is the golden ratio, phi, around 1.618, and the other is negative 0.618,

98
00:06:42,949 --> 00:06:46,540
which happens to be negative 1 divided by phi.

99
00:06:46,959 --> 00:06:49,681
I like to call this other number phi's little brother,

100
00:06:49,682 --> 00:06:52,900
since just about any property that phi has, this number also has.

101
00:06:53,560 --> 00:06:58,413
And this raises the question, would it be valid to say that the infinite

102
00:06:58,413 --> 00:07:03,600
fraction we saw is somehow also equal to phi's little brother, negative 0.618?

103
00:07:04,519 --> 00:07:08,812
Maybe you initially say, obviously not, everything on the left hand side is positive,

104
00:07:08,812 --> 00:07:11,259
so how could it possibly equal a negative number?

105
00:07:12,500 --> 00:07:17,100
Well, first we should be clear about what we actually mean by an expression like this.

106
00:07:17,779 --> 00:07:21,315
One way you could think about it, and it's not the only way,

107
00:07:21,315 --> 00:07:26,185
there's freedom for choice here, is to imagine starting with some constant, like 1,

108
00:07:26,185 --> 00:07:30,939
and then repeatedly applying the function 1 plus 1 divided by x, and then asking,

109
00:07:30,939 --> 00:07:33,259
what is this approach as you keep going?

110
00:07:36,040 --> 00:07:38,552
I mean, certainly symbolically what you get looks more and more

111
00:07:38,552 --> 00:07:41,300
like our infinite fraction, so maybe if you wanted to equal a number,

112
00:07:41,300 --> 00:07:43,420
you should ask what this series of numbers approaches.

113
00:07:45,120 --> 00:07:48,509
And if that's your view of things, maybe you start off with a negative number,

114
00:07:48,509 --> 00:07:51,300
so it's not so crazy for the whole expression to end up negative.

115
00:07:52,740 --> 00:07:55,836
After all, if you start with negative 1 divided by phi,

116
00:07:55,836 --> 00:07:59,985
then applying this function 1 plus 1 over x, you get back the same number,

117
00:07:59,985 --> 00:08:03,802
negative 1 divided by phi, so no matter how many times you apply it,

118
00:08:03,802 --> 00:08:05,740
you're staying fixed at this value.

119
00:08:07,819 --> 00:08:10,620
But even then, there is one reason you should

120
00:08:10,620 --> 00:08:13,420
view phi as the favorite brother in this pair.

121
00:08:14,019 --> 00:08:19,330
Here, try this, pull up a calculator of some kind, then start with any random number,

122
00:08:19,331 --> 00:08:22,728
and plug it into this function, 1 plus 1 divided by x,

123
00:08:22,728 --> 00:08:28,040
and plug that number into 1 plus 1 over x, and again, and again, and again, and again.

124
00:08:28,480 --> 00:08:33,158
No matter what constant you start with, you eventually end up at 1.618.

125
00:08:33,798 --> 00:08:38,481
Even if you start with a negative number, even one that's really close to phi's

126
00:08:38,481 --> 00:08:43,399
little brother, eventually it shies away from that value and jumps back over to phi.

127
00:08:50,820 --> 00:08:52,460
So, what's going on here?

128
00:08:52,799 --> 00:08:55,919
Why is one of these fixed points favored above the other one?

129
00:08:56,720 --> 00:09:00,158
Maybe you can already see how the transformational understanding of derivatives

130
00:09:00,158 --> 00:09:03,984
is helpful for understanding this setup, but for the sake of having a point of contrast,

131
00:09:03,984 --> 00:09:07,080
I want to show you how a problem like this is often taught using graphs.

132
00:09:07,919 --> 00:09:11,115
If you were to plug in some random input to this function,

133
00:09:11,115 --> 00:09:14,039
the y value tells you the corresponding output, right?

134
00:09:14,039 --> 00:09:17,862
So to think about plugging that output back into the function,

135
00:09:17,863 --> 00:09:22,050
you might first move horizontally until you hit the line y equals x,

136
00:09:22,049 --> 00:09:26,782
and that's going to give you a position where the x value corresponds to your

137
00:09:26,783 --> 00:09:28,240
previous y value, right?

138
00:09:28,919 --> 00:09:34,553
So then from there, you can move vertically to see what output this new x value has,

139
00:09:34,553 --> 00:09:35,879
and then you repeat.

140
00:09:36,340 --> 00:09:40,598
You move horizontally to the line y equals x to find a point whose x value is the same

141
00:09:40,597 --> 00:09:44,759
as the output you just got, and then you move vertically to apply the function again.

142
00:09:45,879 --> 00:09:48,285
Now personally, I think this is kind of an awkward way

143
00:09:48,285 --> 00:09:50,779
to think about repeatedly applying a function, don't you?

144
00:09:51,299 --> 00:09:53,803
I mean, it makes sense, but you kind of have to pause

145
00:09:53,803 --> 00:09:56,539
and think about it to remember which way to draw the lines.

146
00:09:57,120 --> 00:10:01,426
And you can, if you want, think through what conditions make this spiderweb

147
00:10:01,426 --> 00:10:05,280
process narrow in on a fixed point, versus propagating away from it.

148
00:10:05,860 --> 00:10:08,899
In fact, go ahead, pause right now, and try to think it through as an exercise.

149
00:10:09,240 --> 00:10:10,460
It has to do with slopes.

150
00:10:12,019 --> 00:10:15,818
Or if you want to skip the exercise for something that I think gives a much more

151
00:10:15,818 --> 00:10:19,620
satisfying understanding, think about how this function acts as a transformation.

152
00:10:22,279 --> 00:10:24,923
So I'm going to go ahead and start here by drawing a bunch of

153
00:10:24,923 --> 00:10:27,740
arrows to indicate where the various sampled input points will go.

154
00:10:28,320 --> 00:10:31,440
And side note, don't you think this gives a neat emergent pattern?

155
00:10:31,820 --> 00:10:35,020
I wasn't expecting this, but it was cool to see it pop up when animating.

156
00:10:35,019 --> 00:10:38,797
I guess the action of 1 divided by x gives this nice emergent circle,

157
00:10:38,797 --> 00:10:41,279
and then we're just shifting things over by 1.

158
00:10:42,039 --> 00:10:46,621
Anyway, I want you to think about what it means to repeatedly apply some function,

159
00:10:46,621 --> 00:10:48,719
like 1 plus 1 over x, in this context.

160
00:10:50,240 --> 00:10:53,590
Well after letting it map all of the inputs to the outputs,

161
00:10:53,590 --> 00:10:58,504
you could consider those as the new inputs, and then just apply the same process again,

162
00:10:58,504 --> 00:11:01,519
and then again, and do it however many times you want.

163
00:11:02,580 --> 00:11:06,523
Notice, in animating this with a few dots representing the sample points,

164
00:11:06,523 --> 00:11:11,320
it doesn't take many iterations at all before all of those dots kind of clump in around 1.

165
00:11:11,320 --> 00:11:12,000
618.

166
00:11:14,620 --> 00:11:18,355
Now remember, we know that 1.618 and its little brother,

167
00:11:18,355 --> 00:11:23,860
negative 0.618 on and on, stay fixed in place during each iteration of this process.

168
00:11:24,860 --> 00:11:27,480
But zoom in on a neighborhood around phi.

169
00:11:27,480 --> 00:11:32,788
During the map, points in that region get contracted around phi,

170
00:11:32,788 --> 00:11:39,403
meaning that the function 1 plus 1 over x has a derivative with a magnitude less

171
00:11:39,403 --> 00:11:41,120
than 1 at this input.

172
00:11:41,879 --> 00:11:45,200
In fact, this derivative works out to be around negative 0.38.

173
00:11:46,120 --> 00:11:50,312
So what that means is that each repeated application scrunches the neighborhood

174
00:11:50,312 --> 00:11:54,399
around this number smaller and smaller, like a gravitational pull towards phi.

175
00:11:54,960 --> 00:11:58,620
So now tell me what you think happens in the neighborhood of phi's little brother.

176
00:12:01,320 --> 00:12:05,426
Over there, the derivative actually has a magnitude larger than 1,

177
00:12:05,426 --> 00:12:08,920
so points near the fixed point are repelled away from it.

178
00:12:09,519 --> 00:12:11,598
And when you work it out, you can see that they get

179
00:12:11,599 --> 00:12:13,800
stretched by more than a factor of 2 in each iteration.

180
00:12:14,419 --> 00:12:17,674
They also get flipped around, because the derivative is negative here,

181
00:12:17,674 --> 00:12:20,839
but the salient fact for the sake of stability is just the magnitude.

182
00:12:23,440 --> 00:12:26,970
Mathematicians would call this right value a stable fixed point,

183
00:12:26,970 --> 00:12:29,360
and the left one is an unstable fixed point.

184
00:12:30,000 --> 00:12:33,408
Something is considered stable if when you perturb it just a little bit,

185
00:12:33,408 --> 00:12:37,100
it tends to come back towards where it started, rather than going away from it.

186
00:12:38,179 --> 00:12:40,777
So what we're seeing is a very useful little fact,

187
00:12:40,778 --> 00:12:45,312
that the stability of a fixed point is determined by whether or not the magnitude of its

188
00:12:45,312 --> 00:12:47,300
derivative is bigger or smaller than 1.

189
00:12:47,299 --> 00:12:50,479
This explains why phi always shows up in the numerical play,

190
00:12:50,480 --> 00:12:53,922
where you're just hitting enter on your calculator over and over,

191
00:12:53,922 --> 00:12:55,800
but phi's little brother never does.

192
00:12:56,460 --> 00:12:59,620
As to whether or not you want to consider phi's little brother a

193
00:12:59,620 --> 00:13:02,879
valid value of the infinite fraction, well that's really up to you.

194
00:13:03,259 --> 00:13:06,970
Everything we just showed suggests that if you think of this expression

195
00:13:06,970 --> 00:13:10,524
as representing a limiting process, then because every possible seed

196
00:13:10,524 --> 00:13:14,442
value other than phi's little brother gives you a series converging to phi,

197
00:13:14,442 --> 00:13:17,740
it does feel silly to put them on equal footing with each other.

198
00:13:18,259 --> 00:13:21,773
But maybe you don't think of it as a limit, maybe the kind of math

199
00:13:21,773 --> 00:13:25,600
you're doing lends itself to treating this as a purely algebraic object,

200
00:13:25,600 --> 00:13:29,220
like the solutions of a polynomial, which simply has multiple values.

201
00:13:30,340 --> 00:13:34,485
Anyway, that's beside the point, and my point here is not that viewing derivatives

202
00:13:34,485 --> 00:13:38,779
as this change in density is somehow better than the graphical intuition on the whole.

203
00:13:39,600 --> 00:13:42,204
In fact, picturing an entire function this way can be

204
00:13:42,203 --> 00:13:44,759
kind of clunky and impractical as compared to graphs.

205
00:13:45,340 --> 00:13:48,221
My point is that it deserves more of a mention in most of the

206
00:13:48,221 --> 00:13:50,918
introductory calculus courses, because it can help make a

207
00:13:50,918 --> 00:13:53,940
student's understanding of the derivative a little more flexible.

208
00:13:54,899 --> 00:13:58,357
Like I mentioned, the real reason I'd recommend you carry this perspective

209
00:13:58,357 --> 00:14:01,816
with you as you learn new topics is not so much for what it does with your

210
00:14:01,817 --> 00:14:05,000
understanding of single variable calculus, it's for what comes after.