WEBVTT

00:00:07.740 --> 00:00:11.853
The months ahead of you hold within them a lot of hard work, some neat examples,

00:00:11.852 --> 00:00:14.949
some not-so-neat examples, beautiful connections to physics,

00:00:14.949 --> 00:00:17.387
not-so-beautiful piles of formulas to memorize,

00:00:17.388 --> 00:00:20.943
plenty of moments of getting stuck and banging your head into a wall,

00:00:20.943 --> 00:00:24.548
a few nice aha moments sprinkled in as well, and some genuinely lovely

00:00:24.547 --> 00:00:27.239
graphical intuition to help guide you through it all.

00:00:27.629 --> 00:00:31.699
But if the course ahead of you is anything like my first introduction to calculus,

00:00:31.699 --> 00:00:34.543
or any of the first courses I've seen in the years since,

00:00:34.543 --> 00:00:38.712
there's one topic you will not see, but which I believe stands to greatly accelerate

00:00:38.713 --> 00:00:39.399
your learning.

00:00:40.280 --> 00:00:44.620
You see, almost all of the visual intuitions from that first year are based on graphs.

00:00:45.079 --> 00:00:49.659
The derivative is the slope of a graph, the integral is a certain area under that graph.

00:00:50.200 --> 00:00:54.035
But as you generalize calculus beyond functions whose inputs and outputs are

00:00:54.034 --> 00:00:58.019
simply numbers, it's not always possible to graph the function you're analyzing.

00:01:00.679 --> 00:01:04.653
So if all your intuitions for the fundamental ideas, like derivatives,

00:01:04.653 --> 00:01:08.794
are rooted too rigidly in graphs, it can make for a very tall and largely

00:01:08.793 --> 00:01:13.158
unnecessary conceptual hurdle between you and the more quote-unquote advanced

00:01:13.159 --> 00:01:17.580
topics like multivariable calculus and complex analysis, differential geometry.

00:01:18.840 --> 00:01:22.204
What I want to share with you is a way to think about derivatives,

00:01:22.204 --> 00:01:24.716
which I'll refer to as the transformational view,

00:01:24.716 --> 00:01:28.433
that generalizes more seamlessly into some of those more general contexts

00:01:28.433 --> 00:01:29.640
where calculus comes up.

00:01:29.879 --> 00:01:34.859
And then we'll use this alternate view to analyze a fun puzzle about repeated fractions.

00:01:35.459 --> 00:01:37.538
But first off, I just want to make sure we're all

00:01:37.539 --> 00:01:39.659
on the same page about what the standard visual is.

00:01:40.060 --> 00:01:44.936
If you were to graph a function, which simply takes real numbers as inputs and outputs,

00:01:44.936 --> 00:01:49.593
one of the first things you learn in a calculus course is that the derivative gives

00:01:49.593 --> 00:01:54.138
you the slope of this graph, where what we mean by that is that the derivative of

00:01:54.138 --> 00:01:58.240
the function is a new function which for every input x returns that slope.

00:01:59.519 --> 00:02:01.978
Now I'd encourage you not to think of this derivative

00:02:01.978 --> 00:02:04.439
as slope idea as being the definition of a derivative.

00:02:05.000 --> 00:02:07.555
Instead think of it as being more fundamentally about how

00:02:07.555 --> 00:02:10.419
sensitive the function is to tiny little nudges around the input.

00:02:11.020 --> 00:02:14.058
And the slope is just one way to think about that sensitivity

00:02:14.057 --> 00:02:16.900
relevant only to this particular way of viewing functions.

00:02:17.340 --> 00:02:19.816
I have not just another video, but a full series on this

00:02:19.816 --> 00:02:22.120
topic if it's something you want to learn more about.

00:02:22.599 --> 00:02:26.022
The basic idea behind the alternate visual for the derivative is to

00:02:26.022 --> 00:02:29.294
think of this function as mapping all of the input points on the

00:02:29.294 --> 00:02:32.819
number line to their corresponding outputs on a different number line.

00:02:33.400 --> 00:02:36.810
In this context, what the derivative gives you is a measure of how

00:02:36.810 --> 00:02:40.219
much the input space gets stretched or squished in various regions.

00:02:41.860 --> 00:02:46.673
That is, if you were to zoom in around a specific input and take a look at some

00:02:46.673 --> 00:02:51.485
evenly spaced points around it, the derivative of the function of that input is

00:02:51.485 --> 00:02:56.599
going to tell you how spread out or contracted those points become after the mapping.

00:02:57.939 --> 00:02:59.400
Here, a specific example helps.

00:02:59.740 --> 00:03:05.920
Take the function x2, it maps 1 to 1, 2 to 4, 3 to 9, and so on.

00:03:06.479 --> 00:03:09.219
You can also see how it acts on all of the points in between.

00:03:12.719 --> 00:03:16.760
If you were to zoom in on a little cluster of points around the input 1,

00:03:16.760 --> 00:03:19.639
and see where they land around the relevant output,

00:03:19.639 --> 00:03:23.736
which for this function also happens to be 1, you'd notice that they tend

00:03:23.736 --> 00:03:24.900
to get stretched out.

00:03:25.759 --> 00:03:29.019
In fact, it roughly looks like stretching out by a factor of 2.

00:03:29.659 --> 00:03:35.534
The closer you zoom in, the more this local behavior looks just like multiplying by a

00:03:35.534 --> 00:03:41.682
factor of 2. This is what it means for the derivative of x2 at the input x equals 1 to be

00:03:41.682 --> 00:03:41.819
2.

00:03:42.340 --> 00:03:45.400
It's what that fact looks like in the context of transformations.

00:03:46.460 --> 00:03:49.731
If you looked at a neighborhood of points around the input 3,

00:03:49.731 --> 00:03:52.159
they would get stretched out by a factor of 6.

00:03:52.740 --> 00:03:57.439
This is what it means for the derivative of this function at the input 3 to equal 6.

00:03:58.979 --> 00:04:03.613
Around the input 1 fourth, a small region tends to get contracted specifically by a

00:04:03.614 --> 00:04:08.360
factor of 1 half, and that's what it looks like for a derivative to be smaller than 1.

00:04:10.719 --> 00:04:12.599
The input 0 is interesting.

00:04:13.120 --> 00:04:15.617
Zooming in by a factor of 10, it doesn't really

00:04:15.617 --> 00:04:17.959
look like a constant stretching or squishing.

00:04:18.379 --> 00:04:21.680
For one thing, all of the outputs end up on the right positive side of things.

00:04:23.319 --> 00:04:27.694
As you zoom in closer and closer, by 100x, or by 1000x,

00:04:27.694 --> 00:04:33.396
it looks more and more like a small neighborhood of points around 0 just

00:04:33.396 --> 00:04:39.959
gets collapsed into 0 itself. This is what it looks like for the derivative to be 0.

00:04:40.500 --> 00:04:45.019
The local behavior looks more and more like multiplying the whole number line by 0.

00:04:45.680 --> 00:04:49.783
It doesn't have to completely collapse everything to a point at a particular zoom level,

00:04:49.783 --> 00:04:53.840
instead it's a matter of what the limiting behavior is as you zoom in closer and closer.

00:04:55.279 --> 00:04:58.959
It's also instructive to take a look at the negative inputs here.

00:05:00.699 --> 00:05:04.536
Things start to feel a little cramped since they collide with where all the positive

00:05:04.536 --> 00:05:08.057
input values go, and this is one of the downsides of thinking of functions as

00:05:08.057 --> 00:05:08.780
transformations.

00:05:09.399 --> 00:05:13.093
But for derivatives, we only really care about the local behavior anyway,

00:05:13.093 --> 00:05:15.639
what happens in a small range around a given input.

00:05:16.500 --> 00:05:20.189
Here, notice that the inputs in a little neighborhood around, say,

00:05:20.189 --> 00:05:24.100
negative 2, don't just get stretched out, they also get flipped around.

00:05:24.680 --> 00:05:28.132
Specifically, the action on such a neighborhood looks more

00:05:28.132 --> 00:05:31.819
and more like multiplying by negative 4 the closer you zoom in.

00:05:32.319 --> 00:05:35.599
This is what it looks like for the derivative of a function to be negative.

00:05:38.459 --> 00:05:40.951
And I think you get the point, this is all well and good,

00:05:40.951 --> 00:05:43.660
but let's see how this is actually useful in solving a problem.

00:05:44.259 --> 00:05:48.305
A friend of mine recently asked me a pretty fun question about the infinite

00:05:48.305 --> 00:05:52.137
fraction 1 plus 1 divided by 1 plus 1 divided by 1 plus 1 divided by 1,

00:05:52.137 --> 00:05:56.182
and clearly you watch math videos online, so maybe you've seen this before,

00:05:56.182 --> 00:05:59.961
but my friend's question actually cuts to something you might not have

00:05:59.961 --> 00:06:04.220
thought about before, relevant to the view of derivatives we're looking at here.

00:06:05.019 --> 00:06:09.661
The typical way you might evaluate an expression like this is to set it equal to x,

00:06:09.661 --> 00:06:13.639
and then notice that there is a copy of the full fraction inside itself.

00:06:14.699 --> 00:06:18.779
So you can replace that copy with another x, and then just solve for x.

00:06:19.439 --> 00:06:24.579
That is, what you want is to find a fixed point of the function 1 plus 1 divided by x.

00:06:27.160 --> 00:06:30.970
But here's the thing, there are actually two solutions for x,

00:06:30.970 --> 00:06:36.380
two special numbers where 1 plus 1 divided by that number gives you back the same thing.

00:06:36.939 --> 00:06:42.949
One is the golden ratio, phi, around 1.618, and the other is negative 0.618,

00:06:42.949 --> 00:06:46.540
which happens to be negative 1 divided by phi.

00:06:46.959 --> 00:06:49.681
I like to call this other number phi's little brother,

00:06:49.682 --> 00:06:52.900
since just about any property that phi has, this number also has.

00:06:53.560 --> 00:06:58.413
And this raises the question, would it be valid to say that the infinite

00:06:58.413 --> 00:07:03.600
fraction we saw is somehow also equal to phi's little brother, negative 0.618?

00:07:04.519 --> 00:07:08.812
Maybe you initially say, obviously not, everything on the left hand side is positive,

00:07:08.812 --> 00:07:11.259
so how could it possibly equal a negative number?

00:07:12.500 --> 00:07:17.100
Well, first we should be clear about what we actually mean by an expression like this.

00:07:17.779 --> 00:07:21.315
One way you could think about it, and it's not the only way,

00:07:21.315 --> 00:07:26.185
there's freedom for choice here, is to imagine starting with some constant, like 1,

00:07:26.185 --> 00:07:30.939
and then repeatedly applying the function 1 plus 1 divided by x, and then asking,

00:07:30.939 --> 00:07:33.259
what is this approach as you keep going?

00:07:36.040 --> 00:07:38.552
I mean, certainly symbolically what you get looks more and more

00:07:38.552 --> 00:07:41.300
like our infinite fraction, so maybe if you wanted to equal a number,

00:07:41.300 --> 00:07:43.420
you should ask what this series of numbers approaches.

00:07:45.120 --> 00:07:48.509
And if that's your view of things, maybe you start off with a negative number,

00:07:48.509 --> 00:07:51.300
so it's not so crazy for the whole expression to end up negative.

00:07:52.740 --> 00:07:55.836
After all, if you start with negative 1 divided by phi,

00:07:55.836 --> 00:07:59.985
then applying this function 1 plus 1 over x, you get back the same number,

00:07:59.985 --> 00:08:03.802
negative 1 divided by phi, so no matter how many times you apply it,

00:08:03.802 --> 00:08:05.740
you're staying fixed at this value.

00:08:07.819 --> 00:08:10.620
But even then, there is one reason you should

00:08:10.620 --> 00:08:13.420
view phi as the favorite brother in this pair.

00:08:14.019 --> 00:08:19.330
Here, try this, pull up a calculator of some kind, then start with any random number,

00:08:19.331 --> 00:08:22.728
and plug it into this function, 1 plus 1 divided by x,

00:08:22.728 --> 00:08:28.040
and plug that number into 1 plus 1 over x, and again, and again, and again, and again.

00:08:28.480 --> 00:08:33.158
No matter what constant you start with, you eventually end up at 1.618.

00:08:33.798 --> 00:08:38.481
Even if you start with a negative number, even one that's really close to phi's

00:08:38.481 --> 00:08:43.399
little brother, eventually it shies away from that value and jumps back over to phi.

00:08:50.820 --> 00:08:52.460
So, what's going on here?

00:08:52.799 --> 00:08:55.919
Why is one of these fixed points favored above the other one?

00:08:56.720 --> 00:09:00.158
Maybe you can already see how the transformational understanding of derivatives

00:09:00.158 --> 00:09:03.984
is helpful for understanding this setup, but for the sake of having a point of contrast,

00:09:03.984 --> 00:09:07.080
I want to show you how a problem like this is often taught using graphs.

00:09:07.919 --> 00:09:11.115
If you were to plug in some random input to this function,

00:09:11.115 --> 00:09:14.039
the y value tells you the corresponding output, right?

00:09:14.039 --> 00:09:17.862
So to think about plugging that output back into the function,

00:09:17.863 --> 00:09:22.050
you might first move horizontally until you hit the line y equals x,

00:09:22.049 --> 00:09:26.782
and that's going to give you a position where the x value corresponds to your

00:09:26.783 --> 00:09:28.240
previous y value, right?

00:09:28.919 --> 00:09:34.553
So then from there, you can move vertically to see what output this new x value has,

00:09:34.553 --> 00:09:35.879
and then you repeat.

00:09:36.340 --> 00:09:40.598
You move horizontally to the line y equals x to find a point whose x value is the same

00:09:40.597 --> 00:09:44.759
as the output you just got, and then you move vertically to apply the function again.

00:09:45.879 --> 00:09:48.285
Now personally, I think this is kind of an awkward way

00:09:48.285 --> 00:09:50.779
to think about repeatedly applying a function, don't you?

00:09:51.299 --> 00:09:53.803
I mean, it makes sense, but you kind of have to pause

00:09:53.803 --> 00:09:56.539
and think about it to remember which way to draw the lines.

00:09:57.120 --> 00:10:01.426
And you can, if you want, think through what conditions make this spiderweb

00:10:01.426 --> 00:10:05.280
process narrow in on a fixed point, versus propagating away from it.

00:10:05.860 --> 00:10:08.899
In fact, go ahead, pause right now, and try to think it through as an exercise.

00:10:09.240 --> 00:10:10.460
It has to do with slopes.

00:10:12.019 --> 00:10:15.818
Or if you want to skip the exercise for something that I think gives a much more

00:10:15.818 --> 00:10:19.620
satisfying understanding, think about how this function acts as a transformation.

00:10:22.279 --> 00:10:24.923
So I'm going to go ahead and start here by drawing a bunch of

00:10:24.923 --> 00:10:27.740
arrows to indicate where the various sampled input points will go.

00:10:28.320 --> 00:10:31.440
And side note, don't you think this gives a neat emergent pattern?

00:10:31.820 --> 00:10:35.020
I wasn't expecting this, but it was cool to see it pop up when animating.

00:10:35.019 --> 00:10:38.797
I guess the action of 1 divided by x gives this nice emergent circle,

00:10:38.797 --> 00:10:41.279
and then we're just shifting things over by 1.

00:10:42.039 --> 00:10:46.621
Anyway, I want you to think about what it means to repeatedly apply some function,

00:10:46.621 --> 00:10:48.719
like 1 plus 1 over x, in this context.

00:10:50.240 --> 00:10:53.590
Well after letting it map all of the inputs to the outputs,

00:10:53.590 --> 00:10:58.504
you could consider those as the new inputs, and then just apply the same process again,

00:10:58.504 --> 00:11:01.519
and then again, and do it however many times you want.

00:11:02.580 --> 00:11:06.523
Notice, in animating this with a few dots representing the sample points,

00:11:06.523 --> 00:11:11.320
it doesn't take many iterations at all before all of those dots kind of clump in around 1.

00:11:11.320 --> 00:11:12.000
618.

00:11:14.620 --> 00:11:18.355
Now remember, we know that 1.618 and its little brother,

00:11:18.355 --> 00:11:23.860
negative 0.618 on and on, stay fixed in place during each iteration of this process.

00:11:24.860 --> 00:11:27.480
But zoom in on a neighborhood around phi.

00:11:27.480 --> 00:11:32.788
During the map, points in that region get contracted around phi,

00:11:32.788 --> 00:11:39.403
meaning that the function 1 plus 1 over x has a derivative with a magnitude less

00:11:39.403 --> 00:11:41.120
than 1 at this input.

00:11:41.879 --> 00:11:45.200
In fact, this derivative works out to be around negative 0.38.

00:11:46.120 --> 00:11:50.312
So what that means is that each repeated application scrunches the neighborhood

00:11:50.312 --> 00:11:54.399
around this number smaller and smaller, like a gravitational pull towards phi.

00:11:54.960 --> 00:11:58.620
So now tell me what you think happens in the neighborhood of phi's little brother.

00:12:01.320 --> 00:12:05.426
Over there, the derivative actually has a magnitude larger than 1,

00:12:05.426 --> 00:12:08.920
so points near the fixed point are repelled away from it.

00:12:09.519 --> 00:12:11.598
And when you work it out, you can see that they get

00:12:11.599 --> 00:12:13.800
stretched by more than a factor of 2 in each iteration.

00:12:14.419 --> 00:12:17.674
They also get flipped around, because the derivative is negative here,

00:12:17.674 --> 00:12:20.839
but the salient fact for the sake of stability is just the magnitude.

00:12:23.440 --> 00:12:26.970
Mathematicians would call this right value a stable fixed point,

00:12:26.970 --> 00:12:29.360
and the left one is an unstable fixed point.

00:12:30.000 --> 00:12:33.408
Something is considered stable if when you perturb it just a little bit,

00:12:33.408 --> 00:12:37.100
it tends to come back towards where it started, rather than going away from it.

00:12:38.179 --> 00:12:40.777
So what we're seeing is a very useful little fact,

00:12:40.778 --> 00:12:45.312
that the stability of a fixed point is determined by whether or not the magnitude of its

00:12:45.312 --> 00:12:47.300
derivative is bigger or smaller than 1.

00:12:47.299 --> 00:12:50.479
This explains why phi always shows up in the numerical play,

00:12:50.480 --> 00:12:53.922
where you're just hitting enter on your calculator over and over,

00:12:53.922 --> 00:12:55.800
but phi's little brother never does.

00:12:56.460 --> 00:12:59.620
As to whether or not you want to consider phi's little brother a

00:12:59.620 --> 00:13:02.879
valid value of the infinite fraction, well that's really up to you.

00:13:03.259 --> 00:13:06.970
Everything we just showed suggests that if you think of this expression

00:13:06.970 --> 00:13:10.524
as representing a limiting process, then because every possible seed

00:13:10.524 --> 00:13:14.442
value other than phi's little brother gives you a series converging to phi,

00:13:14.442 --> 00:13:17.740
it does feel silly to put them on equal footing with each other.

00:13:18.259 --> 00:13:21.773
But maybe you don't think of it as a limit, maybe the kind of math

00:13:21.773 --> 00:13:25.600
you're doing lends itself to treating this as a purely algebraic object,

00:13:25.600 --> 00:13:29.220
like the solutions of a polynomial, which simply has multiple values.

00:13:30.340 --> 00:13:34.485
Anyway, that's beside the point, and my point here is not that viewing derivatives

00:13:34.485 --> 00:13:38.779
as this change in density is somehow better than the graphical intuition on the whole.

00:13:39.600 --> 00:13:42.204
In fact, picturing an entire function this way can be

00:13:42.203 --> 00:13:44.759
kind of clunky and impractical as compared to graphs.

00:13:45.340 --> 00:13:48.221
My point is that it deserves more of a mention in most of the

00:13:48.221 --> 00:13:50.918
introductory calculus courses, because it can help make a

00:13:50.918 --> 00:13:53.940
student's understanding of the derivative a little more flexible.

00:13:54.899 --> 00:13:58.357
Like I mentioned, the real reason I'd recommend you carry this perspective

00:13:58.357 --> 00:14:01.816
with you as you learn new topics is not so much for what it does with your

00:14:01.817 --> 00:14:05.000
understanding of single variable calculus, it's for what comes after.