WEBVTT

00:00:01.120 --> 00:00:07.169
Please welcome former director of AI

00:00:04.000 --> 00:00:11.440
Tesla Andre Carpathy.

00:00:07.169 --> 00:00:14.439
[Music]

00:00:11.439 --> 00:00:14.439
Hello.

00:00:14.769 --> 00:00:17.850
[Music]

00:00:19.039 --> 00:00:24.800
Wow, a lot of people here. Hello.

00:00:22.800 --> 00:00:27.199
Um, okay. Yeah. So I'm excited to be

00:00:24.800 --> 00:00:30.560
here today to talk to you about software

00:00:27.199 --> 00:00:32.559
in the era of AI. And I'm told that many

00:00:30.559 --> 00:00:34.399
of you are students like bachelors,

00:00:32.558 --> 00:00:36.399
masters, PhD and so on. And you're about

00:00:34.399 --> 00:00:37.759
to enter the industry. And I think it's

00:00:36.399 --> 00:00:38.960
actually like an extremely unique and

00:00:37.759 --> 00:00:41.359
very interesting time to enter the

00:00:38.960 --> 00:00:43.039
industry right now. And I think

00:00:41.359 --> 00:00:47.600
fundamentally the reason for that is

00:00:43.039 --> 00:00:49.920
that um software is changing uh again.

00:00:47.600 --> 00:00:52.558
And I say again because I actually gave

00:00:49.920 --> 00:00:54.079
this talk already. Um but the problem is

00:00:52.558 --> 00:00:55.198
that software keeps changing. So I

00:00:54.079 --> 00:00:56.719
actually have a lot of material to

00:00:55.198 --> 00:00:58.159
create new talks and I think it's

00:00:56.719 --> 00:01:00.320
changing quite fundamentally. I think

00:00:58.159 --> 00:01:02.000
roughly speaking software has not

00:01:00.320 --> 00:01:04.558
changed much on such a fundamental level

00:01:02.000 --> 00:01:06.879
for 70 years. And then it's changed I

00:01:04.558 --> 00:01:08.560
think about twice quite rapidly in the

00:01:06.879 --> 00:01:09.839
last few years. And so there's just a

00:01:08.560 --> 00:01:12.320
huge amount of work to do a huge amount

00:01:09.840 --> 00:01:14.159
of software to write and rewrite. So

00:01:12.319 --> 00:01:16.079
let's take a look at maybe the realm of

00:01:14.159 --> 00:01:17.759
software. So if we kind of think of this

00:01:16.079 --> 00:01:20.000
as like the map of software this is a

00:01:17.759 --> 00:01:21.920
really cool tool called map of GitHub.

00:01:20.000 --> 00:01:23.359
Um this is kind of like all the software

00:01:21.920 --> 00:01:24.640
that's written. Uh these are

00:01:23.359 --> 00:01:26.400
instructions to the computer for

00:01:24.640 --> 00:01:28.000
carrying out tasks in the digital space.

00:01:26.400 --> 00:01:30.080
So if you zoom in here, these are all

00:01:28.000 --> 00:01:31.680
different kinds of repositories and this

00:01:30.079 --> 00:01:33.599
is all the code that has been written.

00:01:31.680 --> 00:01:35.840
And a few years ago I kind of observed

00:01:33.599 --> 00:01:37.759
that um software was kind of changing

00:01:35.840 --> 00:01:39.680
and there was kind of like a new type of

00:01:37.759 --> 00:01:42.319
software around and I called this

00:01:39.680 --> 00:01:44.640
software 2.0 at the time and the idea

00:01:42.319 --> 00:01:46.798
here was that software 1.0 is the code

00:01:44.640 --> 00:01:48.799
you write for the computer. Software 2.0

00:01:46.799 --> 00:01:50.320
know are basically neural networks and

00:01:48.799 --> 00:01:53.280
in particular the weights of a neural

00:01:50.319 --> 00:01:55.438
network and you're not writing this code

00:01:53.280 --> 00:01:56.879
directly you are most you are more kind

00:01:55.438 --> 00:01:58.398
of like tuning the data sets and then

00:01:56.879 --> 00:02:00.879
you're running an optimizer to create to

00:01:58.399 --> 00:02:02.560
create the parameters of this neural net

00:02:00.879 --> 00:02:03.599
and I think like at the time neural nets

00:02:02.560 --> 00:02:04.799
were kind of seen as like just a

00:02:03.599 --> 00:02:06.239
different kind of classifier like a

00:02:04.799 --> 00:02:09.039
decision tree or something like that and

00:02:06.239 --> 00:02:10.239
so I think it was kind of like um I

00:02:09.038 --> 00:02:12.238
think this framing was a lot more

00:02:10.239 --> 00:02:13.520
appropriate and now actually what we

00:02:12.239 --> 00:02:15.759
have is kind of like an equivalent of

00:02:13.520 --> 00:02:18.080
GitHub in the realm of software 2.0 And

00:02:15.759 --> 00:02:20.719
I think the hugging face is basically

00:02:18.080 --> 00:02:22.400
equivalent of GitHub in software 2.0.

00:02:20.719 --> 00:02:24.239
And there's also model atlas and you can

00:02:22.400 --> 00:02:25.439
visualize all the code written there. In

00:02:24.239 --> 00:02:28.319
case you're curious, by the way, the

00:02:25.439 --> 00:02:30.878
giant circle, the point in the middle,

00:02:28.318 --> 00:02:32.878
uh these are the parameters of flux, the

00:02:30.878 --> 00:02:34.959
image generator. And so anytime someone

00:02:32.878 --> 00:02:37.120
tunes a on top of a flux model, you

00:02:34.959 --> 00:02:39.120
basically create a git commit uh in this

00:02:37.120 --> 00:02:41.599
space and uh you create a different kind

00:02:39.120 --> 00:02:43.599
of a image generator. So basically what

00:02:41.598 --> 00:02:45.919
we have is software 1.0 is the computer

00:02:43.598 --> 00:02:48.719
code that programs a computer. Software

00:02:45.919 --> 00:02:50.719
2.0 are the weights which program neural

00:02:48.719 --> 00:02:53.519
networks. Uh and here's an example of

00:02:50.719 --> 00:02:55.039
Alexet image recognizer neural network.

00:02:53.519 --> 00:02:56.400
Now so far all of the neural networks

00:02:55.039 --> 00:02:58.159
that we've been familiar with until

00:02:56.400 --> 00:03:01.680
recently where kind of like fixed

00:02:58.159 --> 00:03:03.439
function computers image to categories

00:03:01.680 --> 00:03:05.200
or something like that. And I think

00:03:03.439 --> 00:03:06.719
what's changed and I think is a quite

00:03:05.199 --> 00:03:09.598
fundamental change is that neural

00:03:06.719 --> 00:03:12.158
networks became programmable with large

00:03:09.598 --> 00:03:14.959
language models. And so I I see this as

00:03:12.158 --> 00:03:18.000
quite new, unique. It's a new kind of a

00:03:14.959 --> 00:03:19.598
computer and uh so in my mind it's uh

00:03:18.000 --> 00:03:22.158
worth giving it a new designation of

00:03:19.598 --> 00:03:25.679
software 3.0. And basically your prompts

00:03:22.158 --> 00:03:28.318
are now programs that program the LLM.

00:03:25.680 --> 00:03:30.400
And uh remarkably uh these uh prompts

00:03:28.318 --> 00:03:33.598
are written in English. So it's kind of

00:03:30.400 --> 00:03:36.799
a very interesting programming language.

00:03:33.598 --> 00:03:37.919
Um so maybe uh to summarize the

00:03:36.799 --> 00:03:39.439
difference if you're doing sentiment

00:03:37.919 --> 00:03:42.479
classification for example you can

00:03:39.439 --> 00:03:44.239
imagine writing some uh amount of Python

00:03:42.479 --> 00:03:46.000
to to basically do sentiment

00:03:44.239 --> 00:03:47.840
classification or you can train a neural

00:03:46.000 --> 00:03:50.000
net or you can prompt a large language

00:03:47.840 --> 00:03:51.280
model. Uh so here this is a few short

00:03:50.000 --> 00:03:52.799
prompt and you can imagine changing it

00:03:51.280 --> 00:03:54.640
and programming the computer in a

00:03:52.799 --> 00:03:57.599
slightly different way. So basically we

00:03:54.639 --> 00:03:59.679
have software 1.0 software 2.0 and I

00:03:57.598 --> 00:04:01.919
think we're seeing maybe you've seen a

00:03:59.680 --> 00:04:03.519
lot of GitHub code is not just like code

00:04:01.919 --> 00:04:05.438
anymore. there's a bunch of like English

00:04:03.519 --> 00:04:07.360
interspersed with code and so I think

00:04:05.438 --> 00:04:09.199
kind of there's a growing category of

00:04:07.360 --> 00:04:10.879
new kind of code. So not only is it a

00:04:09.199 --> 00:04:12.719
new programming paradigm, it's also

00:04:10.878 --> 00:04:14.878
remarkable to me that it's in our native

00:04:12.719 --> 00:04:17.918
language of English. And so when this

00:04:14.878 --> 00:04:20.879
blew my mind a few uh I guess years ago

00:04:17.918 --> 00:04:21.918
now I tweeted this and um I think it

00:04:20.879 --> 00:04:23.199
captured the attention of a lot of

00:04:21.918 --> 00:04:25.359
people and this is my currently pinned

00:04:23.199 --> 00:04:28.160
tweet uh is that remarkably we're now

00:04:25.360 --> 00:04:31.600
programming computers in English. Now,

00:04:28.160 --> 00:04:34.960
when I was at uh Tesla, um we were

00:04:31.600 --> 00:04:37.439
working on the uh autopilot and uh we

00:04:34.959 --> 00:04:39.918
were trying to get the car to drive and

00:04:37.439 --> 00:04:41.680
I sort of showed this slide at the time

00:04:39.918 --> 00:04:43.198
where you can imagine that the inputs to

00:04:41.680 --> 00:04:44.639
the car are on the bottom and they're

00:04:43.199 --> 00:04:47.040
going through a software stack to

00:04:44.639 --> 00:04:48.560
produce the steering and acceleration

00:04:47.040 --> 00:04:51.120
and I made the observation at the time

00:04:48.560 --> 00:04:52.720
that there was a ton of C++ code around

00:04:51.120 --> 00:04:54.478
in the autopilot which was the software

00:04:52.720 --> 00:04:56.960
1.0 code and then there was some neural

00:04:54.478 --> 00:04:58.800
nets in there doing image recognition

00:04:56.959 --> 00:05:00.879
and uh I kind of observed that over time

00:04:58.800 --> 00:05:02.720
as we made the autopilot better

00:05:00.879 --> 00:05:05.839
basically the neural network grew in

00:05:02.720 --> 00:05:08.560
capability and size and in addition to

00:05:05.839 --> 00:05:12.079
that all the C++ code was being deleted

00:05:08.560 --> 00:05:14.560
and kind of like was um and a lot of the

00:05:12.079 --> 00:05:16.478
kind of capabilities and functionality

00:05:14.560 --> 00:05:19.038
that was originally written in 1.0 was

00:05:16.478 --> 00:05:20.719
migrated to 2.0. So as an example, a lot

00:05:19.038 --> 00:05:22.639
of the stitching up of information

00:05:20.720 --> 00:05:24.960
across images from the different cameras

00:05:22.639 --> 00:05:26.478
and across time was done by a neural

00:05:24.959 --> 00:05:29.839
network and we were able to delete a lot

00:05:26.478 --> 00:05:32.560
of code and so the software 2.0 stack

00:05:29.839 --> 00:05:34.159
quite literally ate through the software

00:05:32.560 --> 00:05:35.680
stack of the autopilot. So I thought

00:05:34.160 --> 00:05:37.039
this was really remarkable at the time

00:05:35.680 --> 00:05:39.360
and I think we're seeing the same thing

00:05:37.038 --> 00:05:40.800
again where uh basically we have a new

00:05:39.360 --> 00:05:42.479
kind of software and it's eating through

00:05:40.800 --> 00:05:44.400
the stack. We have three completely

00:05:42.478 --> 00:05:45.599
different programming paradigms and I

00:05:44.399 --> 00:05:47.359
think if you're entering the industry

00:05:45.600 --> 00:05:49.360
it's a very good idea to be fluent in

00:05:47.360 --> 00:05:50.800
all of them because they all have slight

00:05:49.360 --> 00:05:53.120
pros and cons and you may want to

00:05:50.800 --> 00:05:54.400
program some functionality in 1.0 or 2.0

00:05:53.120 --> 00:05:55.600
or 3.0. Are you going to train

00:05:54.399 --> 00:05:57.439
neurallet? Are you going to just prompt

00:05:55.600 --> 00:05:59.360
an LLM? Should this be a piece of code

00:05:57.439 --> 00:06:00.560
that's explicit etc. So we all have to

00:05:59.360 --> 00:06:03.520
make these decisions and actually

00:06:00.560 --> 00:06:06.800
potentially uh fluidly trans transition

00:06:03.519 --> 00:06:09.758
between these paradigms. So what I

00:06:06.800 --> 00:06:11.759
wanted to get into now is first I want

00:06:09.759 --> 00:06:13.520
to in the first part talk about LLMs and

00:06:11.759 --> 00:06:15.120
how to kind of like think of this new

00:06:13.519 --> 00:06:17.439
paradigm and the ecosystem and what that

00:06:15.120 --> 00:06:18.720
looks like. Uh like what are what is

00:06:17.439 --> 00:06:20.240
this new computer? What does it look

00:06:18.720 --> 00:06:23.759
like and what does the ecosystem look

00:06:20.240 --> 00:06:25.759
like? Um I was struck by this quote from

00:06:23.759 --> 00:06:27.520
Anduring actually uh many years ago now

00:06:25.759 --> 00:06:29.439
I think and I think Andrew is going to

00:06:27.519 --> 00:06:30.639
be speaking right after me. Uh but he

00:06:29.439 --> 00:06:33.360
said at the time AI is the new

00:06:30.639 --> 00:06:34.639
electricity and I do think that it um

00:06:33.360 --> 00:06:36.720
kind of captures something very

00:06:34.639 --> 00:06:38.960
interesting in that LLMs certainly feel

00:06:36.720 --> 00:06:41.600
like they have properties of utilities

00:06:38.959 --> 00:06:44.239
right now. So

00:06:41.600 --> 00:06:47.120
um LLM labs like OpenAI, Gemini,

00:06:44.240 --> 00:06:48.879
Enthropic etc. They spend capex to train

00:06:47.120 --> 00:06:51.120
the LLMs and this is kind of equivalent

00:06:48.879 --> 00:06:53.038
to building out a grid and then there's

00:06:51.120 --> 00:06:56.399
opex to serve that intelligence over

00:06:53.038 --> 00:06:58.639
APIs to all of us and this is done

00:06:56.399 --> 00:07:00.399
through metered access where we pay per

00:06:58.639 --> 00:07:01.918
million tokens or something like that

00:07:00.399 --> 00:07:03.918
and we have a lot of demands that are

00:07:01.918 --> 00:07:06.240
very utility- like demands out of this

00:07:03.918 --> 00:07:08.959
API we demand low latency high uptime

00:07:06.240 --> 00:07:10.800
consistent quality etc. In electricity,

00:07:08.959 --> 00:07:12.399
you would have a transfer switch. So you

00:07:10.800 --> 00:07:14.400
can transfer your electricity source

00:07:12.399 --> 00:07:16.799
from like grid and solar or battery or

00:07:14.399 --> 00:07:18.560
generator. In LLM, we have maybe open

00:07:16.800 --> 00:07:20.639
router and easily switch between the

00:07:18.560 --> 00:07:23.038
different types of LLMs that exist.

00:07:20.639 --> 00:07:25.038
Because the LLM are software, they don't

00:07:23.038 --> 00:07:26.719
compete for physical space. So it's okay

00:07:25.038 --> 00:07:28.159
to have basically like six electricity

00:07:26.720 --> 00:07:29.840
providers and you can switch between

00:07:28.160 --> 00:07:31.919
them, right? Because they don't compete

00:07:29.839 --> 00:07:33.679
in such a direct way. And I think what's

00:07:31.918 --> 00:07:36.478
also a little fascinating and we saw

00:07:33.680 --> 00:07:38.800
this in the last few days actually a lot

00:07:36.478 --> 00:07:41.120
of the LLMs went down and people were

00:07:38.800 --> 00:07:42.478
kind of like stuck and unable to work.

00:07:41.120 --> 00:07:43.759
And uh I think it's kind of fascinating

00:07:42.478 --> 00:07:45.758
to me that when the state-of-the-art

00:07:43.759 --> 00:07:47.759
LLMs go down, it's actually kind of like

00:07:45.759 --> 00:07:49.360
an intelligence brownout in the world.

00:07:47.759 --> 00:07:52.080
It's kind of like when the voltage is

00:07:49.360 --> 00:07:55.120
unreliable in the grid and uh the planet

00:07:52.079 --> 00:07:56.719
just gets dumber the more reliance we

00:07:55.120 --> 00:07:58.399
have on these models, which already is

00:07:56.720 --> 00:08:00.800
like really dramatic and I think will

00:07:58.399 --> 00:08:02.239
continue to grow. But LLM's don't only

00:08:00.800 --> 00:08:03.520
have properties of utilities. I think

00:08:02.240 --> 00:08:06.478
it's also fair to say that they have

00:08:03.519 --> 00:08:09.519
some properties of fabs. And the reason

00:08:06.478 --> 00:08:12.240
for this is that the capex required for

00:08:09.519 --> 00:08:14.318
building LLM is actually quite large. Uh

00:08:12.240 --> 00:08:15.918
it's not just like building some uh

00:08:14.319 --> 00:08:17.598
power station or something like that,

00:08:15.918 --> 00:08:20.000
right? You're investing a huge amount of

00:08:17.598 --> 00:08:22.478
money and I think the tech tree and uh

00:08:20.000 --> 00:08:24.399
for the technology is growing quite

00:08:22.478 --> 00:08:26.959
rapidly. So we're in a world where we

00:08:24.399 --> 00:08:28.959
have sort of deep tech trees, research

00:08:26.959 --> 00:08:32.399
and development secrets that are

00:08:28.959 --> 00:08:34.240
centralizing inside the LLM labs. Um and

00:08:32.399 --> 00:08:36.240
but I think the analogy muddies a little

00:08:34.240 --> 00:08:38.158
bit also because as I mentioned this is

00:08:36.240 --> 00:08:40.959
software and software is a bit less

00:08:38.158 --> 00:08:43.038
defensible because it is so malleable.

00:08:40.958 --> 00:08:44.319
And so um I think it's just an

00:08:43.038 --> 00:08:46.639
interesting kind of thing to think about

00:08:44.320 --> 00:08:48.160
potentially. There's many analogy

00:08:46.639 --> 00:08:49.600
analogies you can make like a 4

00:08:48.159 --> 00:08:51.039
nanometer process node maybe is

00:08:49.600 --> 00:08:53.040
something like a cluster with certain

00:08:51.039 --> 00:08:54.799
max flops. You can think about when

00:08:53.039 --> 00:08:56.079
you're use when you're using Nvidia GPUs

00:08:54.799 --> 00:08:57.120
and you're only doing the software and

00:08:56.080 --> 00:08:59.120
you're not doing the hardware. That's

00:08:57.120 --> 00:09:00.320
kind of like the fabless model. But if

00:08:59.120 --> 00:09:02.000
you're actually also building your own

00:09:00.320 --> 00:09:03.278
hardware and you're training on TPUs if

00:09:02.000 --> 00:09:05.200
you're Google, that's kind of like the

00:09:03.278 --> 00:09:06.399
Intel model where you own your fab. So I

00:09:05.200 --> 00:09:08.240
think there's some analogies here that

00:09:06.399 --> 00:09:09.759
make sense. But actually I think the

00:09:08.240 --> 00:09:12.480
analogy that makes the most sense

00:09:09.759 --> 00:09:15.278
perhaps is that in my mind LLM have very

00:09:12.480 --> 00:09:17.759
strong kind of analogies to operating

00:09:15.278 --> 00:09:19.519
systems. Uh in that this is not just

00:09:17.759 --> 00:09:20.958
electricity or water. It's not something

00:09:19.519 --> 00:09:22.959
that comes out of the tap as a

00:09:20.958 --> 00:09:25.919
commodity. uh this is these are now

00:09:22.958 --> 00:09:28.719
increasingly complex software ecosystems

00:09:25.919 --> 00:09:30.879
right so uh they're not just like simple

00:09:28.720 --> 00:09:32.000
commodities like electricity and it's

00:09:30.879 --> 00:09:33.919
kind of interesting to me that the

00:09:32.000 --> 00:09:36.159
ecosystem is shaping in a very similar

00:09:33.919 --> 00:09:38.559
kind of way where you have a few closed

00:09:36.159 --> 00:09:39.838
source providers like Windows or Mac OS

00:09:38.559 --> 00:09:42.719
and then you have an open source

00:09:39.839 --> 00:09:45.519
alternative like Linux and I think for u

00:09:42.720 --> 00:09:47.519
neural for LLMs as well we have a kind

00:09:45.519 --> 00:09:49.200
of a few competing closed source

00:09:47.519 --> 00:09:51.440
providers and then maybe the llama

00:09:49.200 --> 00:09:53.120
ecosystem is currently like maybe a

00:09:51.440 --> 00:09:55.120
close approximation to something that

00:09:53.120 --> 00:09:56.480
may grow into something like Linux.

00:09:55.120 --> 00:09:58.159
Again, I think it's still very early

00:09:56.480 --> 00:09:59.600
because these are just simple LLMs, but

00:09:58.159 --> 00:10:01.120
we're starting to see that these are

00:09:59.600 --> 00:10:02.800
going to get a lot more complicated.

00:10:01.120 --> 00:10:03.919
It's not just about the LLM itself. It's

00:10:02.799 --> 00:10:05.519
about all the tool use and the

00:10:03.919 --> 00:10:07.278
multiodalities and how all of that

00:10:05.519 --> 00:10:09.360
works. And so when I sort of had this

00:10:07.278 --> 00:10:11.200
realization a while back, I tried to

00:10:09.360 --> 00:10:12.800
sketch it out and it kind of seemed to

00:10:11.200 --> 00:10:15.839
me like LLMs are kind of like a new

00:10:12.799 --> 00:10:17.599
operating system, right? So the LLM is a

00:10:15.839 --> 00:10:19.760
new kind of a computer. It's sitting

00:10:17.600 --> 00:10:21.519
it's kind of like the CPU equivalent. uh

00:10:19.759 --> 00:10:24.399
the context windows are kind of like the

00:10:21.519 --> 00:10:26.639
memory and then the LLM is orchestrating

00:10:24.399 --> 00:10:29.839
memory and compute uh for problem

00:10:26.639 --> 00:10:32.639
solving um using all of these uh

00:10:29.839 --> 00:10:34.320
capabilities here and so definitely if

00:10:32.639 --> 00:10:36.480
you look at it looks very much like

00:10:34.320 --> 00:10:38.879
operating system from that perspective.

00:10:36.480 --> 00:10:41.200
Um, a few more analogies. For example,

00:10:38.879 --> 00:10:43.679
if you want to download an app, say I go

00:10:41.200 --> 00:10:46.240
to VS Code and I go to download, you can

00:10:43.679 --> 00:10:50.159
download VS Code and you can run it on

00:10:46.240 --> 00:10:53.120
Windows, Linux or or Mac in the same way

00:10:50.159 --> 00:10:55.519
as you can take an LLM app like cursor

00:10:53.120 --> 00:10:57.440
and you can run it on GPT or cloud or

00:10:55.519 --> 00:10:59.039
Gemini series, right? It's just a drop

00:10:57.440 --> 00:11:00.720
down. So, it's kind of like similar in

00:10:59.039 --> 00:11:02.399
that way as well.

00:11:00.720 --> 00:11:04.320
uh more analogies that I think strike me

00:11:02.399 --> 00:11:05.919
is that we're kind of like in this

00:11:04.320 --> 00:11:09.040
1960sish

00:11:05.919 --> 00:11:10.719
era where LLM compute is still very

00:11:09.039 --> 00:11:13.439
expensive for this new kind of a

00:11:10.720 --> 00:11:15.839
computer and that forces the LLMs to be

00:11:13.440 --> 00:11:18.399
centralized in the cloud and we're all

00:11:15.839 --> 00:11:20.320
just uh sort of thing clients that

00:11:18.399 --> 00:11:22.078
interact with it over the network and

00:11:20.320 --> 00:11:24.160
none of us have full utilization of

00:11:22.078 --> 00:11:26.399
these computers and therefore it makes

00:11:24.159 --> 00:11:28.319
sense to use time sharing where we're

00:11:26.399 --> 00:11:30.000
all just you know a dimension of the

00:11:28.320 --> 00:11:32.000
batch when they're running the computer

00:11:30.000 --> 00:11:33.440
in the cloud. And this is very much what

00:11:32.000 --> 00:11:35.039
computers used to look like at during

00:11:33.440 --> 00:11:36.160
this time. The operating systems were in

00:11:35.039 --> 00:11:39.599
the cloud. Everything was streamed

00:11:36.159 --> 00:11:41.519
around and there was batching. And so

00:11:39.600 --> 00:11:42.959
the p the personal computing revolution

00:11:41.519 --> 00:11:44.560
hasn't happened yet because it's just

00:11:42.958 --> 00:11:46.719
not economical. It doesn't make sense.

00:11:44.559 --> 00:11:48.399
But I think some people are trying. And

00:11:46.720 --> 00:11:50.399
it turns out that Mac minis, for

00:11:48.399 --> 00:11:52.320
example, are a very good fit for some of

00:11:50.399 --> 00:11:53.839
the LLMs because it's all if you're

00:11:52.320 --> 00:11:55.360
doing batch one inference, this is all

00:11:53.839 --> 00:11:56.880
super memory bound. So this actually

00:11:55.360 --> 00:11:58.720
works.

00:11:56.879 --> 00:12:00.399
And uh I think these are some early

00:11:58.720 --> 00:12:02.079
indications maybe of personal computing.

00:12:00.399 --> 00:12:03.519
Uh but this hasn't really happened yet.

00:12:02.078 --> 00:12:05.199
It's not clear what this looks like.

00:12:03.519 --> 00:12:08.078
Maybe some of you get to invent what

00:12:05.200 --> 00:12:10.320
what this is or how it works or uh what

00:12:08.078 --> 00:12:12.159
this should what this should be. Maybe

00:12:10.320 --> 00:12:14.560
one more analogy that I'll mention is

00:12:12.159 --> 00:12:16.480
whenever I talk to Chach or some LLM

00:12:14.559 --> 00:12:18.399
directly in text, I feel like I'm

00:12:16.480 --> 00:12:21.039
talking to an operating system through

00:12:18.399 --> 00:12:22.639
the terminal. Like it's just it's it's

00:12:21.039 --> 00:12:24.719
text. It's direct access to the

00:12:22.639 --> 00:12:26.720
operating system. And I think a guey

00:12:24.720 --> 00:12:29.680
hasn't yet really been invented in like

00:12:26.720 --> 00:12:31.440
a general way like should chatt have a

00:12:29.679 --> 00:12:33.439
guey like different than just a tech

00:12:31.440 --> 00:12:35.360
bubbles. Uh certainly some of the apps

00:12:33.440 --> 00:12:38.480
that we're going to go into in a bit

00:12:35.360 --> 00:12:40.240
have guey but there's no like guey

00:12:38.480 --> 00:12:43.440
across all the tasks if that makes

00:12:40.240 --> 00:12:45.519
sense. Um there are some ways in which

00:12:43.440 --> 00:12:47.440
LLMs are different from kind of

00:12:45.519 --> 00:12:49.839
operating systems in some fairly unique

00:12:47.440 --> 00:12:52.880
way and from early computing. And I

00:12:49.839 --> 00:12:54.240
wrote about uh this one particular

00:12:52.879 --> 00:12:57.120
property that strikes me as very

00:12:54.240 --> 00:12:59.839
different uh this time around. It's that

00:12:57.120 --> 00:13:02.000
LLMs like flip they flip the direction

00:12:59.839 --> 00:13:05.360
of technology diffusion uh that is

00:13:02.000 --> 00:13:07.039
usually uh present in technology. So for

00:13:05.360 --> 00:13:09.120
example with electricity, cryptography,

00:13:07.039 --> 00:13:10.639
computing, flight, internet, GPS, lots

00:13:09.120 --> 00:13:12.320
of new transformative technologies that

00:13:10.639 --> 00:13:14.320
have not been around. Typically it is

00:13:12.320 --> 00:13:16.720
the government and corporations that are

00:13:14.320 --> 00:13:18.720
the first users because it's new and

00:13:16.720 --> 00:13:20.720
expensive etc. and it only later

00:13:18.720 --> 00:13:22.079
diffuses to consumer. Uh, but I feel

00:13:20.720 --> 00:13:24.000
like LLMs are kind of like flipped

00:13:22.078 --> 00:13:26.000
around. So maybe with early computers,

00:13:24.000 --> 00:13:29.039
it was all about ballistics and military

00:13:26.000 --> 00:13:30.320
use, but with LLMs, it's all about how

00:13:29.039 --> 00:13:32.000
do you boil an egg or something like

00:13:30.320 --> 00:13:33.600
that. This is certainly like a lot of my

00:13:32.000 --> 00:13:35.600
use. And so it's really fascinating to

00:13:33.600 --> 00:13:37.360
me that we have a new magical computer

00:13:35.600 --> 00:13:38.879
and it's like helping me boil an egg.

00:13:37.360 --> 00:13:40.720
It's not helping the government do

00:13:38.879 --> 00:13:42.159
something really crazy like some

00:13:40.720 --> 00:13:43.839
military ballistics or some special

00:13:42.159 --> 00:13:45.120
technology. Indeed, corporations are

00:13:43.839 --> 00:13:47.200
governments are lagging behind the

00:13:45.120 --> 00:13:48.959
adoption of all of us, of all of these

00:13:47.200 --> 00:13:50.480
technologies. So, it's just backwards

00:13:48.958 --> 00:13:52.399
and I think it informs maybe some of the

00:13:50.480 --> 00:13:53.600
uses of how we want to use this

00:13:52.399 --> 00:13:56.078
technology or like where are some of the

00:13:53.600 --> 00:14:01.040
first apps and so on.

00:13:56.078 --> 00:14:03.679
So, in summary so far, LLM labs LLMs. I

00:14:01.039 --> 00:14:06.480
think it's accurate language to use, but

00:14:03.679 --> 00:14:08.559
LLMs are complicated operating systems.

00:14:06.480 --> 00:14:10.240
They're circa 1960s in computing and

00:14:08.559 --> 00:14:11.838
we're redoing computing all over again.

00:14:10.240 --> 00:14:13.839
and they're currently available via time

00:14:11.839 --> 00:14:16.000
sharing and distributed like a utility.

00:14:13.839 --> 00:14:17.360
What is new and unprecedented is that

00:14:16.000 --> 00:14:18.879
they're not in the hands of a few

00:14:17.360 --> 00:14:20.240
governments and corporations. They're in

00:14:18.879 --> 00:14:21.600
the hands of all of us because we all

00:14:20.240 --> 00:14:24.320
have a computer and it's all just

00:14:21.600 --> 00:14:26.639
software and Chaship was beamed down to

00:14:24.320 --> 00:14:28.320
our computers like billions of people

00:14:26.639 --> 00:14:30.879
like instantly and overnight and this is

00:14:28.320 --> 00:14:33.278
insane. Uh and it's kind of insane to me

00:14:30.879 --> 00:14:34.958
that this is the case and now it is our

00:14:33.278 --> 00:14:37.278
time to enter the industry and program

00:14:34.958 --> 00:14:39.679
these computers. This is crazy. So I

00:14:37.278 --> 00:14:42.078
think this is quite remarkable. Before

00:14:39.679 --> 00:14:43.519
we program LLMs, we have to kind of like

00:14:42.078 --> 00:14:45.838
spend some time to think about what

00:14:43.519 --> 00:14:48.320
these things are. And I especially like

00:14:45.839 --> 00:14:50.480
to kind of talk about their psychology.

00:14:48.320 --> 00:14:51.519
So the way I like to think about LLMs is

00:14:50.480 --> 00:14:54.079
that they're kind of like people

00:14:51.519 --> 00:14:56.399
spirits. Um they are stoastic

00:14:54.078 --> 00:14:58.000
simulations of people. Um and the

00:14:56.399 --> 00:14:59.839
simulator in this case happens to be an

00:14:58.000 --> 00:15:02.720
auto reggressive transformer. So

00:14:59.839 --> 00:15:04.800
transformer is a neural net. Uh it's and

00:15:02.720 --> 00:15:06.480
it just kind of like is goes on the

00:15:04.799 --> 00:15:08.319
level of tokens. It goes chunk chunk

00:15:06.480 --> 00:15:10.159
chunk chunk chunk. And there's an almost

00:15:08.320 --> 00:15:14.720
equal amount of compute for every single

00:15:10.159 --> 00:15:16.958
chunk. Um and um this simulator of

00:15:14.720 --> 00:15:19.040
course is is just is basically there's

00:15:16.958 --> 00:15:20.479
some weights involved and we fit it to

00:15:19.039 --> 00:15:22.159
all of text that we have on the internet

00:15:20.480 --> 00:15:24.240
and so on. And you end up with this kind

00:15:22.159 --> 00:15:26.240
of a simulator and because it is trained

00:15:24.240 --> 00:15:28.399
on humans, it's got this emergent

00:15:26.240 --> 00:15:30.639
psychology that is humanlike. So the

00:15:28.399 --> 00:15:32.559
first thing you'll notice is of course

00:15:30.639 --> 00:15:34.639
uh LLM have encyclopedic knowledge and

00:15:32.559 --> 00:15:36.078
memory. uh and they can remember lots of

00:15:34.639 --> 00:15:37.600
things, a lot more than any single

00:15:36.078 --> 00:15:39.838
individual human can because they read

00:15:37.600 --> 00:15:41.680
so many things. It's it actually kind of

00:15:39.839 --> 00:15:43.040
reminds me of this movie Rainman, which

00:15:41.679 --> 00:15:44.479
I actually really recommend people

00:15:43.039 --> 00:15:46.719
watch. It's an amazing movie. I love

00:15:44.480 --> 00:15:49.199
this movie. Um and Dustin Hoffman here

00:15:46.720 --> 00:15:51.600
is an autistic savant who has almost

00:15:49.198 --> 00:15:53.278
perfect memory. So, he can read a he can

00:15:51.600 --> 00:15:55.360
read like a phone book and remember all

00:15:53.278 --> 00:15:57.198
of the names and phone numbers. And I

00:15:55.360 --> 00:15:58.959
kind of feel like LM are kind of like

00:15:57.198 --> 00:16:00.399
very similar. They can remember Shaw

00:15:58.958 --> 00:16:02.479
hashes and lots of different kinds of

00:16:00.399 --> 00:16:04.399
things very very easily. So they

00:16:02.480 --> 00:16:06.240
certainly have superpowers in some set

00:16:04.399 --> 00:16:08.799
in some respects. But they also have a

00:16:06.240 --> 00:16:11.759
bunch of I would say cognitive deficits.

00:16:08.799 --> 00:16:13.120
So they hallucinate quite a bit. Um and

00:16:11.759 --> 00:16:15.278
they kind of make up stuff and don't

00:16:13.120 --> 00:16:17.679
have a very good uh sort of internal

00:16:15.278 --> 00:16:19.360
model of self-nowledge, not sufficient

00:16:17.679 --> 00:16:21.599
at least. And this has gotten better but

00:16:19.360 --> 00:16:22.800
not perfect. They display jagged

00:16:21.600 --> 00:16:24.480
intelligence. So they're going to be

00:16:22.799 --> 00:16:26.000
superhuman in some problems solving

00:16:24.480 --> 00:16:27.680
domains. And then they're going to make

00:16:26.000 --> 00:16:29.919
mistakes that basically no human will

00:16:27.679 --> 00:16:32.559
make. like you know they will insist

00:16:29.919 --> 00:16:34.240
that 9.11 is greater than 9.9 or that

00:16:32.559 --> 00:16:36.159
there are two Rs in strawberry these are

00:16:34.240 --> 00:16:38.879
some famous examples but basically there

00:16:36.159 --> 00:16:40.319
are rough edges that you can trip on so

00:16:38.879 --> 00:16:43.278
that's kind of I think also kind of

00:16:40.320 --> 00:16:46.879
unique um they also kind of suffer from

00:16:43.278 --> 00:16:48.078
entrograde amnesia um so uh and I think

00:16:46.879 --> 00:16:49.278
I'm alluding to the fact that if you

00:16:48.078 --> 00:16:51.439
have a co-orker who joins your

00:16:49.278 --> 00:16:54.159
organization this co-orker will over

00:16:51.440 --> 00:16:55.920
time learn your organization and uh they

00:16:54.159 --> 00:16:57.759
will understand and gain like a huge

00:16:55.919 --> 00:16:59.599
amount of context on the organization

00:16:57.759 --> 00:17:01.120
and they go home and they sleep and they

00:16:59.600 --> 00:17:03.440
consolidate knowledge and they develop

00:17:01.120 --> 00:17:04.640
expertise over time. LLMs don't natively

00:17:03.440 --> 00:17:06.400
do this and this is not something that

00:17:04.640 --> 00:17:09.280
has really been solved in the R&D of

00:17:06.400 --> 00:17:10.559
LLM. I think um and so context windows

00:17:09.279 --> 00:17:12.000
are really kind of like working memory

00:17:10.558 --> 00:17:13.599
and you have to sort of program the

00:17:12.000 --> 00:17:15.038
working memory quite directly because

00:17:13.599 --> 00:17:17.038
they don't just kind of like get smarter

00:17:15.038 --> 00:17:19.038
by uh by default and I think a lot of

00:17:17.038 --> 00:17:22.240
people get tripped up by the analogies

00:17:19.038 --> 00:17:23.919
uh in this way. Uh in popular culture I

00:17:22.240 --> 00:17:26.078
recommend people watch these two movies

00:17:23.919 --> 00:17:27.759
uh Momento and 51st dates. In both of

00:17:26.078 --> 00:17:29.839
these movies, the protagonists, their

00:17:27.759 --> 00:17:32.160
weights are fixed and their context

00:17:29.839 --> 00:17:34.240
windows gets wiped every single morning

00:17:32.160 --> 00:17:35.759
and it's really problematic to go to

00:17:34.240 --> 00:17:37.519
work or have relationships when this

00:17:35.759 --> 00:17:39.599
happens and this happens to all the

00:17:37.519 --> 00:17:42.319
time. I guess one more thing I would

00:17:39.599 --> 00:17:44.319
point to is security kind of related

00:17:42.319 --> 00:17:46.399
limitations of the use of LLM. So for

00:17:44.319 --> 00:17:48.240
example, LLMs are quite gullible. Uh

00:17:46.400 --> 00:17:50.798
they are susceptible to prompt injection

00:17:48.240 --> 00:17:52.798
risks. They might leak your data etc.

00:17:50.798 --> 00:17:55.279
And so um and there's many other

00:17:52.798 --> 00:17:57.519
considerations uh security related. So,

00:17:55.279 --> 00:18:00.000
so basically long story short, you have

00:17:57.519 --> 00:18:01.279
to load your you have to load your you

00:18:00.000 --> 00:18:03.200
have to simultaneously think through

00:18:01.279 --> 00:18:05.440
this superhuman thing that has a bunch

00:18:03.200 --> 00:18:07.759
of cognitive deficits and issues. How do

00:18:05.440 --> 00:18:10.640
we and yet they are extremely like

00:18:07.759 --> 00:18:12.400
useful and so how do we program them and

00:18:10.640 --> 00:18:15.759
how do we work around their deficits and

00:18:12.400 --> 00:18:17.440
enjoy their superhuman powers.

00:18:15.759 --> 00:18:18.960
So what I want to switch to now is talk

00:18:17.440 --> 00:18:20.720
about the opportunities of how do we use

00:18:18.960 --> 00:18:22.400
these models and what are some of the

00:18:20.720 --> 00:18:23.519
biggest opportunities. This is not a

00:18:22.400 --> 00:18:24.640
comprehensive list just some of the

00:18:23.519 --> 00:18:26.879
things that I thought were interesting

00:18:24.640 --> 00:18:29.280
for this talk. The first thing I'm kind

00:18:26.880 --> 00:18:32.160
of excited about is what I would call

00:18:29.279 --> 00:18:34.240
partial autonomy apps. So for example,

00:18:32.160 --> 00:18:36.558
let's work with the example of coding.

00:18:34.240 --> 00:18:38.079
You can certainly go to chacht directly

00:18:36.558 --> 00:18:40.960
and you can start copy pasting code

00:18:38.079 --> 00:18:42.399
around and copyping bug reports and

00:18:40.960 --> 00:18:44.160
stuff around and getting code and copy

00:18:42.400 --> 00:18:45.440
pasting everything around. Why would you

00:18:44.160 --> 00:18:47.120
why would you do that? Why would you go

00:18:45.440 --> 00:18:48.480
directly to the operating system? It

00:18:47.119 --> 00:18:50.719
makes a lot more sense to have an app

00:18:48.480 --> 00:18:53.759
dedicated for this. And so I think many

00:18:50.720 --> 00:18:56.319
of you uh use uh cursor. I do as well.

00:18:53.759 --> 00:18:57.759
And uh cursor is kind of like the thing

00:18:56.319 --> 00:18:59.759
you want instead. You don't want to just

00:18:57.759 --> 00:19:01.440
directly go to the chash apt. And I

00:18:59.759 --> 00:19:03.759
think cursor is a very good example of

00:19:01.440 --> 00:19:06.160
an early LLM app that has a bunch of

00:19:03.759 --> 00:19:08.000
properties that I think are um useful

00:19:06.160 --> 00:19:09.679
across all the LLM apps. So in

00:19:08.000 --> 00:19:12.000
particular, you will notice that we have

00:19:09.679 --> 00:19:13.840
a traditional interface that allows a

00:19:12.000 --> 00:19:16.480
human to go in and do all the work

00:19:13.839 --> 00:19:17.839
manually just as before. But in addition

00:19:16.480 --> 00:19:19.360
to that, we now have this LLM

00:19:17.839 --> 00:19:21.918
integration that allows us to go in

00:19:19.359 --> 00:19:23.519
bigger chunks. And so some of the

00:19:21.919 --> 00:19:25.840
properties of LLM apps that I think are

00:19:23.519 --> 00:19:28.079
shared and useful to point out. Number

00:19:25.839 --> 00:19:31.199
one, the LLMs basically do a ton of the

00:19:28.079 --> 00:19:33.199
context management. Um, number two, they

00:19:31.200 --> 00:19:34.960
orchestrate multiple calls to LLMs,

00:19:33.200 --> 00:19:36.960
right? So in the case of cursor, there's

00:19:34.960 --> 00:19:39.200
under the hood embedding models for all

00:19:36.960 --> 00:19:41.840
your files, the actual chat models,

00:19:39.200 --> 00:19:43.919
models that apply diffs to the code, and

00:19:41.839 --> 00:19:46.079
this is all orchestrated for you. A

00:19:43.919 --> 00:19:48.480
really big one that uh I think also

00:19:46.079 --> 00:19:50.480
maybe not fully appreciated always is

00:19:48.480 --> 00:19:53.120
application specific uh GUI and the

00:19:50.480 --> 00:19:54.558
importance of it. Um because you don't

00:19:53.119 --> 00:19:56.558
just want to talk to the operating

00:19:54.558 --> 00:19:59.038
system directly in text. Text is very

00:19:56.558 --> 00:20:00.480
hard to read, interpret, understand and

00:19:59.038 --> 00:20:03.119
also like you don't want to take some of

00:20:00.480 --> 00:20:05.038
these actions natively in text. So it's

00:20:03.119 --> 00:20:06.798
much better to just see a diff as like

00:20:05.038 --> 00:20:08.480
red and green change and you can see

00:20:06.798 --> 00:20:10.240
what's being added is subtracted. It's

00:20:08.480 --> 00:20:11.919
much easier to just do command Y to

00:20:10.240 --> 00:20:13.120
accept or command N to reject. I

00:20:11.919 --> 00:20:15.520
shouldn't have to type it in text,

00:20:13.119 --> 00:20:17.839
right? So, a guey allows a human to

00:20:15.519 --> 00:20:20.000
audit the work of these fallible systems

00:20:17.839 --> 00:20:21.759
and to go faster. I'm going to come back

00:20:20.000 --> 00:20:23.839
to this point a little bit uh later as

00:20:21.759 --> 00:20:25.200
well. And the last kind of feature I

00:20:23.839 --> 00:20:27.678
want to point out is that there's what I

00:20:25.200 --> 00:20:29.440
call the autonomy slider. So, for

00:20:27.679 --> 00:20:31.519
example, in cursor, you can just do tap

00:20:29.440 --> 00:20:33.600
completion. You're mostly in charge. You

00:20:31.519 --> 00:20:36.000
can select a chunk of code and command K

00:20:33.599 --> 00:20:37.918
to change just that chunk of code. You

00:20:36.000 --> 00:20:40.400
can do command L to change the entire

00:20:37.919 --> 00:20:42.159
file. Or you can do command I which just

00:20:40.400 --> 00:20:44.080
you know let it rip do whatever you want

00:20:42.159 --> 00:20:46.400
in the entire repo and that's the sort

00:20:44.079 --> 00:20:48.319
of full autonomy agent agentic version

00:20:46.400 --> 00:20:50.159
and so you are in charge of the autonomy

00:20:48.319 --> 00:20:53.038
slider and depending on the complexity

00:20:50.159 --> 00:20:54.320
of the task at hand you can uh tune the

00:20:53.038 --> 00:20:57.119
amount of autonomy that you're willing

00:20:54.319 --> 00:20:58.558
to give up uh for that task maybe to

00:20:57.119 --> 00:21:03.038
show one more example of a fairly

00:20:58.558 --> 00:21:04.639
successful LLM app uh perplexity um it

00:21:03.038 --> 00:21:07.200
also has very similar features to what

00:21:04.640 --> 00:21:08.720
I've just pointed out to in cursor uh it

00:21:07.200 --> 00:21:10.960
packages up a lot of the information. It

00:21:08.720 --> 00:21:13.440
orchestrates multiple LLMs. It's got a

00:21:10.960 --> 00:21:15.600
GUI that allows you to audit some of its

00:21:13.440 --> 00:21:17.279
work. So, for example, it will site

00:21:15.599 --> 00:21:18.959
sources and you can imagine inspecting

00:21:17.279 --> 00:21:20.639
them. And it's got an autonomy slider.

00:21:18.960 --> 00:21:22.319
You can either just do a quick search or

00:21:20.640 --> 00:21:24.320
you can do research or you can do deep

00:21:22.319 --> 00:21:25.678
research and come back 10 minutes later.

00:21:24.319 --> 00:21:27.678
So, this is all just varying levels of

00:21:25.679 --> 00:21:30.159
autonomy that you give up to the tool.

00:21:27.679 --> 00:21:32.000
So, I guess my question is I feel like a

00:21:30.159 --> 00:21:33.520
lot of software will become partially

00:21:32.000 --> 00:21:35.279
autonomous. I'm trying to think through

00:21:33.519 --> 00:21:36.960
like what does that look like? And for

00:21:35.279 --> 00:21:38.960
many of you who maintain products and

00:21:36.960 --> 00:21:40.240
services, how are you going to make your

00:21:38.960 --> 00:21:42.720
products and services partially

00:21:40.240 --> 00:21:45.120
autonomous? Can an LLM see everything

00:21:42.720 --> 00:21:47.038
that a human can see? Can an LLM act in

00:21:45.119 --> 00:21:49.439
all the ways that a human could act? And

00:21:47.038 --> 00:21:50.879
can humans supervise and stay in the

00:21:49.440 --> 00:21:52.320
loop of this activity? Because again,

00:21:50.880 --> 00:21:54.880
these are fallible systems that aren't

00:21:52.319 --> 00:21:56.558
yet perfect. And what does a diff look

00:21:54.880 --> 00:21:58.799
like in Photoshop or something like

00:21:56.558 --> 00:22:00.079
that? You know, and also a lot of the

00:21:58.798 --> 00:22:01.839
traditional software right now, it has

00:22:00.079 --> 00:22:03.359
all these switches and all this kind of

00:22:01.839 --> 00:22:04.720
stuff that's all designed for human. All

00:22:03.359 --> 00:22:07.759
of this has to change and become

00:22:04.720 --> 00:22:09.519
accessible to LLMs.

00:22:07.759 --> 00:22:11.119
So, one thing I want to stress with a

00:22:09.519 --> 00:22:14.240
lot of these LLM apps that I'm not sure

00:22:11.119 --> 00:22:16.798
gets as much attention as it should is

00:22:14.240 --> 00:22:18.640
um we we're now kind of like cooperating

00:22:16.798 --> 00:22:20.158
with AIS and usually they are doing the

00:22:18.640 --> 00:22:22.559
generation and we as humans are doing

00:22:20.159 --> 00:22:24.480
the verification. It is in our interest

00:22:22.558 --> 00:22:25.759
to make this loop go as fast as

00:22:24.480 --> 00:22:28.000
possible. So, we're getting a lot of

00:22:25.759 --> 00:22:30.400
work done. There are two major ways that

00:22:28.000 --> 00:22:32.720
I think uh this can be done. Number one,

00:22:30.400 --> 00:22:34.240
you can speed up verification a lot. Um,

00:22:32.720 --> 00:22:36.079
and I think guies, for example, are

00:22:34.240 --> 00:22:39.279
extremely important to this because a

00:22:36.079 --> 00:22:41.359
guey utilizes your computer vision GPU

00:22:39.279 --> 00:22:43.200
in all of our head. Reading text is

00:22:41.359 --> 00:22:45.759
effortful and it's not fun, but looking

00:22:43.200 --> 00:22:47.440
at stuff is fun and it's it's just a

00:22:45.759 --> 00:22:49.679
kind of like a highway to your brain.

00:22:47.440 --> 00:22:51.679
So, I think guies are very useful for

00:22:49.679 --> 00:22:53.600
auditing systems and visual

00:22:51.679 --> 00:22:56.080
representations in general. And number

00:22:53.599 --> 00:22:58.879
two, I would say is we have to keep the

00:22:56.079 --> 00:23:00.639
AI on the leash. We I think a lot of

00:22:58.880 --> 00:23:03.600
people are getting way over excited with

00:23:00.640 --> 00:23:05.840
AI agents and uh it's not useful to me

00:23:03.599 --> 00:23:07.918
to get a diff of 10,000 lines of code to

00:23:05.839 --> 00:23:09.199
my repo. Like I have to I'm still the

00:23:07.919 --> 00:23:11.120
bottleneck, right? Even though that

00:23:09.200 --> 00:23:12.240
10,00 lines come out instantly, I have

00:23:11.119 --> 00:23:15.359
to make sure that this thing is not

00:23:12.240 --> 00:23:16.558
introducing bugs. It's just like and

00:23:15.359 --> 00:23:17.839
that it's doing the correct thing,

00:23:16.558 --> 00:23:22.879
right? And that there's no security

00:23:17.839 --> 00:23:25.439
issues and so on. So um I think that um

00:23:22.880 --> 00:23:28.240
yeah basically you we have to sort of

00:23:25.440 --> 00:23:30.320
like it's in our interest to make the

00:23:28.240 --> 00:23:32.159
the flow of these two go very very fast

00:23:30.319 --> 00:23:33.119
and we have to somehow keep the AI on

00:23:32.159 --> 00:23:35.280
the leash because it gets way too

00:23:33.119 --> 00:23:37.279
overreactive. It's uh it's kind of like

00:23:35.279 --> 00:23:39.200
this. This is how I feel when I do AI

00:23:37.279 --> 00:23:40.879
assisted coding. If I'm just bite coding

00:23:39.200 --> 00:23:42.400
everything is nice and great but if I'm

00:23:40.880 --> 00:23:44.720
actually trying to get work done it's

00:23:42.400 --> 00:23:47.280
not so great to have an overreactive uh

00:23:44.720 --> 00:23:48.798
agent doing all this kind of stuff. So

00:23:47.279 --> 00:23:51.119
this slide is not very good. I'm sorry,

00:23:48.798 --> 00:23:53.839
but I guess I'm trying to develop like

00:23:51.119 --> 00:23:55.759
many of you some ways of utilizing these

00:23:53.839 --> 00:23:58.079
agents in my coding workflow and to do

00:23:55.759 --> 00:23:59.839
AI assisted coding. And in my own work,

00:23:58.079 --> 00:24:02.240
I'm always scared to get way too big

00:23:59.839 --> 00:24:04.158
diffs. I always go in small incremental

00:24:02.240 --> 00:24:06.159
chunks. I want to make sure that

00:24:04.159 --> 00:24:09.120
everything is good. I want to spin this

00:24:06.159 --> 00:24:10.799
loop very very fast and um I sort of

00:24:09.119 --> 00:24:13.199
work on small chunks of single concrete

00:24:10.798 --> 00:24:14.639
thing. Uh and so I think many of you

00:24:13.200 --> 00:24:17.600
probably are developing similar ways of

00:24:14.640 --> 00:24:19.600
working with the with LLMs.

00:24:17.599 --> 00:24:22.240
Um, I also saw a number of blog posts

00:24:19.599 --> 00:24:24.000
that try to develop these best practices

00:24:22.240 --> 00:24:25.359
for working with LLMs. And here's one

00:24:24.000 --> 00:24:26.798
that I read recently and I thought was

00:24:25.359 --> 00:24:28.240
quite good. And it kind of discussed

00:24:26.798 --> 00:24:29.918
some techniques and some of them have to

00:24:28.240 --> 00:24:32.000
do with how you keep the AI on the

00:24:29.919 --> 00:24:34.960
leash. And so, as an example, if you are

00:24:32.000 --> 00:24:36.960
prompting, if your prompt is vague, then

00:24:34.960 --> 00:24:38.880
uh the AI might not do exactly what you

00:24:36.960 --> 00:24:40.240
wanted and in that case, verification

00:24:38.880 --> 00:24:42.080
will fail. You're going to ask for

00:24:40.240 --> 00:24:43.679
something else. If a verification fails,

00:24:42.079 --> 00:24:45.119
then you're going to start spinning. So

00:24:43.679 --> 00:24:46.798
it makes a lot more sense to spend a bit

00:24:45.119 --> 00:24:48.479
more time to be more concrete in your

00:24:46.798 --> 00:24:50.240
prompts which increases the probability

00:24:48.480 --> 00:24:52.079
of successful verification and you can

00:24:50.240 --> 00:24:54.079
move forward. And so I think a lot of us

00:24:52.079 --> 00:24:56.319
are going to end up finding um kind of

00:24:54.079 --> 00:24:57.839
techniques like this. I think in my own

00:24:56.319 --> 00:25:00.079
work as well I'm currently interested in

00:24:57.839 --> 00:25:01.839
uh what education looks like in um

00:25:00.079 --> 00:25:04.480
together with kind of like now that we

00:25:01.839 --> 00:25:07.038
have AI uh and LLMs what does education

00:25:04.480 --> 00:25:09.679
look like? And I think a a large amount

00:25:07.038 --> 00:25:11.440
of thought for me goes into how we keep

00:25:09.679 --> 00:25:13.200
AI on the leash. I don't think it just

00:25:11.440 --> 00:25:14.798
works to go to chat and be like, "Hey,

00:25:13.200 --> 00:25:16.880
teach me physics." I don't think this

00:25:14.798 --> 00:25:18.798
works because the AI is like gets lost

00:25:16.880 --> 00:25:20.880
in the woods. And so for me, this is

00:25:18.798 --> 00:25:22.639
actually two separate apps. For example,

00:25:20.880 --> 00:25:24.880
there's an app for a teacher that

00:25:22.640 --> 00:25:26.480
creates courses and then there's an app

00:25:24.880 --> 00:25:29.120
that takes courses and serves them to

00:25:26.480 --> 00:25:31.200
students. And in both cases, we now have

00:25:29.119 --> 00:25:32.719
this intermediate artifact of a course

00:25:31.200 --> 00:25:33.840
that is auditable and we can make sure

00:25:32.720 --> 00:25:35.919
it's good. We can make sure it's

00:25:33.839 --> 00:25:37.119
consistent. and the AI is kept on the

00:25:35.919 --> 00:25:40.240
leash with respect to a certain

00:25:37.119 --> 00:25:42.639
syllabus, a certain like um progression

00:25:40.240 --> 00:25:44.159
of projects and so on. And so this is

00:25:42.640 --> 00:25:45.759
one way of keeping the AI on leash and I

00:25:44.159 --> 00:25:47.760
think has a much higher likelihood of

00:25:45.759 --> 00:25:49.919
working and the AI is not getting lost

00:25:47.759 --> 00:25:51.919
in the woods.

00:25:49.919 --> 00:25:54.480
One more kind of analogy I wanted to

00:25:51.919 --> 00:25:56.159
sort of allude to is I'm not I'm no

00:25:54.480 --> 00:25:57.839
stranger to partial autonomy and I kind

00:25:56.159 --> 00:26:00.240
of worked on this I think for five years

00:25:57.839 --> 00:26:01.918
at Tesla and this is also a partial

00:26:00.240 --> 00:26:03.519
autonomy product and shares a lot of the

00:26:01.919 --> 00:26:05.440
features like for example right there in

00:26:03.519 --> 00:26:07.599
the instrument panel is the GUI of the

00:26:05.440 --> 00:26:09.200
autopilot so it's showing me what the

00:26:07.599 --> 00:26:10.798
what the neural network sees and so on

00:26:09.200 --> 00:26:13.440
and we have the autonomy slider where

00:26:10.798 --> 00:26:15.599
over the course of my tenure there we

00:26:13.440 --> 00:26:18.320
did more and more autonomous tasks for

00:26:15.599 --> 00:26:21.119
the user and maybe the story that I

00:26:18.319 --> 00:26:22.639
wanted to tell very briefly is uh

00:26:21.119 --> 00:26:25.199
actually the first time I drove a

00:26:22.640 --> 00:26:27.278
self-driving vehicle was in 2013 and I

00:26:25.200 --> 00:26:29.120
had a friend who worked at Whimo and uh

00:26:27.278 --> 00:26:31.519
he offered to give me a drive around

00:26:29.119 --> 00:26:33.918
Palo Alto. I took this picture using

00:26:31.519 --> 00:26:35.278
Google Glass at the time and many of you

00:26:33.919 --> 00:26:37.278
are so young that you might not even

00:26:35.278 --> 00:26:39.440
know what that is. Uh but uh yeah, this

00:26:37.278 --> 00:26:40.960
was like all the rage at the time. And

00:26:39.440 --> 00:26:42.960
we got into this car and we went for

00:26:40.960 --> 00:26:45.120
about a 30-minute drive around Palo Alto

00:26:42.960 --> 00:26:46.960
highways uh streets and so on. And this

00:26:45.119 --> 00:26:49.839
drive was perfect. There was zero

00:26:46.960 --> 00:26:52.480
interventions and this was 2013 which is

00:26:49.839 --> 00:26:54.000
now 12 years ago. And it kind of struck

00:26:52.480 --> 00:26:56.159
me because at the time when I had this

00:26:54.000 --> 00:26:59.519
perfect drive, this perfect demo, I felt

00:26:56.159 --> 00:27:00.799
like, wow, self-driving is imminent

00:26:59.519 --> 00:27:03.440
because this just worked. This is

00:27:00.798 --> 00:27:04.879
incredible. Um, but here we are 12 years

00:27:03.440 --> 00:27:07.038
later and we are still working on

00:27:04.880 --> 00:27:09.200
autonomy. Um, we are still working on

00:27:07.038 --> 00:27:10.798
driving agents and even now we haven't

00:27:09.200 --> 00:27:12.880
actually like really solved the problem.

00:27:10.798 --> 00:27:14.960
like you may see Whimos going around and

00:27:12.880 --> 00:27:16.799
they look driverless but you know

00:27:14.960 --> 00:27:18.720
there's still a lot of teleoperation and

00:27:16.798 --> 00:27:20.960
a lot of human in the loop of a lot of

00:27:18.720 --> 00:27:22.558
this driving so we still haven't even

00:27:20.960 --> 00:27:24.400
like declared success but I think it's

00:27:22.558 --> 00:27:26.558
definitely like going to succeed at this

00:27:24.400 --> 00:27:29.360
point but it just took a long time and

00:27:26.558 --> 00:27:31.599
so I think like like this is software is

00:27:29.359 --> 00:27:34.719
really tricky I think in the same way

00:27:31.599 --> 00:27:36.480
that driving is tricky and so when I see

00:27:34.720 --> 00:27:38.720
things like oh 2025 is the year of

00:27:36.480 --> 00:27:41.038
agents I get very concerned and I kind

00:27:38.720 --> 00:27:44.079
of feel like you know this is the decade

00:27:41.038 --> 00:27:45.759
of agents and this is going to be quite

00:27:44.079 --> 00:27:47.199
some time. We need humans in the loop.

00:27:45.759 --> 00:27:51.038
We need to do this carefully. This is

00:27:47.200 --> 00:27:52.880
software. Let's be serious here. One

00:27:51.038 --> 00:27:56.079
more kind of analogy that I always think

00:27:52.880 --> 00:27:58.159
through is the Iron Man suit. Uh I think

00:27:56.079 --> 00:28:01.359
this is I always love Iron Man. I think

00:27:58.159 --> 00:28:02.880
it's like so um correct in a bunch of

00:28:01.359 --> 00:28:04.398
ways with respect to technology and how

00:28:02.880 --> 00:28:05.919
it will play out. And what I love about

00:28:04.398 --> 00:28:08.719
the Iron Man suit is that it's both an

00:28:05.919 --> 00:28:10.320
augmentation and Tony Stark can drive it

00:28:08.720 --> 00:28:11.839
and it's also an agent. And in some of

00:28:10.319 --> 00:28:13.599
the movies, the Iron Man suit is quite

00:28:11.839 --> 00:28:15.278
autonomous and can fly around and find

00:28:13.599 --> 00:28:17.278
Tony and all this kind of stuff. And so

00:28:15.278 --> 00:28:19.038
this is the autonomy slider is we can be

00:28:17.278 --> 00:28:21.200
we can build augmentations or we can

00:28:19.038 --> 00:28:23.440
build agents and we kind of want to do a

00:28:21.200 --> 00:28:25.919
bit of both. But at this stage I would

00:28:23.440 --> 00:28:29.120
say working with fallible LLMs and so

00:28:25.919 --> 00:28:31.600
on. I would say you know it's less Iron

00:28:29.119 --> 00:28:33.678
Man robots and more Iron Man suits that

00:28:31.599 --> 00:28:35.119
you want to build. It's less like

00:28:33.679 --> 00:28:36.720
building flashy demos of autonomous

00:28:35.119 --> 00:28:39.678
agents and more building partial

00:28:36.720 --> 00:28:41.919
autonomy products. And these products

00:28:39.679 --> 00:28:43.840
have custom gueies and UIUX. And we're

00:28:41.919 --> 00:28:45.520
trying to um and this is done so that

00:28:43.839 --> 00:28:48.158
the generation verification loop of the

00:28:45.519 --> 00:28:49.519
human is very very fast. But we are not

00:28:48.159 --> 00:28:51.278
losing the sight of the fact that it is

00:28:49.519 --> 00:28:52.960
in principle possible to automate this

00:28:51.278 --> 00:28:54.558
work. And there should be an autonomy

00:28:52.960 --> 00:28:55.919
slider in your product. And you should

00:28:54.558 --> 00:28:58.558
be thinking about how you can slide that

00:28:55.919 --> 00:29:01.278
autonomy slider and make your product uh

00:28:58.558 --> 00:29:02.720
sort of um more autonomous over time.

00:29:01.278 --> 00:29:04.240
But this is kind of how I think there's

00:29:02.720 --> 00:29:06.558
lots of opportunities in these kinds of

00:29:04.240 --> 00:29:08.159
products. I want to now switch gears a

00:29:06.558 --> 00:29:09.839
little bit and talk about one other

00:29:08.159 --> 00:29:11.440
dimension that I think is very unique.

00:29:09.839 --> 00:29:12.959
Not only is there a new type of

00:29:11.440 --> 00:29:15.278
programming language that allows for

00:29:12.960 --> 00:29:16.640
autonomy in software but also as I

00:29:15.278 --> 00:29:19.038
mentioned it's programmed in English

00:29:16.640 --> 00:29:20.559
which is this natural interface and

00:29:19.038 --> 00:29:22.240
suddenly everyone is a programmer

00:29:20.558 --> 00:29:24.639
because everyone speaks natural language

00:29:22.240 --> 00:29:26.159
like English. So this is extremely

00:29:24.640 --> 00:29:28.000
bullish and very interesting to me and

00:29:26.159 --> 00:29:29.520
also completely unprecedented. I would

00:29:28.000 --> 00:29:31.440
say it it used to be the case that you

00:29:29.519 --> 00:29:32.879
need to spend five to 10 years studying

00:29:31.440 --> 00:29:35.200
something to be able to do something in

00:29:32.880 --> 00:29:37.120
software. this is not the case anymore.

00:29:35.200 --> 00:29:40.640
So, I don't know if by any chance anyone

00:29:37.119 --> 00:29:42.479
has heard of vibe coding.

00:29:40.640 --> 00:29:44.240
Uh, this this is the tweet that kind of

00:29:42.480 --> 00:29:46.720
like introduced this, but I'm told that

00:29:44.240 --> 00:29:49.599
this is now like a major meme. Um, fun

00:29:46.720 --> 00:29:51.200
story about this is that I've been on

00:29:49.599 --> 00:29:53.519
Twitter for like 15 years or something

00:29:51.200 --> 00:29:56.319
like that at this point and I still have

00:29:53.519 --> 00:29:58.000
no clue which tweet will become viral

00:29:56.319 --> 00:30:00.798
and which tweet like fizzles and no one

00:29:58.000 --> 00:30:01.839
cares. And I thought that this tweet was

00:30:00.798 --> 00:30:03.359
going to be the latter. I don't know. It

00:30:01.839 --> 00:30:05.278
was just like a shower of thoughts. But

00:30:03.359 --> 00:30:06.719
this became like a total meme and I

00:30:05.278 --> 00:30:08.480
really just can't tell. But I guess like

00:30:06.720 --> 00:30:10.558
it struck a chord and it gave a name to

00:30:08.480 --> 00:30:13.278
something that everyone was feeling but

00:30:10.558 --> 00:30:17.278
couldn't quite say in words. So now

00:30:13.278 --> 00:30:18.640
there's a Wikipedia page and everything.

00:30:17.278 --> 00:30:25.919
This is like

00:30:18.640 --> 00:30:27.600
[Applause]

00:30:25.919 --> 00:30:30.720
yeah this is like a major contribution

00:30:27.599 --> 00:30:32.959
now or something like that. So,

00:30:30.720 --> 00:30:34.960
um, so Tom Wolf from HuggingFace shared

00:30:32.960 --> 00:30:37.759
this beautiful video that I really love.

00:30:34.960 --> 00:30:41.720
Um,

00:30:37.759 --> 00:30:41.720
these are kids vibe coding.

00:30:42.640 --> 00:30:46.720
And I find that this is such a wholesome

00:30:44.398 --> 00:30:48.079
video. Like, I love this video. Like,

00:30:46.720 --> 00:30:49.839
how can you look at this video and feel

00:30:48.079 --> 00:30:52.558
bad about the future? The future is

00:30:49.839 --> 00:30:53.918
great.

00:30:52.558 --> 00:30:56.639
I think this will end up being like a

00:30:53.919 --> 00:30:59.200
gateway drug to software development.

00:30:56.640 --> 00:31:02.240
Um, I'm not a doomer about the future of

00:30:59.200 --> 00:31:04.798
the generation and I think yeah, I love

00:31:02.240 --> 00:31:07.120
this video. So, I tried by coding a

00:31:04.798 --> 00:31:09.359
little bit uh as well because it's so

00:31:07.119 --> 00:31:10.798
fun. Uh, so bike coding is so great when

00:31:09.359 --> 00:31:12.398
you want to build something super duper

00:31:10.798 --> 00:31:13.679
custom that doesn't appear to exist and

00:31:12.398 --> 00:31:15.519
you just want to wing it because it's a

00:31:13.679 --> 00:31:18.720
Saturday or something like that. So, I

00:31:15.519 --> 00:31:20.639
built this uh iOS app and I don't I

00:31:18.720 --> 00:31:21.759
can't actually program in Swift, but I

00:31:20.640 --> 00:31:23.360
was really shocked that I was able to

00:31:21.759 --> 00:31:24.720
build like a super basic app and I'm not

00:31:23.359 --> 00:31:27.359
going to explain it. It's really uh

00:31:24.720 --> 00:31:28.720
dumb, but uh I kind of like this was

00:31:27.359 --> 00:31:30.319
just like a day of work and this was

00:31:28.720 --> 00:31:32.319
running on my phone like later that day

00:31:30.319 --> 00:31:33.918
and I was like, "Wow, this is amazing."

00:31:32.319 --> 00:31:35.918
I didn't have to like read through Swift

00:31:33.919 --> 00:31:38.159
for like five days or something like

00:31:35.919 --> 00:31:40.480
that to like get started. I also

00:31:38.159 --> 00:31:41.760
vipcoded this app called Menu Genen. And

00:31:40.480 --> 00:31:44.079
this is live. You can try it in

00:31:41.759 --> 00:31:45.440
menu.app. And I basically had this

00:31:44.079 --> 00:31:46.639
problem where I show up at a restaurant,

00:31:45.440 --> 00:31:48.558
I read through the menu, and I have no

00:31:46.640 --> 00:31:51.600
idea what any of the things are. And I

00:31:48.558 --> 00:31:52.960
need pictures. So this doesn't exist. So

00:31:51.599 --> 00:31:55.918
I was like, "Hey, I'm going to bite code

00:31:52.960 --> 00:31:58.240
it." So, um, this is what it looks like.

00:31:55.919 --> 00:32:01.440
You go to menu.app,

00:31:58.240 --> 00:32:03.278
um, and, uh, you take a picture of a of

00:32:01.440 --> 00:32:06.240
a menu and then menu generates the

00:32:03.278 --> 00:32:08.000
images and everyone gets $5 in credits

00:32:06.240 --> 00:32:10.480
for free when you sign up. And

00:32:08.000 --> 00:32:13.759
therefore, this is a major cost center

00:32:10.480 --> 00:32:16.240
in my life. So, this is a negative

00:32:13.759 --> 00:32:17.839
negative uh, revenue app for me right

00:32:16.240 --> 00:32:19.200
now.

00:32:17.839 --> 00:32:21.278
I've lost a huge amount of money on

00:32:19.200 --> 00:32:23.360
menu.

00:32:21.278 --> 00:32:28.159
Okay. But the fascinating thing about

00:32:23.359 --> 00:32:30.240
menu genen for me is that the code of

00:32:28.159 --> 00:32:32.720
the v the vite coding part the code was

00:32:30.240 --> 00:32:35.120
actually the easy part of v of v coding

00:32:32.720 --> 00:32:36.480
menu and most of it actually was when I

00:32:35.119 --> 00:32:37.599
tried to make it real so that you can

00:32:36.480 --> 00:32:39.599
actually have authentication and

00:32:37.599 --> 00:32:41.918
payments and the domain name and averal

00:32:39.599 --> 00:32:44.158
deployment. This was really hard and all

00:32:41.919 --> 00:32:47.120
of this was not code. All of this devops

00:32:44.159 --> 00:32:49.840
stuff was in me in the browser clicking

00:32:47.119 --> 00:32:51.518
stuff and this was extreme slo and took

00:32:49.839 --> 00:32:54.639
another week. So it was really

00:32:51.519 --> 00:32:57.278
fascinating that I had the menu genen um

00:32:54.640 --> 00:32:59.278
basically demo working on my laptop in a

00:32:57.278 --> 00:33:01.200
few hours and then it took me a week

00:32:59.278 --> 00:33:02.880
because I was trying to make it real and

00:33:01.200 --> 00:33:05.600
the reason for this is this was just

00:33:02.880 --> 00:33:07.278
really annoying. Um, so for example, if

00:33:05.599 --> 00:33:09.199
you try to add Google login to your web

00:33:07.278 --> 00:33:11.679
page, I know this is very small, but

00:33:09.200 --> 00:33:13.600
just a huge amount of instructions of

00:33:11.679 --> 00:33:15.200
this clerk library telling me how to

00:33:13.599 --> 00:33:17.519
integrate this. And this is crazy. Like

00:33:15.200 --> 00:33:19.759
it's telling me go to this URL, click on

00:33:17.519 --> 00:33:21.200
this dropdown, choose this, go to this,

00:33:19.759 --> 00:33:22.640
and click on that. And it's like telling

00:33:21.200 --> 00:33:24.880
me what to do. Like a computer is

00:33:22.640 --> 00:33:26.640
telling me the actions I should be

00:33:24.880 --> 00:33:28.640
taking. Like you do it. Why am I doing

00:33:26.640 --> 00:33:31.759
this?

00:33:28.640 --> 00:33:33.840
What the hell?

00:33:31.759 --> 00:33:36.158
I had to follow all these instructions.

00:33:33.839 --> 00:33:39.519
This was crazy. So I think the last part

00:33:36.159 --> 00:33:41.679
of my talk therefore focuses on can we

00:33:39.519 --> 00:33:44.240
just build for agents? I don't want to

00:33:41.679 --> 00:33:46.320
do this work. Can agents do this? Thank

00:33:44.240 --> 00:33:48.640
you.

00:33:46.319 --> 00:33:50.879
Okay. So roughly speaking, I think

00:33:48.640 --> 00:33:53.120
there's a new category of consumer and

00:33:50.880 --> 00:33:55.440
manipulator of digital information. It

00:33:53.119 --> 00:33:57.518
used to be just humans through GUIs or

00:33:55.440 --> 00:34:00.240
computers through APIs. And now we have

00:33:57.519 --> 00:34:02.798
a completely new thing and agents are

00:34:00.240 --> 00:34:04.319
they're computers but they are humanlike

00:34:02.798 --> 00:34:05.599
kind of right they're people spirits

00:34:04.319 --> 00:34:06.720
there's people spirits on the internet

00:34:05.599 --> 00:34:08.319
and they need to interact with our

00:34:06.720 --> 00:34:10.639
software infrastructure like can we

00:34:08.320 --> 00:34:12.960
build for them it's a new thing so as an

00:34:10.639 --> 00:34:15.119
example you can have robots.txt on your

00:34:12.960 --> 00:34:18.320
domain and you can instruct uh or like

00:34:15.119 --> 00:34:19.838
advise I suppose um uh web crawlers on

00:34:18.320 --> 00:34:21.519
how to behave on your website in the

00:34:19.838 --> 00:34:23.358
same way you can have maybe lm.txt txt

00:34:21.519 --> 00:34:25.679
file which is just a simple markdown

00:34:23.358 --> 00:34:28.078
that's telling LLMs what this domain is

00:34:25.679 --> 00:34:30.559
about and this is very readable to a to

00:34:28.079 --> 00:34:32.480
an LLM. If it had to instead get the

00:34:30.559 --> 00:34:33.838
HTML of your web page and try to parse

00:34:32.480 --> 00:34:35.679
it, this is very errorprone and

00:34:33.838 --> 00:34:36.799
difficult and will screw it up and it's

00:34:35.679 --> 00:34:38.398
not going to work. So we can just

00:34:36.800 --> 00:34:41.280
directly speak to the LLM. It's worth

00:34:38.398 --> 00:34:42.719
it. Um a huge amount of documentation is

00:34:41.280 --> 00:34:45.599
currently written for people. So you

00:34:42.719 --> 00:34:47.759
will see things like lists and bold and

00:34:45.599 --> 00:34:51.200
pictures and this is not directly

00:34:47.760 --> 00:34:52.800
accessible by an LLM. So I see some of

00:34:51.199 --> 00:34:54.878
the services now are transitioning a lot

00:34:52.800 --> 00:34:57.039
of the their docs to be specifically for

00:34:54.878 --> 00:34:59.440
LLMs. So Versell and Stripe as an

00:34:57.039 --> 00:35:01.920
example are early movers here but there

00:34:59.440 --> 00:35:04.159
are a few more that I've seen already

00:35:01.920 --> 00:35:06.720
and they offer their documentation in

00:35:04.159 --> 00:35:10.078
markdown. Markdown is super easy for LMS

00:35:06.719 --> 00:35:12.319
to understand. This is great. Um maybe

00:35:10.079 --> 00:35:14.079
one simple example from from uh my

00:35:12.320 --> 00:35:15.599
experience as well. Maybe some of you

00:35:14.079 --> 00:35:19.360
know three blue one brown. He makes

00:35:15.599 --> 00:35:22.639
beautiful animation videos on YouTube.

00:35:19.360 --> 00:35:22.639
[Applause]

00:35:23.199 --> 00:35:27.439
Yeah, I love this library. So that he

00:35:25.039 --> 00:35:30.079
wrote uh Manon and I wanted to make my

00:35:27.440 --> 00:35:32.639
own and uh there's extensive

00:35:30.079 --> 00:35:34.000
documentations on how to use manon and

00:35:32.639 --> 00:35:35.358
so I didn't want to actually read

00:35:34.000 --> 00:35:37.440
through it. So I copy pasted the whole

00:35:35.358 --> 00:35:39.199
thing to an LLM and I described what I

00:35:37.440 --> 00:35:41.440
wanted and it just worked out of the box

00:35:39.199 --> 00:35:43.279
like LLM just bcoded me an animation

00:35:41.440 --> 00:35:45.838
exactly what I wanted and I was like wow

00:35:43.280 --> 00:35:48.160
this is amazing. So if we can make docs

00:35:45.838 --> 00:35:51.199
legible to LLMs, it's going to unlock a

00:35:48.159 --> 00:35:52.399
huge amount of um kind of use and um I

00:35:51.199 --> 00:35:55.118
think this is wonderful and should

00:35:52.400 --> 00:35:56.240
should happen more. The other thing I

00:35:55.119 --> 00:35:57.680
wanted to point out is that you do

00:35:56.239 --> 00:35:58.959
unfortunately have to it's not just

00:35:57.679 --> 00:36:00.639
about taking your docs and making them

00:35:58.960 --> 00:36:01.920
appear in markdown. That's the easy

00:36:00.639 --> 00:36:04.719
part. We actually have to change the

00:36:01.920 --> 00:36:06.800
docs because anytime your docs say click

00:36:04.719 --> 00:36:09.919
this is bad. An LLM will not be able to

00:36:06.800 --> 00:36:11.519
natively take this action right now. So,

00:36:09.920 --> 00:36:13.519
Verscell, for example, is replacing

00:36:11.519 --> 00:36:15.358
every occurrence of click with an

00:36:13.519 --> 00:36:18.239
equivalent curl command that your LM

00:36:15.358 --> 00:36:19.759
agent could take on your behalf. Um, and

00:36:18.239 --> 00:36:21.358
so I think this is very interesting. And

00:36:19.760 --> 00:36:23.040
then, of course, there's a model context

00:36:21.358 --> 00:36:24.880
protocol from Enthropic. And this is

00:36:23.039 --> 00:36:26.719
also another way, it's a protocol of

00:36:24.880 --> 00:36:28.160
speaking directly to agents as this new

00:36:26.719 --> 00:36:29.679
consumer and manipulator of digital

00:36:28.159 --> 00:36:31.519
information. So, I'm very bullish on

00:36:29.679 --> 00:36:33.519
these ideas. The other thing I really

00:36:31.519 --> 00:36:36.639
like is a number of little tools here

00:36:33.519 --> 00:36:38.719
and there that are helping ingest data

00:36:36.639 --> 00:36:40.159
that in like very LLM friendly formats.

00:36:38.719 --> 00:36:42.719
So for example, when I go to a GitHub

00:36:40.159 --> 00:36:44.319
repo like my nanoGPT repo, I can't feed

00:36:42.719 --> 00:36:46.719
this to an LLM and ask questions about

00:36:44.320 --> 00:36:48.880
it uh because it's you know this is a

00:36:46.719 --> 00:36:50.480
human interface on GitHub. So when you

00:36:48.880 --> 00:36:52.320
just change the URL from GitHub to get

00:36:50.480 --> 00:36:54.159
ingest then uh this will actually

00:36:52.320 --> 00:36:55.920
concatenate all the files into a single

00:36:54.159 --> 00:36:57.519
giant text and it will create a

00:36:55.920 --> 00:36:59.039
directory structure etc. And this is

00:36:57.519 --> 00:37:01.519
ready to be copy pasted into your

00:36:59.039 --> 00:37:03.440
favorite LLM and you can do stuff. Maybe

00:37:01.519 --> 00:37:05.440
even more dramatic example of this is

00:37:03.440 --> 00:37:08.639
deep wiki where it's not just the raw

00:37:05.440 --> 00:37:10.960
content of these files. uh this is from

00:37:08.639 --> 00:37:12.879
Devon but also like they have Devon

00:37:10.960 --> 00:37:14.639
basically do analysis of the GitHub repo

00:37:12.880 --> 00:37:18.000
and Devon basically builds up a whole

00:37:14.639 --> 00:37:19.838
docs uh pages just for your repo and you

00:37:18.000 --> 00:37:22.079
can imagine that this is even more

00:37:19.838 --> 00:37:23.440
helpful to copy paste into your LLM. So

00:37:22.079 --> 00:37:24.960
I love all the little tools that

00:37:23.440 --> 00:37:26.559
basically where you just change the URL

00:37:24.960 --> 00:37:29.519
and it makes something accessible to an

00:37:26.559 --> 00:37:30.719
LLM. So this is all well and great and u

00:37:29.519 --> 00:37:32.719
I think there should be a lot more of

00:37:30.719 --> 00:37:35.279
it. One more note I wanted to make is

00:37:32.719 --> 00:37:38.000
that it is absolutely possible that in

00:37:35.280 --> 00:37:39.599
the future LLMs will be able to this is

00:37:38.000 --> 00:37:40.800
not even future this is today they'll be

00:37:39.599 --> 00:37:42.640
able to go around and they'll be able to

00:37:40.800 --> 00:37:46.079
click stuff and so on but I still think

00:37:42.639 --> 00:37:48.559
it's very worth u basically meeting LLM

00:37:46.079 --> 00:37:49.920
halfway LLM's halfway and making it

00:37:48.559 --> 00:37:51.679
easier for them to access all this

00:37:49.920 --> 00:37:54.400
information uh because this is still

00:37:51.679 --> 00:37:56.639
fairly expensive I would say to use and

00:37:54.400 --> 00:37:58.240
uh a lot more difficult and so I do

00:37:56.639 --> 00:38:00.639
think that lots of software there will

00:37:58.239 --> 00:38:02.159
be a long tail where it won't like adapt

00:38:00.639 --> 00:38:04.480
apps because these are not like live

00:38:02.159 --> 00:38:06.239
player sort of repositories or digital

00:38:04.480 --> 00:38:08.400
infrastructure and we will need these

00:38:06.239 --> 00:38:09.679
tools. Uh but I think for everyone else

00:38:08.400 --> 00:38:11.760
I think it's very worth kind of like

00:38:09.679 --> 00:38:14.639
meeting in some middle point. So I'm

00:38:11.760 --> 00:38:17.119
bullish on both if that makes sense.

00:38:14.639 --> 00:38:18.639
So in summary, what an amazing time to

00:38:17.119 --> 00:38:20.720
get into the industry. We need to

00:38:18.639 --> 00:38:23.039
rewrite a ton of code. A ton of code

00:38:20.719 --> 00:38:25.598
will be written by professionals and by

00:38:23.039 --> 00:38:27.519
coders. These LLMs are kind of like

00:38:25.599 --> 00:38:28.800
utilities, kind of like fabs, but

00:38:27.519 --> 00:38:30.960
they're kind of especially like

00:38:28.800 --> 00:38:34.320
operating systems. But it's so early.

00:38:30.960 --> 00:38:36.079
It's like 1960s of operating systems and

00:38:34.320 --> 00:38:38.960
uh and I think a lot of the analogies

00:38:36.079 --> 00:38:41.599
cross over. Um and these LMS are kind of

00:38:38.960 --> 00:38:43.358
like these fallible uh you know people

00:38:41.599 --> 00:38:45.599
spirits that we have to learn to work

00:38:43.358 --> 00:38:47.679
with. And in order to do that properly,

00:38:45.599 --> 00:38:48.960
we need to adjust our infrastructure

00:38:47.679 --> 00:38:50.639
towards it. So when you're building

00:38:48.960 --> 00:38:52.800
these LLM apps, I describe some of the

00:38:50.639 --> 00:38:54.719
ways of working effectively with these

00:38:52.800 --> 00:38:57.039
LLMs and some of the tools that make

00:38:54.719 --> 00:38:59.039
that uh kind of possible and how you can

00:38:57.039 --> 00:39:00.800
spin this loop very very quickly and

00:38:59.039 --> 00:39:03.519
basically create partial tunneling

00:39:00.800 --> 00:39:04.880
products and then um yeah, a lot of code

00:39:03.519 --> 00:39:07.199
has to also be written for the agents

00:39:04.880 --> 00:39:09.519
more directly. But in any case, going

00:39:07.199 --> 00:39:10.879
back to the Iron Man suit analogy, I

00:39:09.519 --> 00:39:12.719
think what we'll see over the next

00:39:10.880 --> 00:39:15.920
decade roughly is we're going to take

00:39:12.719 --> 00:39:17.598
the slider from left to right. And I'm

00:39:15.920 --> 00:39:19.358
very interesting. It's going to be very

00:39:17.599 --> 00:39:21.519
interesting to see what that looks like.

00:39:19.358 --> 00:39:25.639
And I can't wait to build it with all of

00:39:21.519 --> 00:39:25.639
you. Thank you.
