Advertisement
39:31
Transcript
0:01
Please welcome former director of AI
0:04
Tesla Andre Carpathy.
0:07
[Music]
0:11
Hello.
0:14
[Music]
0:19
Wow, a lot of people here. Hello.
0:22
Um, okay. Yeah. So I'm excited to be
0:24
here today to talk to you about software
Advertisement
0:27
in the era of AI. And I'm told that many
0:30
of you are students like bachelors,
0:32
masters, PhD and so on. And you're about
0:34
to enter the industry. And I think it's
0:36
actually like an extremely unique and
0:37
very interesting time to enter the
0:38
industry right now. And I think
0:41
fundamentally the reason for that is
0:43
that um software is changing uh again.
0:47
And I say again because I actually gave
Advertisement
0:49
this talk already. Um but the problem is
0:52
that software keeps changing. So I
0:54
actually have a lot of material to
0:55
create new talks and I think it's
0:56
changing quite fundamentally. I think
0:58
roughly speaking software has not
1:00
changed much on such a fundamental level
1:02
for 70 years. And then it's changed I
1:04
think about twice quite rapidly in the
1:06
last few years. And so there's just a
1:08
huge amount of work to do a huge amount
1:09
of software to write and rewrite. So
1:12
let's take a look at maybe the realm of
1:14
software. So if we kind of think of this
1:16
as like the map of software this is a
1:17
really cool tool called map of GitHub.
1:20
Um this is kind of like all the software
1:21
that's written. Uh these are
1:23
instructions to the computer for
1:24
carrying out tasks in the digital space.
1:26
So if you zoom in here, these are all
1:28
different kinds of repositories and this
1:30
is all the code that has been written.
1:31
And a few years ago I kind of observed
1:33
that um software was kind of changing
1:35
and there was kind of like a new type of
1:37
software around and I called this
1:39
software 2.0 at the time and the idea
1:42
here was that software 1.0 is the code
1:44
you write for the computer. Software 2.0
1:46
know are basically neural networks and
1:48
in particular the weights of a neural
1:50
network and you're not writing this code
1:53
directly you are most you are more kind
1:55
of like tuning the data sets and then
1:56
you're running an optimizer to create to
1:58
create the parameters of this neural net
2:00
and I think like at the time neural nets
2:02
were kind of seen as like just a
2:03
different kind of classifier like a
2:04
decision tree or something like that and
2:06
so I think it was kind of like um I
2:09
think this framing was a lot more
2:10
appropriate and now actually what we
2:12
have is kind of like an equivalent of
2:13
GitHub in the realm of software 2.0 And
2:15
I think the hugging face is basically
2:18
equivalent of GitHub in software 2.0.
2:20
And there's also model atlas and you can
2:22
visualize all the code written there. In
2:24
case you're curious, by the way, the
2:25
giant circle, the point in the middle,
2:28
uh these are the parameters of flux, the
2:30
image generator. And so anytime someone
2:32
tunes a on top of a flux model, you
2:34
basically create a git commit uh in this
2:37
space and uh you create a different kind
2:39
of a image generator. So basically what
2:41
we have is software 1.0 is the computer
2:43
code that programs a computer. Software
2:45
2.0 are the weights which program neural
2:48
networks. Uh and here's an example of
2:50
Alexet image recognizer neural network.
2:53
Now so far all of the neural networks
2:55
that we've been familiar with until
2:56
recently where kind of like fixed
2:58
function computers image to categories
3:01
or something like that. And I think
3:03
what's changed and I think is a quite
3:05
fundamental change is that neural
3:06
networks became programmable with large
3:09
language models. And so I I see this as
3:12
quite new, unique. It's a new kind of a
3:14
computer and uh so in my mind it's uh
3:18
worth giving it a new designation of
3:19
software 3.0. And basically your prompts
3:22
are now programs that program the LLM.
3:25
And uh remarkably uh these uh prompts
3:28
are written in English. So it's kind of
3:30
a very interesting programming language.
3:33
Um so maybe uh to summarize the
3:36
difference if you're doing sentiment
3:37
classification for example you can
3:39
imagine writing some uh amount of Python
3:42
to to basically do sentiment
3:44
classification or you can train a neural
3:46
net or you can prompt a large language
3:47
model. Uh so here this is a few short
3:50
prompt and you can imagine changing it
3:51
and programming the computer in a
3:52
slightly different way. So basically we
3:54
have software 1.0 software 2.0 and I
3:57
think we're seeing maybe you've seen a
3:59
lot of GitHub code is not just like code
4:01
anymore. there's a bunch of like English
4:03
interspersed with code and so I think
4:05
kind of there's a growing category of
4:07
new kind of code. So not only is it a
4:09
new programming paradigm, it's also
4:10
remarkable to me that it's in our native
4:12
language of English. And so when this
4:14
blew my mind a few uh I guess years ago
4:17
now I tweeted this and um I think it
4:20
captured the attention of a lot of
4:21
people and this is my currently pinned
4:23
tweet uh is that remarkably we're now
4:25
programming computers in English. Now,
4:28
when I was at uh Tesla, um we were
4:31
working on the uh autopilot and uh we
4:34
were trying to get the car to drive and
4:37
I sort of showed this slide at the time
4:39
where you can imagine that the inputs to
4:41
the car are on the bottom and they're
4:43
going through a software stack to
4:44
produce the steering and acceleration
4:47
and I made the observation at the time
4:48
that there was a ton of C++ code around
4:51
in the autopilot which was the software
4:52
1.0 code and then there was some neural
4:54
nets in there doing image recognition
4:56
and uh I kind of observed that over time
4:58
as we made the autopilot better
5:00
basically the neural network grew in
5:02
capability and size and in addition to
5:05
that all the C++ code was being deleted
5:08
and kind of like was um and a lot of the
5:12
kind of capabilities and functionality
5:14
that was originally written in 1.0 was
5:16
migrated to 2.0. So as an example, a lot
5:19
of the stitching up of information
5:20
across images from the different cameras
5:22
and across time was done by a neural
5:24
network and we were able to delete a lot
5:26
of code and so the software 2.0 stack
5:29
quite literally ate through the software
5:32
stack of the autopilot. So I thought
5:34
this was really remarkable at the time
5:35
and I think we're seeing the same thing
5:37
again where uh basically we have a new
5:39
kind of software and it's eating through
5:40
the stack. We have three completely
5:42
different programming paradigms and I
5:44
think if you're entering the industry
5:45
it's a very good idea to be fluent in
5:47
all of them because they all have slight
5:49
pros and cons and you may want to
5:50
program some functionality in 1.0 or 2.0
5:53
or 3.0. Are you going to train
5:54
neurallet? Are you going to just prompt
5:55
an LLM? Should this be a piece of code
5:57
that's explicit etc. So we all have to
5:59
make these decisions and actually
6:00
potentially uh fluidly trans transition
6:03
between these paradigms. So what I
6:06
wanted to get into now is first I want
6:09
to in the first part talk about LLMs and
6:11
how to kind of like think of this new
6:13
paradigm and the ecosystem and what that
6:15
looks like. Uh like what are what is
6:17
this new computer? What does it look
6:18
like and what does the ecosystem look
6:20
like? Um I was struck by this quote from
6:23
Anduring actually uh many years ago now
6:25
I think and I think Andrew is going to
6:27
be speaking right after me. Uh but he
6:29
said at the time AI is the new
6:30
electricity and I do think that it um
6:33
kind of captures something very
6:34
interesting in that LLMs certainly feel
6:36
like they have properties of utilities
6:38
right now. So
6:41
um LLM labs like OpenAI, Gemini,
6:44
Enthropic etc. They spend capex to train
6:47
the LLMs and this is kind of equivalent
6:48
to building out a grid and then there's
6:51
opex to serve that intelligence over
6:53
APIs to all of us and this is done
6:56
through metered access where we pay per
6:58
million tokens or something like that
7:00
and we have a lot of demands that are
7:01
very utility- like demands out of this
7:03
API we demand low latency high uptime
7:06
consistent quality etc. In electricity,
7:08
you would have a transfer switch. So you
7:10
can transfer your electricity source
7:12
from like grid and solar or battery or
7:14
generator. In LLM, we have maybe open
7:16
router and easily switch between the
7:18
different types of LLMs that exist.
7:20
Because the LLM are software, they don't
7:23
compete for physical space. So it's okay
7:25
to have basically like six electricity
7:26
providers and you can switch between
7:28
them, right? Because they don't compete
7:29
in such a direct way. And I think what's
7:31
also a little fascinating and we saw
7:33
this in the last few days actually a lot
7:36
of the LLMs went down and people were
7:38
kind of like stuck and unable to work.
7:41
And uh I think it's kind of fascinating
7:42
to me that when the state-of-the-art
7:43
LLMs go down, it's actually kind of like
7:45
an intelligence brownout in the world.
7:47
It's kind of like when the voltage is
7:49
unreliable in the grid and uh the planet
7:52
just gets dumber the more reliance we
7:55
have on these models, which already is
7:56
like really dramatic and I think will
7:58
continue to grow. But LLM's don't only
8:00
have properties of utilities. I think
8:02
it's also fair to say that they have
8:03
some properties of fabs. And the reason
8:06
for this is that the capex required for
8:09
building LLM is actually quite large. Uh
8:12
it's not just like building some uh
8:14
power station or something like that,
8:15
right? You're investing a huge amount of
8:17
money and I think the tech tree and uh
8:20
for the technology is growing quite
8:22
rapidly. So we're in a world where we
8:24
have sort of deep tech trees, research
8:26
and development secrets that are
8:28
centralizing inside the LLM labs. Um and
8:32
but I think the analogy muddies a little
8:34
bit also because as I mentioned this is
8:36
software and software is a bit less
8:38
defensible because it is so malleable.
8:40
And so um I think it's just an
8:43
interesting kind of thing to think about
8:44
potentially. There's many analogy
8:46
analogies you can make like a 4
8:48
nanometer process node maybe is
8:49
something like a cluster with certain
8:51
max flops. You can think about when
8:53
you're use when you're using Nvidia GPUs
8:54
and you're only doing the software and
8:56
you're not doing the hardware. That's
8:57
kind of like the fabless model. But if
8:59
you're actually also building your own
9:00
hardware and you're training on TPUs if
9:02
you're Google, that's kind of like the
9:03
Intel model where you own your fab. So I
9:05
think there's some analogies here that
9:06
make sense. But actually I think the
9:08
analogy that makes the most sense
9:09
perhaps is that in my mind LLM have very
9:12
strong kind of analogies to operating
9:15
systems. Uh in that this is not just
9:17
electricity or water. It's not something
9:19
that comes out of the tap as a
9:20
commodity. uh this is these are now
9:22
increasingly complex software ecosystems
9:25
right so uh they're not just like simple
9:28
commodities like electricity and it's
9:30
kind of interesting to me that the
9:32
ecosystem is shaping in a very similar
9:33
kind of way where you have a few closed
9:36
source providers like Windows or Mac OS
9:38
and then you have an open source
9:39
alternative like Linux and I think for u
9:42
neural for LLMs as well we have a kind
9:45
of a few competing closed source
9:47
providers and then maybe the llama
9:49
ecosystem is currently like maybe a
9:51
close approximation to something that
9:53
may grow into something like Linux.
9:55
Again, I think it's still very early
9:56
because these are just simple LLMs, but
9:58
we're starting to see that these are
9:59
going to get a lot more complicated.
10:01
It's not just about the LLM itself. It's
10:02
about all the tool use and the
10:03
multiodalities and how all of that
10:05
works. And so when I sort of had this
10:07
realization a while back, I tried to
10:09
sketch it out and it kind of seemed to
10:11
me like LLMs are kind of like a new
10:12
operating system, right? So the LLM is a
10:15
new kind of a computer. It's sitting
10:17
it's kind of like the CPU equivalent. uh
10:19
the context windows are kind of like the
10:21
memory and then the LLM is orchestrating
10:24
memory and compute uh for problem
10:26
solving um using all of these uh
10:29
capabilities here and so definitely if
10:32
you look at it looks very much like
10:34
operating system from that perspective.
10:36
Um, a few more analogies. For example,
10:38
if you want to download an app, say I go
10:41
to VS Code and I go to download, you can
10:43
download VS Code and you can run it on
10:46
Windows, Linux or or Mac in the same way
10:50
as you can take an LLM app like cursor
10:53
and you can run it on GPT or cloud or
10:55
Gemini series, right? It's just a drop
10:57
down. So, it's kind of like similar in
10:59
that way as well.
11:00
uh more analogies that I think strike me
11:02
is that we're kind of like in this
11:04
1960sish
11:05
era where LLM compute is still very
11:09
expensive for this new kind of a
11:10
computer and that forces the LLMs to be
11:13
centralized in the cloud and we're all
11:15
just uh sort of thing clients that
11:18
interact with it over the network and
11:20
none of us have full utilization of
11:22
these computers and therefore it makes
11:24
sense to use time sharing where we're
11:26
all just you know a dimension of the
11:28
batch when they're running the computer
11:30
in the cloud. And this is very much what
11:32
computers used to look like at during
11:33
this time. The operating systems were in
11:35
the cloud. Everything was streamed
11:36
around and there was batching. And so
11:39
the p the personal computing revolution
11:41
hasn't happened yet because it's just
11:42
not economical. It doesn't make sense.
11:44
But I think some people are trying. And
11:46
it turns out that Mac minis, for
11:48
example, are a very good fit for some of
11:50
the LLMs because it's all if you're
11:52
doing batch one inference, this is all
11:53
super memory bound. So this actually
11:55
works.
11:56
And uh I think these are some early
11:58
indications maybe of personal computing.
12:00
Uh but this hasn't really happened yet.
12:02
It's not clear what this looks like.
12:03
Maybe some of you get to invent what
12:05
what this is or how it works or uh what
12:08
this should what this should be. Maybe
12:10
one more analogy that I'll mention is
12:12
whenever I talk to Chach or some LLM
12:14
directly in text, I feel like I'm
12:16
talking to an operating system through
12:18
the terminal. Like it's just it's it's
12:21
text. It's direct access to the
12:22
operating system. And I think a guey
12:24
hasn't yet really been invented in like
12:26
a general way like should chatt have a
12:29
guey like different than just a tech
12:31
bubbles. Uh certainly some of the apps
12:33
that we're going to go into in a bit
12:35
have guey but there's no like guey
12:38
across all the tasks if that makes
12:40
sense. Um there are some ways in which
12:43
LLMs are different from kind of
12:45
operating systems in some fairly unique
12:47
way and from early computing. And I
12:49
wrote about uh this one particular
12:52
property that strikes me as very
12:54
different uh this time around. It's that
12:57
LLMs like flip they flip the direction
12:59
of technology diffusion uh that is
13:02
usually uh present in technology. So for
13:05
example with electricity, cryptography,
13:07
computing, flight, internet, GPS, lots
13:09
of new transformative technologies that
13:10
have not been around. Typically it is
13:12
the government and corporations that are
13:14
the first users because it's new and
13:16
expensive etc. and it only later
13:18
diffuses to consumer. Uh, but I feel
13:20
like LLMs are kind of like flipped
13:22
around. So maybe with early computers,
13:24
it was all about ballistics and military
13:26
use, but with LLMs, it's all about how
13:29
do you boil an egg or something like
13:30
that. This is certainly like a lot of my
13:32
use. And so it's really fascinating to
13:33
me that we have a new magical computer
13:35
and it's like helping me boil an egg.
13:37
It's not helping the government do
13:38
something really crazy like some
13:40
military ballistics or some special
13:42
technology. Indeed, corporations are
13:43
governments are lagging behind the
13:45
adoption of all of us, of all of these
13:47
technologies. So, it's just backwards
13:48
and I think it informs maybe some of the
13:50
uses of how we want to use this
13:52
technology or like where are some of the
13:53
first apps and so on.
13:56
So, in summary so far, LLM labs LLMs. I
14:01
think it's accurate language to use, but
14:03
LLMs are complicated operating systems.
14:06
They're circa 1960s in computing and
14:08
we're redoing computing all over again.
14:10
and they're currently available via time
14:11
sharing and distributed like a utility.
14:13
What is new and unprecedented is that
14:16
they're not in the hands of a few
14:17
governments and corporations. They're in
14:18
the hands of all of us because we all
14:20
have a computer and it's all just
14:21
software and Chaship was beamed down to
14:24
our computers like billions of people
14:26
like instantly and overnight and this is
14:28
insane. Uh and it's kind of insane to me
14:30
that this is the case and now it is our
14:33
time to enter the industry and program
14:34
these computers. This is crazy. So I
14:37
think this is quite remarkable. Before
14:39
we program LLMs, we have to kind of like
14:42
spend some time to think about what
14:43
these things are. And I especially like
14:45
to kind of talk about their psychology.
14:48
So the way I like to think about LLMs is
14:50
that they're kind of like people
14:51
spirits. Um they are stoastic
14:54
simulations of people. Um and the
14:56
simulator in this case happens to be an
14:58
auto reggressive transformer. So
14:59
transformer is a neural net. Uh it's and
15:02
it just kind of like is goes on the
15:04
level of tokens. It goes chunk chunk
15:06
chunk chunk chunk. And there's an almost
15:08
equal amount of compute for every single
15:10
chunk. Um and um this simulator of
15:14
course is is just is basically there's
15:16
some weights involved and we fit it to
15:19
all of text that we have on the internet
15:20
and so on. And you end up with this kind
15:22
of a simulator and because it is trained
15:24
on humans, it's got this emergent
15:26
psychology that is humanlike. So the
15:28
first thing you'll notice is of course
15:30
uh LLM have encyclopedic knowledge and
15:32
memory. uh and they can remember lots of
15:34
things, a lot more than any single
15:36
individual human can because they read
15:37
so many things. It's it actually kind of
15:39
reminds me of this movie Rainman, which
15:41
I actually really recommend people
15:43
watch. It's an amazing movie. I love
15:44
this movie. Um and Dustin Hoffman here
15:46
is an autistic savant who has almost
15:49
perfect memory. So, he can read a he can
15:51
read like a phone book and remember all
15:53
of the names and phone numbers. And I
15:55
kind of feel like LM are kind of like
15:57
very similar. They can remember Shaw
15:58
hashes and lots of different kinds of
16:00
things very very easily. So they
16:02
certainly have superpowers in some set
16:04
in some respects. But they also have a
16:06
bunch of I would say cognitive deficits.
16:08
So they hallucinate quite a bit. Um and
16:11
they kind of make up stuff and don't
16:13
have a very good uh sort of internal
16:15
model of self-nowledge, not sufficient
16:17
at least. And this has gotten better but
16:19
not perfect. They display jagged
16:21
intelligence. So they're going to be
16:22
superhuman in some problems solving
16:24
domains. And then they're going to make
16:26
mistakes that basically no human will
16:27
make. like you know they will insist
16:29
that 9.11 is greater than 9.9 or that
16:32
there are two Rs in strawberry these are
16:34
some famous examples but basically there
16:36
are rough edges that you can trip on so
16:38
that's kind of I think also kind of
16:40
unique um they also kind of suffer from
16:43
entrograde amnesia um so uh and I think
16:46
I'm alluding to the fact that if you
16:48
have a co-orker who joins your
16:49
organization this co-orker will over
16:51
time learn your organization and uh they
16:54
will understand and gain like a huge
16:55
amount of context on the organization
16:57
and they go home and they sleep and they
16:59
consolidate knowledge and they develop
17:01
expertise over time. LLMs don't natively
17:03
do this and this is not something that
17:04
has really been solved in the R&D of
17:06
LLM. I think um and so context windows
17:09
are really kind of like working memory
17:10
and you have to sort of program the
17:12
working memory quite directly because
17:13
they don't just kind of like get smarter
17:15
by uh by default and I think a lot of
17:17
people get tripped up by the analogies
17:19
uh in this way. Uh in popular culture I
17:22
recommend people watch these two movies
17:23
uh Momento and 51st dates. In both of
17:26
these movies, the protagonists, their
17:27
weights are fixed and their context
17:29
windows gets wiped every single morning
17:32
and it's really problematic to go to
17:34
work or have relationships when this
17:35
happens and this happens to all the
17:37
time. I guess one more thing I would
17:39
point to is security kind of related
17:42
limitations of the use of LLM. So for
17:44
example, LLMs are quite gullible. Uh
17:46
they are susceptible to prompt injection
17:48
risks. They might leak your data etc.
17:50
And so um and there's many other
17:52
considerations uh security related. So,
17:55
so basically long story short, you have
17:57
to load your you have to load your you
18:00
have to simultaneously think through
18:01
this superhuman thing that has a bunch
18:03
of cognitive deficits and issues. How do
18:05
we and yet they are extremely like
18:07
useful and so how do we program them and
18:10
how do we work around their deficits and
18:12
enjoy their superhuman powers.
18:15
So what I want to switch to now is talk
18:17
about the opportunities of how do we use
18:18
these models and what are some of the
18:20
biggest opportunities. This is not a
18:22
comprehensive list just some of the
18:23
things that I thought were interesting
18:24
for this talk. The first thing I'm kind
18:26
of excited about is what I would call
18:29
partial autonomy apps. So for example,
18:32
let's work with the example of coding.
18:34
You can certainly go to chacht directly
18:36
and you can start copy pasting code
18:38
around and copyping bug reports and
18:40
stuff around and getting code and copy
18:42
pasting everything around. Why would you
18:44
why would you do that? Why would you go
18:45
directly to the operating system? It
18:47
makes a lot more sense to have an app
18:48
dedicated for this. And so I think many
18:50
of you uh use uh cursor. I do as well.
18:53
And uh cursor is kind of like the thing
18:56
you want instead. You don't want to just
18:57
directly go to the chash apt. And I
18:59
think cursor is a very good example of
19:01
an early LLM app that has a bunch of
19:03
properties that I think are um useful
19:06
across all the LLM apps. So in
19:08
particular, you will notice that we have
19:09
a traditional interface that allows a
19:12
human to go in and do all the work
19:13
manually just as before. But in addition
19:16
to that, we now have this LLM
19:17
integration that allows us to go in
19:19
bigger chunks. And so some of the
19:21
properties of LLM apps that I think are
19:23
shared and useful to point out. Number
19:25
one, the LLMs basically do a ton of the
19:28
context management. Um, number two, they
19:31
orchestrate multiple calls to LLMs,
19:33
right? So in the case of cursor, there's
19:34
under the hood embedding models for all
19:36
your files, the actual chat models,
19:39
models that apply diffs to the code, and
19:41
this is all orchestrated for you. A
19:43
really big one that uh I think also
19:46
maybe not fully appreciated always is
19:48
application specific uh GUI and the
19:50
importance of it. Um because you don't
19:53
just want to talk to the operating
19:54
system directly in text. Text is very
19:56
hard to read, interpret, understand and
19:59
also like you don't want to take some of
20:00
these actions natively in text. So it's
20:03
much better to just see a diff as like
20:05
red and green change and you can see
20:06
what's being added is subtracted. It's
20:08
much easier to just do command Y to
20:10
accept or command N to reject. I
20:11
shouldn't have to type it in text,
20:13
right? So, a guey allows a human to
20:15
audit the work of these fallible systems
20:17
and to go faster. I'm going to come back
20:20
to this point a little bit uh later as
20:21
well. And the last kind of feature I
20:23
want to point out is that there's what I
20:25
call the autonomy slider. So, for
20:27
example, in cursor, you can just do tap
20:29
completion. You're mostly in charge. You
20:31
can select a chunk of code and command K
20:33
to change just that chunk of code. You
20:36
can do command L to change the entire
20:37
file. Or you can do command I which just
20:40
you know let it rip do whatever you want
20:42
in the entire repo and that's the sort
20:44
of full autonomy agent agentic version
20:46
and so you are in charge of the autonomy
20:48
slider and depending on the complexity
20:50
of the task at hand you can uh tune the
20:53
amount of autonomy that you're willing
20:54
to give up uh for that task maybe to
20:57
show one more example of a fairly
20:58
successful LLM app uh perplexity um it
21:03
also has very similar features to what
21:04
I've just pointed out to in cursor uh it
21:07
packages up a lot of the information. It
21:08
orchestrates multiple LLMs. It's got a
21:10
GUI that allows you to audit some of its
21:13
work. So, for example, it will site
21:15
sources and you can imagine inspecting
21:17
them. And it's got an autonomy slider.
21:18
You can either just do a quick search or
21:20
you can do research or you can do deep
21:22
research and come back 10 minutes later.
21:24
So, this is all just varying levels of
21:25
autonomy that you give up to the tool.
21:27
So, I guess my question is I feel like a
21:30
lot of software will become partially
21:32
autonomous. I'm trying to think through
21:33
like what does that look like? And for
21:35
many of you who maintain products and
21:36
services, how are you going to make your
21:38
products and services partially
21:40
autonomous? Can an LLM see everything
21:42
that a human can see? Can an LLM act in
21:45
all the ways that a human could act? And
21:47
can humans supervise and stay in the
21:49
loop of this activity? Because again,
21:50
these are fallible systems that aren't
21:52
yet perfect. And what does a diff look
21:54
like in Photoshop or something like
21:56
that? You know, and also a lot of the
21:58
traditional software right now, it has
22:00
all these switches and all this kind of
22:01
stuff that's all designed for human. All
22:03
of this has to change and become
22:04
accessible to LLMs.
22:07
So, one thing I want to stress with a
22:09
lot of these LLM apps that I'm not sure
22:11
gets as much attention as it should is
22:14
um we we're now kind of like cooperating
22:16
with AIS and usually they are doing the
22:18
generation and we as humans are doing
22:20
the verification. It is in our interest
22:22
to make this loop go as fast as
22:24
possible. So, we're getting a lot of
22:25
work done. There are two major ways that
22:28
I think uh this can be done. Number one,
22:30
you can speed up verification a lot. Um,
22:32
and I think guies, for example, are
22:34
extremely important to this because a
22:36
guey utilizes your computer vision GPU
22:39
in all of our head. Reading text is
22:41
effortful and it's not fun, but looking
22:43
at stuff is fun and it's it's just a
22:45
kind of like a highway to your brain.
22:47
So, I think guies are very useful for
22:49
auditing systems and visual
22:51
representations in general. And number
22:53
two, I would say is we have to keep the
22:56
AI on the leash. We I think a lot of
22:58
people are getting way over excited with
23:00
AI agents and uh it's not useful to me
23:03
to get a diff of 10,000 lines of code to
23:05
my repo. Like I have to I'm still the
23:07
bottleneck, right? Even though that
23:09
10,00 lines come out instantly, I have
23:11
to make sure that this thing is not
23:12
introducing bugs. It's just like and
23:15
that it's doing the correct thing,
23:16
right? And that there's no security
23:17
issues and so on. So um I think that um
23:22
yeah basically you we have to sort of
23:25
like it's in our interest to make the
23:28
the flow of these two go very very fast
23:30
and we have to somehow keep the AI on
23:32
the leash because it gets way too
23:33
overreactive. It's uh it's kind of like
23:35
this. This is how I feel when I do AI
23:37
assisted coding. If I'm just bite coding
23:39
everything is nice and great but if I'm
23:40
actually trying to get work done it's
23:42
not so great to have an overreactive uh
23:44
agent doing all this kind of stuff. So
23:47
this slide is not very good. I'm sorry,
23:48
but I guess I'm trying to develop like
23:51
many of you some ways of utilizing these
23:53
agents in my coding workflow and to do
23:55
AI assisted coding. And in my own work,
23:58
I'm always scared to get way too big
23:59
diffs. I always go in small incremental
24:02
chunks. I want to make sure that
24:04
everything is good. I want to spin this
24:06
loop very very fast and um I sort of
24:09
work on small chunks of single concrete
24:10
thing. Uh and so I think many of you
24:13
probably are developing similar ways of
24:14
working with the with LLMs.
24:17
Um, I also saw a number of blog posts
24:19
that try to develop these best practices
24:22
for working with LLMs. And here's one
24:24
that I read recently and I thought was
24:25
quite good. And it kind of discussed
24:26
some techniques and some of them have to
24:28
do with how you keep the AI on the
24:29
leash. And so, as an example, if you are
24:32
prompting, if your prompt is vague, then
24:34
uh the AI might not do exactly what you
24:36
wanted and in that case, verification
24:38
will fail. You're going to ask for
24:40
something else. If a verification fails,
24:42
then you're going to start spinning. So
24:43
it makes a lot more sense to spend a bit
24:45
more time to be more concrete in your
24:46
prompts which increases the probability
24:48
of successful verification and you can
24:50
move forward. And so I think a lot of us
24:52
are going to end up finding um kind of
24:54
techniques like this. I think in my own
24:56
work as well I'm currently interested in
24:57
uh what education looks like in um
25:00
together with kind of like now that we
25:01
have AI uh and LLMs what does education
25:04
look like? And I think a a large amount
25:07
of thought for me goes into how we keep
25:09
AI on the leash. I don't think it just
25:11
works to go to chat and be like, "Hey,
25:13
teach me physics." I don't think this
25:14
works because the AI is like gets lost
25:16
in the woods. And so for me, this is
25:18
actually two separate apps. For example,
25:20
there's an app for a teacher that
25:22
creates courses and then there's an app
25:24
that takes courses and serves them to
25:26
students. And in both cases, we now have
25:29
this intermediate artifact of a course
25:31
that is auditable and we can make sure
25:32
it's good. We can make sure it's
25:33
consistent. and the AI is kept on the
25:35
leash with respect to a certain
25:37
syllabus, a certain like um progression
25:40
of projects and so on. And so this is
25:42
one way of keeping the AI on leash and I
25:44
think has a much higher likelihood of
25:45
working and the AI is not getting lost
25:47
in the woods.
25:49
One more kind of analogy I wanted to
25:51
sort of allude to is I'm not I'm no
25:54
stranger to partial autonomy and I kind
25:56
of worked on this I think for five years
25:57
at Tesla and this is also a partial
26:00
autonomy product and shares a lot of the
26:01
features like for example right there in
26:03
the instrument panel is the GUI of the
26:05
autopilot so it's showing me what the
26:07
what the neural network sees and so on
26:09
and we have the autonomy slider where
26:10
over the course of my tenure there we
26:13
did more and more autonomous tasks for
26:15
the user and maybe the story that I
26:18
wanted to tell very briefly is uh
26:21
actually the first time I drove a
26:22
self-driving vehicle was in 2013 and I
26:25
had a friend who worked at Whimo and uh
26:27
he offered to give me a drive around
26:29
Palo Alto. I took this picture using
26:31
Google Glass at the time and many of you
26:33
are so young that you might not even
26:35
know what that is. Uh but uh yeah, this
26:37
was like all the rage at the time. And
26:39
we got into this car and we went for
26:40
about a 30-minute drive around Palo Alto
26:42
highways uh streets and so on. And this
26:45
drive was perfect. There was zero
26:46
interventions and this was 2013 which is
26:49
now 12 years ago. And it kind of struck
26:52
me because at the time when I had this
26:54
perfect drive, this perfect demo, I felt
26:56
like, wow, self-driving is imminent
26:59
because this just worked. This is
27:00
incredible. Um, but here we are 12 years
27:03
later and we are still working on
27:04
autonomy. Um, we are still working on
27:07
driving agents and even now we haven't
27:09
actually like really solved the problem.
27:10
like you may see Whimos going around and
27:12
they look driverless but you know
27:14
there's still a lot of teleoperation and
27:16
a lot of human in the loop of a lot of
27:18
this driving so we still haven't even
27:20
like declared success but I think it's
27:22
definitely like going to succeed at this
27:24
point but it just took a long time and
27:26
so I think like like this is software is
27:29
really tricky I think in the same way
27:31
that driving is tricky and so when I see
27:34
things like oh 2025 is the year of
27:36
agents I get very concerned and I kind
27:38
of feel like you know this is the decade
27:41
of agents and this is going to be quite
27:44
some time. We need humans in the loop.
27:45
We need to do this carefully. This is
27:47
software. Let's be serious here. One
27:51
more kind of analogy that I always think
27:52
through is the Iron Man suit. Uh I think
27:56
this is I always love Iron Man. I think
27:58
it's like so um correct in a bunch of
28:01
ways with respect to technology and how
28:02
it will play out. And what I love about
28:04
the Iron Man suit is that it's both an
28:05
augmentation and Tony Stark can drive it
28:08
and it's also an agent. And in some of
28:10
the movies, the Iron Man suit is quite
28:11
autonomous and can fly around and find
28:13
Tony and all this kind of stuff. And so
28:15
this is the autonomy slider is we can be
28:17
we can build augmentations or we can
28:19
build agents and we kind of want to do a
28:21
bit of both. But at this stage I would
28:23
say working with fallible LLMs and so
28:25
on. I would say you know it's less Iron
28:29
Man robots and more Iron Man suits that
28:31
you want to build. It's less like
28:33
building flashy demos of autonomous
28:35
agents and more building partial
28:36
autonomy products. And these products
28:39
have custom gueies and UIUX. And we're
28:41
trying to um and this is done so that
28:43
the generation verification loop of the
28:45
human is very very fast. But we are not
28:48
losing the sight of the fact that it is
28:49
in principle possible to automate this
28:51
work. And there should be an autonomy
28:52
slider in your product. And you should
28:54
be thinking about how you can slide that
28:55
autonomy slider and make your product uh
28:58
sort of um more autonomous over time.
29:01
But this is kind of how I think there's
29:02
lots of opportunities in these kinds of
29:04
products. I want to now switch gears a
29:06
little bit and talk about one other
29:08
dimension that I think is very unique.
29:09
Not only is there a new type of
29:11
programming language that allows for
29:12
autonomy in software but also as I
29:15
mentioned it's programmed in English
29:16
which is this natural interface and
29:19
suddenly everyone is a programmer
29:20
because everyone speaks natural language
29:22
like English. So this is extremely
29:24
bullish and very interesting to me and
29:26
also completely unprecedented. I would
29:28
say it it used to be the case that you
29:29
need to spend five to 10 years studying
29:31
something to be able to do something in
29:32
software. this is not the case anymore.
29:35
So, I don't know if by any chance anyone
29:37
has heard of vibe coding.
29:40
Uh, this this is the tweet that kind of
29:42
like introduced this, but I'm told that
29:44
this is now like a major meme. Um, fun
29:46
story about this is that I've been on
29:49
Twitter for like 15 years or something
29:51
like that at this point and I still have
29:53
no clue which tweet will become viral
29:56
and which tweet like fizzles and no one
29:58
cares. And I thought that this tweet was
30:00
going to be the latter. I don't know. It
30:01
was just like a shower of thoughts. But
30:03
this became like a total meme and I
30:05
really just can't tell. But I guess like
30:06
it struck a chord and it gave a name to
30:08
something that everyone was feeling but
30:10
couldn't quite say in words. So now
30:13
there's a Wikipedia page and everything.
30:17
This is like
30:18
[Applause]
30:25
yeah this is like a major contribution
30:27
now or something like that. So,
30:30
um, so Tom Wolf from HuggingFace shared
30:32
this beautiful video that I really love.
30:34
Um,
30:37
these are kids vibe coding.
30:42
And I find that this is such a wholesome
30:44
video. Like, I love this video. Like,
30:46
how can you look at this video and feel
30:48
bad about the future? The future is
30:49
great.
30:52
I think this will end up being like a
30:53
gateway drug to software development.
30:56
Um, I'm not a doomer about the future of
30:59
the generation and I think yeah, I love
31:02
this video. So, I tried by coding a
31:04
little bit uh as well because it's so
31:07
fun. Uh, so bike coding is so great when
31:09
you want to build something super duper
31:10
custom that doesn't appear to exist and
31:12
you just want to wing it because it's a
31:13
Saturday or something like that. So, I
31:15
built this uh iOS app and I don't I
31:18
can't actually program in Swift, but I
31:20
was really shocked that I was able to
31:21
build like a super basic app and I'm not
31:23
going to explain it. It's really uh
31:24
dumb, but uh I kind of like this was
31:27
just like a day of work and this was
31:28
running on my phone like later that day
31:30
and I was like, "Wow, this is amazing."
31:32
I didn't have to like read through Swift
31:33
for like five days or something like
31:35
that to like get started. I also
31:38
vipcoded this app called Menu Genen. And
31:40
this is live. You can try it in
31:41
menu.app. And I basically had this
31:44
problem where I show up at a restaurant,
31:45
I read through the menu, and I have no
31:46
idea what any of the things are. And I
31:48
need pictures. So this doesn't exist. So
31:51
I was like, "Hey, I'm going to bite code
31:52
it." So, um, this is what it looks like.
31:55
You go to menu.app,
31:58
um, and, uh, you take a picture of a of
32:01
a menu and then menu generates the
32:03
images and everyone gets $5 in credits
32:06
for free when you sign up. And
32:08
therefore, this is a major cost center
32:10
in my life. So, this is a negative
32:13
negative uh, revenue app for me right
32:16
now.
32:17
I've lost a huge amount of money on
32:19
menu.
32:21
Okay. But the fascinating thing about
32:23
menu genen for me is that the code of
32:28
the v the vite coding part the code was
32:30
actually the easy part of v of v coding
32:32
menu and most of it actually was when I
32:35
tried to make it real so that you can
32:36
actually have authentication and
32:37
payments and the domain name and averal
32:39
deployment. This was really hard and all
32:41
of this was not code. All of this devops
32:44
stuff was in me in the browser clicking
32:47
stuff and this was extreme slo and took
32:49
another week. So it was really
32:51
fascinating that I had the menu genen um
32:54
basically demo working on my laptop in a
32:57
few hours and then it took me a week
32:59
because I was trying to make it real and
33:01
the reason for this is this was just
33:02
really annoying. Um, so for example, if
33:05
you try to add Google login to your web
33:07
page, I know this is very small, but
33:09
just a huge amount of instructions of
33:11
this clerk library telling me how to
33:13
integrate this. And this is crazy. Like
33:15
it's telling me go to this URL, click on
33:17
this dropdown, choose this, go to this,
33:19
and click on that. And it's like telling
33:21
me what to do. Like a computer is
33:22
telling me the actions I should be
33:24
taking. Like you do it. Why am I doing
33:26
this?
33:28
What the hell?
33:31
I had to follow all these instructions.
33:33
This was crazy. So I think the last part
33:36
of my talk therefore focuses on can we
33:39
just build for agents? I don't want to
33:41
do this work. Can agents do this? Thank
33:44
you.
33:46
Okay. So roughly speaking, I think
33:48
there's a new category of consumer and
33:50
manipulator of digital information. It
33:53
used to be just humans through GUIs or
33:55
computers through APIs. And now we have
33:57
a completely new thing and agents are
34:00
they're computers but they are humanlike
34:02
kind of right they're people spirits
34:04
there's people spirits on the internet
34:05
and they need to interact with our
34:06
software infrastructure like can we
34:08
build for them it's a new thing so as an
34:10
example you can have robots.txt on your
34:12
domain and you can instruct uh or like
34:15
advise I suppose um uh web crawlers on
34:18
how to behave on your website in the
34:19
same way you can have maybe lm.txt txt
34:21
file which is just a simple markdown
34:23
that's telling LLMs what this domain is
34:25
about and this is very readable to a to
34:28
an LLM. If it had to instead get the
34:30
HTML of your web page and try to parse
34:32
it, this is very errorprone and
34:33
difficult and will screw it up and it's
34:35
not going to work. So we can just
34:36
directly speak to the LLM. It's worth
34:38
it. Um a huge amount of documentation is
34:41
currently written for people. So you
34:42
will see things like lists and bold and
34:45
pictures and this is not directly
34:47
accessible by an LLM. So I see some of
34:51
the services now are transitioning a lot
34:52
of the their docs to be specifically for
34:54
LLMs. So Versell and Stripe as an
34:57
example are early movers here but there
34:59
are a few more that I've seen already
35:01
and they offer their documentation in
35:04
markdown. Markdown is super easy for LMS
35:06
to understand. This is great. Um maybe
35:10
one simple example from from uh my
35:12
experience as well. Maybe some of you
35:14
know three blue one brown. He makes
35:15
beautiful animation videos on YouTube.
35:19
[Applause]
35:23
Yeah, I love this library. So that he
35:25
wrote uh Manon and I wanted to make my
35:27
own and uh there's extensive
35:30
documentations on how to use manon and
35:32
so I didn't want to actually read
35:34
through it. So I copy pasted the whole
35:35
thing to an LLM and I described what I
35:37
wanted and it just worked out of the box
35:39
like LLM just bcoded me an animation
35:41
exactly what I wanted and I was like wow
35:43
this is amazing. So if we can make docs
35:45
legible to LLMs, it's going to unlock a
35:48
huge amount of um kind of use and um I
35:51
think this is wonderful and should
35:52
should happen more. The other thing I
35:55
wanted to point out is that you do
35:56
unfortunately have to it's not just
35:57
about taking your docs and making them
35:58
appear in markdown. That's the easy
36:00
part. We actually have to change the
36:01
docs because anytime your docs say click
36:04
this is bad. An LLM will not be able to
36:06
natively take this action right now. So,
36:09
Verscell, for example, is replacing
36:11
every occurrence of click with an
36:13
equivalent curl command that your LM
36:15
agent could take on your behalf. Um, and
36:18
so I think this is very interesting. And
36:19
then, of course, there's a model context
36:21
protocol from Enthropic. And this is
36:23
also another way, it's a protocol of
36:24
speaking directly to agents as this new
36:26
consumer and manipulator of digital
36:28
information. So, I'm very bullish on
36:29
these ideas. The other thing I really
36:31
like is a number of little tools here
36:33
and there that are helping ingest data
36:36
that in like very LLM friendly formats.
36:38
So for example, when I go to a GitHub
36:40
repo like my nanoGPT repo, I can't feed
36:42
this to an LLM and ask questions about
36:44
it uh because it's you know this is a
36:46
human interface on GitHub. So when you
36:48
just change the URL from GitHub to get
36:50
ingest then uh this will actually
36:52
concatenate all the files into a single
36:54
giant text and it will create a
36:55
directory structure etc. And this is
36:57
ready to be copy pasted into your
36:59
favorite LLM and you can do stuff. Maybe
37:01
even more dramatic example of this is
37:03
deep wiki where it's not just the raw
37:05
content of these files. uh this is from
37:08
Devon but also like they have Devon
37:10
basically do analysis of the GitHub repo
37:12
and Devon basically builds up a whole
37:14
docs uh pages just for your repo and you
37:18
can imagine that this is even more
37:19
helpful to copy paste into your LLM. So
37:22
I love all the little tools that
37:23
basically where you just change the URL
37:24
and it makes something accessible to an
37:26
LLM. So this is all well and great and u
37:29
I think there should be a lot more of
37:30
it. One more note I wanted to make is
37:32
that it is absolutely possible that in
37:35
the future LLMs will be able to this is
37:38
not even future this is today they'll be
37:39
able to go around and they'll be able to
37:40
click stuff and so on but I still think
37:42
it's very worth u basically meeting LLM
37:46
halfway LLM's halfway and making it
37:48
easier for them to access all this
37:49
information uh because this is still
37:51
fairly expensive I would say to use and
37:54
uh a lot more difficult and so I do
37:56
think that lots of software there will
37:58
be a long tail where it won't like adapt
38:00
apps because these are not like live
38:02
player sort of repositories or digital
38:04
infrastructure and we will need these
38:06
tools. Uh but I think for everyone else
38:08
I think it's very worth kind of like
38:09
meeting in some middle point. So I'm
38:11
bullish on both if that makes sense.
38:14
So in summary, what an amazing time to
38:17
get into the industry. We need to
38:18
rewrite a ton of code. A ton of code
38:20
will be written by professionals and by
38:23
coders. These LLMs are kind of like
38:25
utilities, kind of like fabs, but
38:27
they're kind of especially like
38:28
operating systems. But it's so early.
38:30
It's like 1960s of operating systems and
38:34
uh and I think a lot of the analogies
38:36
cross over. Um and these LMS are kind of
38:38
like these fallible uh you know people
38:41
spirits that we have to learn to work
38:43
with. And in order to do that properly,
38:45
we need to adjust our infrastructure
38:47
towards it. So when you're building
38:48
these LLM apps, I describe some of the
38:50
ways of working effectively with these
38:52
LLMs and some of the tools that make
38:54
that uh kind of possible and how you can
38:57
spin this loop very very quickly and
38:59
basically create partial tunneling
39:00
products and then um yeah, a lot of code
39:03
has to also be written for the agents
39:04
more directly. But in any case, going
39:07
back to the Iron Man suit analogy, I
39:09
think what we'll see over the next
39:10
decade roughly is we're going to take
39:12
the slider from left to right. And I'm
39:15
very interesting. It's going to be very
39:17
interesting to see what that looks like.
39:19
And I can't wait to build it with all of
39:21
you. Thank you.
— end of transcript —
Advertisement