Advertisement
29:49
Andrej Karpathy: From Vibe Coding to Agentic Engineering
Sequoia Capital
·
May 10, 2026
Open on YouTube
Transcript
0:02
We're so excited for our very first
0:03
special guest. He has helped build
0:06
modern AI, then explain modern AI, and
0:10
then occasionally rename modern AI. He
0:14
actually helped co-ound open AAI right
0:16
inside of this office. Was the one who
0:18
actually got Autopilot working at Tesla
0:21
back in the day, and he has a rare gift
Advertisement
0:23
of making the most complex technical
0:26
shifts feel both accessible and
0:28
inevitable.
0:30
You all know him for having coined the
0:31
term vibe coding last year, but just in
0:35
the last few months, he said something
0:36
even more startling. That he's never
0:38
felt more behind as a programmer. That's
0:41
where we're starting today. Thank you,
0:43
Andre, for joining us.
Advertisement
0:44
>> Yeah. Hello. Excited to be here and to
0:46
kick us off.
0:47
>> Okay. So, just a couple months ago, you
0:49
said that you've never felt more behind
0:51
as a programmer. That's startling to
0:53
hear from you of all people. Um, can you
0:55
help us unpack that? Was that feeling
0:57
exhilarating or unsettling?
1:00
>> Uh yeah, a mixture of both for sure. Uh
1:02
well, first of all, um
1:05
I guess like as many of you, I've been
1:06
using agentic tools like lot code,
1:08
adjacent things, uh for a while, maybe
1:10
over the last year as it came out and it
1:12
was very good at you know chunks of code
1:13
and sometimes it would mess up and you
1:15
have to edit them and it was kind of
1:16
helpful and then I would say December
1:18
was this uh clear point where for me I
1:21
was on a break so I had a bit more time.
1:22
I think many other people were similar
1:24
and uh I just started to notice that
1:26
with the latest models uh the chunks
1:28
just came out fine and then I kept
1:30
asking for more and it just came out
1:31
fine and then I can't remember the last
1:32
time I corrected it and then I was I
1:34
just you know trusted the system more
1:36
and more and then I was vibe coding
1:38
[laughter]
1:39
and uh so it was kind of a I do think
1:42
that it was a very stark transition. I
1:43
think that a lot of people actually I
1:45
tried to I tried to stress this on uh
1:47
Twitter and or X because I think a lot
1:49
of people experienced AI last year as
1:52
ChachiPT adjacent thing. Uh but you
1:54
really had to look again and you had to
1:55
look as of December uh because things
1:58
have changed fundamentally and uh
1:59
especially on this like agentic coherent
2:01
workflow uh that really started to
2:04
actually work. Um, and so I would say
2:07
that um, yeah, it was just that
2:09
realization that really uh, uh, had me
2:12
um, go down their whole rabbit hole of
2:14
just, you know, infinity side projects.
2:16
Uh, my side projects folder is like
2:18
extremely full with lots of random
2:19
things and, uh, just, uh, V coding all
2:21
the time. Uh, so, uh, yeah, that kind of
2:23
happened in December, I would say, and I
2:25
was looking at the repercussions of that
2:26
since.
2:28
>> Um, you've talked a lot about this idea
2:30
of LLMs as a new computer. um that it
2:33
isn't just better software, it's a whole
2:35
new computing paradigm. And um software
2:38
1.0 was explicit rules, software 2.0 was
2:41
learned weights, software 3.0 is this.
2:43
Um if that's actually true, what does a
2:46
team build differently the day they
2:48
actually believe this,
2:50
>> right? So uh yeah, exactly. So software
2:53
1.0, I'm writing code, software 2.0, I'm
2:56
actually programming by creating data
2:57
sets and training uh training neural
2:59
networks. So the programming is kind of
3:01
like arranging data sets and maybe some
3:02
objectives and neural network
3:03
architectures. And then what happened is
3:05
that basically if you train one of these
3:07
GPT models or LLMs on a sufficiently
3:09
large set of tasks implicit basically um
3:12
implicitly because by training on the
3:14
internet you have to multitask all the
3:15
things that are in the data set. Uh
3:17
these actually become kind of like a
3:18
programmable computer in a certain
3:20
sense. So software 3.0 know is kind of
3:21
about uh you know your programming now
3:24
turns to prompting and what's in the
3:25
context window is your lever over the
3:28
interpreter that is the LLM that is kind
3:30
of like interpreting your context and uh
3:32
performing computation in the dig
3:34
digital information space. So I guess um
3:37
yeah that's kind of the transition and I
3:39
think there's a few examples of that
3:41
really drove it home for me and maybe
3:42
that might be instructive. Uh so for
3:44
example when you when openclaw came out
3:48
when you want to install openclaw you
3:49
would expect that normally this is a
3:50
bash bash script like a shell script. So
3:52
run the shell script to run to install
3:54
open claw. Um but the thing is that in
3:57
order to target lots of different
3:58
platforms and lots of different types of
4:00
computers you might run an open claw.
4:01
This these shell scripts usually balloon
4:03
up and become extremely complex. But the
4:05
thing is you're still stuck in a
4:06
software 1.0 universe of wanting to
4:07
write the code. And actually the open
4:09
claw installation is a is a copy paste
4:12
of a b bunch of text that you're
4:13
supposed to give to your agent. Uh so
4:15
basically it's it's a little skill of uh
4:18
you know copy paste this and give it to
4:19
your agent and it will install open
4:20
claw. And the reason this is a lot more
4:22
powerful is you're working now in the
4:23
software 3.0 paradigm where you don't
4:25
have to precisely spell out you know all
4:27
the individual details of that setup.
4:29
The agent has its own intelligence that
4:30
it packages up and then it kind of like
4:32
follows the instructions and it looks at
4:34
your environment, your computer and it
4:36
kind of like performs intelligent
4:37
actions to make things work and it
4:38
debugs things in the loop and it's just
4:40
like so much more powerful, right? So I
4:42
think that's a very different kind of
4:44
like way of thinking about it is just
4:46
like what is the piece of text to copy
4:47
paste to your agent? That's the
4:48
programming paradigm. Now I think one
4:50
more maybe uh example that comes to mind
4:52
that is even more extreme than that is
4:54
when I was building um menu genen. So,
4:56
menu genen is this idea where you um you
5:00
come to a restaurant, they give you a
5:01
menu. There's no pictures usually. So, I
5:03
don't know what any of these things are
5:05
uh usually like 30% of the things I have
5:07
no idea what they are, 50%. So, I wanted
5:09
to take a photo of the restaurant menu
5:12
and to get pictures of what those things
5:13
might look like in a generic sense. And
5:16
so I built I've vcoded this app that
5:18
basically lets you upload a photo and it
5:20
does all this stuff and it runs on
5:21
Verscell and uh it basically rerenders
5:24
the menu and it gives you like all the
5:26
items and it gives you a picture that it
5:28
uses an image um you know generator uh
5:31
for to basically OCR all the different
5:33
titles uh use the image generator to get
5:35
pictures of them and then shows it to
5:37
you. And then I saw the software 3.0
5:39
version of this which is which blew my
5:41
mind which is literally just take your
5:43
photo give it to Gemini and say use
5:46
Nanobanana to overlay the the things
5:48
onto the menu. Uh and Nanabanana
5:51
basically returned an image that is
5:52
exactly the picture of the menu that I
5:54
took but it actually put into the pixels
5:56
it rendered the different things in the
5:58
menu and this blew my mind because
6:02
actually all of my menu gen is spirious.
6:04
It's working in the old paradigm that
6:06
app shouldn't exist. uh and uh yeah the
6:09
software 3.0 paradigm is a lot more kind
6:11
of raw. It just um your neural network
6:14
is doing more and more of the work and
6:15
your prompt or context is just the image
6:18
and the output is an image and there's
6:19
no need to have any of the app in
6:21
between. Um so I think that people have
6:24
to kind of like reframe you know not to
6:27
work in existing paradigm of what things
6:30
existed and just think about it as a
6:31
speed up of what exists. It's actually
6:33
like new things are available now. And
6:36
going back to your programming question,
6:37
it's not even I think that's also an
6:38
example of working in the in the old
6:40
mindset because it's not just about
6:41
programming and programming becoming
6:42
faster. This is more general information
6:44
processing that is automatable now. So
6:47
um it's not just even about code. So
6:49
previous code worked over kind of like
6:51
structured data, right? And uh you write
6:53
code over structured data. But like for
6:55
example with my LLM knowledge basis
6:56
project um basically you get LLMs to
6:59
create wikis for your organization or
7:01
for you in person etc. This is not even
7:03
a program. This is not something that
7:04
could exist before because there was no
7:06
there was no code that would create a
7:08
knowledge base based on a bunch of
7:09
facts. But now you can just take these
7:10
documents and uh basically uh recompile
7:14
them in a different way and uh reorder
7:15
them and create something that is uh new
7:17
and interesting uh as a reframing of the
7:19
data. And so these are new things that
7:22
weren't possible. Uh and so I think this
7:24
is uh something that I keep trying to
7:26
get back to as to not only what can we
7:29
do that existed that is faster now but I
7:31
think there's new opportunities of just
7:33
things that couldn't be possible before
7:35
and I almost think that that's more
7:36
exciting.
7:37
>> I love the menu genen progression and
7:40
dichotomy that you laid out and I think
7:41
even I'm sure many folks here followed
7:43
your own progression of programming from
7:45
last October to early January February
7:48
this year. Um, if you extrapolate that
7:51
further, what is the 2026 equivalent um,
7:54
for building websites in the '9s,
7:56
building mobile apps in the 2010s,
7:59
building SAS um, in the last cloud era,
8:02
what will look completely obvious in
8:04
hindsight that is still mostly unbuilt
8:06
today?
8:08
>> Um, [clears throat] well, going with the
8:10
example of menu, I guess, uh, so a lot
8:12
of this code shouldn't exist and it's
8:13
just neural network doing most of the
8:15
work. Um I do think that the
8:17
extrapolation looks very weird because
8:19
you could basically imagine
8:21
I don't I yeah so you could imagine
8:23
completely neural computers in a certain
8:25
sense you feed raw videos like imagine a
8:28
device you takes raw videos or audio
8:30
into basically what's a neural net and
8:32
uh uses diffusion to render a UI that is
8:35
kind of like you know unique for that
8:37
moment in a certain sense and um I kind
8:40
of feel like in the early days of
8:42
computing actually people were a little
8:43
bit confused as to whether computers
8:45
would look like calculators or computers
8:46
would look like neural nets and in 50s
8:48
and 60s it was not really obvious which
8:50
way would go and of course we went down
8:52
the calculator path and ended up
8:53
building classical computing and then
8:55
neural nets are currently running
8:56
virtualized on existing computers but
8:58
you could imagine I think that uh a lot
9:00
of this will flip and that the neural
9:01
net becomes kind of like the host
9:02
process and uh the CPUs become kind of
9:05
like the co-processor so we saw the
9:07
diagram of you know intelligence compute
9:09
is going to of neural networks is going
9:10
to take over and become the dominant
9:12
spend of flops so you could imagine
9:14
something really weird and foreign when
9:17
where neural nets are doing most of the
9:18
heavy lifting. They're using tool use as
9:20
this like you know um historical
9:22
appendage for some kinds of like
9:24
deterministic tasks. Uh but what's
9:25
really running the show is these uh
9:27
neural nets that are in a certain way.
9:29
Um so you can imagine something
9:31
extremely foreign as the extrapolation
9:33
but I think we're going to probably get
9:34
there uh sort of piece by piece. Um and
9:36
I don't yeah that that progression is
9:39
TBD I would say.
9:40
>> [snorts]
9:41
>> I'd like to talk a little bit about um
9:43
uh this concept of verifiability, the
9:45
fact that AI will automate faster and
9:47
more easily domains where the output can
9:49
be verified. Um if that framework is
9:52
right, what work is about to move much
9:54
faster than people realize and what
9:56
professions do we have that people
9:58
actually think are safe but that are
10:00
actually highly verifiable?
10:02
Uh yes. So I I spent uh some time
10:05
writing about verifiability and um
10:07
basically like traditional computers can
10:09
easily automate what you can specify in
10:12
code and uh kind of this latest round of
10:14
LLMs can easily automate what you can uh
10:16
verify in a certain in a certain sense
10:19
because the way this works is that when
10:20
frontier labs are training these LLMs
10:22
these are giant reinforcement learning
10:24
environments. So they are given
10:25
verification rewards and then because of
10:28
the way that these models are trained
10:29
they end up basically uh progressing and
10:32
creating these like jagged entities that
10:34
really peak in capability in kind of
10:36
like verifiable domains like math and
10:37
code and adjacent and kind of like
10:39
stagnate and are a little bit um you
10:41
know rough around the edges when uh
10:43
things are not kind of like in that in
10:44
that space. So I think the reason I
10:46
wrote about verifiability is I'm trying
10:47
to understand why these things are so
10:49
jagged. Um and some of it has to do with
10:52
how the labs train the models but I
10:54
think some of it also has to do with um
10:55
the focus of the labs and what they
10:57
happen to put into the data
10:58
distribution. Uh because some things
11:00
basically are significantly more
11:01
valuable in economy and end up creating
11:03
more environments because the labs
11:05
wanted to work in those settings. So I
11:06
think code is a good example of that.
11:08
There's probably lots of verifiable
11:09
environments they could think about that
11:10
happen not to make it into the mix
11:12
because they're just not that useful to
11:13
have the capability around. Um, but I
11:15
think to me the big um I guess like the
11:18
big mystery is uh the favorite example
11:21
for a while was that how many letters
11:22
are are in a strawberry and the models
11:24
would famously get this wrong and it's
11:26
an example of jaggedness. Uh the models
11:27
now patch this I think but the new one
11:29
is I want to go to a car wash to wash my
11:32
car and it's 50 meters away. Should I
11:34
drive or should I walk? And
11:36
state-of-the-art models today will tell
11:38
you to walk because it's so close. How
11:40
is it possible that state-of-the-art
11:42
Opus 4.7 will simultaneously refactor a
11:46
100,000 like [laughter] codebase line
11:48
codebase or find zero day
11:50
vulnerabilities and yet tells me to walk
11:52
to this car wash? This is insane. And to
11:56
whatever extent these uh models are
11:58
remain jagged, it's an indication that
12:01
number one maybe something's slightly
12:02
off or um number two you need to
12:05
actually be in the loop a little bit and
12:07
you need to treat them as tools and you
12:09
do have to kind of stay in touch with
12:11
what they're doing. And so I think all
12:12
of my writing long story short about
12:14
verifiability is just trying to
12:16
understand um why these things are
12:18
jacked. Is there any pattern to it? And
12:20
I think it's some kind of a combination
12:22
of verifiable plus labs care. Maybe one
12:25
more anecdote that is instructive is uh
12:28
from GPT 3.5 to GPT4 people noticed that
12:31
chess improved a lot and I think a lot
12:33
of people thought oh well it's just a
12:34
progression of the capabilities but
12:36
actually it's it's more that uh I think
12:38
this is public information I think I saw
12:39
it on the internet um a huge amount of
12:41
like um data of chess made it into the
12:43
pre-training set and just because it's
12:46
in a data distribution uh basically the
12:48
model improved a lot more than it would
12:50
just by default. So someone at OpenAI
12:53
decided to add this data and now you
12:55
have a capability that just peaked a lot
12:56
more. And so that's why I think I'm
12:58
stressing this um dimension of it as we
13:01
are slightly at the mercy of whatever
13:03
the labs are doing, whatever they happen
13:04
to put into the mix. And you have to
13:06
actually explore this thing that they
13:08
give you that has no manual. And it
13:10
works in certain settings, but maybe not
13:11
in some settings. And you have to kind
13:13
of um explore it a little bit. And uh if
13:16
you're in the circuits that were part of
13:17
the RL, you fly. And if you're in the
13:19
circuits that are out of the data
13:21
distribution, uh you're going to
13:22
struggle and you have to kind of figure
13:24
out which which circuits you're in in
13:26
your application. And if you and if
13:28
you're not in the circuits, then you
13:29
have to really look at fine-tuning and
13:30
doing some of your own work because it's
13:32
not going to necessarily come out of the
13:34
LLM out of the box.
13:36
>> I'd love to come back to the concept of
13:38
jagged intelligence in a little bit. Um,
13:40
if you are a founder today and thinking
13:42
about building a company, you are trying
13:44
to solve a problem that you think is
13:46
tractable, something that uh is a domain
13:49
that is verifiable, but you look around
13:51
and you think, "Oh my gosh, well, the
13:53
labs have really really started uh
13:56
getting to escape velocity in the ones
13:58
that seem most obvious, math, coding,
14:00
and others." What would your advice be
14:02
to to the founders in the audience?
14:05
Um
14:08
so I think maybe that comes to the
14:10
previous question of I do think that
14:12
verifiability because it um let me
14:14
think. So verifiability makes something
14:17
tractable in the current paradigm
14:18
because you can throw a huge amount of
14:20
RL at it. Um so maybe one way to see it
14:24
is that uh that remains true even if the
14:26
labs are not focusing on it directly. So
14:28
if you are in a verifiable setting where
14:30
you could create these RL environments
14:31
or examples then that actually sets you
14:34
up to potentially do your own fine
14:35
tuning and you might benefit from that.
14:36
But that is fundamentally technology
14:38
that just works. You can pull a lever if
14:39
you have huge amount of diverse data
14:41
sets of RL environments etc. Uh you can
14:43
use your favorite fine-tuning framework
14:44
and um and uh pull the lever and get
14:47
something that actually uh works pretty
14:49
well. So um I don't know what the
14:51
examples of this might be. Um, but I do
14:54
think there are some very valuable uh
14:56
reinforcement learning environments that
14:58
people could think of that I think are
14:59
not part of the Yeah, I don't want to
15:01
give away the answer, but there is one
15:02
domain that I think is very uh Oh, okay.
15:04
Sorry, I don't mean to vape post on on
15:06
the stage, but there are some examples
15:08
of this.
15:09
>> On the flip side, what do you think
15:11
still feels automatable only from a
15:13
distance?
15:14
>> I do think that ultimately almost
15:17
everything can be made uh verifiable to
15:19
some extent. some things easier than
15:21
others. Um because even for like things
15:23
like writing or so on, you can imagine
15:25
having a council of LLM judges and
15:27
probably get get to some get get
15:29
something uh reasonable out of the um
15:31
from from this kind of an approach. So
15:33
it's more about what's easy or hard. Um
15:36
so I I do think that ultimately um uh
15:40
yeah, I think uh
15:42
>> everything [laughter]
15:43
>> everything is automatable.
15:45
>> Amazing. Okay. Um, so last year you
15:47
coined the term vibe coding and today
15:49
we're in a world that feels a little bit
15:50
more serious, more regent engineering.
15:52
What do you think is the difference
15:54
between the two and what would you
15:55
actually call what we're in today?
15:57
>> Uh, yeah. So I would say vibe coding is
15:59
about raising the floor for everyone in
16:01
terms of what they can do in software.
16:03
So the floor rises, everyone can vibe
16:05
code anything and that's amazing,
16:06
incredible. But then I would say agentic
16:08
engineering is about preserving the
16:10
quality bar of what existed before in
16:11
professional software. So you're not
16:13
allowed to introduce vulnerabilities due
16:15
to VIP coding. Um you are um you're
16:18
still responsible for your software just
16:20
as before, but can you go faster? And
16:22
spoiler is you can but how do you how do
16:24
you do that properly? And so to me
16:26
agentic engineering when I call it that
16:28
because I do think it's kind of like an
16:29
engineering discipline. You have these
16:31
agents which are these like spiky
16:32
entities. They're a bit fable, a little
16:33
bit stocastic, but they are extremely
16:35
powerful. is how do you how do you
16:37
coordinate them to go faster without
16:39
sacrificing your quality bar and doing
16:42
that well and correctly um is the the
16:46
realm of agentic engineering um so I
16:48
kind of see them as as different like
16:50
one is about maybe raising the raising
16:51
the floor and the other is about um you
16:53
know extrapolating and what I'm seeing I
16:55
think is there is a very high ceiling on
16:58
agentic engineer uh capability and you
17:01
know people used to talk about the 10x
17:02
engineer previously I think that this is
17:04
magnified a lot more 10x is uh is not uh
17:08
the speed up you gain. Um and I think uh
17:11
it does seem to me like people who are
17:13
very good at this um peak a lot more
17:16
than 10x uh from from my perspective
17:18
right now.
17:18
>> I really like that framing. Um one thing
17:21
that when Sam Alman came to AIN last
17:23
year, one memorable thing he said was
17:25
that people of different generations use
17:27
chatpt differently. So if you're in your
17:29
30s, you use it as a Google search
17:31
replacement. But if you're in your
17:32
teens, tragic is your gateway to the
17:35
internet. What is the parallel here in
17:37
coding today? If we were to watch two
17:39
people code using OpenClaw, Claude Code,
17:42
Codeex, one you'd consider mediocre at
17:45
it and one you would consider fully AI
17:47
native. How would you describe the
17:49
difference?
17:51
>> [clears throat]
17:51
>> I mean I think it's a just trying to get
17:53
the most out of the tools that are
17:55
available utilizing all of their
17:56
features investing into your own um kind
17:59
of setup. Uh so just like previously all
18:02
the engineers are used to basically
18:03
getting the most out of the tools you
18:04
use either it's vim or v code or now
18:06
it's you know cloth code or codec or so
18:09
on. So um just investing into your setup
18:13
um and um utilizing a lot of the you
18:16
know uh tools that are available to you.
18:18
Um and I think it just kind of looks
18:20
like that. I do think that um maybe
18:23
related thought is um a lot of people
18:26
are maybe hiring um for this right
18:29
because they want to hire strong agentic
18:31
engineers. I do think that um what I'm
18:34
seeing is that uh the you know most
18:37
people have still not refactored their
18:39
um their hiring process for a gentic
18:41
engineer capability right like if you're
18:44
giving out puzzles to solve and this is
18:46
still the old paradigm I would say that
18:48
hiring have to has to look like give me
18:50
a really big project and see someone
18:52
implement that big project like let's
18:53
write say a Twitter clone uh for agents
18:57
and then uh make it really good make it
18:59
really secure and then have some agents
19:01
uh simulate some activity uh on this
19:03
Twitter and then I'm going to use 10
19:06
codecs 5.4x for X high to try to break
19:09
your break your um uh this website that
19:12
you deployed and they're going to try to
19:15
basically break it and they should not
19:16
be able to break it. And so maybe it
19:18
looks like that, right? And so yeah,
19:20
watching people in that that setting and
19:21
building bigger uh projects and uh
19:25
utilize utilizing the tooling is maybe
19:26
what I would uh look at for the most
19:28
part.
19:29
>> And as agents do more, what human skill
19:31
do you think becomes more valuable, not
19:33
less?
19:34
>> Uh so um yeah, it's a good question. I
19:37
think um well right now the answer is
19:39
that the agents are kind of like these
19:40
intern entities right so it's remarkable
19:44
um you basically still have to be in
19:46
charge of the aesthetics the the
19:48
judgment the taste and a little bit of
19:50
oversight maybe one one of my favorite
19:52
examples of like the the weirdness of
19:54
agents is um for menu genen uh you sign
19:57
up with a Google Google account but you
20:00
um purchase credits using a stripe
20:02
account and both of them have email
20:04
addresses and my agent actually tried to
20:06
basically
20:08
um like when you purchase credits, it
20:10
assigned it using the email address from
20:13
Stripe to the Google email address like
20:15
there wasn't a persistent user ID that
20:18
that uh for people it was trying to
20:20
match up the email addresses, but you
20:21
could use different email address for
20:22
your Stripe and your Google and
20:24
basically would not associate the funds.
20:26
And so this is the kind of thing that
20:28
these agents still will make mistakes
20:29
about is like why would you use email
20:31
addresses to try to crossorrelate the
20:33
funds? They can be arbitrary. You can
20:34
use different emails, etc. Like this is
20:36
such a weird thing to do. So I think
20:39
people have to be in charge of this
20:40
spec, this plan. And um I actually don't
20:43
even like the plan mode. I I would I
20:46
mean obviously it's very useful, but I
20:47
think there's something more general
20:48
here where you have to work with your
20:49
agent to design a spec that is very
20:51
detailed and maybe it's uh maybe
20:53
basically the docs and then get the
20:55
agents to write them and you're in
20:56
charge of the oversight and the top
20:58
level categories, but the agents are
21:00
doing a lot of the under the hood. And
21:02
um so I think you're not caring about
21:04
some of the details. So as an example
21:05
also with um arrays or tensors in neural
21:09
networks. Um there's a ton of details
21:11
between PyTorch and NumPy and all the
21:13
different like pandas and so on for all
21:14
the different little API details. And I
21:17
I already forgot about the keep dims
21:18
versus keep dim or whether it's dim or
21:20
axis or reshape or permute or transpose.
21:22
I don't remember this stuff anymore,
21:24
right? Because you don't have to. This
21:25
is the kind of details that are handled
21:26
by the intern because they have very
21:28
good recall and but you still have to
21:30
know for example that um you know
21:32
there's underlying tensor there's an
21:33
underlying view and then you can
21:35
manipulate view of the same storage or
21:37
you can have different storage which
21:38
would be less efficient and so you still
21:40
have to have an understanding of what
21:41
this stuff is doing and some of the
21:43
fundamentals um so that you're not
21:45
copying memory around unnecessarily and
21:47
so on but uh the details of the APIs are
21:50
now handed off so it um you're in charge
21:53
of the taste the engineering the design
21:55
um and that it makes sense and that
21:57
you're asking for the right things and
21:58
that you're saying that okay that these
21:59
have to be unique user IDs that we're
22:01
going to tie everything to um and so
22:03
you're doing some of the design and
22:06
development and the engineers are doing
22:07
the fill in the blanks and that's
22:08
currently kind of like where we are and
22:10
I think that's what everyone of course
22:11
is seeing I think right now
22:13
>> do you think there's a chance that this
22:15
um taste and judgment matters less over
22:18
time or will the ceiling just keep
22:20
rising
22:21
>> um yeah it's a good question I would
22:22
Okay.
22:25
Um, I mean, I'm hoping that the that it
22:28
improves. I think probably the reason it
22:30
doesn't improve right now is again, it's
22:31
not part of the RL. There's probably no
22:33
aesthetics cost or reward or it's not
22:36
good enough or something like that. Um,
22:39
I do think that when you actually look
22:41
at the code, sometimes I get a little
22:42
bit of a heart attack because it's not
22:44
like super amazing code necessarily all
22:46
the time and it's very bloaty and
22:47
there's a lot of copy paste and there's
22:48
awkward abstractions that are brittle
22:50
and like it works but it's just really
22:52
gross. Um, and I do I do hope that this
22:55
can improve in future models. Um, a good
22:57
example also is this uh you know micro
22:59
GPT project which where I was trying to
23:02
simplify uh LLM training to be as simple
23:04
as possible. The models hate this. They
23:06
can't do it. I tried to I keep I kept
23:08
trying to prompt an LLM to simplify more
23:10
simplify more and it just can't you feel
23:13
like you're outside of the RL circuits.
23:15
It feels like you're obviously you know
23:18
you're pulling teeth. It's not like
23:20
light speed. So I think um I do think
23:23
that people are still remain in charge
23:25
of this. But I do think that there's
23:26
nothing fundamental again that's
23:27
preventing it. It's just the labs
23:28
haven't done it yet almost.
23:30
>> Yeah.
23:31
>> So I'd love to come back to this idea of
23:33
uh jagged forms of intelligence. you
23:36
wrote a little bit about this with a
23:38
very thoughtprovoking piece around
23:39
animals versus ghosts. Um, and the idea
23:42
is that we're not building animals, we
23:44
are summoning ghosts. Um, and these are
23:46
jagged forms of intelligence that are
23:48
shaped by data and reward functions, but
23:51
not by intrinsic motivation or fun or
23:54
curiosity or empowerment. Uh, things
23:57
that kind of came about via evolution.
24:00
um why does that framing matter and what
24:02
does it actually change about how you
24:04
build and deploy and evaluate or even
24:07
trust them?
24:08
>> Uh yeah, so yeah, I think the reason I
24:12
wrote about this is because I'm trying
24:13
to wrap my head around what these things
24:15
are, right? Because if you have a good
24:16
model of what they are or are not, then
24:18
you're going to be more competent at uh
24:20
using them. Um and I do think that um I
24:23
don't know if it has I'm not sure if it
24:25
actually has like real power. [laughter]
24:28
I think it's a little bit of
24:29
philosophizing. Um, but I do think that
24:33
um
24:34
I think it's just um coming to terms
24:36
with the fact that these things are not,
24:38
you know, animal intelligences. Like if
24:40
you yell at them, they're not going to
24:41
work better or worse or it doesn't have
24:43
any impact. Um, and uh it's all just
24:46
kind of like these statistical
24:48
simulation circuits where the the
24:50
substrate is pre-training so like
24:53
statistics and then but then there's RL
24:55
bolting on top. So, it kind of like
24:57
increases the dispendages and um maybe
25:00
it's just kind of like a mindset of what
25:02
I'm coming into or what's likely to work
25:04
or not likely to work or how to modify
25:05
it. But I don't actually I don't know
25:07
that I have like here are the five
25:09
obvious outcomes of how to make your
25:11
system better. It's more just being
25:12
suspicious of it and um
25:14
>> figuring out over time.
25:16
>> That's where it starts. Um okay, so you
25:18
are so deep in working with agents that
25:20
don't just chat. They have um real
25:22
permissions. They have local context.
25:24
they actually take action on your be
25:26
your behalf. What does the world look
25:28
like when we all start to live in that
25:30
world?
25:31
>> Uh yeah, I think I think every a lot of
25:34
people probably here are excited about
25:35
what this agent uh you know native
25:38
agentic environment looks like and
25:40
everything has to be rewritten.
25:41
Everything is still fundamentally
25:42
written for humans and has to be moved
25:44
around. I still use most of the time
25:46
when I use uh different frameworks or
25:48
libraries or things like that, they
25:49
still have docs that are fundamentally
25:51
written for humans. This is my favorite
25:53
pet peeve. Like I don't uh why are
25:55
people still telling me what to do? Like
25:57
I don't want to do anything. What is the
25:58
thing I should copy paste to my agent?
26:00
[laughter] Like uh so it's just um every
26:02
time I'm told, you know, go to this URL
26:04
or something like that, it's just like
26:06
ah [laughter]
26:07
you know. [snorts] So um everyone is I
26:10
think excited about how do we decompose
26:12
the workloads that need to happen into
26:14
fundamentally sensors over the world,
26:16
actuators over the world. How do we make
26:18
it agent native? Uh basically describe
26:20
it to agents first. um and then have a
26:23
lot of automation around um you know the
26:27
um yeah around data structures that are
26:30
very legible to the LLMs. Uh so I think
26:32
um yeah I'm hoping that there's a lot of
26:34
agent first um infrastructure out there
26:36
and that you know for Menuguen famously
26:39
when I wrote the uh not I'm not sure how
26:40
famously but when I wrote the blog post
26:42
about Menuguen [laughter]
26:44
um a lot of the work a lot of the
26:46
trouble was not even writing the code
26:47
for Menugen it was deploying it in
26:48
versell because I had to work with all
26:50
these different services and I had to
26:51
string them up and I had to go to their
26:52
settings and the menus and you know
26:54
configure my DNS and it was just so
26:56
annoying and so that's a good example of
26:59
I would hope that menu gen that I could
27:01
give a prompt to an LLM build menu genen
27:04
and then I didn't have to touch anything
27:05
and it's deployed in that same way on
27:07
the internet. Uh I think that would be a
27:09
good kind of a test for whether or not
27:12
uh a lot of our infrastructure is
27:13
becoming more and more agent native. And
27:14
then ultimately I would say yeah I I do
27:17
think we're going towards a world where
27:19
um there's agent representation for
27:21
people and for organizations and um you
27:25
know I'll have my agent talk to your
27:26
agent uh to figure out some of the
27:28
details of our meetings or or things
27:30
like that. So, [laughter]
27:33
um I do think that that's uh roughly
27:34
where things are going, but um yeah, I
27:36
think everyone here is excited about
27:37
that.
27:38
>> I really like the visual analogy of
27:40
sensors and actuators. I actually hadn't
27:41
thought of that. That's super
27:42
interesting,
27:43
>> right?
27:43
>> Um okay, I think we have to end on a
27:45
question about education. Um because you
27:47
are probably one of the very best in the
27:49
world at making complex technical
27:51
concepts simple and deeply thoughtful
27:53
about how we design education around it.
27:56
Um, what still remains worth learning
27:59
deeply when intelligence gets cheap as
28:02
we move into the next a era of AI?
28:05
>> Yeah. Uh, there was a tweet that blew my
28:07
mind recently and I keep thinking about
28:09
it like every other day. It was
28:10
something along the lines of um, you can
28:12
outsource your thinking but you can't
28:14
outsource your understanding.
28:16
And um,
28:17
>> I think that's really nicely put. I so
28:21
yeah because I still I'm still part of
28:23
the system and I still I still have to
28:25
somehow information still has to make it
28:26
into my brain and I feel like I'm
28:27
becoming a bottleneck of just even
28:29
knowing what are we trying to build why
28:30
is it worth doing uh how do I direct you
28:32
know how do I direct my my agents and so
28:34
on so I do still think that ultimately
28:37
something has to direct the thinking and
28:39
the processing and so on and um that's
28:43
still kind of fundamentally constrained
28:44
somehow by understanding and this is one
28:46
reason I also was very excited about all
28:47
the LM knowledge bases because I feel
28:49
like that's that's a way for me to
28:51
process information and anytime I see a
28:53
different projection onto information. I
28:54
always like feel like I gain insight. So
28:56
it's really just a lot of prompts for me
28:58
to do synthetic data generation kind of
29:00
over over some fixed data. Uh so I I
29:03
really enjoy uh whenever I read an
29:05
article I have my uh you know my wiki
29:06
that's being built up from these
29:07
articles and I love asking questions
29:09
about things or um and I I think that
29:12
ultimately these are tools to enhance
29:15
understanding in a certain way and this
29:17
is still kind of like a bit of a
29:18
bottleneck because then you can't direct
29:20
the you can't be a good director if you
29:22
still uh because the LM certainly don't
29:25
excel at understanding you still are
29:26
uniquely in charge of that. So, uh,
29:28
yeah, I think, uh, tools to that effect,
29:31
I think are incredibly interesting and
29:32
exciting.
29:33
>> I'm excited to be back here in a couple
29:34
years and to see if we've been fully
29:36
automated out of the loop and they
29:38
actually take care of understanding as
29:40
well. Uh, thank you so much for joining
29:41
us, Andre. We really appreciate it.
29:42
[applause]
— end of transcript —
Advertisement