Andrej Karpathy: Software Is Changing (Again)

0:49

this talk already. Um but the problem is

0:52

that software keeps changing. So I

0:54

actually have a lot of material to

0:55

create new talks and I think it's

0:56

changing quite fundamentally. I think

0:58

roughly speaking software has not

1:00

changed much on such a fundamental level

1:02

for 70 years. And then it's changed I

1:04

think about twice quite rapidly in the

1:06

last few years. And so there's just a

1:08

huge amount of work to do a huge amount

1:09

of software to write and rewrite. So

1:12

let's take a look at maybe the realm of

1:14

software. So if we kind of think of this

1:16

as like the map of software this is a

1:17

really cool tool called map of GitHub.

1:20

Um this is kind of like all the software

1:21

that's written. Uh these are

1:23

instructions to the computer for

1:24

carrying out tasks in the digital space.

1:26

So if you zoom in here, these are all

1:28

different kinds of repositories and this

1:30

is all the code that has been written.

1:31

And a few years ago I kind of observed

1:33

that um software was kind of changing

1:35

and there was kind of like a new type of

1:37

software around and I called this

1:39

software 2.0 at the time and the idea

1:42

here was that software 1.0 is the code

1:44

you write for the computer. Software 2.0

1:46

know are basically neural networks and

1:48

in particular the weights of a neural

1:50

network and you're not writing this code

1:53

directly you are most you are more kind

1:55

of like tuning the data sets and then

1:56

you're running an optimizer to create to

1:58

create the parameters of this neural net

2:00

and I think like at the time neural nets

2:02

were kind of seen as like just a

2:03

different kind of classifier like a

2:04

decision tree or something like that and

2:06

so I think it was kind of like um I

2:09

think this framing was a lot more

2:10

appropriate and now actually what we

2:12

have is kind of like an equivalent of

2:13

GitHub in the realm of software 2.0 And

2:15

I think the hugging face is basically

2:18

equivalent of GitHub in software 2.0.

2:20

And there's also model atlas and you can

2:22

visualize all the code written there. In

2:24

case you're curious, by the way, the

2:25

giant circle, the point in the middle,

2:28

uh these are the parameters of flux, the

2:30

image generator. And so anytime someone

2:32

tunes a on top of a flux model, you

2:34

basically create a git commit uh in this

2:37

space and uh you create a different kind

2:39

of a image generator. So basically what

2:41

we have is software 1.0 is the computer

2:43

code that programs a computer. Software

2:45

2.0 are the weights which program neural

2:48

networks. Uh and here's an example of

2:50

Alexet image recognizer neural network.

2:53

Now so far all of the neural networks

2:55

that we've been familiar with until

2:56

recently where kind of like fixed

2:58

function computers image to categories

3:01

or something like that. And I think

3:03

what's changed and I think is a quite

3:05

fundamental change is that neural

3:06

networks became programmable with large

3:09

language models. And so I I see this as

3:12

quite new, unique. It's a new kind of a

3:14

computer and uh so in my mind it's uh

3:18

worth giving it a new designation of

3:19

software 3.0. And basically your prompts

3:22

are now programs that program the LLM.

3:25

And uh remarkably uh these uh prompts

3:28

are written in English. So it's kind of

3:30

a very interesting programming language.

3:33

Um so maybe uh to summarize the

3:36

difference if you're doing sentiment

3:37

classification for example you can

3:39

imagine writing some uh amount of Python

3:42

to to basically do sentiment

3:44

classification or you can train a neural

3:46

net or you can prompt a large language

3:47

model. Uh so here this is a few short

3:50

prompt and you can imagine changing it

3:51

and programming the computer in a

3:52

slightly different way. So basically we

3:54

have software 1.0 software 2.0 and I

3:57

think we're seeing maybe you've seen a

3:59

lot of GitHub code is not just like code

4:01

anymore. there's a bunch of like English

4:03

interspersed with code and so I think

4:05

kind of there's a growing category of

4:07

new kind of code. So not only is it a

4:09

new programming paradigm, it's also

4:10

remarkable to me that it's in our native

4:12

language of English. And so when this

4:14

blew my mind a few uh I guess years ago

4:17

now I tweeted this and um I think it

4:20

captured the attention of a lot of

4:21

people and this is my currently pinned

4:23

tweet uh is that remarkably we're now

4:25

programming computers in English. Now,

4:28

when I was at uh Tesla, um we were

4:31

working on the uh autopilot and uh we

4:34

were trying to get the car to drive and

4:37

I sort of showed this slide at the time

4:39

where you can imagine that the inputs to

4:41

the car are on the bottom and they're

4:43

going through a software stack to

4:44

produce the steering and acceleration

4:47

and I made the observation at the time

4:48

that there was a ton of C++ code around

4:51

in the autopilot which was the software

4:52

1.0 code and then there was some neural

4:54

nets in there doing image recognition

4:56

and uh I kind of observed that over time

4:58

as we made the autopilot better

5:00

basically the neural network grew in

5:02

capability and size and in addition to

5:05

that all the C++ code was being deleted

5:08

and kind of like was um and a lot of the

5:12

kind of capabilities and functionality

5:14

that was originally written in 1.0 was

5:16

migrated to 2.0. So as an example, a lot

5:19

of the stitching up of information

5:20

across images from the different cameras

5:22

and across time was done by a neural

5:24

network and we were able to delete a lot

5:26

of code and so the software 2.0 stack

5:29

quite literally ate through the software

5:32

stack of the autopilot. So I thought

5:34

this was really remarkable at the time

5:35

and I think we're seeing the same thing

5:37

again where uh basically we have a new

5:39

kind of software and it's eating through

5:40

the stack. We have three completely

5:42

different programming paradigms and I

5:44

think if you're entering the industry

5:45

it's a very good idea to be fluent in

5:47

all of them because they all have slight

5:49

pros and cons and you may want to

5:50

program some functionality in 1.0 or 2.0

5:53

or 3.0. Are you going to train

5:54

neurallet? Are you going to just prompt

5:55

an LLM? Should this be a piece of code

5:57

that's explicit etc. So we all have to

5:59

make these decisions and actually

6:00

potentially uh fluidly trans transition

6:03

between these paradigms. So what I

6:06

wanted to get into now is first I want

6:09

to in the first part talk about LLMs and

6:11

how to kind of like think of this new

6:13

paradigm and the ecosystem and what that

6:15

looks like. Uh like what are what is

6:17

this new computer? What does it look

6:18

like and what does the ecosystem look

6:20

like? Um I was struck by this quote from

6:23

Anduring actually uh many years ago now

6:25

I think and I think Andrew is going to

6:27

be speaking right after me. Uh but he

6:29

said at the time AI is the new

6:30

electricity and I do think that it um

6:33

kind of captures something very

6:34

interesting in that LLMs certainly feel

6:36

like they have properties of utilities

6:38

right now. So

6:41

um LLM labs like OpenAI, Gemini,

6:44

Enthropic etc. They spend capex to train

6:47

the LLMs and this is kind of equivalent

6:48

to building out a grid and then there's

6:51

opex to serve that intelligence over

6:53

APIs to all of us and this is done

6:56

through metered access where we pay per

6:58

million tokens or something like that

7:00

and we have a lot of demands that are

7:01

very utility- like demands out of this

7:03

API we demand low latency high uptime

7:06

consistent quality etc. In electricity,

7:08

you would have a transfer switch. So you

7:10

can transfer your electricity source

7:12

from like grid and solar or battery or

7:14

generator. In LLM, we have maybe open

7:16

router and easily switch between the

7:18

different types of LLMs that exist.

7:20

Because the LLM are software, they don't

7:23

compete for physical space. So it's okay

7:25

to have basically like six electricity

7:26

providers and you can switch between

7:28

them, right? Because they don't compete

7:29

in such a direct way. And I think what's

7:31

also a little fascinating and we saw

7:33

this in the last few days actually a lot

7:36

of the LLMs went down and people were

7:38

kind of like stuck and unable to work.

7:41

And uh I think it's kind of fascinating

7:42

to me that when the state-of-the-art

7:43

LLMs go down, it's actually kind of like

7:45

an intelligence brownout in the world.

7:47

It's kind of like when the voltage is

7:49

unreliable in the grid and uh the planet

7:52

just gets dumber the more reliance we

7:55

have on these models, which already is

7:56

like really dramatic and I think will

7:58

continue to grow. But LLM's don't only

8:00

have properties of utilities. I think

8:02

it's also fair to say that they have

8:03

some properties of fabs. And the reason

8:06

for this is that the capex required for

8:09

building LLM is actually quite large. Uh

8:12

it's not just like building some uh

8:14

power station or something like that,

8:15

right? You're investing a huge amount of

8:17

money and I think the tech tree and uh

8:20

for the technology is growing quite

8:22

rapidly. So we're in a world where we

8:24

have sort of deep tech trees, research

8:26

and development secrets that are

8:28

centralizing inside the LLM labs. Um and

8:32

but I think the analogy muddies a little

8:34

bit also because as I mentioned this is

8:36

software and software is a bit less

8:38

defensible because it is so malleable.

8:40

And so um I think it's just an

8:43

interesting kind of thing to think about

8:44

potentially. There's many analogy

8:46

analogies you can make like a 4

8:48

nanometer process node maybe is

8:49

something like a cluster with certain

8:51

max flops. You can think about when

8:53

you're use when you're using Nvidia GPUs

8:54

and you're only doing the software and

8:56

you're not doing the hardware. That's

8:57

kind of like the fabless model. But if

8:59

you're actually also building your own

9:00

hardware and you're training on TPUs if

9:02

you're Google, that's kind of like the

9:03

Intel model where you own your fab. So I

9:05

think there's some analogies here that

9:06

make sense. But actually I think the

9:08

analogy that makes the most sense

9:09

perhaps is that in my mind LLM have very

9:12

strong kind of analogies to operating

9:15

systems. Uh in that this is not just

9:17

electricity or water. It's not something

9:19

that comes out of the tap as a

9:20

commodity. uh this is these are now

9:22

increasingly complex software ecosystems

9:25

right so uh they're not just like simple

9:28

commodities like electricity and it's

9:30

kind of interesting to me that the

9:32

ecosystem is shaping in a very similar

9:33

kind of way where you have a few closed

9:36

source providers like Windows or Mac OS

9:38

and then you have an open source

9:39

alternative like Linux and I think for u

9:42

neural for LLMs as well we have a kind

9:45

of a few competing closed source

9:47

providers and then maybe the llama

9:49

ecosystem is currently like maybe a

9:51

close approximation to something that

9:53

may grow into something like Linux.

9:55

Again, I think it's still very early

9:56

because these are just simple LLMs, but

9:58

we're starting to see that these are

9:59

going to get a lot more complicated.

10:01

It's not just about the LLM itself. It's

10:02

about all the tool use and the

10:03

multiodalities and how all of that

10:05

works. And so when I sort of had this

10:07

realization a while back, I tried to

10:09

sketch it out and it kind of seemed to

10:11

me like LLMs are kind of like a new

10:12

operating system, right? So the LLM is a

10:15

new kind of a computer. It's sitting

10:17

it's kind of like the CPU equivalent. uh

10:19

the context windows are kind of like the

10:21

memory and then the LLM is orchestrating

10:24

memory and compute uh for problem

10:26

solving um using all of these uh

10:29

capabilities here and so definitely if

10:32

you look at it looks very much like

10:34

operating system from that perspective.

10:36

Um, a few more analogies. For example,

10:38

if you want to download an app, say I go

10:41

to VS Code and I go to download, you can

10:43

download VS Code and you can run it on

10:46

Windows, Linux or or Mac in the same way

10:50

as you can take an LLM app like cursor

10:53

and you can run it on GPT or cloud or

10:55

Gemini series, right? It's just a drop

10:57

down. So, it's kind of like similar in

10:59

that way as well.

11:00

uh more analogies that I think strike me

11:02

is that we're kind of like in this

11:04

1960sish

11:05

era where LLM compute is still very

11:09

expensive for this new kind of a

11:10

computer and that forces the LLMs to be

11:13

centralized in the cloud and we're all

11:15

just uh sort of thing clients that

11:18

interact with it over the network and

11:20

none of us have full utilization of

11:22

these computers and therefore it makes

11:24

sense to use time sharing where we're

11:26

all just you know a dimension of the

11:28

batch when they're running the computer

11:30

in the cloud. And this is very much what

11:32

computers used to look like at during

11:33

this time. The operating systems were in

11:35

the cloud. Everything was streamed

11:36

around and there was batching. And so

11:39

the p the personal computing revolution

11:41

hasn't happened yet because it's just

11:42

not economical. It doesn't make sense.

11:44

But I think some people are trying. And

11:46

it turns out that Mac minis, for

11:48

example, are a very good fit for some of

11:50

the LLMs because it's all if you're

11:52

doing batch one inference, this is all

11:53

super memory bound. So this actually

11:55

works.

11:56

And uh I think these are some early

11:58

indications maybe of personal computing.

12:00

Uh but this hasn't really happened yet.

12:02

It's not clear what this looks like.

12:03

Maybe some of you get to invent what

12:05

what this is or how it works or uh what

12:08

this should what this should be. Maybe

12:10

one more analogy that I'll mention is

12:12

whenever I talk to Chach or some LLM

12:14

directly in text, I feel like I'm

12:16

talking to an operating system through

12:18

the terminal. Like it's just it's it's

12:21

text. It's direct access to the

12:22

operating system. And I think a guey

12:24

hasn't yet really been invented in like

12:26

a general way like should chatt have a

12:29

guey like different than just a tech

12:31

bubbles. Uh certainly some of the apps

12:33

that we're going to go into in a bit

12:35

have guey but there's no like guey

12:38

across all the tasks if that makes

12:40

sense. Um there are some ways in which

12:43

LLMs are different from kind of

12:45

operating systems in some fairly unique

12:47

way and from early computing. And I

12:49

wrote about uh this one particular

12:52

property that strikes me as very

12:54

different uh this time around. It's that

12:57

LLMs like flip they flip the direction

12:59

of technology diffusion uh that is

13:02

usually uh present in technology. So for

13:05

example with electricity, cryptography,

13:07

computing, flight, internet, GPS, lots

13:09

of new transformative technologies that

13:10

have not been around. Typically it is

13:12

the government and corporations that are

13:14

the first users because it's new and

13:16

expensive etc. and it only later

13:18

diffuses to consumer. Uh, but I feel

13:20

like LLMs are kind of like flipped

13:22

around. So maybe with early computers,

13:24

it was all about ballistics and military

13:26

use, but with LLMs, it's all about how

13:29

do you boil an egg or something like

13:30

that. This is certainly like a lot of my

13:32

use. And so it's really fascinating to

13:33

me that we have a new magical computer

13:35

and it's like helping me boil an egg.

13:37

It's not helping the government do

13:38

something really crazy like some

13:40

military ballistics or some special

13:42

technology. Indeed, corporations are

13:43

governments are lagging behind the

13:45

adoption of all of us, of all of these

13:47

technologies. So, it's just backwards

13:48

and I think it informs maybe some of the

13:50

uses of how we want to use this

13:52

technology or like where are some of the

13:53

first apps and so on.

13:56

So, in summary so far, LLM labs LLMs. I

14:01

think it's accurate language to use, but

14:03

LLMs are complicated operating systems.

14:06

They're circa 1960s in computing and

14:08

we're redoing computing all over again.

14:10

and they're currently available via time

14:11

sharing and distributed like a utility.

14:13

What is new and unprecedented is that

14:16

they're not in the hands of a few

14:17

governments and corporations. They're in

14:18

the hands of all of us because we all

14:20

have a computer and it's all just

14:21

software and Chaship was beamed down to

14:24

our computers like billions of people

14:26

like instantly and overnight and this is

14:28

insane. Uh and it's kind of insane to me

14:30

that this is the case and now it is our

14:33

time to enter the industry and program

14:34

these computers. This is crazy. So I

14:37

think this is quite remarkable. Before

14:39

we program LLMs, we have to kind of like

14:42

spend some time to think about what

14:43

these things are. And I especially like

14:45

to kind of talk about their psychology.

14:48

So the way I like to think about LLMs is

14:50

that they're kind of like people

14:51

spirits. Um they are stoastic

14:54

simulations of people. Um and the

14:56

simulator in this case happens to be an

14:58

auto reggressive transformer. So

14:59

transformer is a neural net. Uh it's and

15:02

it just kind of like is goes on the

15:04

level of tokens. It goes chunk chunk

15:06

chunk chunk chunk. And there's an almost

15:08

equal amount of compute for every single

15:10

chunk. Um and um this simulator of

15:14

course is is just is basically there's

15:16

some weights involved and we fit it to

15:19

all of text that we have on the internet

15:20

and so on. And you end up with this kind

15:22

of a simulator and because it is trained

15:24

on humans, it's got this emergent

15:26

psychology that is humanlike. So the

15:28

first thing you'll notice is of course

15:30

uh LLM have encyclopedic knowledge and

15:32

memory. uh and they can remember lots of

15:34

things, a lot more than any single

15:36

individual human can because they read

15:37

so many things. It's it actually kind of

15:39

reminds me of this movie Rainman, which

15:41

I actually really recommend people

15:43

watch. It's an amazing movie. I love

15:44

this movie. Um and Dustin Hoffman here

15:46

is an autistic savant who has almost

15:49

perfect memory. So, he can read a he can

15:51

read like a phone book and remember all

15:53

of the names and phone numbers. And I

15:55

kind of feel like LM are kind of like

15:57

very similar. They can remember Shaw

15:58

hashes and lots of different kinds of

16:00

things very very easily. So they

16:02

certainly have superpowers in some set

16:04

in some respects. But they also have a

16:06

bunch of I would say cognitive deficits.

16:08

So they hallucinate quite a bit. Um and

16:11

they kind of make up stuff and don't

16:13

have a very good uh sort of internal

16:15

model of self-nowledge, not sufficient

16:17

at least. And this has gotten better but

16:19

not perfect. They display jagged

16:21

intelligence. So they're going to be

16:22

superhuman in some problems solving

16:24

domains. And then they're going to make

16:26

mistakes that basically no human will

16:27

make. like you know they will insist

16:29

that 9.11 is greater than 9.9 or that

16:32

there are two Rs in strawberry these are

16:34

some famous examples but basically there

16:36

are rough edges that you can trip on so

16:38

that's kind of I think also kind of

16:40

unique um they also kind of suffer from

16:43

entrograde amnesia um so uh and I think

16:46

I'm alluding to the fact that if you

16:48

have a co-orker who joins your

16:49

organization this co-orker will over

16:51

time learn your organization and uh they

16:54

will understand and gain like a huge

16:55

amount of context on the organization

16:57

and they go home and they sleep and they

16:59

consolidate knowledge and they develop

17:01

expertise over time. LLMs don't natively

17:03

do this and this is not something that

17:04

has really been solved in the R&D of

17:06

LLM. I think um and so context windows

17:09

are really kind of like working memory

17:10

and you have to sort of program the

17:12

working memory quite directly because

17:13

they don't just kind of like get smarter

17:15

by uh by default and I think a lot of

17:17

people get tripped up by the analogies

17:19

uh in this way. Uh in popular culture I

17:22

recommend people watch these two movies

17:23

uh Momento and 51st dates. In both of

17:26

these movies, the protagonists, their

17:27

weights are fixed and their context

17:29

windows gets wiped every single morning

17:32

and it's really problematic to go to

17:34

work or have relationships when this

17:35

happens and this happens to all the

17:37

time. I guess one more thing I would

17:39

point to is security kind of related

17:42

limitations of the use of LLM. So for

17:44

example, LLMs are quite gullible. Uh

17:46

they are susceptible to prompt injection

17:48

risks. They might leak your data etc.

17:50

And so um and there's many other

17:52

considerations uh security related. So,

17:55

so basically long story short, you have

17:57

to load your you have to load your you

18:00

have to simultaneously think through

18:01

this superhuman thing that has a bunch

18:03

of cognitive deficits and issues. How do

18:05

we and yet they are extremely like

18:07

useful and so how do we program them and

18:10

how do we work around their deficits and

18:12

enjoy their superhuman powers.

18:15

So what I want to switch to now is talk

18:17

about the opportunities of how do we use

18:18

these models and what are some of the

18:20

biggest opportunities. This is not a

18:22

comprehensive list just some of the

18:23

things that I thought were interesting

18:24

for this talk. The first thing I'm kind

18:26

of excited about is what I would call

18:29

partial autonomy apps. So for example,

18:32

let's work with the example of coding.

18:34

You can certainly go to chacht directly

18:36

and you can start copy pasting code

18:38

around and copyping bug reports and

18:40

stuff around and getting code and copy

18:42

pasting everything around. Why would you

18:44

why would you do that? Why would you go

18:45

directly to the operating system? It

18:47

makes a lot more sense to have an app

18:48

dedicated for this. And so I think many

18:50

of you uh use uh cursor. I do as well.

18:53

And uh cursor is kind of like the thing

18:56

you want instead. You don't want to just

18:57

directly go to the chash apt. And I

18:59

think cursor is a very good example of

19:01

an early LLM app that has a bunch of

19:03

properties that I think are um useful

19:06

across all the LLM apps. So in

19:08

particular, you will notice that we have

19:09

a traditional interface that allows a

19:12

human to go in and do all the work

19:13

manually just as before. But in addition

19:16

to that, we now have this LLM

19:17

integration that allows us to go in

19:19

bigger chunks. And so some of the

19:21

properties of LLM apps that I think are

19:23

shared and useful to point out. Number

19:25

one, the LLMs basically do a ton of the

19:28

context management. Um, number two, they

19:31

orchestrate multiple calls to LLMs,

19:33

right? So in the case of cursor, there's

19:34

under the hood embedding models for all

19:36

your files, the actual chat models,

19:39

models that apply diffs to the code, and

19:41

this is all orchestrated for you. A

19:43

really big one that uh I think also

19:46

maybe not fully appreciated always is

19:48

application specific uh GUI and the

19:50

importance of it. Um because you don't

19:53

just want to talk to the operating

19:54

system directly in text. Text is very

19:56

hard to read, interpret, understand and

19:59

also like you don't want to take some of

20:00

these actions natively in text. So it's

20:03

much better to just see a diff as like

20:05

red and green change and you can see

20:06

what's being added is subtracted. It's

20:08

much easier to just do command Y to

20:10

accept or command N to reject. I

20:11

shouldn't have to type it in text,

20:13

right? So, a guey allows a human to

20:15

audit the work of these fallible systems

20:17

and to go faster. I'm going to come back

20:20

to this point a little bit uh later as

20:21

well. And the last kind of feature I

20:23

want to point out is that there's what I

20:25

call the autonomy slider. So, for

20:27

example, in cursor, you can just do tap

20:29

completion. You're mostly in charge. You

20:31

can select a chunk of code and command K

20:33

to change just that chunk of code. You

20:36

can do command L to change the entire

20:37

file. Or you can do command I which just

20:40

you know let it rip do whatever you want

20:42

in the entire repo and that's the sort

20:44

of full autonomy agent agentic version

20:46

and so you are in charge of the autonomy

20:48

slider and depending on the complexity

20:50

of the task at hand you can uh tune the

20:53

amount of autonomy that you're willing

20:54

to give up uh for that task maybe to

20:57

show one more example of a fairly

20:58

successful LLM app uh perplexity um it

21:03

also has very similar features to what

21:04

I've just pointed out to in cursor uh it

21:07

packages up a lot of the information. It

21:08

orchestrates multiple LLMs. It's got a

21:10

GUI that allows you to audit some of its

21:13

work. So, for example, it will site

21:15

sources and you can imagine inspecting

21:17

them. And it's got an autonomy slider.

21:18

You can either just do a quick search or

21:20

you can do research or you can do deep

21:22

research and come back 10 minutes later.

21:24

So, this is all just varying levels of

21:25

autonomy that you give up to the tool.

21:27

So, I guess my question is I feel like a

21:30

lot of software will become partially

21:32

autonomous. I'm trying to think through

21:33

like what does that look like? And for

21:35

many of you who maintain products and

21:36

services, how are you going to make your

21:38

products and services partially

21:40

autonomous? Can an LLM see everything

21:42

that a human can see? Can an LLM act in

21:45

all the ways that a human could act? And

21:47

can humans supervise and stay in the

21:49

loop of this activity? Because again,

21:50

these are fallible systems that aren't

21:52

yet perfect. And what does a diff look

21:54

like in Photoshop or something like

21:56

that? You know, and also a lot of the

21:58

traditional software right now, it has

22:00

all these switches and all this kind of

22:01

stuff that's all designed for human. All

22:03

of this has to change and become

22:04

accessible to LLMs.

22:07

So, one thing I want to stress with a

22:09

lot of these LLM apps that I'm not sure

22:11

gets as much attention as it should is

22:14

um we we're now kind of like cooperating

22:16

with AIS and usually they are doing the

22:18

generation and we as humans are doing

22:20

the verification. It is in our interest

22:22

to make this loop go as fast as

22:24

possible. So, we're getting a lot of

22:25

work done. There are two major ways that

22:28

I think uh this can be done. Number one,

22:30

you can speed up verification a lot. Um,

22:32

and I think guies, for example, are

22:34

extremely important to this because a

22:36

guey utilizes your computer vision GPU

22:39

in all of our head. Reading text is

22:41

effortful and it's not fun, but looking

22:43

at stuff is fun and it's it's just a

22:45

kind of like a highway to your brain.

22:47

So, I think guies are very useful for

22:49

auditing systems and visual

22:51

representations in general. And number

22:53

two, I would say is we have to keep the

22:56

AI on the leash. We I think a lot of

22:58

people are getting way over excited with

23:00

AI agents and uh it's not useful to me

23:03

to get a diff of 10,000 lines of code to

23:05

my repo. Like I have to I'm still the

23:07

bottleneck, right? Even though that

23:09

10,00 lines come out instantly, I have

23:11

to make sure that this thing is not

23:12

introducing bugs. It's just like and

23:15

that it's doing the correct thing,

23:16

right? And that there's no security

23:17

issues and so on. So um I think that um

23:22

yeah basically you we have to sort of

23:25

like it's in our interest to make the

23:28

the flow of these two go very very fast

23:30

and we have to somehow keep the AI on

23:32

the leash because it gets way too

23:33

overreactive. It's uh it's kind of like

23:35

this. This is how I feel when I do AI

23:37

assisted coding. If I'm just bite coding

23:39

everything is nice and great but if I'm

23:40

actually trying to get work done it's

23:42

not so great to have an overreactive uh

23:44

agent doing all this kind of stuff. So

23:47

this slide is not very good. I'm sorry,

23:48

but I guess I'm trying to develop like

23:51

many of you some ways of utilizing these

23:53

agents in my coding workflow and to do

23:55

AI assisted coding. And in my own work,

23:58

I'm always scared to get way too big

23:59

diffs. I always go in small incremental

24:02

chunks. I want to make sure that

24:04

everything is good. I want to spin this

24:06

loop very very fast and um I sort of

24:09

work on small chunks of single concrete

24:10

thing. Uh and so I think many of you

24:13

probably are developing similar ways of

24:14

working with the with LLMs.

24:17

Um, I also saw a number of blog posts

24:19

that try to develop these best practices

24:22

for working with LLMs. And here's one

24:24

that I read recently and I thought was

24:25

quite good. And it kind of discussed

24:26

some techniques and some of them have to

24:28

do with how you keep the AI on the

24:29

leash. And so, as an example, if you are

24:32

prompting, if your prompt is vague, then

24:34

uh the AI might not do exactly what you

24:36

wanted and in that case, verification

24:38

will fail. You're going to ask for

24:40

something else. If a verification fails,

24:42

then you're going to start spinning. So

24:43

it makes a lot more sense to spend a bit

24:45

more time to be more concrete in your

24:46

prompts which increases the probability

24:48

of successful verification and you can

24:50

move forward. And so I think a lot of us

24:52

are going to end up finding um kind of

24:54

techniques like this. I think in my own

24:56

work as well I'm currently interested in

24:57

uh what education looks like in um

25:00

together with kind of like now that we

25:01

have AI uh and LLMs what does education

25:04

look like? And I think a a large amount

25:07

of thought for me goes into how we keep

25:09

AI on the leash. I don't think it just

25:11

works to go to chat and be like, "Hey,

25:13

teach me physics." I don't think this

25:14

works because the AI is like gets lost

25:16

in the woods. And so for me, this is

25:18

actually two separate apps. For example,

25:20

there's an app for a teacher that

25:22

creates courses and then there's an app

25:24

that takes courses and serves them to

25:26

students. And in both cases, we now have

25:29

this intermediate artifact of a course

25:31

that is auditable and we can make sure

25:32

it's good. We can make sure it's

25:33

consistent. and the AI is kept on the

25:35

leash with respect to a certain

25:37

syllabus, a certain like um progression

25:40

of projects and so on. And so this is

25:42

one way of keeping the AI on leash and I

25:44

think has a much higher likelihood of

25:45

working and the AI is not getting lost

25:47

in the woods.

25:49

One more kind of analogy I wanted to

25:51

sort of allude to is I'm not I'm no

25:54

stranger to partial autonomy and I kind

25:56

of worked on this I think for five years

25:57

at Tesla and this is also a partial

26:00

autonomy product and shares a lot of the

26:01

features like for example right there in

26:03

the instrument panel is the GUI of the

26:05

autopilot so it's showing me what the

26:07

what the neural network sees and so on

26:09

and we have the autonomy slider where

26:10

over the course of my tenure there we

26:13

did more and more autonomous tasks for

26:15

the user and maybe the story that I

26:18

wanted to tell very briefly is uh

26:21

actually the first time I drove a

26:22

self-driving vehicle was in 2013 and I

26:25

had a friend who worked at Whimo and uh

26:27

he offered to give me a drive around

26:29

Palo Alto. I took this picture using

26:31

Google Glass at the time and many of you

26:33

are so young that you might not even

26:35

know what that is. Uh but uh yeah, this

26:37

was like all the rage at the time. And

26:39

we got into this car and we went for

26:40

about a 30-minute drive around Palo Alto

26:42

highways uh streets and so on. And this

26:45

drive was perfect. There was zero

26:46

interventions and this was 2013 which is

26:49

now 12 years ago. And it kind of struck

26:52

me because at the time when I had this

26:54

perfect drive, this perfect demo, I felt

26:56

like, wow, self-driving is imminent

26:59

because this just worked. This is

27:00

incredible. Um, but here we are 12 years

27:03

later and we are still working on

27:04

autonomy. Um, we are still working on

27:07

driving agents and even now we haven't

27:09

actually like really solved the problem.

27:10

like you may see Whimos going around and

27:12

they look driverless but you know

27:14

there's still a lot of teleoperation and

27:16

a lot of human in the loop of a lot of

27:18

this driving so we still haven't even

27:20

like declared success but I think it's

27:22

definitely like going to succeed at this

27:24

point but it just took a long time and

27:26

so I think like like this is software is

27:29

really tricky I think in the same way

27:31

that driving is tricky and so when I see

27:34

things like oh 2025 is the year of

27:36

agents I get very concerned and I kind

27:38

of feel like you know this is the decade

27:41

of agents and this is going to be quite

27:44

some time. We need humans in the loop.

27:45

We need to do this carefully. This is

27:47

software. Let's be serious here. One

27:51

more kind of analogy that I always think

27:52

through is the Iron Man suit. Uh I think

27:56

this is I always love Iron Man. I think

27:58

it's like so um correct in a bunch of

28:01

ways with respect to technology and how

28:02

it will play out. And what I love about

28:04

the Iron Man suit is that it's both an

28:05

augmentation and Tony Stark can drive it

28:08

and it's also an agent. And in some of

28:10

the movies, the Iron Man suit is quite

28:11

autonomous and can fly around and find

28:13

Tony and all this kind of stuff. And so

28:15

this is the autonomy slider is we can be

28:17

we can build augmentations or we can

28:19

build agents and we kind of want to do a

28:21

bit of both. But at this stage I would

28:23

say working with fallible LLMs and so

28:25

on. I would say you know it's less Iron

28:29

Man robots and more Iron Man suits that

28:31

you want to build. It's less like

28:33

building flashy demos of autonomous

28:35

agents and more building partial

28:36

autonomy products. And these products

28:39

have custom gueies and UIUX. And we're

28:41

trying to um and this is done so that

28:43

the generation verification loop of the

28:45

human is very very fast. But we are not

28:48

losing the sight of the fact that it is

28:49

in principle possible to automate this

28:51

work. And there should be an autonomy

28:52

slider in your product. And you should

28:54

be thinking about how you can slide that

28:55

autonomy slider and make your product uh

28:58

sort of um more autonomous over time.

29:01

But this is kind of how I think there's

29:02

lots of opportunities in these kinds of

29:04

products. I want to now switch gears a

29:06

little bit and talk about one other

29:08

dimension that I think is very unique.

29:09

Not only is there a new type of

29:11

programming language that allows for

29:12

autonomy in software but also as I

29:15

mentioned it's programmed in English

29:16

which is this natural interface and

29:19

suddenly everyone is a programmer

29:20

because everyone speaks natural language

29:22

like English. So this is extremely

29:24

bullish and very interesting to me and

29:26

also completely unprecedented. I would

29:28

say it it used to be the case that you

29:29

need to spend five to 10 years studying

29:31

something to be able to do something in

29:32

software. this is not the case anymore.

29:35

So, I don't know if by any chance anyone

29:37

has heard of vibe coding.

29:40

Uh, this this is the tweet that kind of

29:42

like introduced this, but I'm told that

29:44

this is now like a major meme. Um, fun

29:46

story about this is that I've been on

29:49

Twitter for like 15 years or something

29:51

like that at this point and I still have

29:53

no clue which tweet will become viral

29:56

and which tweet like fizzles and no one

29:58

cares. And I thought that this tweet was

30:00

going to be the latter. I don't know. It

30:01

was just like a shower of thoughts. But

30:03

this became like a total meme and I

30:05

really just can't tell. But I guess like

30:06

it struck a chord and it gave a name to

30:08

something that everyone was feeling but

30:10

couldn't quite say in words. So now

30:13

there's a Wikipedia page and everything.

30:17

This is like

30:18

[Applause]

30:25

yeah this is like a major contribution

30:27

now or something like that. So,

30:30

um, so Tom Wolf from HuggingFace shared

30:32

this beautiful video that I really love.

30:34

Um,

30:37

these are kids vibe coding.

30:42

And I find that this is such a wholesome

30:44

video. Like, I love this video. Like,

30:46

how can you look at this video and feel

30:48

bad about the future? The future is

30:49

great.

30:52

I think this will end up being like a

30:53

gateway drug to software development.

30:56

Um, I'm not a doomer about the future of

30:59

the generation and I think yeah, I love

31:02

this video. So, I tried by coding a

31:04

little bit uh as well because it's so

31:07

fun. Uh, so bike coding is so great when

31:09

you want to build something super duper

31:10

custom that doesn't appear to exist and

31:12

you just want to wing it because it's a

31:13

Saturday or something like that. So, I

31:15

built this uh iOS app and I don't I

31:18

can't actually program in Swift, but I

31:20

was really shocked that I was able to

31:21

build like a super basic app and I'm not

31:23

going to explain it. It's really uh

31:24

dumb, but uh I kind of like this was

31:27

just like a day of work and this was

31:28

running on my phone like later that day

31:30

and I was like, "Wow, this is amazing."

31:32

I didn't have to like read through Swift

31:33

for like five days or something like

31:35

that to like get started. I also

31:38

vipcoded this app called Menu Genen. And

31:40

this is live. You can try it in

31:41

menu.app. And I basically had this

31:44

problem where I show up at a restaurant,

31:45

I read through the menu, and I have no

31:46

idea what any of the things are. And I

31:48

need pictures. So this doesn't exist. So

31:51

I was like, "Hey, I'm going to bite code

31:52

it." So, um, this is what it looks like.

31:55

You go to menu.app,

31:58

um, and, uh, you take a picture of a of

32:01

a menu and then menu generates the

32:03

images and everyone gets $5 in credits

32:06

for free when you sign up. And

32:08

therefore, this is a major cost center

32:10

in my life. So, this is a negative

32:13

negative uh, revenue app for me right

32:16

now.

32:17

I've lost a huge amount of money on

32:19

menu.

32:21

Okay. But the fascinating thing about

32:23

menu genen for me is that the code of

32:28

the v the vite coding part the code was

32:30

actually the easy part of v of v coding

32:32

menu and most of it actually was when I

32:35

tried to make it real so that you can

32:36

actually have authentication and

32:37

payments and the domain name and averal

32:39

deployment. This was really hard and all

32:41

of this was not code. All of this devops

32:44

stuff was in me in the browser clicking

32:47

stuff and this was extreme slo and took

32:49

another week. So it was really

32:51

fascinating that I had the menu genen um

32:54

basically demo working on my laptop in a

32:57

few hours and then it took me a week

32:59

because I was trying to make it real and

33:01

the reason for this is this was just

33:02

really annoying. Um, so for example, if

33:05

you try to add Google login to your web

33:07

page, I know this is very small, but

33:09

just a huge amount of instructions of

33:11

this clerk library telling me how to

33:13

integrate this. And this is crazy. Like

33:15

it's telling me go to this URL, click on

33:17

this dropdown, choose this, go to this,

33:19

and click on that. And it's like telling

33:21

me what to do. Like a computer is

33:22

telling me the actions I should be

33:24

taking. Like you do it. Why am I doing

33:26

this?

33:28

What the hell?

33:31

I had to follow all these instructions.

33:33

This was crazy. So I think the last part

33:36

of my talk therefore focuses on can we

33:39

just build for agents? I don't want to

33:41

do this work. Can agents do this? Thank

33:44

you.

33:46

Okay. So roughly speaking, I think

33:48

there's a new category of consumer and

33:50

manipulator of digital information. It

33:53

used to be just humans through GUIs or

33:55

computers through APIs. And now we have

33:57

a completely new thing and agents are

34:00

they're computers but they are humanlike

34:02

kind of right they're people spirits

34:04

there's people spirits on the internet

34:05

and they need to interact with our

34:06

software infrastructure like can we

34:08

build for them it's a new thing so as an

34:10

example you can have robots.txt on your

34:12

domain and you can instruct uh or like

34:15

advise I suppose um uh web crawlers on

34:18

how to behave on your website in the

34:19

same way you can have maybe lm.txt txt

34:21

file which is just a simple markdown

34:23

that's telling LLMs what this domain is

34:25

about and this is very readable to a to

34:28

an LLM. If it had to instead get the

34:30

HTML of your web page and try to parse

34:32

it, this is very errorprone and

34:33

difficult and will screw it up and it's

34:35

not going to work. So we can just

34:36

directly speak to the LLM. It's worth

34:38

it. Um a huge amount of documentation is

34:41

currently written for people. So you

34:42

will see things like lists and bold and

34:45

pictures and this is not directly

34:47

accessible by an LLM. So I see some of

34:51

the services now are transitioning a lot

34:52

of the their docs to be specifically for

34:54

LLMs. So Versell and Stripe as an

34:57

example are early movers here but there

34:59

are a few more that I've seen already

35:01

and they offer their documentation in

35:04

markdown. Markdown is super easy for LMS

35:06

to understand. This is great. Um maybe

35:10

one simple example from from uh my

35:12

experience as well. Maybe some of you

35:14

know three blue one brown. He makes

35:15

beautiful animation videos on YouTube.

35:19

[Applause]

35:23

Yeah, I love this library. So that he

35:25

wrote uh Manon and I wanted to make my

35:27

own and uh there's extensive

35:30

documentations on how to use manon and

35:32

so I didn't want to actually read

35:34

through it. So I copy pasted the whole

35:35

thing to an LLM and I described what I

35:37

wanted and it just worked out of the box

35:39

like LLM just bcoded me an animation

35:41

exactly what I wanted and I was like wow

35:43

this is amazing. So if we can make docs

35:45

legible to LLMs, it's going to unlock a

35:48

huge amount of um kind of use and um I

35:51

think this is wonderful and should

35:52

should happen more. The other thing I

35:55

wanted to point out is that you do

35:56

unfortunately have to it's not just

35:57

about taking your docs and making them

35:58

appear in markdown. That's the easy

36:00

part. We actually have to change the

36:01

docs because anytime your docs say click

36:04

this is bad. An LLM will not be able to

36:06

natively take this action right now. So,

36:09

Verscell, for example, is replacing

36:11

every occurrence of click with an

36:13

equivalent curl command that your LM

36:15

agent could take on your behalf. Um, and

36:18

so I think this is very interesting. And

36:19

then, of course, there's a model context

36:21

protocol from Enthropic. And this is

36:23

also another way, it's a protocol of

36:24

speaking directly to agents as this new

36:26

consumer and manipulator of digital

36:28

information. So, I'm very bullish on

36:29

these ideas. The other thing I really

36:31

like is a number of little tools here

36:33

and there that are helping ingest data

36:36

that in like very LLM friendly formats.

36:38

So for example, when I go to a GitHub

36:40

repo like my nanoGPT repo, I can't feed

36:42

this to an LLM and ask questions about

36:44

it uh because it's you know this is a

36:46

human interface on GitHub. So when you

36:48

just change the URL from GitHub to get

36:50

ingest then uh this will actually

36:52

concatenate all the files into a single

36:54

giant text and it will create a

36:55

directory structure etc. And this is

36:57

ready to be copy pasted into your

36:59

favorite LLM and you can do stuff. Maybe

37:01

even more dramatic example of this is

37:03

deep wiki where it's not just the raw

37:05

content of these files. uh this is from

37:08

Devon but also like they have Devon

37:10

basically do analysis of the GitHub repo

37:12

and Devon basically builds up a whole

37:14

docs uh pages just for your repo and you

37:18

can imagine that this is even more

37:19

helpful to copy paste into your LLM. So

37:22

I love all the little tools that

37:23

basically where you just change the URL

37:24

and it makes something accessible to an

37:26

LLM. So this is all well and great and u

37:29

I think there should be a lot more of

37:30

it. One more note I wanted to make is

37:32

that it is absolutely possible that in

37:35

the future LLMs will be able to this is

37:38

not even future this is today they'll be

37:39

able to go around and they'll be able to

37:40

click stuff and so on but I still think

37:42

it's very worth u basically meeting LLM

37:46

halfway LLM's halfway and making it

37:48

easier for them to access all this

37:49

information uh because this is still

37:51

fairly expensive I would say to use and

37:54

uh a lot more difficult and so I do

37:56

think that lots of software there will

37:58

be a long tail where it won't like adapt

38:00

apps because these are not like live

38:02

player sort of repositories or digital

38:04

infrastructure and we will need these

38:06

tools. Uh but I think for everyone else

38:08

I think it's very worth kind of like

38:09

meeting in some middle point. So I'm

38:11

bullish on both if that makes sense.

38:14

So in summary, what an amazing time to

38:17

get into the industry. We need to

38:18

rewrite a ton of code. A ton of code

38:20

will be written by professionals and by

38:23

coders. These LLMs are kind of like

38:25

utilities, kind of like fabs, but

38:27

they're kind of especially like

38:28

operating systems. But it's so early.

38:30

It's like 1960s of operating systems and

38:34

uh and I think a lot of the analogies

38:36

cross over. Um and these LMS are kind of

38:38

like these fallible uh you know people

38:41

spirits that we have to learn to work

38:43

with. And in order to do that properly,

38:45

we need to adjust our infrastructure

38:47

towards it. So when you're building

38:48

these LLM apps, I describe some of the

38:50

ways of working effectively with these

38:52

LLMs and some of the tools that make

38:54

that uh kind of possible and how you can

38:57

spin this loop very very quickly and

38:59

basically create partial tunneling

39:00

products and then um yeah, a lot of code

39:03

has to also be written for the agents

39:04

more directly. But in any case, going

39:07

back to the Iron Man suit analogy, I

39:09

think what we'll see over the next

39:10

decade roughly is we're going to take

39:12

the slider from left to right. And I'm

39:15

very interesting. It's going to be very

39:17

interesting to see what that looks like.

39:19

And I can't wait to build it with all of

39:21

you. Thank you.

More from Y Combinator

Trending Transcripts