1
00:00:01,120 --> 00:00:07,169
Please welcome former director of AI

2
00:00:04,000 --> 00:00:11,440
Tesla Andre Carpathy.

3
00:00:07,169 --> 00:00:14,439
[Music]

4
00:00:11,439 --> 00:00:14,439
Hello.

5
00:00:14,769 --> 00:00:17,850
[Music]

6
00:00:19,039 --> 00:00:24,800
Wow, a lot of people here. Hello.

7
00:00:22,800 --> 00:00:27,199
Um, okay. Yeah. So I'm excited to be

8
00:00:24,800 --> 00:00:30,560
here today to talk to you about software

9
00:00:27,199 --> 00:00:32,559
in the era of AI. And I'm told that many

10
00:00:30,559 --> 00:00:34,399
of you are students like bachelors,

11
00:00:32,558 --> 00:00:36,399
masters, PhD and so on. And you're about

12
00:00:34,399 --> 00:00:37,759
to enter the industry. And I think it's

13
00:00:36,399 --> 00:00:38,960
actually like an extremely unique and

14
00:00:37,759 --> 00:00:41,359
very interesting time to enter the

15
00:00:38,960 --> 00:00:43,039
industry right now. And I think

16
00:00:41,359 --> 00:00:47,600
fundamentally the reason for that is

17
00:00:43,039 --> 00:00:49,920
that um software is changing uh again.

18
00:00:47,600 --> 00:00:52,558
And I say again because I actually gave

19
00:00:49,920 --> 00:00:54,079
this talk already. Um but the problem is

20
00:00:52,558 --> 00:00:55,198
that software keeps changing. So I

21
00:00:54,079 --> 00:00:56,719
actually have a lot of material to

22
00:00:55,198 --> 00:00:58,159
create new talks and I think it's

23
00:00:56,719 --> 00:01:00,320
changing quite fundamentally. I think

24
00:00:58,159 --> 00:01:02,000
roughly speaking software has not

25
00:01:00,320 --> 00:01:04,558
changed much on such a fundamental level

26
00:01:02,000 --> 00:01:06,879
for 70 years. And then it's changed I

27
00:01:04,558 --> 00:01:08,560
think about twice quite rapidly in the

28
00:01:06,879 --> 00:01:09,839
last few years. And so there's just a

29
00:01:08,560 --> 00:01:12,320
huge amount of work to do a huge amount

30
00:01:09,840 --> 00:01:14,159
of software to write and rewrite. So

31
00:01:12,319 --> 00:01:16,079
let's take a look at maybe the realm of

32
00:01:14,159 --> 00:01:17,759
software. So if we kind of think of this

33
00:01:16,079 --> 00:01:20,000
as like the map of software this is a

34
00:01:17,759 --> 00:01:21,920
really cool tool called map of GitHub.

35
00:01:20,000 --> 00:01:23,359
Um this is kind of like all the software

36
00:01:21,920 --> 00:01:24,640
that's written. Uh these are

37
00:01:23,359 --> 00:01:26,400
instructions to the computer for

38
00:01:24,640 --> 00:01:28,000
carrying out tasks in the digital space.

39
00:01:26,400 --> 00:01:30,080
So if you zoom in here, these are all

40
00:01:28,000 --> 00:01:31,680
different kinds of repositories and this

41
00:01:30,079 --> 00:01:33,599
is all the code that has been written.

42
00:01:31,680 --> 00:01:35,840
And a few years ago I kind of observed

43
00:01:33,599 --> 00:01:37,759
that um software was kind of changing

44
00:01:35,840 --> 00:01:39,680
and there was kind of like a new type of

45
00:01:37,759 --> 00:01:42,319
software around and I called this

46
00:01:39,680 --> 00:01:44,640
software 2.0 at the time and the idea

47
00:01:42,319 --> 00:01:46,798
here was that software 1.0 is the code

48
00:01:44,640 --> 00:01:48,799
you write for the computer. Software 2.0

49
00:01:46,799 --> 00:01:50,320
know are basically neural networks and

50
00:01:48,799 --> 00:01:53,280
in particular the weights of a neural

51
00:01:50,319 --> 00:01:55,438
network and you're not writing this code

52
00:01:53,280 --> 00:01:56,879
directly you are most you are more kind

53
00:01:55,438 --> 00:01:58,398
of like tuning the data sets and then

54
00:01:56,879 --> 00:02:00,879
you're running an optimizer to create to

55
00:01:58,399 --> 00:02:02,560
create the parameters of this neural net

56
00:02:00,879 --> 00:02:03,599
and I think like at the time neural nets

57
00:02:02,560 --> 00:02:04,799
were kind of seen as like just a

58
00:02:03,599 --> 00:02:06,239
different kind of classifier like a

59
00:02:04,799 --> 00:02:09,039
decision tree or something like that and

60
00:02:06,239 --> 00:02:10,239
so I think it was kind of like um I

61
00:02:09,038 --> 00:02:12,238
think this framing was a lot more

62
00:02:10,239 --> 00:02:13,520
appropriate and now actually what we

63
00:02:12,239 --> 00:02:15,759
have is kind of like an equivalent of

64
00:02:13,520 --> 00:02:18,080
GitHub in the realm of software 2.0 And

65
00:02:15,759 --> 00:02:20,719
I think the hugging face is basically

66
00:02:18,080 --> 00:02:22,400
equivalent of GitHub in software 2.0.

67
00:02:20,719 --> 00:02:24,239
And there's also model atlas and you can

68
00:02:22,400 --> 00:02:25,439
visualize all the code written there. In

69
00:02:24,239 --> 00:02:28,319
case you're curious, by the way, the

70
00:02:25,439 --> 00:02:30,878
giant circle, the point in the middle,

71
00:02:28,318 --> 00:02:32,878
uh these are the parameters of flux, the

72
00:02:30,878 --> 00:02:34,959
image generator. And so anytime someone

73
00:02:32,878 --> 00:02:37,120
tunes a on top of a flux model, you

74
00:02:34,959 --> 00:02:39,120
basically create a git commit uh in this

75
00:02:37,120 --> 00:02:41,599
space and uh you create a different kind

76
00:02:39,120 --> 00:02:43,599
of a image generator. So basically what

77
00:02:41,598 --> 00:02:45,919
we have is software 1.0 is the computer

78
00:02:43,598 --> 00:02:48,719
code that programs a computer. Software

79
00:02:45,919 --> 00:02:50,719
2.0 are the weights which program neural

80
00:02:48,719 --> 00:02:53,519
networks. Uh and here's an example of

81
00:02:50,719 --> 00:02:55,039
Alexet image recognizer neural network.

82
00:02:53,519 --> 00:02:56,400
Now so far all of the neural networks

83
00:02:55,039 --> 00:02:58,159
that we've been familiar with until

84
00:02:56,400 --> 00:03:01,680
recently where kind of like fixed

85
00:02:58,159 --> 00:03:03,439
function computers image to categories

86
00:03:01,680 --> 00:03:05,200
or something like that. And I think

87
00:03:03,439 --> 00:03:06,719
what's changed and I think is a quite

88
00:03:05,199 --> 00:03:09,598
fundamental change is that neural

89
00:03:06,719 --> 00:03:12,158
networks became programmable with large

90
00:03:09,598 --> 00:03:14,959
language models. And so I I see this as

91
00:03:12,158 --> 00:03:18,000
quite new, unique. It's a new kind of a

92
00:03:14,959 --> 00:03:19,598
computer and uh so in my mind it's uh

93
00:03:18,000 --> 00:03:22,158
worth giving it a new designation of

94
00:03:19,598 --> 00:03:25,679
software 3.0. And basically your prompts

95
00:03:22,158 --> 00:03:28,318
are now programs that program the LLM.

96
00:03:25,680 --> 00:03:30,400
And uh remarkably uh these uh prompts

97
00:03:28,318 --> 00:03:33,598
are written in English. So it's kind of

98
00:03:30,400 --> 00:03:36,799
a very interesting programming language.

99
00:03:33,598 --> 00:03:37,919
Um so maybe uh to summarize the

100
00:03:36,799 --> 00:03:39,439
difference if you're doing sentiment

101
00:03:37,919 --> 00:03:42,479
classification for example you can

102
00:03:39,439 --> 00:03:44,239
imagine writing some uh amount of Python

103
00:03:42,479 --> 00:03:46,000
to to basically do sentiment

104
00:03:44,239 --> 00:03:47,840
classification or you can train a neural

105
00:03:46,000 --> 00:03:50,000
net or you can prompt a large language

106
00:03:47,840 --> 00:03:51,280
model. Uh so here this is a few short

107
00:03:50,000 --> 00:03:52,799
prompt and you can imagine changing it

108
00:03:51,280 --> 00:03:54,640
and programming the computer in a

109
00:03:52,799 --> 00:03:57,599
slightly different way. So basically we

110
00:03:54,639 --> 00:03:59,679
have software 1.0 software 2.0 and I

111
00:03:57,598 --> 00:04:01,919
think we're seeing maybe you've seen a

112
00:03:59,680 --> 00:04:03,519
lot of GitHub code is not just like code

113
00:04:01,919 --> 00:04:05,438
anymore. there's a bunch of like English

114
00:04:03,519 --> 00:04:07,360
interspersed with code and so I think

115
00:04:05,438 --> 00:04:09,199
kind of there's a growing category of

116
00:04:07,360 --> 00:04:10,879
new kind of code. So not only is it a

117
00:04:09,199 --> 00:04:12,719
new programming paradigm, it's also

118
00:04:10,878 --> 00:04:14,878
remarkable to me that it's in our native

119
00:04:12,719 --> 00:04:17,918
language of English. And so when this

120
00:04:14,878 --> 00:04:20,879
blew my mind a few uh I guess years ago

121
00:04:17,918 --> 00:04:21,918
now I tweeted this and um I think it

122
00:04:20,879 --> 00:04:23,199
captured the attention of a lot of

123
00:04:21,918 --> 00:04:25,359
people and this is my currently pinned

124
00:04:23,199 --> 00:04:28,160
tweet uh is that remarkably we're now

125
00:04:25,360 --> 00:04:31,600
programming computers in English. Now,

126
00:04:28,160 --> 00:04:34,960
when I was at uh Tesla, um we were

127
00:04:31,600 --> 00:04:37,439
working on the uh autopilot and uh we

128
00:04:34,959 --> 00:04:39,918
were trying to get the car to drive and

129
00:04:37,439 --> 00:04:41,680
I sort of showed this slide at the time

130
00:04:39,918 --> 00:04:43,198
where you can imagine that the inputs to

131
00:04:41,680 --> 00:04:44,639
the car are on the bottom and they're

132
00:04:43,199 --> 00:04:47,040
going through a software stack to

133
00:04:44,639 --> 00:04:48,560
produce the steering and acceleration

134
00:04:47,040 --> 00:04:51,120
and I made the observation at the time

135
00:04:48,560 --> 00:04:52,720
that there was a ton of C++ code around

136
00:04:51,120 --> 00:04:54,478
in the autopilot which was the software

137
00:04:52,720 --> 00:04:56,960
1.0 code and then there was some neural

138
00:04:54,478 --> 00:04:58,800
nets in there doing image recognition

139
00:04:56,959 --> 00:05:00,879
and uh I kind of observed that over time

140
00:04:58,800 --> 00:05:02,720
as we made the autopilot better

141
00:05:00,879 --> 00:05:05,839
basically the neural network grew in

142
00:05:02,720 --> 00:05:08,560
capability and size and in addition to

143
00:05:05,839 --> 00:05:12,079
that all the C++ code was being deleted

144
00:05:08,560 --> 00:05:14,560
and kind of like was um and a lot of the

145
00:05:12,079 --> 00:05:16,478
kind of capabilities and functionality

146
00:05:14,560 --> 00:05:19,038
that was originally written in 1.0 was

147
00:05:16,478 --> 00:05:20,719
migrated to 2.0. So as an example, a lot

148
00:05:19,038 --> 00:05:22,639
of the stitching up of information

149
00:05:20,720 --> 00:05:24,960
across images from the different cameras

150
00:05:22,639 --> 00:05:26,478
and across time was done by a neural

151
00:05:24,959 --> 00:05:29,839
network and we were able to delete a lot

152
00:05:26,478 --> 00:05:32,560
of code and so the software 2.0 stack

153
00:05:29,839 --> 00:05:34,159
quite literally ate through the software

154
00:05:32,560 --> 00:05:35,680
stack of the autopilot. So I thought

155
00:05:34,160 --> 00:05:37,039
this was really remarkable at the time

156
00:05:35,680 --> 00:05:39,360
and I think we're seeing the same thing

157
00:05:37,038 --> 00:05:40,800
again where uh basically we have a new

158
00:05:39,360 --> 00:05:42,479
kind of software and it's eating through

159
00:05:40,800 --> 00:05:44,400
the stack. We have three completely

160
00:05:42,478 --> 00:05:45,599
different programming paradigms and I

161
00:05:44,399 --> 00:05:47,359
think if you're entering the industry

162
00:05:45,600 --> 00:05:49,360
it's a very good idea to be fluent in

163
00:05:47,360 --> 00:05:50,800
all of them because they all have slight

164
00:05:49,360 --> 00:05:53,120
pros and cons and you may want to

165
00:05:50,800 --> 00:05:54,400
program some functionality in 1.0 or 2.0

166
00:05:53,120 --> 00:05:55,600
or 3.0. Are you going to train

167
00:05:54,399 --> 00:05:57,439
neurallet? Are you going to just prompt

168
00:05:55,600 --> 00:05:59,360
an LLM? Should this be a piece of code

169
00:05:57,439 --> 00:06:00,560
that's explicit etc. So we all have to

170
00:05:59,360 --> 00:06:03,520
make these decisions and actually

171
00:06:00,560 --> 00:06:06,800
potentially uh fluidly trans transition

172
00:06:03,519 --> 00:06:09,758
between these paradigms. So what I

173
00:06:06,800 --> 00:06:11,759
wanted to get into now is first I want

174
00:06:09,759 --> 00:06:13,520
to in the first part talk about LLMs and

175
00:06:11,759 --> 00:06:15,120
how to kind of like think of this new

176
00:06:13,519 --> 00:06:17,439
paradigm and the ecosystem and what that

177
00:06:15,120 --> 00:06:18,720
looks like. Uh like what are what is

178
00:06:17,439 --> 00:06:20,240
this new computer? What does it look

179
00:06:18,720 --> 00:06:23,759
like and what does the ecosystem look

180
00:06:20,240 --> 00:06:25,759
like? Um I was struck by this quote from

181
00:06:23,759 --> 00:06:27,520
Anduring actually uh many years ago now

182
00:06:25,759 --> 00:06:29,439
I think and I think Andrew is going to

183
00:06:27,519 --> 00:06:30,639
be speaking right after me. Uh but he

184
00:06:29,439 --> 00:06:33,360
said at the time AI is the new

185
00:06:30,639 --> 00:06:34,639
electricity and I do think that it um

186
00:06:33,360 --> 00:06:36,720
kind of captures something very

187
00:06:34,639 --> 00:06:38,960
interesting in that LLMs certainly feel

188
00:06:36,720 --> 00:06:41,600
like they have properties of utilities

189
00:06:38,959 --> 00:06:44,239
right now. So

190
00:06:41,600 --> 00:06:47,120
um LLM labs like OpenAI, Gemini,

191
00:06:44,240 --> 00:06:48,879
Enthropic etc. They spend capex to train

192
00:06:47,120 --> 00:06:51,120
the LLMs and this is kind of equivalent

193
00:06:48,879 --> 00:06:53,038
to building out a grid and then there's

194
00:06:51,120 --> 00:06:56,399
opex to serve that intelligence over

195
00:06:53,038 --> 00:06:58,639
APIs to all of us and this is done

196
00:06:56,399 --> 00:07:00,399
through metered access where we pay per

197
00:06:58,639 --> 00:07:01,918
million tokens or something like that

198
00:07:00,399 --> 00:07:03,918
and we have a lot of demands that are

199
00:07:01,918 --> 00:07:06,240
very utility- like demands out of this

200
00:07:03,918 --> 00:07:08,959
API we demand low latency high uptime

201
00:07:06,240 --> 00:07:10,800
consistent quality etc. In electricity,

202
00:07:08,959 --> 00:07:12,399
you would have a transfer switch. So you

203
00:07:10,800 --> 00:07:14,400
can transfer your electricity source

204
00:07:12,399 --> 00:07:16,799
from like grid and solar or battery or

205
00:07:14,399 --> 00:07:18,560
generator. In LLM, we have maybe open

206
00:07:16,800 --> 00:07:20,639
router and easily switch between the

207
00:07:18,560 --> 00:07:23,038
different types of LLMs that exist.

208
00:07:20,639 --> 00:07:25,038
Because the LLM are software, they don't

209
00:07:23,038 --> 00:07:26,719
compete for physical space. So it's okay

210
00:07:25,038 --> 00:07:28,159
to have basically like six electricity

211
00:07:26,720 --> 00:07:29,840
providers and you can switch between

212
00:07:28,160 --> 00:07:31,919
them, right? Because they don't compete

213
00:07:29,839 --> 00:07:33,679
in such a direct way. And I think what's

214
00:07:31,918 --> 00:07:36,478
also a little fascinating and we saw

215
00:07:33,680 --> 00:07:38,800
this in the last few days actually a lot

216
00:07:36,478 --> 00:07:41,120
of the LLMs went down and people were

217
00:07:38,800 --> 00:07:42,478
kind of like stuck and unable to work.

218
00:07:41,120 --> 00:07:43,759
And uh I think it's kind of fascinating

219
00:07:42,478 --> 00:07:45,758
to me that when the state-of-the-art

220
00:07:43,759 --> 00:07:47,759
LLMs go down, it's actually kind of like

221
00:07:45,759 --> 00:07:49,360
an intelligence brownout in the world.

222
00:07:47,759 --> 00:07:52,080
It's kind of like when the voltage is

223
00:07:49,360 --> 00:07:55,120
unreliable in the grid and uh the planet

224
00:07:52,079 --> 00:07:56,719
just gets dumber the more reliance we

225
00:07:55,120 --> 00:07:58,399
have on these models, which already is

226
00:07:56,720 --> 00:08:00,800
like really dramatic and I think will

227
00:07:58,399 --> 00:08:02,239
continue to grow. But LLM's don't only

228
00:08:00,800 --> 00:08:03,520
have properties of utilities. I think

229
00:08:02,240 --> 00:08:06,478
it's also fair to say that they have

230
00:08:03,519 --> 00:08:09,519
some properties of fabs. And the reason

231
00:08:06,478 --> 00:08:12,240
for this is that the capex required for

232
00:08:09,519 --> 00:08:14,318
building LLM is actually quite large. Uh

233
00:08:12,240 --> 00:08:15,918
it's not just like building some uh

234
00:08:14,319 --> 00:08:17,598
power station or something like that,

235
00:08:15,918 --> 00:08:20,000
right? You're investing a huge amount of

236
00:08:17,598 --> 00:08:22,478
money and I think the tech tree and uh

237
00:08:20,000 --> 00:08:24,399
for the technology is growing quite

238
00:08:22,478 --> 00:08:26,959
rapidly. So we're in a world where we

239
00:08:24,399 --> 00:08:28,959
have sort of deep tech trees, research

240
00:08:26,959 --> 00:08:32,399
and development secrets that are

241
00:08:28,959 --> 00:08:34,240
centralizing inside the LLM labs. Um and

242
00:08:32,399 --> 00:08:36,240
but I think the analogy muddies a little

243
00:08:34,240 --> 00:08:38,158
bit also because as I mentioned this is

244
00:08:36,240 --> 00:08:40,959
software and software is a bit less

245
00:08:38,158 --> 00:08:43,038
defensible because it is so malleable.

246
00:08:40,958 --> 00:08:44,319
And so um I think it's just an

247
00:08:43,038 --> 00:08:46,639
interesting kind of thing to think about

248
00:08:44,320 --> 00:08:48,160
potentially. There's many analogy

249
00:08:46,639 --> 00:08:49,600
analogies you can make like a 4

250
00:08:48,159 --> 00:08:51,039
nanometer process node maybe is

251
00:08:49,600 --> 00:08:53,040
something like a cluster with certain

252
00:08:51,039 --> 00:08:54,799
max flops. You can think about when

253
00:08:53,039 --> 00:08:56,079
you're use when you're using Nvidia GPUs

254
00:08:54,799 --> 00:08:57,120
and you're only doing the software and

255
00:08:56,080 --> 00:08:59,120
you're not doing the hardware. That's

256
00:08:57,120 --> 00:09:00,320
kind of like the fabless model. But if

257
00:08:59,120 --> 00:09:02,000
you're actually also building your own

258
00:09:00,320 --> 00:09:03,278
hardware and you're training on TPUs if

259
00:09:02,000 --> 00:09:05,200
you're Google, that's kind of like the

260
00:09:03,278 --> 00:09:06,399
Intel model where you own your fab. So I

261
00:09:05,200 --> 00:09:08,240
think there's some analogies here that

262
00:09:06,399 --> 00:09:09,759
make sense. But actually I think the

263
00:09:08,240 --> 00:09:12,480
analogy that makes the most sense

264
00:09:09,759 --> 00:09:15,278
perhaps is that in my mind LLM have very

265
00:09:12,480 --> 00:09:17,759
strong kind of analogies to operating

266
00:09:15,278 --> 00:09:19,519
systems. Uh in that this is not just

267
00:09:17,759 --> 00:09:20,958
electricity or water. It's not something

268
00:09:19,519 --> 00:09:22,959
that comes out of the tap as a

269
00:09:20,958 --> 00:09:25,919
commodity. uh this is these are now

270
00:09:22,958 --> 00:09:28,719
increasingly complex software ecosystems

271
00:09:25,919 --> 00:09:30,879
right so uh they're not just like simple

272
00:09:28,720 --> 00:09:32,000
commodities like electricity and it's

273
00:09:30,879 --> 00:09:33,919
kind of interesting to me that the

274
00:09:32,000 --> 00:09:36,159
ecosystem is shaping in a very similar

275
00:09:33,919 --> 00:09:38,559
kind of way where you have a few closed

276
00:09:36,159 --> 00:09:39,838
source providers like Windows or Mac OS

277
00:09:38,559 --> 00:09:42,719
and then you have an open source

278
00:09:39,839 --> 00:09:45,519
alternative like Linux and I think for u

279
00:09:42,720 --> 00:09:47,519
neural for LLMs as well we have a kind

280
00:09:45,519 --> 00:09:49,200
of a few competing closed source

281
00:09:47,519 --> 00:09:51,440
providers and then maybe the llama

282
00:09:49,200 --> 00:09:53,120
ecosystem is currently like maybe a

283
00:09:51,440 --> 00:09:55,120
close approximation to something that

284
00:09:53,120 --> 00:09:56,480
may grow into something like Linux.

285
00:09:55,120 --> 00:09:58,159
Again, I think it's still very early

286
00:09:56,480 --> 00:09:59,600
because these are just simple LLMs, but

287
00:09:58,159 --> 00:10:01,120
we're starting to see that these are

288
00:09:59,600 --> 00:10:02,800
going to get a lot more complicated.

289
00:10:01,120 --> 00:10:03,919
It's not just about the LLM itself. It's

290
00:10:02,799 --> 00:10:05,519
about all the tool use and the

291
00:10:03,919 --> 00:10:07,278
multiodalities and how all of that

292
00:10:05,519 --> 00:10:09,360
works. And so when I sort of had this

293
00:10:07,278 --> 00:10:11,200
realization a while back, I tried to

294
00:10:09,360 --> 00:10:12,800
sketch it out and it kind of seemed to

295
00:10:11,200 --> 00:10:15,839
me like LLMs are kind of like a new

296
00:10:12,799 --> 00:10:17,599
operating system, right? So the LLM is a

297
00:10:15,839 --> 00:10:19,760
new kind of a computer. It's sitting

298
00:10:17,600 --> 00:10:21,519
it's kind of like the CPU equivalent. uh

299
00:10:19,759 --> 00:10:24,399
the context windows are kind of like the

300
00:10:21,519 --> 00:10:26,639
memory and then the LLM is orchestrating

301
00:10:24,399 --> 00:10:29,839
memory and compute uh for problem

302
00:10:26,639 --> 00:10:32,639
solving um using all of these uh

303
00:10:29,839 --> 00:10:34,320
capabilities here and so definitely if

304
00:10:32,639 --> 00:10:36,480
you look at it looks very much like

305
00:10:34,320 --> 00:10:38,879
operating system from that perspective.

306
00:10:36,480 --> 00:10:41,200
Um, a few more analogies. For example,

307
00:10:38,879 --> 00:10:43,679
if you want to download an app, say I go

308
00:10:41,200 --> 00:10:46,240
to VS Code and I go to download, you can

309
00:10:43,679 --> 00:10:50,159
download VS Code and you can run it on

310
00:10:46,240 --> 00:10:53,120
Windows, Linux or or Mac in the same way

311
00:10:50,159 --> 00:10:55,519
as you can take an LLM app like cursor

312
00:10:53,120 --> 00:10:57,440
and you can run it on GPT or cloud or

313
00:10:55,519 --> 00:10:59,039
Gemini series, right? It's just a drop

314
00:10:57,440 --> 00:11:00,720
down. So, it's kind of like similar in

315
00:10:59,039 --> 00:11:02,399
that way as well.

316
00:11:00,720 --> 00:11:04,320
uh more analogies that I think strike me

317
00:11:02,399 --> 00:11:05,919
is that we're kind of like in this

318
00:11:04,320 --> 00:11:09,040
1960sish

319
00:11:05,919 --> 00:11:10,719
era where LLM compute is still very

320
00:11:09,039 --> 00:11:13,439
expensive for this new kind of a

321
00:11:10,720 --> 00:11:15,839
computer and that forces the LLMs to be

322
00:11:13,440 --> 00:11:18,399
centralized in the cloud and we're all

323
00:11:15,839 --> 00:11:20,320
just uh sort of thing clients that

324
00:11:18,399 --> 00:11:22,078
interact with it over the network and

325
00:11:20,320 --> 00:11:24,160
none of us have full utilization of

326
00:11:22,078 --> 00:11:26,399
these computers and therefore it makes

327
00:11:24,159 --> 00:11:28,319
sense to use time sharing where we're

328
00:11:26,399 --> 00:11:30,000
all just you know a dimension of the

329
00:11:28,320 --> 00:11:32,000
batch when they're running the computer

330
00:11:30,000 --> 00:11:33,440
in the cloud. And this is very much what

331
00:11:32,000 --> 00:11:35,039
computers used to look like at during

332
00:11:33,440 --> 00:11:36,160
this time. The operating systems were in

333
00:11:35,039 --> 00:11:39,599
the cloud. Everything was streamed

334
00:11:36,159 --> 00:11:41,519
around and there was batching. And so

335
00:11:39,600 --> 00:11:42,959
the p the personal computing revolution

336
00:11:41,519 --> 00:11:44,560
hasn't happened yet because it's just

337
00:11:42,958 --> 00:11:46,719
not economical. It doesn't make sense.

338
00:11:44,559 --> 00:11:48,399
But I think some people are trying. And

339
00:11:46,720 --> 00:11:50,399
it turns out that Mac minis, for

340
00:11:48,399 --> 00:11:52,320
example, are a very good fit for some of

341
00:11:50,399 --> 00:11:53,839
the LLMs because it's all if you're

342
00:11:52,320 --> 00:11:55,360
doing batch one inference, this is all

343
00:11:53,839 --> 00:11:56,880
super memory bound. So this actually

344
00:11:55,360 --> 00:11:58,720
works.

345
00:11:56,879 --> 00:12:00,399
And uh I think these are some early

346
00:11:58,720 --> 00:12:02,079
indications maybe of personal computing.

347
00:12:00,399 --> 00:12:03,519
Uh but this hasn't really happened yet.

348
00:12:02,078 --> 00:12:05,199
It's not clear what this looks like.

349
00:12:03,519 --> 00:12:08,078
Maybe some of you get to invent what

350
00:12:05,200 --> 00:12:10,320
what this is or how it works or uh what

351
00:12:08,078 --> 00:12:12,159
this should what this should be. Maybe

352
00:12:10,320 --> 00:12:14,560
one more analogy that I'll mention is

353
00:12:12,159 --> 00:12:16,480
whenever I talk to Chach or some LLM

354
00:12:14,559 --> 00:12:18,399
directly in text, I feel like I'm

355
00:12:16,480 --> 00:12:21,039
talking to an operating system through

356
00:12:18,399 --> 00:12:22,639
the terminal. Like it's just it's it's

357
00:12:21,039 --> 00:12:24,719
text. It's direct access to the

358
00:12:22,639 --> 00:12:26,720
operating system. And I think a guey

359
00:12:24,720 --> 00:12:29,680
hasn't yet really been invented in like

360
00:12:26,720 --> 00:12:31,440
a general way like should chatt have a

361
00:12:29,679 --> 00:12:33,439
guey like different than just a tech

362
00:12:31,440 --> 00:12:35,360
bubbles. Uh certainly some of the apps

363
00:12:33,440 --> 00:12:38,480
that we're going to go into in a bit

364
00:12:35,360 --> 00:12:40,240
have guey but there's no like guey

365
00:12:38,480 --> 00:12:43,440
across all the tasks if that makes

366
00:12:40,240 --> 00:12:45,519
sense. Um there are some ways in which

367
00:12:43,440 --> 00:12:47,440
LLMs are different from kind of

368
00:12:45,519 --> 00:12:49,839
operating systems in some fairly unique

369
00:12:47,440 --> 00:12:52,880
way and from early computing. And I

370
00:12:49,839 --> 00:12:54,240
wrote about uh this one particular

371
00:12:52,879 --> 00:12:57,120
property that strikes me as very

372
00:12:54,240 --> 00:12:59,839
different uh this time around. It's that

373
00:12:57,120 --> 00:13:02,000
LLMs like flip they flip the direction

374
00:12:59,839 --> 00:13:05,360
of technology diffusion uh that is

375
00:13:02,000 --> 00:13:07,039
usually uh present in technology. So for

376
00:13:05,360 --> 00:13:09,120
example with electricity, cryptography,

377
00:13:07,039 --> 00:13:10,639
computing, flight, internet, GPS, lots

378
00:13:09,120 --> 00:13:12,320
of new transformative technologies that

379
00:13:10,639 --> 00:13:14,320
have not been around. Typically it is

380
00:13:12,320 --> 00:13:16,720
the government and corporations that are

381
00:13:14,320 --> 00:13:18,720
the first users because it's new and

382
00:13:16,720 --> 00:13:20,720
expensive etc. and it only later

383
00:13:18,720 --> 00:13:22,079
diffuses to consumer. Uh, but I feel

384
00:13:20,720 --> 00:13:24,000
like LLMs are kind of like flipped

385
00:13:22,078 --> 00:13:26,000
around. So maybe with early computers,

386
00:13:24,000 --> 00:13:29,039
it was all about ballistics and military

387
00:13:26,000 --> 00:13:30,320
use, but with LLMs, it's all about how

388
00:13:29,039 --> 00:13:32,000
do you boil an egg or something like

389
00:13:30,320 --> 00:13:33,600
that. This is certainly like a lot of my

390
00:13:32,000 --> 00:13:35,600
use. And so it's really fascinating to

391
00:13:33,600 --> 00:13:37,360
me that we have a new magical computer

392
00:13:35,600 --> 00:13:38,879
and it's like helping me boil an egg.

393
00:13:37,360 --> 00:13:40,720
It's not helping the government do

394
00:13:38,879 --> 00:13:42,159
something really crazy like some

395
00:13:40,720 --> 00:13:43,839
military ballistics or some special

396
00:13:42,159 --> 00:13:45,120
technology. Indeed, corporations are

397
00:13:43,839 --> 00:13:47,200
governments are lagging behind the

398
00:13:45,120 --> 00:13:48,959
adoption of all of us, of all of these

399
00:13:47,200 --> 00:13:50,480
technologies. So, it's just backwards

400
00:13:48,958 --> 00:13:52,399
and I think it informs maybe some of the

401
00:13:50,480 --> 00:13:53,600
uses of how we want to use this

402
00:13:52,399 --> 00:13:56,078
technology or like where are some of the

403
00:13:53,600 --> 00:14:01,040
first apps and so on.

404
00:13:56,078 --> 00:14:03,679
So, in summary so far, LLM labs LLMs. I

405
00:14:01,039 --> 00:14:06,480
think it's accurate language to use, but

406
00:14:03,679 --> 00:14:08,559
LLMs are complicated operating systems.

407
00:14:06,480 --> 00:14:10,240
They're circa 1960s in computing and

408
00:14:08,559 --> 00:14:11,838
we're redoing computing all over again.

409
00:14:10,240 --> 00:14:13,839
and they're currently available via time

410
00:14:11,839 --> 00:14:16,000
sharing and distributed like a utility.

411
00:14:13,839 --> 00:14:17,360
What is new and unprecedented is that

412
00:14:16,000 --> 00:14:18,879
they're not in the hands of a few

413
00:14:17,360 --> 00:14:20,240
governments and corporations. They're in

414
00:14:18,879 --> 00:14:21,600
the hands of all of us because we all

415
00:14:20,240 --> 00:14:24,320
have a computer and it's all just

416
00:14:21,600 --> 00:14:26,639
software and Chaship was beamed down to

417
00:14:24,320 --> 00:14:28,320
our computers like billions of people

418
00:14:26,639 --> 00:14:30,879
like instantly and overnight and this is

419
00:14:28,320 --> 00:14:33,278
insane. Uh and it's kind of insane to me

420
00:14:30,879 --> 00:14:34,958
that this is the case and now it is our

421
00:14:33,278 --> 00:14:37,278
time to enter the industry and program

422
00:14:34,958 --> 00:14:39,679
these computers. This is crazy. So I

423
00:14:37,278 --> 00:14:42,078
think this is quite remarkable. Before

424
00:14:39,679 --> 00:14:43,519
we program LLMs, we have to kind of like

425
00:14:42,078 --> 00:14:45,838
spend some time to think about what

426
00:14:43,519 --> 00:14:48,320
these things are. And I especially like

427
00:14:45,839 --> 00:14:50,480
to kind of talk about their psychology.

428
00:14:48,320 --> 00:14:51,519
So the way I like to think about LLMs is

429
00:14:50,480 --> 00:14:54,079
that they're kind of like people

430
00:14:51,519 --> 00:14:56,399
spirits. Um they are stoastic

431
00:14:54,078 --> 00:14:58,000
simulations of people. Um and the

432
00:14:56,399 --> 00:14:59,839
simulator in this case happens to be an

433
00:14:58,000 --> 00:15:02,720
auto reggressive transformer. So

434
00:14:59,839 --> 00:15:04,800
transformer is a neural net. Uh it's and

435
00:15:02,720 --> 00:15:06,480
it just kind of like is goes on the

436
00:15:04,799 --> 00:15:08,319
level of tokens. It goes chunk chunk

437
00:15:06,480 --> 00:15:10,159
chunk chunk chunk. And there's an almost

438
00:15:08,320 --> 00:15:14,720
equal amount of compute for every single

439
00:15:10,159 --> 00:15:16,958
chunk. Um and um this simulator of

440
00:15:14,720 --> 00:15:19,040
course is is just is basically there's

441
00:15:16,958 --> 00:15:20,479
some weights involved and we fit it to

442
00:15:19,039 --> 00:15:22,159
all of text that we have on the internet

443
00:15:20,480 --> 00:15:24,240
and so on. And you end up with this kind

444
00:15:22,159 --> 00:15:26,240
of a simulator and because it is trained

445
00:15:24,240 --> 00:15:28,399
on humans, it's got this emergent

446
00:15:26,240 --> 00:15:30,639
psychology that is humanlike. So the

447
00:15:28,399 --> 00:15:32,559
first thing you'll notice is of course

448
00:15:30,639 --> 00:15:34,639
uh LLM have encyclopedic knowledge and

449
00:15:32,559 --> 00:15:36,078
memory. uh and they can remember lots of

450
00:15:34,639 --> 00:15:37,600
things, a lot more than any single

451
00:15:36,078 --> 00:15:39,838
individual human can because they read

452
00:15:37,600 --> 00:15:41,680
so many things. It's it actually kind of

453
00:15:39,839 --> 00:15:43,040
reminds me of this movie Rainman, which

454
00:15:41,679 --> 00:15:44,479
I actually really recommend people

455
00:15:43,039 --> 00:15:46,719
watch. It's an amazing movie. I love

456
00:15:44,480 --> 00:15:49,199
this movie. Um and Dustin Hoffman here

457
00:15:46,720 --> 00:15:51,600
is an autistic savant who has almost

458
00:15:49,198 --> 00:15:53,278
perfect memory. So, he can read a he can

459
00:15:51,600 --> 00:15:55,360
read like a phone book and remember all

460
00:15:53,278 --> 00:15:57,198
of the names and phone numbers. And I

461
00:15:55,360 --> 00:15:58,959
kind of feel like LM are kind of like

462
00:15:57,198 --> 00:16:00,399
very similar. They can remember Shaw

463
00:15:58,958 --> 00:16:02,479
hashes and lots of different kinds of

464
00:16:00,399 --> 00:16:04,399
things very very easily. So they

465
00:16:02,480 --> 00:16:06,240
certainly have superpowers in some set

466
00:16:04,399 --> 00:16:08,799
in some respects. But they also have a

467
00:16:06,240 --> 00:16:11,759
bunch of I would say cognitive deficits.

468
00:16:08,799 --> 00:16:13,120
So they hallucinate quite a bit. Um and

469
00:16:11,759 --> 00:16:15,278
they kind of make up stuff and don't

470
00:16:13,120 --> 00:16:17,679
have a very good uh sort of internal

471
00:16:15,278 --> 00:16:19,360
model of self-nowledge, not sufficient

472
00:16:17,679 --> 00:16:21,599
at least. And this has gotten better but

473
00:16:19,360 --> 00:16:22,800
not perfect. They display jagged

474
00:16:21,600 --> 00:16:24,480
intelligence. So they're going to be

475
00:16:22,799 --> 00:16:26,000
superhuman in some problems solving

476
00:16:24,480 --> 00:16:27,680
domains. And then they're going to make

477
00:16:26,000 --> 00:16:29,919
mistakes that basically no human will

478
00:16:27,679 --> 00:16:32,559
make. like you know they will insist

479
00:16:29,919 --> 00:16:34,240
that 9.11 is greater than 9.9 or that

480
00:16:32,559 --> 00:16:36,159
there are two Rs in strawberry these are

481
00:16:34,240 --> 00:16:38,879
some famous examples but basically there

482
00:16:36,159 --> 00:16:40,319
are rough edges that you can trip on so

483
00:16:38,879 --> 00:16:43,278
that's kind of I think also kind of

484
00:16:40,320 --> 00:16:46,879
unique um they also kind of suffer from

485
00:16:43,278 --> 00:16:48,078
entrograde amnesia um so uh and I think

486
00:16:46,879 --> 00:16:49,278
I'm alluding to the fact that if you

487
00:16:48,078 --> 00:16:51,439
have a co-orker who joins your

488
00:16:49,278 --> 00:16:54,159
organization this co-orker will over

489
00:16:51,440 --> 00:16:55,920
time learn your organization and uh they

490
00:16:54,159 --> 00:16:57,759
will understand and gain like a huge

491
00:16:55,919 --> 00:16:59,599
amount of context on the organization

492
00:16:57,759 --> 00:17:01,120
and they go home and they sleep and they

493
00:16:59,600 --> 00:17:03,440
consolidate knowledge and they develop

494
00:17:01,120 --> 00:17:04,640
expertise over time. LLMs don't natively

495
00:17:03,440 --> 00:17:06,400
do this and this is not something that

496
00:17:04,640 --> 00:17:09,280
has really been solved in the R&D of

497
00:17:06,400 --> 00:17:10,559
LLM. I think um and so context windows

498
00:17:09,279 --> 00:17:12,000
are really kind of like working memory

499
00:17:10,558 --> 00:17:13,599
and you have to sort of program the

500
00:17:12,000 --> 00:17:15,038
working memory quite directly because

501
00:17:13,599 --> 00:17:17,038
they don't just kind of like get smarter

502
00:17:15,038 --> 00:17:19,038
by uh by default and I think a lot of

503
00:17:17,038 --> 00:17:22,240
people get tripped up by the analogies

504
00:17:19,038 --> 00:17:23,919
uh in this way. Uh in popular culture I

505
00:17:22,240 --> 00:17:26,078
recommend people watch these two movies

506
00:17:23,919 --> 00:17:27,759
uh Momento and 51st dates. In both of

507
00:17:26,078 --> 00:17:29,839
these movies, the protagonists, their

508
00:17:27,759 --> 00:17:32,160
weights are fixed and their context

509
00:17:29,839 --> 00:17:34,240
windows gets wiped every single morning

510
00:17:32,160 --> 00:17:35,759
and it's really problematic to go to

511
00:17:34,240 --> 00:17:37,519
work or have relationships when this

512
00:17:35,759 --> 00:17:39,599
happens and this happens to all the

513
00:17:37,519 --> 00:17:42,319
time. I guess one more thing I would

514
00:17:39,599 --> 00:17:44,319
point to is security kind of related

515
00:17:42,319 --> 00:17:46,399
limitations of the use of LLM. So for

516
00:17:44,319 --> 00:17:48,240
example, LLMs are quite gullible. Uh

517
00:17:46,400 --> 00:17:50,798
they are susceptible to prompt injection

518
00:17:48,240 --> 00:17:52,798
risks. They might leak your data etc.

519
00:17:50,798 --> 00:17:55,279
And so um and there's many other

520
00:17:52,798 --> 00:17:57,519
considerations uh security related. So,

521
00:17:55,279 --> 00:18:00,000
so basically long story short, you have

522
00:17:57,519 --> 00:18:01,279
to load your you have to load your you

523
00:18:00,000 --> 00:18:03,200
have to simultaneously think through

524
00:18:01,279 --> 00:18:05,440
this superhuman thing that has a bunch

525
00:18:03,200 --> 00:18:07,759
of cognitive deficits and issues. How do

526
00:18:05,440 --> 00:18:10,640
we and yet they are extremely like

527
00:18:07,759 --> 00:18:12,400
useful and so how do we program them and

528
00:18:10,640 --> 00:18:15,759
how do we work around their deficits and

529
00:18:12,400 --> 00:18:17,440
enjoy their superhuman powers.

530
00:18:15,759 --> 00:18:18,960
So what I want to switch to now is talk

531
00:18:17,440 --> 00:18:20,720
about the opportunities of how do we use

532
00:18:18,960 --> 00:18:22,400
these models and what are some of the

533
00:18:20,720 --> 00:18:23,519
biggest opportunities. This is not a

534
00:18:22,400 --> 00:18:24,640
comprehensive list just some of the

535
00:18:23,519 --> 00:18:26,879
things that I thought were interesting

536
00:18:24,640 --> 00:18:29,280
for this talk. The first thing I'm kind

537
00:18:26,880 --> 00:18:32,160
of excited about is what I would call

538
00:18:29,279 --> 00:18:34,240
partial autonomy apps. So for example,

539
00:18:32,160 --> 00:18:36,558
let's work with the example of coding.

540
00:18:34,240 --> 00:18:38,079
You can certainly go to chacht directly

541
00:18:36,558 --> 00:18:40,960
and you can start copy pasting code

542
00:18:38,079 --> 00:18:42,399
around and copyping bug reports and

543
00:18:40,960 --> 00:18:44,160
stuff around and getting code and copy

544
00:18:42,400 --> 00:18:45,440
pasting everything around. Why would you

545
00:18:44,160 --> 00:18:47,120
why would you do that? Why would you go

546
00:18:45,440 --> 00:18:48,480
directly to the operating system? It

547
00:18:47,119 --> 00:18:50,719
makes a lot more sense to have an app

548
00:18:48,480 --> 00:18:53,759
dedicated for this. And so I think many

549
00:18:50,720 --> 00:18:56,319
of you uh use uh cursor. I do as well.

550
00:18:53,759 --> 00:18:57,759
And uh cursor is kind of like the thing

551
00:18:56,319 --> 00:18:59,759
you want instead. You don't want to just

552
00:18:57,759 --> 00:19:01,440
directly go to the chash apt. And I

553
00:18:59,759 --> 00:19:03,759
think cursor is a very good example of

554
00:19:01,440 --> 00:19:06,160
an early LLM app that has a bunch of

555
00:19:03,759 --> 00:19:08,000
properties that I think are um useful

556
00:19:06,160 --> 00:19:09,679
across all the LLM apps. So in

557
00:19:08,000 --> 00:19:12,000
particular, you will notice that we have

558
00:19:09,679 --> 00:19:13,840
a traditional interface that allows a

559
00:19:12,000 --> 00:19:16,480
human to go in and do all the work

560
00:19:13,839 --> 00:19:17,839
manually just as before. But in addition

561
00:19:16,480 --> 00:19:19,360
to that, we now have this LLM

562
00:19:17,839 --> 00:19:21,918
integration that allows us to go in

563
00:19:19,359 --> 00:19:23,519
bigger chunks. And so some of the

564
00:19:21,919 --> 00:19:25,840
properties of LLM apps that I think are

565
00:19:23,519 --> 00:19:28,079
shared and useful to point out. Number

566
00:19:25,839 --> 00:19:31,199
one, the LLMs basically do a ton of the

567
00:19:28,079 --> 00:19:33,199
context management. Um, number two, they

568
00:19:31,200 --> 00:19:34,960
orchestrate multiple calls to LLMs,

569
00:19:33,200 --> 00:19:36,960
right? So in the case of cursor, there's

570
00:19:34,960 --> 00:19:39,200
under the hood embedding models for all

571
00:19:36,960 --> 00:19:41,840
your files, the actual chat models,

572
00:19:39,200 --> 00:19:43,919
models that apply diffs to the code, and

573
00:19:41,839 --> 00:19:46,079
this is all orchestrated for you. A

574
00:19:43,919 --> 00:19:48,480
really big one that uh I think also

575
00:19:46,079 --> 00:19:50,480
maybe not fully appreciated always is

576
00:19:48,480 --> 00:19:53,120
application specific uh GUI and the

577
00:19:50,480 --> 00:19:54,558
importance of it. Um because you don't

578
00:19:53,119 --> 00:19:56,558
just want to talk to the operating

579
00:19:54,558 --> 00:19:59,038
system directly in text. Text is very

580
00:19:56,558 --> 00:20:00,480
hard to read, interpret, understand and

581
00:19:59,038 --> 00:20:03,119
also like you don't want to take some of

582
00:20:00,480 --> 00:20:05,038
these actions natively in text. So it's

583
00:20:03,119 --> 00:20:06,798
much better to just see a diff as like

584
00:20:05,038 --> 00:20:08,480
red and green change and you can see

585
00:20:06,798 --> 00:20:10,240
what's being added is subtracted. It's

586
00:20:08,480 --> 00:20:11,919
much easier to just do command Y to

587
00:20:10,240 --> 00:20:13,120
accept or command N to reject. I

588
00:20:11,919 --> 00:20:15,520
shouldn't have to type it in text,

589
00:20:13,119 --> 00:20:17,839
right? So, a guey allows a human to

590
00:20:15,519 --> 00:20:20,000
audit the work of these fallible systems

591
00:20:17,839 --> 00:20:21,759
and to go faster. I'm going to come back

592
00:20:20,000 --> 00:20:23,839
to this point a little bit uh later as

593
00:20:21,759 --> 00:20:25,200
well. And the last kind of feature I

594
00:20:23,839 --> 00:20:27,678
want to point out is that there's what I

595
00:20:25,200 --> 00:20:29,440
call the autonomy slider. So, for

596
00:20:27,679 --> 00:20:31,519
example, in cursor, you can just do tap

597
00:20:29,440 --> 00:20:33,600
completion. You're mostly in charge. You

598
00:20:31,519 --> 00:20:36,000
can select a chunk of code and command K

599
00:20:33,599 --> 00:20:37,918
to change just that chunk of code. You

600
00:20:36,000 --> 00:20:40,400
can do command L to change the entire

601
00:20:37,919 --> 00:20:42,159
file. Or you can do command I which just

602
00:20:40,400 --> 00:20:44,080
you know let it rip do whatever you want

603
00:20:42,159 --> 00:20:46,400
in the entire repo and that's the sort

604
00:20:44,079 --> 00:20:48,319
of full autonomy agent agentic version

605
00:20:46,400 --> 00:20:50,159
and so you are in charge of the autonomy

606
00:20:48,319 --> 00:20:53,038
slider and depending on the complexity

607
00:20:50,159 --> 00:20:54,320
of the task at hand you can uh tune the

608
00:20:53,038 --> 00:20:57,119
amount of autonomy that you're willing

609
00:20:54,319 --> 00:20:58,558
to give up uh for that task maybe to

610
00:20:57,119 --> 00:21:03,038
show one more example of a fairly

611
00:20:58,558 --> 00:21:04,639
successful LLM app uh perplexity um it

612
00:21:03,038 --> 00:21:07,200
also has very similar features to what

613
00:21:04,640 --> 00:21:08,720
I've just pointed out to in cursor uh it

614
00:21:07,200 --> 00:21:10,960
packages up a lot of the information. It

615
00:21:08,720 --> 00:21:13,440
orchestrates multiple LLMs. It's got a

616
00:21:10,960 --> 00:21:15,600
GUI that allows you to audit some of its

617
00:21:13,440 --> 00:21:17,279
work. So, for example, it will site

618
00:21:15,599 --> 00:21:18,959
sources and you can imagine inspecting

619
00:21:17,279 --> 00:21:20,639
them. And it's got an autonomy slider.

620
00:21:18,960 --> 00:21:22,319
You can either just do a quick search or

621
00:21:20,640 --> 00:21:24,320
you can do research or you can do deep

622
00:21:22,319 --> 00:21:25,678
research and come back 10 minutes later.

623
00:21:24,319 --> 00:21:27,678
So, this is all just varying levels of

624
00:21:25,679 --> 00:21:30,159
autonomy that you give up to the tool.

625
00:21:27,679 --> 00:21:32,000
So, I guess my question is I feel like a

626
00:21:30,159 --> 00:21:33,520
lot of software will become partially

627
00:21:32,000 --> 00:21:35,279
autonomous. I'm trying to think through

628
00:21:33,519 --> 00:21:36,960
like what does that look like? And for

629
00:21:35,279 --> 00:21:38,960
many of you who maintain products and

630
00:21:36,960 --> 00:21:40,240
services, how are you going to make your

631
00:21:38,960 --> 00:21:42,720
products and services partially

632
00:21:40,240 --> 00:21:45,120
autonomous? Can an LLM see everything

633
00:21:42,720 --> 00:21:47,038
that a human can see? Can an LLM act in

634
00:21:45,119 --> 00:21:49,439
all the ways that a human could act? And

635
00:21:47,038 --> 00:21:50,879
can humans supervise and stay in the

636
00:21:49,440 --> 00:21:52,320
loop of this activity? Because again,

637
00:21:50,880 --> 00:21:54,880
these are fallible systems that aren't

638
00:21:52,319 --> 00:21:56,558
yet perfect. And what does a diff look

639
00:21:54,880 --> 00:21:58,799
like in Photoshop or something like

640
00:21:56,558 --> 00:22:00,079
that? You know, and also a lot of the

641
00:21:58,798 --> 00:22:01,839
traditional software right now, it has

642
00:22:00,079 --> 00:22:03,359
all these switches and all this kind of

643
00:22:01,839 --> 00:22:04,720
stuff that's all designed for human. All

644
00:22:03,359 --> 00:22:07,759
of this has to change and become

645
00:22:04,720 --> 00:22:09,519
accessible to LLMs.

646
00:22:07,759 --> 00:22:11,119
So, one thing I want to stress with a

647
00:22:09,519 --> 00:22:14,240
lot of these LLM apps that I'm not sure

648
00:22:11,119 --> 00:22:16,798
gets as much attention as it should is

649
00:22:14,240 --> 00:22:18,640
um we we're now kind of like cooperating

650
00:22:16,798 --> 00:22:20,158
with AIS and usually they are doing the

651
00:22:18,640 --> 00:22:22,559
generation and we as humans are doing

652
00:22:20,159 --> 00:22:24,480
the verification. It is in our interest

653
00:22:22,558 --> 00:22:25,759
to make this loop go as fast as

654
00:22:24,480 --> 00:22:28,000
possible. So, we're getting a lot of

655
00:22:25,759 --> 00:22:30,400
work done. There are two major ways that

656
00:22:28,000 --> 00:22:32,720
I think uh this can be done. Number one,

657
00:22:30,400 --> 00:22:34,240
you can speed up verification a lot. Um,

658
00:22:32,720 --> 00:22:36,079
and I think guies, for example, are

659
00:22:34,240 --> 00:22:39,279
extremely important to this because a

660
00:22:36,079 --> 00:22:41,359
guey utilizes your computer vision GPU

661
00:22:39,279 --> 00:22:43,200
in all of our head. Reading text is

662
00:22:41,359 --> 00:22:45,759
effortful and it's not fun, but looking

663
00:22:43,200 --> 00:22:47,440
at stuff is fun and it's it's just a

664
00:22:45,759 --> 00:22:49,679
kind of like a highway to your brain.

665
00:22:47,440 --> 00:22:51,679
So, I think guies are very useful for

666
00:22:49,679 --> 00:22:53,600
auditing systems and visual

667
00:22:51,679 --> 00:22:56,080
representations in general. And number

668
00:22:53,599 --> 00:22:58,879
two, I would say is we have to keep the

669
00:22:56,079 --> 00:23:00,639
AI on the leash. We I think a lot of

670
00:22:58,880 --> 00:23:03,600
people are getting way over excited with

671
00:23:00,640 --> 00:23:05,840
AI agents and uh it's not useful to me

672
00:23:03,599 --> 00:23:07,918
to get a diff of 10,000 lines of code to

673
00:23:05,839 --> 00:23:09,199
my repo. Like I have to I'm still the

674
00:23:07,919 --> 00:23:11,120
bottleneck, right? Even though that

675
00:23:09,200 --> 00:23:12,240
10,00 lines come out instantly, I have

676
00:23:11,119 --> 00:23:15,359
to make sure that this thing is not

677
00:23:12,240 --> 00:23:16,558
introducing bugs. It's just like and

678
00:23:15,359 --> 00:23:17,839
that it's doing the correct thing,

679
00:23:16,558 --> 00:23:22,879
right? And that there's no security

680
00:23:17,839 --> 00:23:25,439
issues and so on. So um I think that um

681
00:23:22,880 --> 00:23:28,240
yeah basically you we have to sort of

682
00:23:25,440 --> 00:23:30,320
like it's in our interest to make the

683
00:23:28,240 --> 00:23:32,159
the flow of these two go very very fast

684
00:23:30,319 --> 00:23:33,119
and we have to somehow keep the AI on

685
00:23:32,159 --> 00:23:35,280
the leash because it gets way too

686
00:23:33,119 --> 00:23:37,279
overreactive. It's uh it's kind of like

687
00:23:35,279 --> 00:23:39,200
this. This is how I feel when I do AI

688
00:23:37,279 --> 00:23:40,879
assisted coding. If I'm just bite coding

689
00:23:39,200 --> 00:23:42,400
everything is nice and great but if I'm

690
00:23:40,880 --> 00:23:44,720
actually trying to get work done it's

691
00:23:42,400 --> 00:23:47,280
not so great to have an overreactive uh

692
00:23:44,720 --> 00:23:48,798
agent doing all this kind of stuff. So

693
00:23:47,279 --> 00:23:51,119
this slide is not very good. I'm sorry,

694
00:23:48,798 --> 00:23:53,839
but I guess I'm trying to develop like

695
00:23:51,119 --> 00:23:55,759
many of you some ways of utilizing these

696
00:23:53,839 --> 00:23:58,079
agents in my coding workflow and to do

697
00:23:55,759 --> 00:23:59,839
AI assisted coding. And in my own work,

698
00:23:58,079 --> 00:24:02,240
I'm always scared to get way too big

699
00:23:59,839 --> 00:24:04,158
diffs. I always go in small incremental

700
00:24:02,240 --> 00:24:06,159
chunks. I want to make sure that

701
00:24:04,159 --> 00:24:09,120
everything is good. I want to spin this

702
00:24:06,159 --> 00:24:10,799
loop very very fast and um I sort of

703
00:24:09,119 --> 00:24:13,199
work on small chunks of single concrete

704
00:24:10,798 --> 00:24:14,639
thing. Uh and so I think many of you

705
00:24:13,200 --> 00:24:17,600
probably are developing similar ways of

706
00:24:14,640 --> 00:24:19,600
working with the with LLMs.

707
00:24:17,599 --> 00:24:22,240
Um, I also saw a number of blog posts

708
00:24:19,599 --> 00:24:24,000
that try to develop these best practices

709
00:24:22,240 --> 00:24:25,359
for working with LLMs. And here's one

710
00:24:24,000 --> 00:24:26,798
that I read recently and I thought was

711
00:24:25,359 --> 00:24:28,240
quite good. And it kind of discussed

712
00:24:26,798 --> 00:24:29,918
some techniques and some of them have to

713
00:24:28,240 --> 00:24:32,000
do with how you keep the AI on the

714
00:24:29,919 --> 00:24:34,960
leash. And so, as an example, if you are

715
00:24:32,000 --> 00:24:36,960
prompting, if your prompt is vague, then

716
00:24:34,960 --> 00:24:38,880
uh the AI might not do exactly what you

717
00:24:36,960 --> 00:24:40,240
wanted and in that case, verification

718
00:24:38,880 --> 00:24:42,080
will fail. You're going to ask for

719
00:24:40,240 --> 00:24:43,679
something else. If a verification fails,

720
00:24:42,079 --> 00:24:45,119
then you're going to start spinning. So

721
00:24:43,679 --> 00:24:46,798
it makes a lot more sense to spend a bit

722
00:24:45,119 --> 00:24:48,479
more time to be more concrete in your

723
00:24:46,798 --> 00:24:50,240
prompts which increases the probability

724
00:24:48,480 --> 00:24:52,079
of successful verification and you can

725
00:24:50,240 --> 00:24:54,079
move forward. And so I think a lot of us

726
00:24:52,079 --> 00:24:56,319
are going to end up finding um kind of

727
00:24:54,079 --> 00:24:57,839
techniques like this. I think in my own

728
00:24:56,319 --> 00:25:00,079
work as well I'm currently interested in

729
00:24:57,839 --> 00:25:01,839
uh what education looks like in um

730
00:25:00,079 --> 00:25:04,480
together with kind of like now that we

731
00:25:01,839 --> 00:25:07,038
have AI uh and LLMs what does education

732
00:25:04,480 --> 00:25:09,679
look like? And I think a a large amount

733
00:25:07,038 --> 00:25:11,440
of thought for me goes into how we keep

734
00:25:09,679 --> 00:25:13,200
AI on the leash. I don't think it just

735
00:25:11,440 --> 00:25:14,798
works to go to chat and be like, "Hey,

736
00:25:13,200 --> 00:25:16,880
teach me physics." I don't think this

737
00:25:14,798 --> 00:25:18,798
works because the AI is like gets lost

738
00:25:16,880 --> 00:25:20,880
in the woods. And so for me, this is

739
00:25:18,798 --> 00:25:22,639
actually two separate apps. For example,

740
00:25:20,880 --> 00:25:24,880
there's an app for a teacher that

741
00:25:22,640 --> 00:25:26,480
creates courses and then there's an app

742
00:25:24,880 --> 00:25:29,120
that takes courses and serves them to

743
00:25:26,480 --> 00:25:31,200
students. And in both cases, we now have

744
00:25:29,119 --> 00:25:32,719
this intermediate artifact of a course

745
00:25:31,200 --> 00:25:33,840
that is auditable and we can make sure

746
00:25:32,720 --> 00:25:35,919
it's good. We can make sure it's

747
00:25:33,839 --> 00:25:37,119
consistent. and the AI is kept on the

748
00:25:35,919 --> 00:25:40,240
leash with respect to a certain

749
00:25:37,119 --> 00:25:42,639
syllabus, a certain like um progression

750
00:25:40,240 --> 00:25:44,159
of projects and so on. And so this is

751
00:25:42,640 --> 00:25:45,759
one way of keeping the AI on leash and I

752
00:25:44,159 --> 00:25:47,760
think has a much higher likelihood of

753
00:25:45,759 --> 00:25:49,919
working and the AI is not getting lost

754
00:25:47,759 --> 00:25:51,919
in the woods.

755
00:25:49,919 --> 00:25:54,480
One more kind of analogy I wanted to

756
00:25:51,919 --> 00:25:56,159
sort of allude to is I'm not I'm no

757
00:25:54,480 --> 00:25:57,839
stranger to partial autonomy and I kind

758
00:25:56,159 --> 00:26:00,240
of worked on this I think for five years

759
00:25:57,839 --> 00:26:01,918
at Tesla and this is also a partial

760
00:26:00,240 --> 00:26:03,519
autonomy product and shares a lot of the

761
00:26:01,919 --> 00:26:05,440
features like for example right there in

762
00:26:03,519 --> 00:26:07,599
the instrument panel is the GUI of the

763
00:26:05,440 --> 00:26:09,200
autopilot so it's showing me what the

764
00:26:07,599 --> 00:26:10,798
what the neural network sees and so on

765
00:26:09,200 --> 00:26:13,440
and we have the autonomy slider where

766
00:26:10,798 --> 00:26:15,599
over the course of my tenure there we

767
00:26:13,440 --> 00:26:18,320
did more and more autonomous tasks for

768
00:26:15,599 --> 00:26:21,119
the user and maybe the story that I

769
00:26:18,319 --> 00:26:22,639
wanted to tell very briefly is uh

770
00:26:21,119 --> 00:26:25,199
actually the first time I drove a

771
00:26:22,640 --> 00:26:27,278
self-driving vehicle was in 2013 and I

772
00:26:25,200 --> 00:26:29,120
had a friend who worked at Whimo and uh

773
00:26:27,278 --> 00:26:31,519
he offered to give me a drive around

774
00:26:29,119 --> 00:26:33,918
Palo Alto. I took this picture using

775
00:26:31,519 --> 00:26:35,278
Google Glass at the time and many of you

776
00:26:33,919 --> 00:26:37,278
are so young that you might not even

777
00:26:35,278 --> 00:26:39,440
know what that is. Uh but uh yeah, this

778
00:26:37,278 --> 00:26:40,960
was like all the rage at the time. And

779
00:26:39,440 --> 00:26:42,960
we got into this car and we went for

780
00:26:40,960 --> 00:26:45,120
about a 30-minute drive around Palo Alto

781
00:26:42,960 --> 00:26:46,960
highways uh streets and so on. And this

782
00:26:45,119 --> 00:26:49,839
drive was perfect. There was zero

783
00:26:46,960 --> 00:26:52,480
interventions and this was 2013 which is

784
00:26:49,839 --> 00:26:54,000
now 12 years ago. And it kind of struck

785
00:26:52,480 --> 00:26:56,159
me because at the time when I had this

786
00:26:54,000 --> 00:26:59,519
perfect drive, this perfect demo, I felt

787
00:26:56,159 --> 00:27:00,799
like, wow, self-driving is imminent

788
00:26:59,519 --> 00:27:03,440
because this just worked. This is

789
00:27:00,798 --> 00:27:04,879
incredible. Um, but here we are 12 years

790
00:27:03,440 --> 00:27:07,038
later and we are still working on

791
00:27:04,880 --> 00:27:09,200
autonomy. Um, we are still working on

792
00:27:07,038 --> 00:27:10,798
driving agents and even now we haven't

793
00:27:09,200 --> 00:27:12,880
actually like really solved the problem.

794
00:27:10,798 --> 00:27:14,960
like you may see Whimos going around and

795
00:27:12,880 --> 00:27:16,799
they look driverless but you know

796
00:27:14,960 --> 00:27:18,720
there's still a lot of teleoperation and

797
00:27:16,798 --> 00:27:20,960
a lot of human in the loop of a lot of

798
00:27:18,720 --> 00:27:22,558
this driving so we still haven't even

799
00:27:20,960 --> 00:27:24,400
like declared success but I think it's

800
00:27:22,558 --> 00:27:26,558
definitely like going to succeed at this

801
00:27:24,400 --> 00:27:29,360
point but it just took a long time and

802
00:27:26,558 --> 00:27:31,599
so I think like like this is software is

803
00:27:29,359 --> 00:27:34,719
really tricky I think in the same way

804
00:27:31,599 --> 00:27:36,480
that driving is tricky and so when I see

805
00:27:34,720 --> 00:27:38,720
things like oh 2025 is the year of

806
00:27:36,480 --> 00:27:41,038
agents I get very concerned and I kind

807
00:27:38,720 --> 00:27:44,079
of feel like you know this is the decade

808
00:27:41,038 --> 00:27:45,759
of agents and this is going to be quite

809
00:27:44,079 --> 00:27:47,199
some time. We need humans in the loop.

810
00:27:45,759 --> 00:27:51,038
We need to do this carefully. This is

811
00:27:47,200 --> 00:27:52,880
software. Let's be serious here. One

812
00:27:51,038 --> 00:27:56,079
more kind of analogy that I always think

813
00:27:52,880 --> 00:27:58,159
through is the Iron Man suit. Uh I think

814
00:27:56,079 --> 00:28:01,359
this is I always love Iron Man. I think

815
00:27:58,159 --> 00:28:02,880
it's like so um correct in a bunch of

816
00:28:01,359 --> 00:28:04,398
ways with respect to technology and how

817
00:28:02,880 --> 00:28:05,919
it will play out. And what I love about

818
00:28:04,398 --> 00:28:08,719
the Iron Man suit is that it's both an

819
00:28:05,919 --> 00:28:10,320
augmentation and Tony Stark can drive it

820
00:28:08,720 --> 00:28:11,839
and it's also an agent. And in some of

821
00:28:10,319 --> 00:28:13,599
the movies, the Iron Man suit is quite

822
00:28:11,839 --> 00:28:15,278
autonomous and can fly around and find

823
00:28:13,599 --> 00:28:17,278
Tony and all this kind of stuff. And so

824
00:28:15,278 --> 00:28:19,038
this is the autonomy slider is we can be

825
00:28:17,278 --> 00:28:21,200
we can build augmentations or we can

826
00:28:19,038 --> 00:28:23,440
build agents and we kind of want to do a

827
00:28:21,200 --> 00:28:25,919
bit of both. But at this stage I would

828
00:28:23,440 --> 00:28:29,120
say working with fallible LLMs and so

829
00:28:25,919 --> 00:28:31,600
on. I would say you know it's less Iron

830
00:28:29,119 --> 00:28:33,678
Man robots and more Iron Man suits that

831
00:28:31,599 --> 00:28:35,119
you want to build. It's less like

832
00:28:33,679 --> 00:28:36,720
building flashy demos of autonomous

833
00:28:35,119 --> 00:28:39,678
agents and more building partial

834
00:28:36,720 --> 00:28:41,919
autonomy products. And these products

835
00:28:39,679 --> 00:28:43,840
have custom gueies and UIUX. And we're

836
00:28:41,919 --> 00:28:45,520
trying to um and this is done so that

837
00:28:43,839 --> 00:28:48,158
the generation verification loop of the

838
00:28:45,519 --> 00:28:49,519
human is very very fast. But we are not

839
00:28:48,159 --> 00:28:51,278
losing the sight of the fact that it is

840
00:28:49,519 --> 00:28:52,960
in principle possible to automate this

841
00:28:51,278 --> 00:28:54,558
work. And there should be an autonomy

842
00:28:52,960 --> 00:28:55,919
slider in your product. And you should

843
00:28:54,558 --> 00:28:58,558
be thinking about how you can slide that

844
00:28:55,919 --> 00:29:01,278
autonomy slider and make your product uh

845
00:28:58,558 --> 00:29:02,720
sort of um more autonomous over time.

846
00:29:01,278 --> 00:29:04,240
But this is kind of how I think there's

847
00:29:02,720 --> 00:29:06,558
lots of opportunities in these kinds of

848
00:29:04,240 --> 00:29:08,159
products. I want to now switch gears a

849
00:29:06,558 --> 00:29:09,839
little bit and talk about one other

850
00:29:08,159 --> 00:29:11,440
dimension that I think is very unique.

851
00:29:09,839 --> 00:29:12,959
Not only is there a new type of

852
00:29:11,440 --> 00:29:15,278
programming language that allows for

853
00:29:12,960 --> 00:29:16,640
autonomy in software but also as I

854
00:29:15,278 --> 00:29:19,038
mentioned it's programmed in English

855
00:29:16,640 --> 00:29:20,559
which is this natural interface and

856
00:29:19,038 --> 00:29:22,240
suddenly everyone is a programmer

857
00:29:20,558 --> 00:29:24,639
because everyone speaks natural language

858
00:29:22,240 --> 00:29:26,159
like English. So this is extremely

859
00:29:24,640 --> 00:29:28,000
bullish and very interesting to me and

860
00:29:26,159 --> 00:29:29,520
also completely unprecedented. I would

861
00:29:28,000 --> 00:29:31,440
say it it used to be the case that you

862
00:29:29,519 --> 00:29:32,879
need to spend five to 10 years studying

863
00:29:31,440 --> 00:29:35,200
something to be able to do something in

864
00:29:32,880 --> 00:29:37,120
software. this is not the case anymore.

865
00:29:35,200 --> 00:29:40,640
So, I don't know if by any chance anyone

866
00:29:37,119 --> 00:29:42,479
has heard of vibe coding.

867
00:29:40,640 --> 00:29:44,240
Uh, this this is the tweet that kind of

868
00:29:42,480 --> 00:29:46,720
like introduced this, but I'm told that

869
00:29:44,240 --> 00:29:49,599
this is now like a major meme. Um, fun

870
00:29:46,720 --> 00:29:51,200
story about this is that I've been on

871
00:29:49,599 --> 00:29:53,519
Twitter for like 15 years or something

872
00:29:51,200 --> 00:29:56,319
like that at this point and I still have

873
00:29:53,519 --> 00:29:58,000
no clue which tweet will become viral

874
00:29:56,319 --> 00:30:00,798
and which tweet like fizzles and no one

875
00:29:58,000 --> 00:30:01,839
cares. And I thought that this tweet was

876
00:30:00,798 --> 00:30:03,359
going to be the latter. I don't know. It

877
00:30:01,839 --> 00:30:05,278
was just like a shower of thoughts. But

878
00:30:03,359 --> 00:30:06,719
this became like a total meme and I

879
00:30:05,278 --> 00:30:08,480
really just can't tell. But I guess like

880
00:30:06,720 --> 00:30:10,558
it struck a chord and it gave a name to

881
00:30:08,480 --> 00:30:13,278
something that everyone was feeling but

882
00:30:10,558 --> 00:30:17,278
couldn't quite say in words. So now

883
00:30:13,278 --> 00:30:18,640
there's a Wikipedia page and everything.

884
00:30:17,278 --> 00:30:25,919
This is like

885
00:30:18,640 --> 00:30:27,600
[Applause]

886
00:30:25,919 --> 00:30:30,720
yeah this is like a major contribution

887
00:30:27,599 --> 00:30:32,959
now or something like that. So,

888
00:30:30,720 --> 00:30:34,960
um, so Tom Wolf from HuggingFace shared

889
00:30:32,960 --> 00:30:37,759
this beautiful video that I really love.

890
00:30:34,960 --> 00:30:41,720
Um,

891
00:30:37,759 --> 00:30:41,720
these are kids vibe coding.

892
00:30:42,640 --> 00:30:46,720
And I find that this is such a wholesome

893
00:30:44,398 --> 00:30:48,079
video. Like, I love this video. Like,

894
00:30:46,720 --> 00:30:49,839
how can you look at this video and feel

895
00:30:48,079 --> 00:30:52,558
bad about the future? The future is

896
00:30:49,839 --> 00:30:53,918
great.

897
00:30:52,558 --> 00:30:56,639
I think this will end up being like a

898
00:30:53,919 --> 00:30:59,200
gateway drug to software development.

899
00:30:56,640 --> 00:31:02,240
Um, I'm not a doomer about the future of

900
00:30:59,200 --> 00:31:04,798
the generation and I think yeah, I love

901
00:31:02,240 --> 00:31:07,120
this video. So, I tried by coding a

902
00:31:04,798 --> 00:31:09,359
little bit uh as well because it's so

903
00:31:07,119 --> 00:31:10,798
fun. Uh, so bike coding is so great when

904
00:31:09,359 --> 00:31:12,398
you want to build something super duper

905
00:31:10,798 --> 00:31:13,679
custom that doesn't appear to exist and

906
00:31:12,398 --> 00:31:15,519
you just want to wing it because it's a

907
00:31:13,679 --> 00:31:18,720
Saturday or something like that. So, I

908
00:31:15,519 --> 00:31:20,639
built this uh iOS app and I don't I

909
00:31:18,720 --> 00:31:21,759
can't actually program in Swift, but I

910
00:31:20,640 --> 00:31:23,360
was really shocked that I was able to

911
00:31:21,759 --> 00:31:24,720
build like a super basic app and I'm not

912
00:31:23,359 --> 00:31:27,359
going to explain it. It's really uh

913
00:31:24,720 --> 00:31:28,720
dumb, but uh I kind of like this was

914
00:31:27,359 --> 00:31:30,319
just like a day of work and this was

915
00:31:28,720 --> 00:31:32,319
running on my phone like later that day

916
00:31:30,319 --> 00:31:33,918
and I was like, "Wow, this is amazing."

917
00:31:32,319 --> 00:31:35,918
I didn't have to like read through Swift

918
00:31:33,919 --> 00:31:38,159
for like five days or something like

919
00:31:35,919 --> 00:31:40,480
that to like get started. I also

920
00:31:38,159 --> 00:31:41,760
vipcoded this app called Menu Genen. And

921
00:31:40,480 --> 00:31:44,079
this is live. You can try it in

922
00:31:41,759 --> 00:31:45,440
menu.app. And I basically had this

923
00:31:44,079 --> 00:31:46,639
problem where I show up at a restaurant,

924
00:31:45,440 --> 00:31:48,558
I read through the menu, and I have no

925
00:31:46,640 --> 00:31:51,600
idea what any of the things are. And I

926
00:31:48,558 --> 00:31:52,960
need pictures. So this doesn't exist. So

927
00:31:51,599 --> 00:31:55,918
I was like, "Hey, I'm going to bite code

928
00:31:52,960 --> 00:31:58,240
it." So, um, this is what it looks like.

929
00:31:55,919 --> 00:32:01,440
You go to menu.app,

930
00:31:58,240 --> 00:32:03,278
um, and, uh, you take a picture of a of

931
00:32:01,440 --> 00:32:06,240
a menu and then menu generates the

932
00:32:03,278 --> 00:32:08,000
images and everyone gets $5 in credits

933
00:32:06,240 --> 00:32:10,480
for free when you sign up. And

934
00:32:08,000 --> 00:32:13,759
therefore, this is a major cost center

935
00:32:10,480 --> 00:32:16,240
in my life. So, this is a negative

936
00:32:13,759 --> 00:32:17,839
negative uh, revenue app for me right

937
00:32:16,240 --> 00:32:19,200
now.

938
00:32:17,839 --> 00:32:21,278
I've lost a huge amount of money on

939
00:32:19,200 --> 00:32:23,360
menu.

940
00:32:21,278 --> 00:32:28,159
Okay. But the fascinating thing about

941
00:32:23,359 --> 00:32:30,240
menu genen for me is that the code of

942
00:32:28,159 --> 00:32:32,720
the v the vite coding part the code was

943
00:32:30,240 --> 00:32:35,120
actually the easy part of v of v coding

944
00:32:32,720 --> 00:32:36,480
menu and most of it actually was when I

945
00:32:35,119 --> 00:32:37,599
tried to make it real so that you can

946
00:32:36,480 --> 00:32:39,599
actually have authentication and

947
00:32:37,599 --> 00:32:41,918
payments and the domain name and averal

948
00:32:39,599 --> 00:32:44,158
deployment. This was really hard and all

949
00:32:41,919 --> 00:32:47,120
of this was not code. All of this devops

950
00:32:44,159 --> 00:32:49,840
stuff was in me in the browser clicking

951
00:32:47,119 --> 00:32:51,518
stuff and this was extreme slo and took

952
00:32:49,839 --> 00:32:54,639
another week. So it was really

953
00:32:51,519 --> 00:32:57,278
fascinating that I had the menu genen um

954
00:32:54,640 --> 00:32:59,278
basically demo working on my laptop in a

955
00:32:57,278 --> 00:33:01,200
few hours and then it took me a week

956
00:32:59,278 --> 00:33:02,880
because I was trying to make it real and

957
00:33:01,200 --> 00:33:05,600
the reason for this is this was just

958
00:33:02,880 --> 00:33:07,278
really annoying. Um, so for example, if

959
00:33:05,599 --> 00:33:09,199
you try to add Google login to your web

960
00:33:07,278 --> 00:33:11,679
page, I know this is very small, but

961
00:33:09,200 --> 00:33:13,600
just a huge amount of instructions of

962
00:33:11,679 --> 00:33:15,200
this clerk library telling me how to

963
00:33:13,599 --> 00:33:17,519
integrate this. And this is crazy. Like

964
00:33:15,200 --> 00:33:19,759
it's telling me go to this URL, click on

965
00:33:17,519 --> 00:33:21,200
this dropdown, choose this, go to this,

966
00:33:19,759 --> 00:33:22,640
and click on that. And it's like telling

967
00:33:21,200 --> 00:33:24,880
me what to do. Like a computer is

968
00:33:22,640 --> 00:33:26,640
telling me the actions I should be

969
00:33:24,880 --> 00:33:28,640
taking. Like you do it. Why am I doing

970
00:33:26,640 --> 00:33:31,759
this?

971
00:33:28,640 --> 00:33:33,840
What the hell?

972
00:33:31,759 --> 00:33:36,158
I had to follow all these instructions.

973
00:33:33,839 --> 00:33:39,519
This was crazy. So I think the last part

974
00:33:36,159 --> 00:33:41,679
of my talk therefore focuses on can we

975
00:33:39,519 --> 00:33:44,240
just build for agents? I don't want to

976
00:33:41,679 --> 00:33:46,320
do this work. Can agents do this? Thank

977
00:33:44,240 --> 00:33:48,640
you.

978
00:33:46,319 --> 00:33:50,879
Okay. So roughly speaking, I think

979
00:33:48,640 --> 00:33:53,120
there's a new category of consumer and

980
00:33:50,880 --> 00:33:55,440
manipulator of digital information. It

981
00:33:53,119 --> 00:33:57,518
used to be just humans through GUIs or

982
00:33:55,440 --> 00:34:00,240
computers through APIs. And now we have

983
00:33:57,519 --> 00:34:02,798
a completely new thing and agents are

984
00:34:00,240 --> 00:34:04,319
they're computers but they are humanlike

985
00:34:02,798 --> 00:34:05,599
kind of right they're people spirits

986
00:34:04,319 --> 00:34:06,720
there's people spirits on the internet

987
00:34:05,599 --> 00:34:08,319
and they need to interact with our

988
00:34:06,720 --> 00:34:10,639
software infrastructure like can we

989
00:34:08,320 --> 00:34:12,960
build for them it's a new thing so as an

990
00:34:10,639 --> 00:34:15,119
example you can have robots.txt on your

991
00:34:12,960 --> 00:34:18,320
domain and you can instruct uh or like

992
00:34:15,119 --> 00:34:19,838
advise I suppose um uh web crawlers on

993
00:34:18,320 --> 00:34:21,519
how to behave on your website in the

994
00:34:19,838 --> 00:34:23,358
same way you can have maybe lm.txt txt

995
00:34:21,519 --> 00:34:25,679
file which is just a simple markdown

996
00:34:23,358 --> 00:34:28,078
that's telling LLMs what this domain is

997
00:34:25,679 --> 00:34:30,559
about and this is very readable to a to

998
00:34:28,079 --> 00:34:32,480
an LLM. If it had to instead get the

999
00:34:30,559 --> 00:34:33,838
HTML of your web page and try to parse

1000
00:34:32,480 --> 00:34:35,679
it, this is very errorprone and

1001
00:34:33,838 --> 00:34:36,799
difficult and will screw it up and it's

1002
00:34:35,679 --> 00:34:38,398
not going to work. So we can just

1003
00:34:36,800 --> 00:34:41,280
directly speak to the LLM. It's worth

1004
00:34:38,398 --> 00:34:42,719
it. Um a huge amount of documentation is

1005
00:34:41,280 --> 00:34:45,599
currently written for people. So you

1006
00:34:42,719 --> 00:34:47,759
will see things like lists and bold and

1007
00:34:45,599 --> 00:34:51,200
pictures and this is not directly

1008
00:34:47,760 --> 00:34:52,800
accessible by an LLM. So I see some of

1009
00:34:51,199 --> 00:34:54,878
the services now are transitioning a lot

1010
00:34:52,800 --> 00:34:57,039
of the their docs to be specifically for

1011
00:34:54,878 --> 00:34:59,440
LLMs. So Versell and Stripe as an

1012
00:34:57,039 --> 00:35:01,920
example are early movers here but there

1013
00:34:59,440 --> 00:35:04,159
are a few more that I've seen already

1014
00:35:01,920 --> 00:35:06,720
and they offer their documentation in

1015
00:35:04,159 --> 00:35:10,078
markdown. Markdown is super easy for LMS

1016
00:35:06,719 --> 00:35:12,319
to understand. This is great. Um maybe

1017
00:35:10,079 --> 00:35:14,079
one simple example from from uh my

1018
00:35:12,320 --> 00:35:15,599
experience as well. Maybe some of you

1019
00:35:14,079 --> 00:35:19,360
know three blue one brown. He makes

1020
00:35:15,599 --> 00:35:22,639
beautiful animation videos on YouTube.

1021
00:35:19,360 --> 00:35:22,639
[Applause]

1022
00:35:23,199 --> 00:35:27,439
Yeah, I love this library. So that he

1023
00:35:25,039 --> 00:35:30,079
wrote uh Manon and I wanted to make my

1024
00:35:27,440 --> 00:35:32,639
own and uh there's extensive

1025
00:35:30,079 --> 00:35:34,000
documentations on how to use manon and

1026
00:35:32,639 --> 00:35:35,358
so I didn't want to actually read

1027
00:35:34,000 --> 00:35:37,440
through it. So I copy pasted the whole

1028
00:35:35,358 --> 00:35:39,199
thing to an LLM and I described what I

1029
00:35:37,440 --> 00:35:41,440
wanted and it just worked out of the box

1030
00:35:39,199 --> 00:35:43,279
like LLM just bcoded me an animation

1031
00:35:41,440 --> 00:35:45,838
exactly what I wanted and I was like wow

1032
00:35:43,280 --> 00:35:48,160
this is amazing. So if we can make docs

1033
00:35:45,838 --> 00:35:51,199
legible to LLMs, it's going to unlock a

1034
00:35:48,159 --> 00:35:52,399
huge amount of um kind of use and um I

1035
00:35:51,199 --> 00:35:55,118
think this is wonderful and should

1036
00:35:52,400 --> 00:35:56,240
should happen more. The other thing I

1037
00:35:55,119 --> 00:35:57,680
wanted to point out is that you do

1038
00:35:56,239 --> 00:35:58,959
unfortunately have to it's not just

1039
00:35:57,679 --> 00:36:00,639
about taking your docs and making them

1040
00:35:58,960 --> 00:36:01,920
appear in markdown. That's the easy

1041
00:36:00,639 --> 00:36:04,719
part. We actually have to change the

1042
00:36:01,920 --> 00:36:06,800
docs because anytime your docs say click

1043
00:36:04,719 --> 00:36:09,919
this is bad. An LLM will not be able to

1044
00:36:06,800 --> 00:36:11,519
natively take this action right now. So,

1045
00:36:09,920 --> 00:36:13,519
Verscell, for example, is replacing

1046
00:36:11,519 --> 00:36:15,358
every occurrence of click with an

1047
00:36:13,519 --> 00:36:18,239
equivalent curl command that your LM

1048
00:36:15,358 --> 00:36:19,759
agent could take on your behalf. Um, and

1049
00:36:18,239 --> 00:36:21,358
so I think this is very interesting. And

1050
00:36:19,760 --> 00:36:23,040
then, of course, there's a model context

1051
00:36:21,358 --> 00:36:24,880
protocol from Enthropic. And this is

1052
00:36:23,039 --> 00:36:26,719
also another way, it's a protocol of

1053
00:36:24,880 --> 00:36:28,160
speaking directly to agents as this new

1054
00:36:26,719 --> 00:36:29,679
consumer and manipulator of digital

1055
00:36:28,159 --> 00:36:31,519
information. So, I'm very bullish on

1056
00:36:29,679 --> 00:36:33,519
these ideas. The other thing I really

1057
00:36:31,519 --> 00:36:36,639
like is a number of little tools here

1058
00:36:33,519 --> 00:36:38,719
and there that are helping ingest data

1059
00:36:36,639 --> 00:36:40,159
that in like very LLM friendly formats.

1060
00:36:38,719 --> 00:36:42,719
So for example, when I go to a GitHub

1061
00:36:40,159 --> 00:36:44,319
repo like my nanoGPT repo, I can't feed

1062
00:36:42,719 --> 00:36:46,719
this to an LLM and ask questions about

1063
00:36:44,320 --> 00:36:48,880
it uh because it's you know this is a

1064
00:36:46,719 --> 00:36:50,480
human interface on GitHub. So when you

1065
00:36:48,880 --> 00:36:52,320
just change the URL from GitHub to get

1066
00:36:50,480 --> 00:36:54,159
ingest then uh this will actually

1067
00:36:52,320 --> 00:36:55,920
concatenate all the files into a single

1068
00:36:54,159 --> 00:36:57,519
giant text and it will create a

1069
00:36:55,920 --> 00:36:59,039
directory structure etc. And this is

1070
00:36:57,519 --> 00:37:01,519
ready to be copy pasted into your

1071
00:36:59,039 --> 00:37:03,440
favorite LLM and you can do stuff. Maybe

1072
00:37:01,519 --> 00:37:05,440
even more dramatic example of this is

1073
00:37:03,440 --> 00:37:08,639
deep wiki where it's not just the raw

1074
00:37:05,440 --> 00:37:10,960
content of these files. uh this is from

1075
00:37:08,639 --> 00:37:12,879
Devon but also like they have Devon

1076
00:37:10,960 --> 00:37:14,639
basically do analysis of the GitHub repo

1077
00:37:12,880 --> 00:37:18,000
and Devon basically builds up a whole

1078
00:37:14,639 --> 00:37:19,838
docs uh pages just for your repo and you

1079
00:37:18,000 --> 00:37:22,079
can imagine that this is even more

1080
00:37:19,838 --> 00:37:23,440
helpful to copy paste into your LLM. So

1081
00:37:22,079 --> 00:37:24,960
I love all the little tools that

1082
00:37:23,440 --> 00:37:26,559
basically where you just change the URL

1083
00:37:24,960 --> 00:37:29,519
and it makes something accessible to an

1084
00:37:26,559 --> 00:37:30,719
LLM. So this is all well and great and u

1085
00:37:29,519 --> 00:37:32,719
I think there should be a lot more of

1086
00:37:30,719 --> 00:37:35,279
it. One more note I wanted to make is

1087
00:37:32,719 --> 00:37:38,000
that it is absolutely possible that in

1088
00:37:35,280 --> 00:37:39,599
the future LLMs will be able to this is

1089
00:37:38,000 --> 00:37:40,800
not even future this is today they'll be

1090
00:37:39,599 --> 00:37:42,640
able to go around and they'll be able to

1091
00:37:40,800 --> 00:37:46,079
click stuff and so on but I still think

1092
00:37:42,639 --> 00:37:48,559
it's very worth u basically meeting LLM

1093
00:37:46,079 --> 00:37:49,920
halfway LLM's halfway and making it

1094
00:37:48,559 --> 00:37:51,679
easier for them to access all this

1095
00:37:49,920 --> 00:37:54,400
information uh because this is still

1096
00:37:51,679 --> 00:37:56,639
fairly expensive I would say to use and

1097
00:37:54,400 --> 00:37:58,240
uh a lot more difficult and so I do

1098
00:37:56,639 --> 00:38:00,639
think that lots of software there will

1099
00:37:58,239 --> 00:38:02,159
be a long tail where it won't like adapt

1100
00:38:00,639 --> 00:38:04,480
apps because these are not like live

1101
00:38:02,159 --> 00:38:06,239
player sort of repositories or digital

1102
00:38:04,480 --> 00:38:08,400
infrastructure and we will need these

1103
00:38:06,239 --> 00:38:09,679
tools. Uh but I think for everyone else

1104
00:38:08,400 --> 00:38:11,760
I think it's very worth kind of like

1105
00:38:09,679 --> 00:38:14,639
meeting in some middle point. So I'm

1106
00:38:11,760 --> 00:38:17,119
bullish on both if that makes sense.

1107
00:38:14,639 --> 00:38:18,639
So in summary, what an amazing time to

1108
00:38:17,119 --> 00:38:20,720
get into the industry. We need to

1109
00:38:18,639 --> 00:38:23,039
rewrite a ton of code. A ton of code

1110
00:38:20,719 --> 00:38:25,598
will be written by professionals and by

1111
00:38:23,039 --> 00:38:27,519
coders. These LLMs are kind of like

1112
00:38:25,599 --> 00:38:28,800
utilities, kind of like fabs, but

1113
00:38:27,519 --> 00:38:30,960
they're kind of especially like

1114
00:38:28,800 --> 00:38:34,320
operating systems. But it's so early.

1115
00:38:30,960 --> 00:38:36,079
It's like 1960s of operating systems and

1116
00:38:34,320 --> 00:38:38,960
uh and I think a lot of the analogies

1117
00:38:36,079 --> 00:38:41,599
cross over. Um and these LMS are kind of

1118
00:38:38,960 --> 00:38:43,358
like these fallible uh you know people

1119
00:38:41,599 --> 00:38:45,599
spirits that we have to learn to work

1120
00:38:43,358 --> 00:38:47,679
with. And in order to do that properly,

1121
00:38:45,599 --> 00:38:48,960
we need to adjust our infrastructure

1122
00:38:47,679 --> 00:38:50,639
towards it. So when you're building

1123
00:38:48,960 --> 00:38:52,800
these LLM apps, I describe some of the

1124
00:38:50,639 --> 00:38:54,719
ways of working effectively with these

1125
00:38:52,800 --> 00:38:57,039
LLMs and some of the tools that make

1126
00:38:54,719 --> 00:38:59,039
that uh kind of possible and how you can

1127
00:38:57,039 --> 00:39:00,800
spin this loop very very quickly and

1128
00:38:59,039 --> 00:39:03,519
basically create partial tunneling

1129
00:39:00,800 --> 00:39:04,880
products and then um yeah, a lot of code

1130
00:39:03,519 --> 00:39:07,199
has to also be written for the agents

1131
00:39:04,880 --> 00:39:09,519
more directly. But in any case, going

1132
00:39:07,199 --> 00:39:10,879
back to the Iron Man suit analogy, I

1133
00:39:09,519 --> 00:39:12,719
think what we'll see over the next

1134
00:39:10,880 --> 00:39:15,920
decade roughly is we're going to take

1135
00:39:12,719 --> 00:39:17,598
the slider from left to right. And I'm

1136
00:39:15,920 --> 00:39:19,358
very interesting. It's going to be very

1137
00:39:17,599 --> 00:39:21,519
interesting to see what that looks like.

1138
00:39:19,358 --> 00:39:25,639
And I can't wait to build it with all of

1139
00:39:21,519 --> 00:39:25,639
you. Thank you.