1 00:00:01,120 --> 00:00:07,169 Please welcome former director of AI 2 00:00:04,000 --> 00:00:11,440 Tesla Andre Carpathy. 3 00:00:07,169 --> 00:00:14,439 [Music] 4 00:00:11,439 --> 00:00:14,439 Hello. 5 00:00:14,769 --> 00:00:17,850 [Music] 6 00:00:19,039 --> 00:00:24,800 Wow, a lot of people here. Hello. 7 00:00:22,800 --> 00:00:27,199 Um, okay. Yeah. So I'm excited to be 8 00:00:24,800 --> 00:00:30,560 here today to talk to you about software 9 00:00:27,199 --> 00:00:32,559 in the era of AI. And I'm told that many 10 00:00:30,559 --> 00:00:34,399 of you are students like bachelors, 11 00:00:32,558 --> 00:00:36,399 masters, PhD and so on. And you're about 12 00:00:34,399 --> 00:00:37,759 to enter the industry. And I think it's 13 00:00:36,399 --> 00:00:38,960 actually like an extremely unique and 14 00:00:37,759 --> 00:00:41,359 very interesting time to enter the 15 00:00:38,960 --> 00:00:43,039 industry right now. And I think 16 00:00:41,359 --> 00:00:47,600 fundamentally the reason for that is 17 00:00:43,039 --> 00:00:49,920 that um software is changing uh again. 18 00:00:47,600 --> 00:00:52,558 And I say again because I actually gave 19 00:00:49,920 --> 00:00:54,079 this talk already. Um but the problem is 20 00:00:52,558 --> 00:00:55,198 that software keeps changing. So I 21 00:00:54,079 --> 00:00:56,719 actually have a lot of material to 22 00:00:55,198 --> 00:00:58,159 create new talks and I think it's 23 00:00:56,719 --> 00:01:00,320 changing quite fundamentally. I think 24 00:00:58,159 --> 00:01:02,000 roughly speaking software has not 25 00:01:00,320 --> 00:01:04,558 changed much on such a fundamental level 26 00:01:02,000 --> 00:01:06,879 for 70 years. And then it's changed I 27 00:01:04,558 --> 00:01:08,560 think about twice quite rapidly in the 28 00:01:06,879 --> 00:01:09,839 last few years. And so there's just a 29 00:01:08,560 --> 00:01:12,320 huge amount of work to do a huge amount 30 00:01:09,840 --> 00:01:14,159 of software to write and rewrite. So 31 00:01:12,319 --> 00:01:16,079 let's take a look at maybe the realm of 32 00:01:14,159 --> 00:01:17,759 software. So if we kind of think of this 33 00:01:16,079 --> 00:01:20,000 as like the map of software this is a 34 00:01:17,759 --> 00:01:21,920 really cool tool called map of GitHub. 35 00:01:20,000 --> 00:01:23,359 Um this is kind of like all the software 36 00:01:21,920 --> 00:01:24,640 that's written. Uh these are 37 00:01:23,359 --> 00:01:26,400 instructions to the computer for 38 00:01:24,640 --> 00:01:28,000 carrying out tasks in the digital space. 39 00:01:26,400 --> 00:01:30,080 So if you zoom in here, these are all 40 00:01:28,000 --> 00:01:31,680 different kinds of repositories and this 41 00:01:30,079 --> 00:01:33,599 is all the code that has been written. 42 00:01:31,680 --> 00:01:35,840 And a few years ago I kind of observed 43 00:01:33,599 --> 00:01:37,759 that um software was kind of changing 44 00:01:35,840 --> 00:01:39,680 and there was kind of like a new type of 45 00:01:37,759 --> 00:01:42,319 software around and I called this 46 00:01:39,680 --> 00:01:44,640 software 2.0 at the time and the idea 47 00:01:42,319 --> 00:01:46,798 here was that software 1.0 is the code 48 00:01:44,640 --> 00:01:48,799 you write for the computer. Software 2.0 49 00:01:46,799 --> 00:01:50,320 know are basically neural networks and 50 00:01:48,799 --> 00:01:53,280 in particular the weights of a neural 51 00:01:50,319 --> 00:01:55,438 network and you're not writing this code 52 00:01:53,280 --> 00:01:56,879 directly you are most you are more kind 53 00:01:55,438 --> 00:01:58,398 of like tuning the data sets and then 54 00:01:56,879 --> 00:02:00,879 you're running an optimizer to create to 55 00:01:58,399 --> 00:02:02,560 create the parameters of this neural net 56 00:02:00,879 --> 00:02:03,599 and I think like at the time neural nets 57 00:02:02,560 --> 00:02:04,799 were kind of seen as like just a 58 00:02:03,599 --> 00:02:06,239 different kind of classifier like a 59 00:02:04,799 --> 00:02:09,039 decision tree or something like that and 60 00:02:06,239 --> 00:02:10,239 so I think it was kind of like um I 61 00:02:09,038 --> 00:02:12,238 think this framing was a lot more 62 00:02:10,239 --> 00:02:13,520 appropriate and now actually what we 63 00:02:12,239 --> 00:02:15,759 have is kind of like an equivalent of 64 00:02:13,520 --> 00:02:18,080 GitHub in the realm of software 2.0 And 65 00:02:15,759 --> 00:02:20,719 I think the hugging face is basically 66 00:02:18,080 --> 00:02:22,400 equivalent of GitHub in software 2.0. 67 00:02:20,719 --> 00:02:24,239 And there's also model atlas and you can 68 00:02:22,400 --> 00:02:25,439 visualize all the code written there. In 69 00:02:24,239 --> 00:02:28,319 case you're curious, by the way, the 70 00:02:25,439 --> 00:02:30,878 giant circle, the point in the middle, 71 00:02:28,318 --> 00:02:32,878 uh these are the parameters of flux, the 72 00:02:30,878 --> 00:02:34,959 image generator. And so anytime someone 73 00:02:32,878 --> 00:02:37,120 tunes a on top of a flux model, you 74 00:02:34,959 --> 00:02:39,120 basically create a git commit uh in this 75 00:02:37,120 --> 00:02:41,599 space and uh you create a different kind 76 00:02:39,120 --> 00:02:43,599 of a image generator. So basically what 77 00:02:41,598 --> 00:02:45,919 we have is software 1.0 is the computer 78 00:02:43,598 --> 00:02:48,719 code that programs a computer. Software 79 00:02:45,919 --> 00:02:50,719 2.0 are the weights which program neural 80 00:02:48,719 --> 00:02:53,519 networks. Uh and here's an example of 81 00:02:50,719 --> 00:02:55,039 Alexet image recognizer neural network. 82 00:02:53,519 --> 00:02:56,400 Now so far all of the neural networks 83 00:02:55,039 --> 00:02:58,159 that we've been familiar with until 84 00:02:56,400 --> 00:03:01,680 recently where kind of like fixed 85 00:02:58,159 --> 00:03:03,439 function computers image to categories 86 00:03:01,680 --> 00:03:05,200 or something like that. And I think 87 00:03:03,439 --> 00:03:06,719 what's changed and I think is a quite 88 00:03:05,199 --> 00:03:09,598 fundamental change is that neural 89 00:03:06,719 --> 00:03:12,158 networks became programmable with large 90 00:03:09,598 --> 00:03:14,959 language models. And so I I see this as 91 00:03:12,158 --> 00:03:18,000 quite new, unique. It's a new kind of a 92 00:03:14,959 --> 00:03:19,598 computer and uh so in my mind it's uh 93 00:03:18,000 --> 00:03:22,158 worth giving it a new designation of 94 00:03:19,598 --> 00:03:25,679 software 3.0. And basically your prompts 95 00:03:22,158 --> 00:03:28,318 are now programs that program the LLM. 96 00:03:25,680 --> 00:03:30,400 And uh remarkably uh these uh prompts 97 00:03:28,318 --> 00:03:33,598 are written in English. So it's kind of 98 00:03:30,400 --> 00:03:36,799 a very interesting programming language. 99 00:03:33,598 --> 00:03:37,919 Um so maybe uh to summarize the 100 00:03:36,799 --> 00:03:39,439 difference if you're doing sentiment 101 00:03:37,919 --> 00:03:42,479 classification for example you can 102 00:03:39,439 --> 00:03:44,239 imagine writing some uh amount of Python 103 00:03:42,479 --> 00:03:46,000 to to basically do sentiment 104 00:03:44,239 --> 00:03:47,840 classification or you can train a neural 105 00:03:46,000 --> 00:03:50,000 net or you can prompt a large language 106 00:03:47,840 --> 00:03:51,280 model. Uh so here this is a few short 107 00:03:50,000 --> 00:03:52,799 prompt and you can imagine changing it 108 00:03:51,280 --> 00:03:54,640 and programming the computer in a 109 00:03:52,799 --> 00:03:57,599 slightly different way. So basically we 110 00:03:54,639 --> 00:03:59,679 have software 1.0 software 2.0 and I 111 00:03:57,598 --> 00:04:01,919 think we're seeing maybe you've seen a 112 00:03:59,680 --> 00:04:03,519 lot of GitHub code is not just like code 113 00:04:01,919 --> 00:04:05,438 anymore. there's a bunch of like English 114 00:04:03,519 --> 00:04:07,360 interspersed with code and so I think 115 00:04:05,438 --> 00:04:09,199 kind of there's a growing category of 116 00:04:07,360 --> 00:04:10,879 new kind of code. So not only is it a 117 00:04:09,199 --> 00:04:12,719 new programming paradigm, it's also 118 00:04:10,878 --> 00:04:14,878 remarkable to me that it's in our native 119 00:04:12,719 --> 00:04:17,918 language of English. And so when this 120 00:04:14,878 --> 00:04:20,879 blew my mind a few uh I guess years ago 121 00:04:17,918 --> 00:04:21,918 now I tweeted this and um I think it 122 00:04:20,879 --> 00:04:23,199 captured the attention of a lot of 123 00:04:21,918 --> 00:04:25,359 people and this is my currently pinned 124 00:04:23,199 --> 00:04:28,160 tweet uh is that remarkably we're now 125 00:04:25,360 --> 00:04:31,600 programming computers in English. Now, 126 00:04:28,160 --> 00:04:34,960 when I was at uh Tesla, um we were 127 00:04:31,600 --> 00:04:37,439 working on the uh autopilot and uh we 128 00:04:34,959 --> 00:04:39,918 were trying to get the car to drive and 129 00:04:37,439 --> 00:04:41,680 I sort of showed this slide at the time 130 00:04:39,918 --> 00:04:43,198 where you can imagine that the inputs to 131 00:04:41,680 --> 00:04:44,639 the car are on the bottom and they're 132 00:04:43,199 --> 00:04:47,040 going through a software stack to 133 00:04:44,639 --> 00:04:48,560 produce the steering and acceleration 134 00:04:47,040 --> 00:04:51,120 and I made the observation at the time 135 00:04:48,560 --> 00:04:52,720 that there was a ton of C++ code around 136 00:04:51,120 --> 00:04:54,478 in the autopilot which was the software 137 00:04:52,720 --> 00:04:56,960 1.0 code and then there was some neural 138 00:04:54,478 --> 00:04:58,800 nets in there doing image recognition 139 00:04:56,959 --> 00:05:00,879 and uh I kind of observed that over time 140 00:04:58,800 --> 00:05:02,720 as we made the autopilot better 141 00:05:00,879 --> 00:05:05,839 basically the neural network grew in 142 00:05:02,720 --> 00:05:08,560 capability and size and in addition to 143 00:05:05,839 --> 00:05:12,079 that all the C++ code was being deleted 144 00:05:08,560 --> 00:05:14,560 and kind of like was um and a lot of the 145 00:05:12,079 --> 00:05:16,478 kind of capabilities and functionality 146 00:05:14,560 --> 00:05:19,038 that was originally written in 1.0 was 147 00:05:16,478 --> 00:05:20,719 migrated to 2.0. So as an example, a lot 148 00:05:19,038 --> 00:05:22,639 of the stitching up of information 149 00:05:20,720 --> 00:05:24,960 across images from the different cameras 150 00:05:22,639 --> 00:05:26,478 and across time was done by a neural 151 00:05:24,959 --> 00:05:29,839 network and we were able to delete a lot 152 00:05:26,478 --> 00:05:32,560 of code and so the software 2.0 stack 153 00:05:29,839 --> 00:05:34,159 quite literally ate through the software 154 00:05:32,560 --> 00:05:35,680 stack of the autopilot. So I thought 155 00:05:34,160 --> 00:05:37,039 this was really remarkable at the time 156 00:05:35,680 --> 00:05:39,360 and I think we're seeing the same thing 157 00:05:37,038 --> 00:05:40,800 again where uh basically we have a new 158 00:05:39,360 --> 00:05:42,479 kind of software and it's eating through 159 00:05:40,800 --> 00:05:44,400 the stack. We have three completely 160 00:05:42,478 --> 00:05:45,599 different programming paradigms and I 161 00:05:44,399 --> 00:05:47,359 think if you're entering the industry 162 00:05:45,600 --> 00:05:49,360 it's a very good idea to be fluent in 163 00:05:47,360 --> 00:05:50,800 all of them because they all have slight 164 00:05:49,360 --> 00:05:53,120 pros and cons and you may want to 165 00:05:50,800 --> 00:05:54,400 program some functionality in 1.0 or 2.0 166 00:05:53,120 --> 00:05:55,600 or 3.0. Are you going to train 167 00:05:54,399 --> 00:05:57,439 neurallet? Are you going to just prompt 168 00:05:55,600 --> 00:05:59,360 an LLM? Should this be a piece of code 169 00:05:57,439 --> 00:06:00,560 that's explicit etc. So we all have to 170 00:05:59,360 --> 00:06:03,520 make these decisions and actually 171 00:06:00,560 --> 00:06:06,800 potentially uh fluidly trans transition 172 00:06:03,519 --> 00:06:09,758 between these paradigms. So what I 173 00:06:06,800 --> 00:06:11,759 wanted to get into now is first I want 174 00:06:09,759 --> 00:06:13,520 to in the first part talk about LLMs and 175 00:06:11,759 --> 00:06:15,120 how to kind of like think of this new 176 00:06:13,519 --> 00:06:17,439 paradigm and the ecosystem and what that 177 00:06:15,120 --> 00:06:18,720 looks like. Uh like what are what is 178 00:06:17,439 --> 00:06:20,240 this new computer? What does it look 179 00:06:18,720 --> 00:06:23,759 like and what does the ecosystem look 180 00:06:20,240 --> 00:06:25,759 like? Um I was struck by this quote from 181 00:06:23,759 --> 00:06:27,520 Anduring actually uh many years ago now 182 00:06:25,759 --> 00:06:29,439 I think and I think Andrew is going to 183 00:06:27,519 --> 00:06:30,639 be speaking right after me. Uh but he 184 00:06:29,439 --> 00:06:33,360 said at the time AI is the new 185 00:06:30,639 --> 00:06:34,639 electricity and I do think that it um 186 00:06:33,360 --> 00:06:36,720 kind of captures something very 187 00:06:34,639 --> 00:06:38,960 interesting in that LLMs certainly feel 188 00:06:36,720 --> 00:06:41,600 like they have properties of utilities 189 00:06:38,959 --> 00:06:44,239 right now. So 190 00:06:41,600 --> 00:06:47,120 um LLM labs like OpenAI, Gemini, 191 00:06:44,240 --> 00:06:48,879 Enthropic etc. They spend capex to train 192 00:06:47,120 --> 00:06:51,120 the LLMs and this is kind of equivalent 193 00:06:48,879 --> 00:06:53,038 to building out a grid and then there's 194 00:06:51,120 --> 00:06:56,399 opex to serve that intelligence over 195 00:06:53,038 --> 00:06:58,639 APIs to all of us and this is done 196 00:06:56,399 --> 00:07:00,399 through metered access where we pay per 197 00:06:58,639 --> 00:07:01,918 million tokens or something like that 198 00:07:00,399 --> 00:07:03,918 and we have a lot of demands that are 199 00:07:01,918 --> 00:07:06,240 very utility- like demands out of this 200 00:07:03,918 --> 00:07:08,959 API we demand low latency high uptime 201 00:07:06,240 --> 00:07:10,800 consistent quality etc. In electricity, 202 00:07:08,959 --> 00:07:12,399 you would have a transfer switch. So you 203 00:07:10,800 --> 00:07:14,400 can transfer your electricity source 204 00:07:12,399 --> 00:07:16,799 from like grid and solar or battery or 205 00:07:14,399 --> 00:07:18,560 generator. In LLM, we have maybe open 206 00:07:16,800 --> 00:07:20,639 router and easily switch between the 207 00:07:18,560 --> 00:07:23,038 different types of LLMs that exist. 208 00:07:20,639 --> 00:07:25,038 Because the LLM are software, they don't 209 00:07:23,038 --> 00:07:26,719 compete for physical space. So it's okay 210 00:07:25,038 --> 00:07:28,159 to have basically like six electricity 211 00:07:26,720 --> 00:07:29,840 providers and you can switch between 212 00:07:28,160 --> 00:07:31,919 them, right? Because they don't compete 213 00:07:29,839 --> 00:07:33,679 in such a direct way. And I think what's 214 00:07:31,918 --> 00:07:36,478 also a little fascinating and we saw 215 00:07:33,680 --> 00:07:38,800 this in the last few days actually a lot 216 00:07:36,478 --> 00:07:41,120 of the LLMs went down and people were 217 00:07:38,800 --> 00:07:42,478 kind of like stuck and unable to work. 218 00:07:41,120 --> 00:07:43,759 And uh I think it's kind of fascinating 219 00:07:42,478 --> 00:07:45,758 to me that when the state-of-the-art 220 00:07:43,759 --> 00:07:47,759 LLMs go down, it's actually kind of like 221 00:07:45,759 --> 00:07:49,360 an intelligence brownout in the world. 222 00:07:47,759 --> 00:07:52,080 It's kind of like when the voltage is 223 00:07:49,360 --> 00:07:55,120 unreliable in the grid and uh the planet 224 00:07:52,079 --> 00:07:56,719 just gets dumber the more reliance we 225 00:07:55,120 --> 00:07:58,399 have on these models, which already is 226 00:07:56,720 --> 00:08:00,800 like really dramatic and I think will 227 00:07:58,399 --> 00:08:02,239 continue to grow. But LLM's don't only 228 00:08:00,800 --> 00:08:03,520 have properties of utilities. I think 229 00:08:02,240 --> 00:08:06,478 it's also fair to say that they have 230 00:08:03,519 --> 00:08:09,519 some properties of fabs. And the reason 231 00:08:06,478 --> 00:08:12,240 for this is that the capex required for 232 00:08:09,519 --> 00:08:14,318 building LLM is actually quite large. Uh 233 00:08:12,240 --> 00:08:15,918 it's not just like building some uh 234 00:08:14,319 --> 00:08:17,598 power station or something like that, 235 00:08:15,918 --> 00:08:20,000 right? You're investing a huge amount of 236 00:08:17,598 --> 00:08:22,478 money and I think the tech tree and uh 237 00:08:20,000 --> 00:08:24,399 for the technology is growing quite 238 00:08:22,478 --> 00:08:26,959 rapidly. So we're in a world where we 239 00:08:24,399 --> 00:08:28,959 have sort of deep tech trees, research 240 00:08:26,959 --> 00:08:32,399 and development secrets that are 241 00:08:28,959 --> 00:08:34,240 centralizing inside the LLM labs. Um and 242 00:08:32,399 --> 00:08:36,240 but I think the analogy muddies a little 243 00:08:34,240 --> 00:08:38,158 bit also because as I mentioned this is 244 00:08:36,240 --> 00:08:40,959 software and software is a bit less 245 00:08:38,158 --> 00:08:43,038 defensible because it is so malleable. 246 00:08:40,958 --> 00:08:44,319 And so um I think it's just an 247 00:08:43,038 --> 00:08:46,639 interesting kind of thing to think about 248 00:08:44,320 --> 00:08:48,160 potentially. There's many analogy 249 00:08:46,639 --> 00:08:49,600 analogies you can make like a 4 250 00:08:48,159 --> 00:08:51,039 nanometer process node maybe is 251 00:08:49,600 --> 00:08:53,040 something like a cluster with certain 252 00:08:51,039 --> 00:08:54,799 max flops. You can think about when 253 00:08:53,039 --> 00:08:56,079 you're use when you're using Nvidia GPUs 254 00:08:54,799 --> 00:08:57,120 and you're only doing the software and 255 00:08:56,080 --> 00:08:59,120 you're not doing the hardware. That's 256 00:08:57,120 --> 00:09:00,320 kind of like the fabless model. But if 257 00:08:59,120 --> 00:09:02,000 you're actually also building your own 258 00:09:00,320 --> 00:09:03,278 hardware and you're training on TPUs if 259 00:09:02,000 --> 00:09:05,200 you're Google, that's kind of like the 260 00:09:03,278 --> 00:09:06,399 Intel model where you own your fab. So I 261 00:09:05,200 --> 00:09:08,240 think there's some analogies here that 262 00:09:06,399 --> 00:09:09,759 make sense. But actually I think the 263 00:09:08,240 --> 00:09:12,480 analogy that makes the most sense 264 00:09:09,759 --> 00:09:15,278 perhaps is that in my mind LLM have very 265 00:09:12,480 --> 00:09:17,759 strong kind of analogies to operating 266 00:09:15,278 --> 00:09:19,519 systems. Uh in that this is not just 267 00:09:17,759 --> 00:09:20,958 electricity or water. It's not something 268 00:09:19,519 --> 00:09:22,959 that comes out of the tap as a 269 00:09:20,958 --> 00:09:25,919 commodity. uh this is these are now 270 00:09:22,958 --> 00:09:28,719 increasingly complex software ecosystems 271 00:09:25,919 --> 00:09:30,879 right so uh they're not just like simple 272 00:09:28,720 --> 00:09:32,000 commodities like electricity and it's 273 00:09:30,879 --> 00:09:33,919 kind of interesting to me that the 274 00:09:32,000 --> 00:09:36,159 ecosystem is shaping in a very similar 275 00:09:33,919 --> 00:09:38,559 kind of way where you have a few closed 276 00:09:36,159 --> 00:09:39,838 source providers like Windows or Mac OS 277 00:09:38,559 --> 00:09:42,719 and then you have an open source 278 00:09:39,839 --> 00:09:45,519 alternative like Linux and I think for u 279 00:09:42,720 --> 00:09:47,519 neural for LLMs as well we have a kind 280 00:09:45,519 --> 00:09:49,200 of a few competing closed source 281 00:09:47,519 --> 00:09:51,440 providers and then maybe the llama 282 00:09:49,200 --> 00:09:53,120 ecosystem is currently like maybe a 283 00:09:51,440 --> 00:09:55,120 close approximation to something that 284 00:09:53,120 --> 00:09:56,480 may grow into something like Linux. 285 00:09:55,120 --> 00:09:58,159 Again, I think it's still very early 286 00:09:56,480 --> 00:09:59,600 because these are just simple LLMs, but 287 00:09:58,159 --> 00:10:01,120 we're starting to see that these are 288 00:09:59,600 --> 00:10:02,800 going to get a lot more complicated. 289 00:10:01,120 --> 00:10:03,919 It's not just about the LLM itself. It's 290 00:10:02,799 --> 00:10:05,519 about all the tool use and the 291 00:10:03,919 --> 00:10:07,278 multiodalities and how all of that 292 00:10:05,519 --> 00:10:09,360 works. And so when I sort of had this 293 00:10:07,278 --> 00:10:11,200 realization a while back, I tried to 294 00:10:09,360 --> 00:10:12,800 sketch it out and it kind of seemed to 295 00:10:11,200 --> 00:10:15,839 me like LLMs are kind of like a new 296 00:10:12,799 --> 00:10:17,599 operating system, right? So the LLM is a 297 00:10:15,839 --> 00:10:19,760 new kind of a computer. It's sitting 298 00:10:17,600 --> 00:10:21,519 it's kind of like the CPU equivalent. uh 299 00:10:19,759 --> 00:10:24,399 the context windows are kind of like the 300 00:10:21,519 --> 00:10:26,639 memory and then the LLM is orchestrating 301 00:10:24,399 --> 00:10:29,839 memory and compute uh for problem 302 00:10:26,639 --> 00:10:32,639 solving um using all of these uh 303 00:10:29,839 --> 00:10:34,320 capabilities here and so definitely if 304 00:10:32,639 --> 00:10:36,480 you look at it looks very much like 305 00:10:34,320 --> 00:10:38,879 operating system from that perspective. 306 00:10:36,480 --> 00:10:41,200 Um, a few more analogies. For example, 307 00:10:38,879 --> 00:10:43,679 if you want to download an app, say I go 308 00:10:41,200 --> 00:10:46,240 to VS Code and I go to download, you can 309 00:10:43,679 --> 00:10:50,159 download VS Code and you can run it on 310 00:10:46,240 --> 00:10:53,120 Windows, Linux or or Mac in the same way 311 00:10:50,159 --> 00:10:55,519 as you can take an LLM app like cursor 312 00:10:53,120 --> 00:10:57,440 and you can run it on GPT or cloud or 313 00:10:55,519 --> 00:10:59,039 Gemini series, right? It's just a drop 314 00:10:57,440 --> 00:11:00,720 down. So, it's kind of like similar in 315 00:10:59,039 --> 00:11:02,399 that way as well. 316 00:11:00,720 --> 00:11:04,320 uh more analogies that I think strike me 317 00:11:02,399 --> 00:11:05,919 is that we're kind of like in this 318 00:11:04,320 --> 00:11:09,040 1960sish 319 00:11:05,919 --> 00:11:10,719 era where LLM compute is still very 320 00:11:09,039 --> 00:11:13,439 expensive for this new kind of a 321 00:11:10,720 --> 00:11:15,839 computer and that forces the LLMs to be 322 00:11:13,440 --> 00:11:18,399 centralized in the cloud and we're all 323 00:11:15,839 --> 00:11:20,320 just uh sort of thing clients that 324 00:11:18,399 --> 00:11:22,078 interact with it over the network and 325 00:11:20,320 --> 00:11:24,160 none of us have full utilization of 326 00:11:22,078 --> 00:11:26,399 these computers and therefore it makes 327 00:11:24,159 --> 00:11:28,319 sense to use time sharing where we're 328 00:11:26,399 --> 00:11:30,000 all just you know a dimension of the 329 00:11:28,320 --> 00:11:32,000 batch when they're running the computer 330 00:11:30,000 --> 00:11:33,440 in the cloud. And this is very much what 331 00:11:32,000 --> 00:11:35,039 computers used to look like at during 332 00:11:33,440 --> 00:11:36,160 this time. The operating systems were in 333 00:11:35,039 --> 00:11:39,599 the cloud. Everything was streamed 334 00:11:36,159 --> 00:11:41,519 around and there was batching. And so 335 00:11:39,600 --> 00:11:42,959 the p the personal computing revolution 336 00:11:41,519 --> 00:11:44,560 hasn't happened yet because it's just 337 00:11:42,958 --> 00:11:46,719 not economical. It doesn't make sense. 338 00:11:44,559 --> 00:11:48,399 But I think some people are trying. And 339 00:11:46,720 --> 00:11:50,399 it turns out that Mac minis, for 340 00:11:48,399 --> 00:11:52,320 example, are a very good fit for some of 341 00:11:50,399 --> 00:11:53,839 the LLMs because it's all if you're 342 00:11:52,320 --> 00:11:55,360 doing batch one inference, this is all 343 00:11:53,839 --> 00:11:56,880 super memory bound. So this actually 344 00:11:55,360 --> 00:11:58,720 works. 345 00:11:56,879 --> 00:12:00,399 And uh I think these are some early 346 00:11:58,720 --> 00:12:02,079 indications maybe of personal computing. 347 00:12:00,399 --> 00:12:03,519 Uh but this hasn't really happened yet. 348 00:12:02,078 --> 00:12:05,199 It's not clear what this looks like. 349 00:12:03,519 --> 00:12:08,078 Maybe some of you get to invent what 350 00:12:05,200 --> 00:12:10,320 what this is or how it works or uh what 351 00:12:08,078 --> 00:12:12,159 this should what this should be. Maybe 352 00:12:10,320 --> 00:12:14,560 one more analogy that I'll mention is 353 00:12:12,159 --> 00:12:16,480 whenever I talk to Chach or some LLM 354 00:12:14,559 --> 00:12:18,399 directly in text, I feel like I'm 355 00:12:16,480 --> 00:12:21,039 talking to an operating system through 356 00:12:18,399 --> 00:12:22,639 the terminal. Like it's just it's it's 357 00:12:21,039 --> 00:12:24,719 text. It's direct access to the 358 00:12:22,639 --> 00:12:26,720 operating system. And I think a guey 359 00:12:24,720 --> 00:12:29,680 hasn't yet really been invented in like 360 00:12:26,720 --> 00:12:31,440 a general way like should chatt have a 361 00:12:29,679 --> 00:12:33,439 guey like different than just a tech 362 00:12:31,440 --> 00:12:35,360 bubbles. Uh certainly some of the apps 363 00:12:33,440 --> 00:12:38,480 that we're going to go into in a bit 364 00:12:35,360 --> 00:12:40,240 have guey but there's no like guey 365 00:12:38,480 --> 00:12:43,440 across all the tasks if that makes 366 00:12:40,240 --> 00:12:45,519 sense. Um there are some ways in which 367 00:12:43,440 --> 00:12:47,440 LLMs are different from kind of 368 00:12:45,519 --> 00:12:49,839 operating systems in some fairly unique 369 00:12:47,440 --> 00:12:52,880 way and from early computing. And I 370 00:12:49,839 --> 00:12:54,240 wrote about uh this one particular 371 00:12:52,879 --> 00:12:57,120 property that strikes me as very 372 00:12:54,240 --> 00:12:59,839 different uh this time around. It's that 373 00:12:57,120 --> 00:13:02,000 LLMs like flip they flip the direction 374 00:12:59,839 --> 00:13:05,360 of technology diffusion uh that is 375 00:13:02,000 --> 00:13:07,039 usually uh present in technology. So for 376 00:13:05,360 --> 00:13:09,120 example with electricity, cryptography, 377 00:13:07,039 --> 00:13:10,639 computing, flight, internet, GPS, lots 378 00:13:09,120 --> 00:13:12,320 of new transformative technologies that 379 00:13:10,639 --> 00:13:14,320 have not been around. Typically it is 380 00:13:12,320 --> 00:13:16,720 the government and corporations that are 381 00:13:14,320 --> 00:13:18,720 the first users because it's new and 382 00:13:16,720 --> 00:13:20,720 expensive etc. and it only later 383 00:13:18,720 --> 00:13:22,079 diffuses to consumer. Uh, but I feel 384 00:13:20,720 --> 00:13:24,000 like LLMs are kind of like flipped 385 00:13:22,078 --> 00:13:26,000 around. So maybe with early computers, 386 00:13:24,000 --> 00:13:29,039 it was all about ballistics and military 387 00:13:26,000 --> 00:13:30,320 use, but with LLMs, it's all about how 388 00:13:29,039 --> 00:13:32,000 do you boil an egg or something like 389 00:13:30,320 --> 00:13:33,600 that. This is certainly like a lot of my 390 00:13:32,000 --> 00:13:35,600 use. And so it's really fascinating to 391 00:13:33,600 --> 00:13:37,360 me that we have a new magical computer 392 00:13:35,600 --> 00:13:38,879 and it's like helping me boil an egg. 393 00:13:37,360 --> 00:13:40,720 It's not helping the government do 394 00:13:38,879 --> 00:13:42,159 something really crazy like some 395 00:13:40,720 --> 00:13:43,839 military ballistics or some special 396 00:13:42,159 --> 00:13:45,120 technology. Indeed, corporations are 397 00:13:43,839 --> 00:13:47,200 governments are lagging behind the 398 00:13:45,120 --> 00:13:48,959 adoption of all of us, of all of these 399 00:13:47,200 --> 00:13:50,480 technologies. So, it's just backwards 400 00:13:48,958 --> 00:13:52,399 and I think it informs maybe some of the 401 00:13:50,480 --> 00:13:53,600 uses of how we want to use this 402 00:13:52,399 --> 00:13:56,078 technology or like where are some of the 403 00:13:53,600 --> 00:14:01,040 first apps and so on. 404 00:13:56,078 --> 00:14:03,679 So, in summary so far, LLM labs LLMs. I 405 00:14:01,039 --> 00:14:06,480 think it's accurate language to use, but 406 00:14:03,679 --> 00:14:08,559 LLMs are complicated operating systems. 407 00:14:06,480 --> 00:14:10,240 They're circa 1960s in computing and 408 00:14:08,559 --> 00:14:11,838 we're redoing computing all over again. 409 00:14:10,240 --> 00:14:13,839 and they're currently available via time 410 00:14:11,839 --> 00:14:16,000 sharing and distributed like a utility. 411 00:14:13,839 --> 00:14:17,360 What is new and unprecedented is that 412 00:14:16,000 --> 00:14:18,879 they're not in the hands of a few 413 00:14:17,360 --> 00:14:20,240 governments and corporations. They're in 414 00:14:18,879 --> 00:14:21,600 the hands of all of us because we all 415 00:14:20,240 --> 00:14:24,320 have a computer and it's all just 416 00:14:21,600 --> 00:14:26,639 software and Chaship was beamed down to 417 00:14:24,320 --> 00:14:28,320 our computers like billions of people 418 00:14:26,639 --> 00:14:30,879 like instantly and overnight and this is 419 00:14:28,320 --> 00:14:33,278 insane. Uh and it's kind of insane to me 420 00:14:30,879 --> 00:14:34,958 that this is the case and now it is our 421 00:14:33,278 --> 00:14:37,278 time to enter the industry and program 422 00:14:34,958 --> 00:14:39,679 these computers. This is crazy. So I 423 00:14:37,278 --> 00:14:42,078 think this is quite remarkable. Before 424 00:14:39,679 --> 00:14:43,519 we program LLMs, we have to kind of like 425 00:14:42,078 --> 00:14:45,838 spend some time to think about what 426 00:14:43,519 --> 00:14:48,320 these things are. And I especially like 427 00:14:45,839 --> 00:14:50,480 to kind of talk about their psychology. 428 00:14:48,320 --> 00:14:51,519 So the way I like to think about LLMs is 429 00:14:50,480 --> 00:14:54,079 that they're kind of like people 430 00:14:51,519 --> 00:14:56,399 spirits. Um they are stoastic 431 00:14:54,078 --> 00:14:58,000 simulations of people. Um and the 432 00:14:56,399 --> 00:14:59,839 simulator in this case happens to be an 433 00:14:58,000 --> 00:15:02,720 auto reggressive transformer. So 434 00:14:59,839 --> 00:15:04,800 transformer is a neural net. Uh it's and 435 00:15:02,720 --> 00:15:06,480 it just kind of like is goes on the 436 00:15:04,799 --> 00:15:08,319 level of tokens. It goes chunk chunk 437 00:15:06,480 --> 00:15:10,159 chunk chunk chunk. And there's an almost 438 00:15:08,320 --> 00:15:14,720 equal amount of compute for every single 439 00:15:10,159 --> 00:15:16,958 chunk. Um and um this simulator of 440 00:15:14,720 --> 00:15:19,040 course is is just is basically there's 441 00:15:16,958 --> 00:15:20,479 some weights involved and we fit it to 442 00:15:19,039 --> 00:15:22,159 all of text that we have on the internet 443 00:15:20,480 --> 00:15:24,240 and so on. And you end up with this kind 444 00:15:22,159 --> 00:15:26,240 of a simulator and because it is trained 445 00:15:24,240 --> 00:15:28,399 on humans, it's got this emergent 446 00:15:26,240 --> 00:15:30,639 psychology that is humanlike. So the 447 00:15:28,399 --> 00:15:32,559 first thing you'll notice is of course 448 00:15:30,639 --> 00:15:34,639 uh LLM have encyclopedic knowledge and 449 00:15:32,559 --> 00:15:36,078 memory. uh and they can remember lots of 450 00:15:34,639 --> 00:15:37,600 things, a lot more than any single 451 00:15:36,078 --> 00:15:39,838 individual human can because they read 452 00:15:37,600 --> 00:15:41,680 so many things. It's it actually kind of 453 00:15:39,839 --> 00:15:43,040 reminds me of this movie Rainman, which 454 00:15:41,679 --> 00:15:44,479 I actually really recommend people 455 00:15:43,039 --> 00:15:46,719 watch. It's an amazing movie. I love 456 00:15:44,480 --> 00:15:49,199 this movie. Um and Dustin Hoffman here 457 00:15:46,720 --> 00:15:51,600 is an autistic savant who has almost 458 00:15:49,198 --> 00:15:53,278 perfect memory. So, he can read a he can 459 00:15:51,600 --> 00:15:55,360 read like a phone book and remember all 460 00:15:53,278 --> 00:15:57,198 of the names and phone numbers. And I 461 00:15:55,360 --> 00:15:58,959 kind of feel like LM are kind of like 462 00:15:57,198 --> 00:16:00,399 very similar. They can remember Shaw 463 00:15:58,958 --> 00:16:02,479 hashes and lots of different kinds of 464 00:16:00,399 --> 00:16:04,399 things very very easily. So they 465 00:16:02,480 --> 00:16:06,240 certainly have superpowers in some set 466 00:16:04,399 --> 00:16:08,799 in some respects. But they also have a 467 00:16:06,240 --> 00:16:11,759 bunch of I would say cognitive deficits. 468 00:16:08,799 --> 00:16:13,120 So they hallucinate quite a bit. Um and 469 00:16:11,759 --> 00:16:15,278 they kind of make up stuff and don't 470 00:16:13,120 --> 00:16:17,679 have a very good uh sort of internal 471 00:16:15,278 --> 00:16:19,360 model of self-nowledge, not sufficient 472 00:16:17,679 --> 00:16:21,599 at least. And this has gotten better but 473 00:16:19,360 --> 00:16:22,800 not perfect. They display jagged 474 00:16:21,600 --> 00:16:24,480 intelligence. So they're going to be 475 00:16:22,799 --> 00:16:26,000 superhuman in some problems solving 476 00:16:24,480 --> 00:16:27,680 domains. And then they're going to make 477 00:16:26,000 --> 00:16:29,919 mistakes that basically no human will 478 00:16:27,679 --> 00:16:32,559 make. like you know they will insist 479 00:16:29,919 --> 00:16:34,240 that 9.11 is greater than 9.9 or that 480 00:16:32,559 --> 00:16:36,159 there are two Rs in strawberry these are 481 00:16:34,240 --> 00:16:38,879 some famous examples but basically there 482 00:16:36,159 --> 00:16:40,319 are rough edges that you can trip on so 483 00:16:38,879 --> 00:16:43,278 that's kind of I think also kind of 484 00:16:40,320 --> 00:16:46,879 unique um they also kind of suffer from 485 00:16:43,278 --> 00:16:48,078 entrograde amnesia um so uh and I think 486 00:16:46,879 --> 00:16:49,278 I'm alluding to the fact that if you 487 00:16:48,078 --> 00:16:51,439 have a co-orker who joins your 488 00:16:49,278 --> 00:16:54,159 organization this co-orker will over 489 00:16:51,440 --> 00:16:55,920 time learn your organization and uh they 490 00:16:54,159 --> 00:16:57,759 will understand and gain like a huge 491 00:16:55,919 --> 00:16:59,599 amount of context on the organization 492 00:16:57,759 --> 00:17:01,120 and they go home and they sleep and they 493 00:16:59,600 --> 00:17:03,440 consolidate knowledge and they develop 494 00:17:01,120 --> 00:17:04,640 expertise over time. LLMs don't natively 495 00:17:03,440 --> 00:17:06,400 do this and this is not something that 496 00:17:04,640 --> 00:17:09,280 has really been solved in the R&D of 497 00:17:06,400 --> 00:17:10,559 LLM. I think um and so context windows 498 00:17:09,279 --> 00:17:12,000 are really kind of like working memory 499 00:17:10,558 --> 00:17:13,599 and you have to sort of program the 500 00:17:12,000 --> 00:17:15,038 working memory quite directly because 501 00:17:13,599 --> 00:17:17,038 they don't just kind of like get smarter 502 00:17:15,038 --> 00:17:19,038 by uh by default and I think a lot of 503 00:17:17,038 --> 00:17:22,240 people get tripped up by the analogies 504 00:17:19,038 --> 00:17:23,919 uh in this way. Uh in popular culture I 505 00:17:22,240 --> 00:17:26,078 recommend people watch these two movies 506 00:17:23,919 --> 00:17:27,759 uh Momento and 51st dates. In both of 507 00:17:26,078 --> 00:17:29,839 these movies, the protagonists, their 508 00:17:27,759 --> 00:17:32,160 weights are fixed and their context 509 00:17:29,839 --> 00:17:34,240 windows gets wiped every single morning 510 00:17:32,160 --> 00:17:35,759 and it's really problematic to go to 511 00:17:34,240 --> 00:17:37,519 work or have relationships when this 512 00:17:35,759 --> 00:17:39,599 happens and this happens to all the 513 00:17:37,519 --> 00:17:42,319 time. I guess one more thing I would 514 00:17:39,599 --> 00:17:44,319 point to is security kind of related 515 00:17:42,319 --> 00:17:46,399 limitations of the use of LLM. So for 516 00:17:44,319 --> 00:17:48,240 example, LLMs are quite gullible. Uh 517 00:17:46,400 --> 00:17:50,798 they are susceptible to prompt injection 518 00:17:48,240 --> 00:17:52,798 risks. They might leak your data etc. 519 00:17:50,798 --> 00:17:55,279 And so um and there's many other 520 00:17:52,798 --> 00:17:57,519 considerations uh security related. So, 521 00:17:55,279 --> 00:18:00,000 so basically long story short, you have 522 00:17:57,519 --> 00:18:01,279 to load your you have to load your you 523 00:18:00,000 --> 00:18:03,200 have to simultaneously think through 524 00:18:01,279 --> 00:18:05,440 this superhuman thing that has a bunch 525 00:18:03,200 --> 00:18:07,759 of cognitive deficits and issues. How do 526 00:18:05,440 --> 00:18:10,640 we and yet they are extremely like 527 00:18:07,759 --> 00:18:12,400 useful and so how do we program them and 528 00:18:10,640 --> 00:18:15,759 how do we work around their deficits and 529 00:18:12,400 --> 00:18:17,440 enjoy their superhuman powers. 530 00:18:15,759 --> 00:18:18,960 So what I want to switch to now is talk 531 00:18:17,440 --> 00:18:20,720 about the opportunities of how do we use 532 00:18:18,960 --> 00:18:22,400 these models and what are some of the 533 00:18:20,720 --> 00:18:23,519 biggest opportunities. This is not a 534 00:18:22,400 --> 00:18:24,640 comprehensive list just some of the 535 00:18:23,519 --> 00:18:26,879 things that I thought were interesting 536 00:18:24,640 --> 00:18:29,280 for this talk. The first thing I'm kind 537 00:18:26,880 --> 00:18:32,160 of excited about is what I would call 538 00:18:29,279 --> 00:18:34,240 partial autonomy apps. So for example, 539 00:18:32,160 --> 00:18:36,558 let's work with the example of coding. 540 00:18:34,240 --> 00:18:38,079 You can certainly go to chacht directly 541 00:18:36,558 --> 00:18:40,960 and you can start copy pasting code 542 00:18:38,079 --> 00:18:42,399 around and copyping bug reports and 543 00:18:40,960 --> 00:18:44,160 stuff around and getting code and copy 544 00:18:42,400 --> 00:18:45,440 pasting everything around. Why would you 545 00:18:44,160 --> 00:18:47,120 why would you do that? Why would you go 546 00:18:45,440 --> 00:18:48,480 directly to the operating system? It 547 00:18:47,119 --> 00:18:50,719 makes a lot more sense to have an app 548 00:18:48,480 --> 00:18:53,759 dedicated for this. And so I think many 549 00:18:50,720 --> 00:18:56,319 of you uh use uh cursor. I do as well. 550 00:18:53,759 --> 00:18:57,759 And uh cursor is kind of like the thing 551 00:18:56,319 --> 00:18:59,759 you want instead. You don't want to just 552 00:18:57,759 --> 00:19:01,440 directly go to the chash apt. And I 553 00:18:59,759 --> 00:19:03,759 think cursor is a very good example of 554 00:19:01,440 --> 00:19:06,160 an early LLM app that has a bunch of 555 00:19:03,759 --> 00:19:08,000 properties that I think are um useful 556 00:19:06,160 --> 00:19:09,679 across all the LLM apps. So in 557 00:19:08,000 --> 00:19:12,000 particular, you will notice that we have 558 00:19:09,679 --> 00:19:13,840 a traditional interface that allows a 559 00:19:12,000 --> 00:19:16,480 human to go in and do all the work 560 00:19:13,839 --> 00:19:17,839 manually just as before. But in addition 561 00:19:16,480 --> 00:19:19,360 to that, we now have this LLM 562 00:19:17,839 --> 00:19:21,918 integration that allows us to go in 563 00:19:19,359 --> 00:19:23,519 bigger chunks. And so some of the 564 00:19:21,919 --> 00:19:25,840 properties of LLM apps that I think are 565 00:19:23,519 --> 00:19:28,079 shared and useful to point out. Number 566 00:19:25,839 --> 00:19:31,199 one, the LLMs basically do a ton of the 567 00:19:28,079 --> 00:19:33,199 context management. Um, number two, they 568 00:19:31,200 --> 00:19:34,960 orchestrate multiple calls to LLMs, 569 00:19:33,200 --> 00:19:36,960 right? So in the case of cursor, there's 570 00:19:34,960 --> 00:19:39,200 under the hood embedding models for all 571 00:19:36,960 --> 00:19:41,840 your files, the actual chat models, 572 00:19:39,200 --> 00:19:43,919 models that apply diffs to the code, and 573 00:19:41,839 --> 00:19:46,079 this is all orchestrated for you. A 574 00:19:43,919 --> 00:19:48,480 really big one that uh I think also 575 00:19:46,079 --> 00:19:50,480 maybe not fully appreciated always is 576 00:19:48,480 --> 00:19:53,120 application specific uh GUI and the 577 00:19:50,480 --> 00:19:54,558 importance of it. Um because you don't 578 00:19:53,119 --> 00:19:56,558 just want to talk to the operating 579 00:19:54,558 --> 00:19:59,038 system directly in text. Text is very 580 00:19:56,558 --> 00:20:00,480 hard to read, interpret, understand and 581 00:19:59,038 --> 00:20:03,119 also like you don't want to take some of 582 00:20:00,480 --> 00:20:05,038 these actions natively in text. So it's 583 00:20:03,119 --> 00:20:06,798 much better to just see a diff as like 584 00:20:05,038 --> 00:20:08,480 red and green change and you can see 585 00:20:06,798 --> 00:20:10,240 what's being added is subtracted. It's 586 00:20:08,480 --> 00:20:11,919 much easier to just do command Y to 587 00:20:10,240 --> 00:20:13,120 accept or command N to reject. I 588 00:20:11,919 --> 00:20:15,520 shouldn't have to type it in text, 589 00:20:13,119 --> 00:20:17,839 right? So, a guey allows a human to 590 00:20:15,519 --> 00:20:20,000 audit the work of these fallible systems 591 00:20:17,839 --> 00:20:21,759 and to go faster. I'm going to come back 592 00:20:20,000 --> 00:20:23,839 to this point a little bit uh later as 593 00:20:21,759 --> 00:20:25,200 well. And the last kind of feature I 594 00:20:23,839 --> 00:20:27,678 want to point out is that there's what I 595 00:20:25,200 --> 00:20:29,440 call the autonomy slider. So, for 596 00:20:27,679 --> 00:20:31,519 example, in cursor, you can just do tap 597 00:20:29,440 --> 00:20:33,600 completion. You're mostly in charge. You 598 00:20:31,519 --> 00:20:36,000 can select a chunk of code and command K 599 00:20:33,599 --> 00:20:37,918 to change just that chunk of code. You 600 00:20:36,000 --> 00:20:40,400 can do command L to change the entire 601 00:20:37,919 --> 00:20:42,159 file. Or you can do command I which just 602 00:20:40,400 --> 00:20:44,080 you know let it rip do whatever you want 603 00:20:42,159 --> 00:20:46,400 in the entire repo and that's the sort 604 00:20:44,079 --> 00:20:48,319 of full autonomy agent agentic version 605 00:20:46,400 --> 00:20:50,159 and so you are in charge of the autonomy 606 00:20:48,319 --> 00:20:53,038 slider and depending on the complexity 607 00:20:50,159 --> 00:20:54,320 of the task at hand you can uh tune the 608 00:20:53,038 --> 00:20:57,119 amount of autonomy that you're willing 609 00:20:54,319 --> 00:20:58,558 to give up uh for that task maybe to 610 00:20:57,119 --> 00:21:03,038 show one more example of a fairly 611 00:20:58,558 --> 00:21:04,639 successful LLM app uh perplexity um it 612 00:21:03,038 --> 00:21:07,200 also has very similar features to what 613 00:21:04,640 --> 00:21:08,720 I've just pointed out to in cursor uh it 614 00:21:07,200 --> 00:21:10,960 packages up a lot of the information. It 615 00:21:08,720 --> 00:21:13,440 orchestrates multiple LLMs. It's got a 616 00:21:10,960 --> 00:21:15,600 GUI that allows you to audit some of its 617 00:21:13,440 --> 00:21:17,279 work. So, for example, it will site 618 00:21:15,599 --> 00:21:18,959 sources and you can imagine inspecting 619 00:21:17,279 --> 00:21:20,639 them. And it's got an autonomy slider. 620 00:21:18,960 --> 00:21:22,319 You can either just do a quick search or 621 00:21:20,640 --> 00:21:24,320 you can do research or you can do deep 622 00:21:22,319 --> 00:21:25,678 research and come back 10 minutes later. 623 00:21:24,319 --> 00:21:27,678 So, this is all just varying levels of 624 00:21:25,679 --> 00:21:30,159 autonomy that you give up to the tool. 625 00:21:27,679 --> 00:21:32,000 So, I guess my question is I feel like a 626 00:21:30,159 --> 00:21:33,520 lot of software will become partially 627 00:21:32,000 --> 00:21:35,279 autonomous. I'm trying to think through 628 00:21:33,519 --> 00:21:36,960 like what does that look like? And for 629 00:21:35,279 --> 00:21:38,960 many of you who maintain products and 630 00:21:36,960 --> 00:21:40,240 services, how are you going to make your 631 00:21:38,960 --> 00:21:42,720 products and services partially 632 00:21:40,240 --> 00:21:45,120 autonomous? Can an LLM see everything 633 00:21:42,720 --> 00:21:47,038 that a human can see? Can an LLM act in 634 00:21:45,119 --> 00:21:49,439 all the ways that a human could act? And 635 00:21:47,038 --> 00:21:50,879 can humans supervise and stay in the 636 00:21:49,440 --> 00:21:52,320 loop of this activity? Because again, 637 00:21:50,880 --> 00:21:54,880 these are fallible systems that aren't 638 00:21:52,319 --> 00:21:56,558 yet perfect. And what does a diff look 639 00:21:54,880 --> 00:21:58,799 like in Photoshop or something like 640 00:21:56,558 --> 00:22:00,079 that? You know, and also a lot of the 641 00:21:58,798 --> 00:22:01,839 traditional software right now, it has 642 00:22:00,079 --> 00:22:03,359 all these switches and all this kind of 643 00:22:01,839 --> 00:22:04,720 stuff that's all designed for human. All 644 00:22:03,359 --> 00:22:07,759 of this has to change and become 645 00:22:04,720 --> 00:22:09,519 accessible to LLMs. 646 00:22:07,759 --> 00:22:11,119 So, one thing I want to stress with a 647 00:22:09,519 --> 00:22:14,240 lot of these LLM apps that I'm not sure 648 00:22:11,119 --> 00:22:16,798 gets as much attention as it should is 649 00:22:14,240 --> 00:22:18,640 um we we're now kind of like cooperating 650 00:22:16,798 --> 00:22:20,158 with AIS and usually they are doing the 651 00:22:18,640 --> 00:22:22,559 generation and we as humans are doing 652 00:22:20,159 --> 00:22:24,480 the verification. It is in our interest 653 00:22:22,558 --> 00:22:25,759 to make this loop go as fast as 654 00:22:24,480 --> 00:22:28,000 possible. So, we're getting a lot of 655 00:22:25,759 --> 00:22:30,400 work done. There are two major ways that 656 00:22:28,000 --> 00:22:32,720 I think uh this can be done. Number one, 657 00:22:30,400 --> 00:22:34,240 you can speed up verification a lot. Um, 658 00:22:32,720 --> 00:22:36,079 and I think guies, for example, are 659 00:22:34,240 --> 00:22:39,279 extremely important to this because a 660 00:22:36,079 --> 00:22:41,359 guey utilizes your computer vision GPU 661 00:22:39,279 --> 00:22:43,200 in all of our head. Reading text is 662 00:22:41,359 --> 00:22:45,759 effortful and it's not fun, but looking 663 00:22:43,200 --> 00:22:47,440 at stuff is fun and it's it's just a 664 00:22:45,759 --> 00:22:49,679 kind of like a highway to your brain. 665 00:22:47,440 --> 00:22:51,679 So, I think guies are very useful for 666 00:22:49,679 --> 00:22:53,600 auditing systems and visual 667 00:22:51,679 --> 00:22:56,080 representations in general. And number 668 00:22:53,599 --> 00:22:58,879 two, I would say is we have to keep the 669 00:22:56,079 --> 00:23:00,639 AI on the leash. We I think a lot of 670 00:22:58,880 --> 00:23:03,600 people are getting way over excited with 671 00:23:00,640 --> 00:23:05,840 AI agents and uh it's not useful to me 672 00:23:03,599 --> 00:23:07,918 to get a diff of 10,000 lines of code to 673 00:23:05,839 --> 00:23:09,199 my repo. Like I have to I'm still the 674 00:23:07,919 --> 00:23:11,120 bottleneck, right? Even though that 675 00:23:09,200 --> 00:23:12,240 10,00 lines come out instantly, I have 676 00:23:11,119 --> 00:23:15,359 to make sure that this thing is not 677 00:23:12,240 --> 00:23:16,558 introducing bugs. It's just like and 678 00:23:15,359 --> 00:23:17,839 that it's doing the correct thing, 679 00:23:16,558 --> 00:23:22,879 right? And that there's no security 680 00:23:17,839 --> 00:23:25,439 issues and so on. So um I think that um 681 00:23:22,880 --> 00:23:28,240 yeah basically you we have to sort of 682 00:23:25,440 --> 00:23:30,320 like it's in our interest to make the 683 00:23:28,240 --> 00:23:32,159 the flow of these two go very very fast 684 00:23:30,319 --> 00:23:33,119 and we have to somehow keep the AI on 685 00:23:32,159 --> 00:23:35,280 the leash because it gets way too 686 00:23:33,119 --> 00:23:37,279 overreactive. It's uh it's kind of like 687 00:23:35,279 --> 00:23:39,200 this. This is how I feel when I do AI 688 00:23:37,279 --> 00:23:40,879 assisted coding. If I'm just bite coding 689 00:23:39,200 --> 00:23:42,400 everything is nice and great but if I'm 690 00:23:40,880 --> 00:23:44,720 actually trying to get work done it's 691 00:23:42,400 --> 00:23:47,280 not so great to have an overreactive uh 692 00:23:44,720 --> 00:23:48,798 agent doing all this kind of stuff. So 693 00:23:47,279 --> 00:23:51,119 this slide is not very good. I'm sorry, 694 00:23:48,798 --> 00:23:53,839 but I guess I'm trying to develop like 695 00:23:51,119 --> 00:23:55,759 many of you some ways of utilizing these 696 00:23:53,839 --> 00:23:58,079 agents in my coding workflow and to do 697 00:23:55,759 --> 00:23:59,839 AI assisted coding. And in my own work, 698 00:23:58,079 --> 00:24:02,240 I'm always scared to get way too big 699 00:23:59,839 --> 00:24:04,158 diffs. I always go in small incremental 700 00:24:02,240 --> 00:24:06,159 chunks. I want to make sure that 701 00:24:04,159 --> 00:24:09,120 everything is good. I want to spin this 702 00:24:06,159 --> 00:24:10,799 loop very very fast and um I sort of 703 00:24:09,119 --> 00:24:13,199 work on small chunks of single concrete 704 00:24:10,798 --> 00:24:14,639 thing. Uh and so I think many of you 705 00:24:13,200 --> 00:24:17,600 probably are developing similar ways of 706 00:24:14,640 --> 00:24:19,600 working with the with LLMs. 707 00:24:17,599 --> 00:24:22,240 Um, I also saw a number of blog posts 708 00:24:19,599 --> 00:24:24,000 that try to develop these best practices 709 00:24:22,240 --> 00:24:25,359 for working with LLMs. And here's one 710 00:24:24,000 --> 00:24:26,798 that I read recently and I thought was 711 00:24:25,359 --> 00:24:28,240 quite good. And it kind of discussed 712 00:24:26,798 --> 00:24:29,918 some techniques and some of them have to 713 00:24:28,240 --> 00:24:32,000 do with how you keep the AI on the 714 00:24:29,919 --> 00:24:34,960 leash. And so, as an example, if you are 715 00:24:32,000 --> 00:24:36,960 prompting, if your prompt is vague, then 716 00:24:34,960 --> 00:24:38,880 uh the AI might not do exactly what you 717 00:24:36,960 --> 00:24:40,240 wanted and in that case, verification 718 00:24:38,880 --> 00:24:42,080 will fail. You're going to ask for 719 00:24:40,240 --> 00:24:43,679 something else. If a verification fails, 720 00:24:42,079 --> 00:24:45,119 then you're going to start spinning. So 721 00:24:43,679 --> 00:24:46,798 it makes a lot more sense to spend a bit 722 00:24:45,119 --> 00:24:48,479 more time to be more concrete in your 723 00:24:46,798 --> 00:24:50,240 prompts which increases the probability 724 00:24:48,480 --> 00:24:52,079 of successful verification and you can 725 00:24:50,240 --> 00:24:54,079 move forward. And so I think a lot of us 726 00:24:52,079 --> 00:24:56,319 are going to end up finding um kind of 727 00:24:54,079 --> 00:24:57,839 techniques like this. I think in my own 728 00:24:56,319 --> 00:25:00,079 work as well I'm currently interested in 729 00:24:57,839 --> 00:25:01,839 uh what education looks like in um 730 00:25:00,079 --> 00:25:04,480 together with kind of like now that we 731 00:25:01,839 --> 00:25:07,038 have AI uh and LLMs what does education 732 00:25:04,480 --> 00:25:09,679 look like? And I think a a large amount 733 00:25:07,038 --> 00:25:11,440 of thought for me goes into how we keep 734 00:25:09,679 --> 00:25:13,200 AI on the leash. I don't think it just 735 00:25:11,440 --> 00:25:14,798 works to go to chat and be like, "Hey, 736 00:25:13,200 --> 00:25:16,880 teach me physics." I don't think this 737 00:25:14,798 --> 00:25:18,798 works because the AI is like gets lost 738 00:25:16,880 --> 00:25:20,880 in the woods. And so for me, this is 739 00:25:18,798 --> 00:25:22,639 actually two separate apps. For example, 740 00:25:20,880 --> 00:25:24,880 there's an app for a teacher that 741 00:25:22,640 --> 00:25:26,480 creates courses and then there's an app 742 00:25:24,880 --> 00:25:29,120 that takes courses and serves them to 743 00:25:26,480 --> 00:25:31,200 students. And in both cases, we now have 744 00:25:29,119 --> 00:25:32,719 this intermediate artifact of a course 745 00:25:31,200 --> 00:25:33,840 that is auditable and we can make sure 746 00:25:32,720 --> 00:25:35,919 it's good. We can make sure it's 747 00:25:33,839 --> 00:25:37,119 consistent. and the AI is kept on the 748 00:25:35,919 --> 00:25:40,240 leash with respect to a certain 749 00:25:37,119 --> 00:25:42,639 syllabus, a certain like um progression 750 00:25:40,240 --> 00:25:44,159 of projects and so on. And so this is 751 00:25:42,640 --> 00:25:45,759 one way of keeping the AI on leash and I 752 00:25:44,159 --> 00:25:47,760 think has a much higher likelihood of 753 00:25:45,759 --> 00:25:49,919 working and the AI is not getting lost 754 00:25:47,759 --> 00:25:51,919 in the woods. 755 00:25:49,919 --> 00:25:54,480 One more kind of analogy I wanted to 756 00:25:51,919 --> 00:25:56,159 sort of allude to is I'm not I'm no 757 00:25:54,480 --> 00:25:57,839 stranger to partial autonomy and I kind 758 00:25:56,159 --> 00:26:00,240 of worked on this I think for five years 759 00:25:57,839 --> 00:26:01,918 at Tesla and this is also a partial 760 00:26:00,240 --> 00:26:03,519 autonomy product and shares a lot of the 761 00:26:01,919 --> 00:26:05,440 features like for example right there in 762 00:26:03,519 --> 00:26:07,599 the instrument panel is the GUI of the 763 00:26:05,440 --> 00:26:09,200 autopilot so it's showing me what the 764 00:26:07,599 --> 00:26:10,798 what the neural network sees and so on 765 00:26:09,200 --> 00:26:13,440 and we have the autonomy slider where 766 00:26:10,798 --> 00:26:15,599 over the course of my tenure there we 767 00:26:13,440 --> 00:26:18,320 did more and more autonomous tasks for 768 00:26:15,599 --> 00:26:21,119 the user and maybe the story that I 769 00:26:18,319 --> 00:26:22,639 wanted to tell very briefly is uh 770 00:26:21,119 --> 00:26:25,199 actually the first time I drove a 771 00:26:22,640 --> 00:26:27,278 self-driving vehicle was in 2013 and I 772 00:26:25,200 --> 00:26:29,120 had a friend who worked at Whimo and uh 773 00:26:27,278 --> 00:26:31,519 he offered to give me a drive around 774 00:26:29,119 --> 00:26:33,918 Palo Alto. I took this picture using 775 00:26:31,519 --> 00:26:35,278 Google Glass at the time and many of you 776 00:26:33,919 --> 00:26:37,278 are so young that you might not even 777 00:26:35,278 --> 00:26:39,440 know what that is. Uh but uh yeah, this 778 00:26:37,278 --> 00:26:40,960 was like all the rage at the time. And 779 00:26:39,440 --> 00:26:42,960 we got into this car and we went for 780 00:26:40,960 --> 00:26:45,120 about a 30-minute drive around Palo Alto 781 00:26:42,960 --> 00:26:46,960 highways uh streets and so on. And this 782 00:26:45,119 --> 00:26:49,839 drive was perfect. There was zero 783 00:26:46,960 --> 00:26:52,480 interventions and this was 2013 which is 784 00:26:49,839 --> 00:26:54,000 now 12 years ago. And it kind of struck 785 00:26:52,480 --> 00:26:56,159 me because at the time when I had this 786 00:26:54,000 --> 00:26:59,519 perfect drive, this perfect demo, I felt 787 00:26:56,159 --> 00:27:00,799 like, wow, self-driving is imminent 788 00:26:59,519 --> 00:27:03,440 because this just worked. This is 789 00:27:00,798 --> 00:27:04,879 incredible. Um, but here we are 12 years 790 00:27:03,440 --> 00:27:07,038 later and we are still working on 791 00:27:04,880 --> 00:27:09,200 autonomy. Um, we are still working on 792 00:27:07,038 --> 00:27:10,798 driving agents and even now we haven't 793 00:27:09,200 --> 00:27:12,880 actually like really solved the problem. 794 00:27:10,798 --> 00:27:14,960 like you may see Whimos going around and 795 00:27:12,880 --> 00:27:16,799 they look driverless but you know 796 00:27:14,960 --> 00:27:18,720 there's still a lot of teleoperation and 797 00:27:16,798 --> 00:27:20,960 a lot of human in the loop of a lot of 798 00:27:18,720 --> 00:27:22,558 this driving so we still haven't even 799 00:27:20,960 --> 00:27:24,400 like declared success but I think it's 800 00:27:22,558 --> 00:27:26,558 definitely like going to succeed at this 801 00:27:24,400 --> 00:27:29,360 point but it just took a long time and 802 00:27:26,558 --> 00:27:31,599 so I think like like this is software is 803 00:27:29,359 --> 00:27:34,719 really tricky I think in the same way 804 00:27:31,599 --> 00:27:36,480 that driving is tricky and so when I see 805 00:27:34,720 --> 00:27:38,720 things like oh 2025 is the year of 806 00:27:36,480 --> 00:27:41,038 agents I get very concerned and I kind 807 00:27:38,720 --> 00:27:44,079 of feel like you know this is the decade 808 00:27:41,038 --> 00:27:45,759 of agents and this is going to be quite 809 00:27:44,079 --> 00:27:47,199 some time. We need humans in the loop. 810 00:27:45,759 --> 00:27:51,038 We need to do this carefully. This is 811 00:27:47,200 --> 00:27:52,880 software. Let's be serious here. One 812 00:27:51,038 --> 00:27:56,079 more kind of analogy that I always think 813 00:27:52,880 --> 00:27:58,159 through is the Iron Man suit. Uh I think 814 00:27:56,079 --> 00:28:01,359 this is I always love Iron Man. I think 815 00:27:58,159 --> 00:28:02,880 it's like so um correct in a bunch of 816 00:28:01,359 --> 00:28:04,398 ways with respect to technology and how 817 00:28:02,880 --> 00:28:05,919 it will play out. And what I love about 818 00:28:04,398 --> 00:28:08,719 the Iron Man suit is that it's both an 819 00:28:05,919 --> 00:28:10,320 augmentation and Tony Stark can drive it 820 00:28:08,720 --> 00:28:11,839 and it's also an agent. And in some of 821 00:28:10,319 --> 00:28:13,599 the movies, the Iron Man suit is quite 822 00:28:11,839 --> 00:28:15,278 autonomous and can fly around and find 823 00:28:13,599 --> 00:28:17,278 Tony and all this kind of stuff. And so 824 00:28:15,278 --> 00:28:19,038 this is the autonomy slider is we can be 825 00:28:17,278 --> 00:28:21,200 we can build augmentations or we can 826 00:28:19,038 --> 00:28:23,440 build agents and we kind of want to do a 827 00:28:21,200 --> 00:28:25,919 bit of both. But at this stage I would 828 00:28:23,440 --> 00:28:29,120 say working with fallible LLMs and so 829 00:28:25,919 --> 00:28:31,600 on. I would say you know it's less Iron 830 00:28:29,119 --> 00:28:33,678 Man robots and more Iron Man suits that 831 00:28:31,599 --> 00:28:35,119 you want to build. It's less like 832 00:28:33,679 --> 00:28:36,720 building flashy demos of autonomous 833 00:28:35,119 --> 00:28:39,678 agents and more building partial 834 00:28:36,720 --> 00:28:41,919 autonomy products. And these products 835 00:28:39,679 --> 00:28:43,840 have custom gueies and UIUX. And we're 836 00:28:41,919 --> 00:28:45,520 trying to um and this is done so that 837 00:28:43,839 --> 00:28:48,158 the generation verification loop of the 838 00:28:45,519 --> 00:28:49,519 human is very very fast. But we are not 839 00:28:48,159 --> 00:28:51,278 losing the sight of the fact that it is 840 00:28:49,519 --> 00:28:52,960 in principle possible to automate this 841 00:28:51,278 --> 00:28:54,558 work. And there should be an autonomy 842 00:28:52,960 --> 00:28:55,919 slider in your product. And you should 843 00:28:54,558 --> 00:28:58,558 be thinking about how you can slide that 844 00:28:55,919 --> 00:29:01,278 autonomy slider and make your product uh 845 00:28:58,558 --> 00:29:02,720 sort of um more autonomous over time. 846 00:29:01,278 --> 00:29:04,240 But this is kind of how I think there's 847 00:29:02,720 --> 00:29:06,558 lots of opportunities in these kinds of 848 00:29:04,240 --> 00:29:08,159 products. I want to now switch gears a 849 00:29:06,558 --> 00:29:09,839 little bit and talk about one other 850 00:29:08,159 --> 00:29:11,440 dimension that I think is very unique. 851 00:29:09,839 --> 00:29:12,959 Not only is there a new type of 852 00:29:11,440 --> 00:29:15,278 programming language that allows for 853 00:29:12,960 --> 00:29:16,640 autonomy in software but also as I 854 00:29:15,278 --> 00:29:19,038 mentioned it's programmed in English 855 00:29:16,640 --> 00:29:20,559 which is this natural interface and 856 00:29:19,038 --> 00:29:22,240 suddenly everyone is a programmer 857 00:29:20,558 --> 00:29:24,639 because everyone speaks natural language 858 00:29:22,240 --> 00:29:26,159 like English. So this is extremely 859 00:29:24,640 --> 00:29:28,000 bullish and very interesting to me and 860 00:29:26,159 --> 00:29:29,520 also completely unprecedented. I would 861 00:29:28,000 --> 00:29:31,440 say it it used to be the case that you 862 00:29:29,519 --> 00:29:32,879 need to spend five to 10 years studying 863 00:29:31,440 --> 00:29:35,200 something to be able to do something in 864 00:29:32,880 --> 00:29:37,120 software. this is not the case anymore. 865 00:29:35,200 --> 00:29:40,640 So, I don't know if by any chance anyone 866 00:29:37,119 --> 00:29:42,479 has heard of vibe coding. 867 00:29:40,640 --> 00:29:44,240 Uh, this this is the tweet that kind of 868 00:29:42,480 --> 00:29:46,720 like introduced this, but I'm told that 869 00:29:44,240 --> 00:29:49,599 this is now like a major meme. Um, fun 870 00:29:46,720 --> 00:29:51,200 story about this is that I've been on 871 00:29:49,599 --> 00:29:53,519 Twitter for like 15 years or something 872 00:29:51,200 --> 00:29:56,319 like that at this point and I still have 873 00:29:53,519 --> 00:29:58,000 no clue which tweet will become viral 874 00:29:56,319 --> 00:30:00,798 and which tweet like fizzles and no one 875 00:29:58,000 --> 00:30:01,839 cares. And I thought that this tweet was 876 00:30:00,798 --> 00:30:03,359 going to be the latter. I don't know. It 877 00:30:01,839 --> 00:30:05,278 was just like a shower of thoughts. But 878 00:30:03,359 --> 00:30:06,719 this became like a total meme and I 879 00:30:05,278 --> 00:30:08,480 really just can't tell. But I guess like 880 00:30:06,720 --> 00:30:10,558 it struck a chord and it gave a name to 881 00:30:08,480 --> 00:30:13,278 something that everyone was feeling but 882 00:30:10,558 --> 00:30:17,278 couldn't quite say in words. So now 883 00:30:13,278 --> 00:30:18,640 there's a Wikipedia page and everything. 884 00:30:17,278 --> 00:30:25,919 This is like 885 00:30:18,640 --> 00:30:27,600 [Applause] 886 00:30:25,919 --> 00:30:30,720 yeah this is like a major contribution 887 00:30:27,599 --> 00:30:32,959 now or something like that. So, 888 00:30:30,720 --> 00:30:34,960 um, so Tom Wolf from HuggingFace shared 889 00:30:32,960 --> 00:30:37,759 this beautiful video that I really love. 890 00:30:34,960 --> 00:30:41,720 Um, 891 00:30:37,759 --> 00:30:41,720 these are kids vibe coding. 892 00:30:42,640 --> 00:30:46,720 And I find that this is such a wholesome 893 00:30:44,398 --> 00:30:48,079 video. Like, I love this video. Like, 894 00:30:46,720 --> 00:30:49,839 how can you look at this video and feel 895 00:30:48,079 --> 00:30:52,558 bad about the future? The future is 896 00:30:49,839 --> 00:30:53,918 great. 897 00:30:52,558 --> 00:30:56,639 I think this will end up being like a 898 00:30:53,919 --> 00:30:59,200 gateway drug to software development. 899 00:30:56,640 --> 00:31:02,240 Um, I'm not a doomer about the future of 900 00:30:59,200 --> 00:31:04,798 the generation and I think yeah, I love 901 00:31:02,240 --> 00:31:07,120 this video. So, I tried by coding a 902 00:31:04,798 --> 00:31:09,359 little bit uh as well because it's so 903 00:31:07,119 --> 00:31:10,798 fun. Uh, so bike coding is so great when 904 00:31:09,359 --> 00:31:12,398 you want to build something super duper 905 00:31:10,798 --> 00:31:13,679 custom that doesn't appear to exist and 906 00:31:12,398 --> 00:31:15,519 you just want to wing it because it's a 907 00:31:13,679 --> 00:31:18,720 Saturday or something like that. So, I 908 00:31:15,519 --> 00:31:20,639 built this uh iOS app and I don't I 909 00:31:18,720 --> 00:31:21,759 can't actually program in Swift, but I 910 00:31:20,640 --> 00:31:23,360 was really shocked that I was able to 911 00:31:21,759 --> 00:31:24,720 build like a super basic app and I'm not 912 00:31:23,359 --> 00:31:27,359 going to explain it. It's really uh 913 00:31:24,720 --> 00:31:28,720 dumb, but uh I kind of like this was 914 00:31:27,359 --> 00:31:30,319 just like a day of work and this was 915 00:31:28,720 --> 00:31:32,319 running on my phone like later that day 916 00:31:30,319 --> 00:31:33,918 and I was like, "Wow, this is amazing." 917 00:31:32,319 --> 00:31:35,918 I didn't have to like read through Swift 918 00:31:33,919 --> 00:31:38,159 for like five days or something like 919 00:31:35,919 --> 00:31:40,480 that to like get started. I also 920 00:31:38,159 --> 00:31:41,760 vipcoded this app called Menu Genen. And 921 00:31:40,480 --> 00:31:44,079 this is live. You can try it in 922 00:31:41,759 --> 00:31:45,440 menu.app. And I basically had this 923 00:31:44,079 --> 00:31:46,639 problem where I show up at a restaurant, 924 00:31:45,440 --> 00:31:48,558 I read through the menu, and I have no 925 00:31:46,640 --> 00:31:51,600 idea what any of the things are. And I 926 00:31:48,558 --> 00:31:52,960 need pictures. So this doesn't exist. So 927 00:31:51,599 --> 00:31:55,918 I was like, "Hey, I'm going to bite code 928 00:31:52,960 --> 00:31:58,240 it." So, um, this is what it looks like. 929 00:31:55,919 --> 00:32:01,440 You go to menu.app, 930 00:31:58,240 --> 00:32:03,278 um, and, uh, you take a picture of a of 931 00:32:01,440 --> 00:32:06,240 a menu and then menu generates the 932 00:32:03,278 --> 00:32:08,000 images and everyone gets $5 in credits 933 00:32:06,240 --> 00:32:10,480 for free when you sign up. And 934 00:32:08,000 --> 00:32:13,759 therefore, this is a major cost center 935 00:32:10,480 --> 00:32:16,240 in my life. So, this is a negative 936 00:32:13,759 --> 00:32:17,839 negative uh, revenue app for me right 937 00:32:16,240 --> 00:32:19,200 now. 938 00:32:17,839 --> 00:32:21,278 I've lost a huge amount of money on 939 00:32:19,200 --> 00:32:23,360 menu. 940 00:32:21,278 --> 00:32:28,159 Okay. But the fascinating thing about 941 00:32:23,359 --> 00:32:30,240 menu genen for me is that the code of 942 00:32:28,159 --> 00:32:32,720 the v the vite coding part the code was 943 00:32:30,240 --> 00:32:35,120 actually the easy part of v of v coding 944 00:32:32,720 --> 00:32:36,480 menu and most of it actually was when I 945 00:32:35,119 --> 00:32:37,599 tried to make it real so that you can 946 00:32:36,480 --> 00:32:39,599 actually have authentication and 947 00:32:37,599 --> 00:32:41,918 payments and the domain name and averal 948 00:32:39,599 --> 00:32:44,158 deployment. This was really hard and all 949 00:32:41,919 --> 00:32:47,120 of this was not code. All of this devops 950 00:32:44,159 --> 00:32:49,840 stuff was in me in the browser clicking 951 00:32:47,119 --> 00:32:51,518 stuff and this was extreme slo and took 952 00:32:49,839 --> 00:32:54,639 another week. So it was really 953 00:32:51,519 --> 00:32:57,278 fascinating that I had the menu genen um 954 00:32:54,640 --> 00:32:59,278 basically demo working on my laptop in a 955 00:32:57,278 --> 00:33:01,200 few hours and then it took me a week 956 00:32:59,278 --> 00:33:02,880 because I was trying to make it real and 957 00:33:01,200 --> 00:33:05,600 the reason for this is this was just 958 00:33:02,880 --> 00:33:07,278 really annoying. Um, so for example, if 959 00:33:05,599 --> 00:33:09,199 you try to add Google login to your web 960 00:33:07,278 --> 00:33:11,679 page, I know this is very small, but 961 00:33:09,200 --> 00:33:13,600 just a huge amount of instructions of 962 00:33:11,679 --> 00:33:15,200 this clerk library telling me how to 963 00:33:13,599 --> 00:33:17,519 integrate this. And this is crazy. Like 964 00:33:15,200 --> 00:33:19,759 it's telling me go to this URL, click on 965 00:33:17,519 --> 00:33:21,200 this dropdown, choose this, go to this, 966 00:33:19,759 --> 00:33:22,640 and click on that. And it's like telling 967 00:33:21,200 --> 00:33:24,880 me what to do. Like a computer is 968 00:33:22,640 --> 00:33:26,640 telling me the actions I should be 969 00:33:24,880 --> 00:33:28,640 taking. Like you do it. Why am I doing 970 00:33:26,640 --> 00:33:31,759 this? 971 00:33:28,640 --> 00:33:33,840 What the hell? 972 00:33:31,759 --> 00:33:36,158 I had to follow all these instructions. 973 00:33:33,839 --> 00:33:39,519 This was crazy. So I think the last part 974 00:33:36,159 --> 00:33:41,679 of my talk therefore focuses on can we 975 00:33:39,519 --> 00:33:44,240 just build for agents? I don't want to 976 00:33:41,679 --> 00:33:46,320 do this work. Can agents do this? Thank 977 00:33:44,240 --> 00:33:48,640 you. 978 00:33:46,319 --> 00:33:50,879 Okay. So roughly speaking, I think 979 00:33:48,640 --> 00:33:53,120 there's a new category of consumer and 980 00:33:50,880 --> 00:33:55,440 manipulator of digital information. It 981 00:33:53,119 --> 00:33:57,518 used to be just humans through GUIs or 982 00:33:55,440 --> 00:34:00,240 computers through APIs. And now we have 983 00:33:57,519 --> 00:34:02,798 a completely new thing and agents are 984 00:34:00,240 --> 00:34:04,319 they're computers but they are humanlike 985 00:34:02,798 --> 00:34:05,599 kind of right they're people spirits 986 00:34:04,319 --> 00:34:06,720 there's people spirits on the internet 987 00:34:05,599 --> 00:34:08,319 and they need to interact with our 988 00:34:06,720 --> 00:34:10,639 software infrastructure like can we 989 00:34:08,320 --> 00:34:12,960 build for them it's a new thing so as an 990 00:34:10,639 --> 00:34:15,119 example you can have robots.txt on your 991 00:34:12,960 --> 00:34:18,320 domain and you can instruct uh or like 992 00:34:15,119 --> 00:34:19,838 advise I suppose um uh web crawlers on 993 00:34:18,320 --> 00:34:21,519 how to behave on your website in the 994 00:34:19,838 --> 00:34:23,358 same way you can have maybe lm.txt txt 995 00:34:21,519 --> 00:34:25,679 file which is just a simple markdown 996 00:34:23,358 --> 00:34:28,078 that's telling LLMs what this domain is 997 00:34:25,679 --> 00:34:30,559 about and this is very readable to a to 998 00:34:28,079 --> 00:34:32,480 an LLM. If it had to instead get the 999 00:34:30,559 --> 00:34:33,838 HTML of your web page and try to parse 1000 00:34:32,480 --> 00:34:35,679 it, this is very errorprone and 1001 00:34:33,838 --> 00:34:36,799 difficult and will screw it up and it's 1002 00:34:35,679 --> 00:34:38,398 not going to work. So we can just 1003 00:34:36,800 --> 00:34:41,280 directly speak to the LLM. It's worth 1004 00:34:38,398 --> 00:34:42,719 it. Um a huge amount of documentation is 1005 00:34:41,280 --> 00:34:45,599 currently written for people. So you 1006 00:34:42,719 --> 00:34:47,759 will see things like lists and bold and 1007 00:34:45,599 --> 00:34:51,200 pictures and this is not directly 1008 00:34:47,760 --> 00:34:52,800 accessible by an LLM. So I see some of 1009 00:34:51,199 --> 00:34:54,878 the services now are transitioning a lot 1010 00:34:52,800 --> 00:34:57,039 of the their docs to be specifically for 1011 00:34:54,878 --> 00:34:59,440 LLMs. So Versell and Stripe as an 1012 00:34:57,039 --> 00:35:01,920 example are early movers here but there 1013 00:34:59,440 --> 00:35:04,159 are a few more that I've seen already 1014 00:35:01,920 --> 00:35:06,720 and they offer their documentation in 1015 00:35:04,159 --> 00:35:10,078 markdown. Markdown is super easy for LMS 1016 00:35:06,719 --> 00:35:12,319 to understand. This is great. Um maybe 1017 00:35:10,079 --> 00:35:14,079 one simple example from from uh my 1018 00:35:12,320 --> 00:35:15,599 experience as well. Maybe some of you 1019 00:35:14,079 --> 00:35:19,360 know three blue one brown. He makes 1020 00:35:15,599 --> 00:35:22,639 beautiful animation videos on YouTube. 1021 00:35:19,360 --> 00:35:22,639 [Applause] 1022 00:35:23,199 --> 00:35:27,439 Yeah, I love this library. So that he 1023 00:35:25,039 --> 00:35:30,079 wrote uh Manon and I wanted to make my 1024 00:35:27,440 --> 00:35:32,639 own and uh there's extensive 1025 00:35:30,079 --> 00:35:34,000 documentations on how to use manon and 1026 00:35:32,639 --> 00:35:35,358 so I didn't want to actually read 1027 00:35:34,000 --> 00:35:37,440 through it. So I copy pasted the whole 1028 00:35:35,358 --> 00:35:39,199 thing to an LLM and I described what I 1029 00:35:37,440 --> 00:35:41,440 wanted and it just worked out of the box 1030 00:35:39,199 --> 00:35:43,279 like LLM just bcoded me an animation 1031 00:35:41,440 --> 00:35:45,838 exactly what I wanted and I was like wow 1032 00:35:43,280 --> 00:35:48,160 this is amazing. So if we can make docs 1033 00:35:45,838 --> 00:35:51,199 legible to LLMs, it's going to unlock a 1034 00:35:48,159 --> 00:35:52,399 huge amount of um kind of use and um I 1035 00:35:51,199 --> 00:35:55,118 think this is wonderful and should 1036 00:35:52,400 --> 00:35:56,240 should happen more. The other thing I 1037 00:35:55,119 --> 00:35:57,680 wanted to point out is that you do 1038 00:35:56,239 --> 00:35:58,959 unfortunately have to it's not just 1039 00:35:57,679 --> 00:36:00,639 about taking your docs and making them 1040 00:35:58,960 --> 00:36:01,920 appear in markdown. That's the easy 1041 00:36:00,639 --> 00:36:04,719 part. We actually have to change the 1042 00:36:01,920 --> 00:36:06,800 docs because anytime your docs say click 1043 00:36:04,719 --> 00:36:09,919 this is bad. An LLM will not be able to 1044 00:36:06,800 --> 00:36:11,519 natively take this action right now. So, 1045 00:36:09,920 --> 00:36:13,519 Verscell, for example, is replacing 1046 00:36:11,519 --> 00:36:15,358 every occurrence of click with an 1047 00:36:13,519 --> 00:36:18,239 equivalent curl command that your LM 1048 00:36:15,358 --> 00:36:19,759 agent could take on your behalf. Um, and 1049 00:36:18,239 --> 00:36:21,358 so I think this is very interesting. And 1050 00:36:19,760 --> 00:36:23,040 then, of course, there's a model context 1051 00:36:21,358 --> 00:36:24,880 protocol from Enthropic. And this is 1052 00:36:23,039 --> 00:36:26,719 also another way, it's a protocol of 1053 00:36:24,880 --> 00:36:28,160 speaking directly to agents as this new 1054 00:36:26,719 --> 00:36:29,679 consumer and manipulator of digital 1055 00:36:28,159 --> 00:36:31,519 information. So, I'm very bullish on 1056 00:36:29,679 --> 00:36:33,519 these ideas. The other thing I really 1057 00:36:31,519 --> 00:36:36,639 like is a number of little tools here 1058 00:36:33,519 --> 00:36:38,719 and there that are helping ingest data 1059 00:36:36,639 --> 00:36:40,159 that in like very LLM friendly formats. 1060 00:36:38,719 --> 00:36:42,719 So for example, when I go to a GitHub 1061 00:36:40,159 --> 00:36:44,319 repo like my nanoGPT repo, I can't feed 1062 00:36:42,719 --> 00:36:46,719 this to an LLM and ask questions about 1063 00:36:44,320 --> 00:36:48,880 it uh because it's you know this is a 1064 00:36:46,719 --> 00:36:50,480 human interface on GitHub. So when you 1065 00:36:48,880 --> 00:36:52,320 just change the URL from GitHub to get 1066 00:36:50,480 --> 00:36:54,159 ingest then uh this will actually 1067 00:36:52,320 --> 00:36:55,920 concatenate all the files into a single 1068 00:36:54,159 --> 00:36:57,519 giant text and it will create a 1069 00:36:55,920 --> 00:36:59,039 directory structure etc. And this is 1070 00:36:57,519 --> 00:37:01,519 ready to be copy pasted into your 1071 00:36:59,039 --> 00:37:03,440 favorite LLM and you can do stuff. Maybe 1072 00:37:01,519 --> 00:37:05,440 even more dramatic example of this is 1073 00:37:03,440 --> 00:37:08,639 deep wiki where it's not just the raw 1074 00:37:05,440 --> 00:37:10,960 content of these files. uh this is from 1075 00:37:08,639 --> 00:37:12,879 Devon but also like they have Devon 1076 00:37:10,960 --> 00:37:14,639 basically do analysis of the GitHub repo 1077 00:37:12,880 --> 00:37:18,000 and Devon basically builds up a whole 1078 00:37:14,639 --> 00:37:19,838 docs uh pages just for your repo and you 1079 00:37:18,000 --> 00:37:22,079 can imagine that this is even more 1080 00:37:19,838 --> 00:37:23,440 helpful to copy paste into your LLM. So 1081 00:37:22,079 --> 00:37:24,960 I love all the little tools that 1082 00:37:23,440 --> 00:37:26,559 basically where you just change the URL 1083 00:37:24,960 --> 00:37:29,519 and it makes something accessible to an 1084 00:37:26,559 --> 00:37:30,719 LLM. So this is all well and great and u 1085 00:37:29,519 --> 00:37:32,719 I think there should be a lot more of 1086 00:37:30,719 --> 00:37:35,279 it. One more note I wanted to make is 1087 00:37:32,719 --> 00:37:38,000 that it is absolutely possible that in 1088 00:37:35,280 --> 00:37:39,599 the future LLMs will be able to this is 1089 00:37:38,000 --> 00:37:40,800 not even future this is today they'll be 1090 00:37:39,599 --> 00:37:42,640 able to go around and they'll be able to 1091 00:37:40,800 --> 00:37:46,079 click stuff and so on but I still think 1092 00:37:42,639 --> 00:37:48,559 it's very worth u basically meeting LLM 1093 00:37:46,079 --> 00:37:49,920 halfway LLM's halfway and making it 1094 00:37:48,559 --> 00:37:51,679 easier for them to access all this 1095 00:37:49,920 --> 00:37:54,400 information uh because this is still 1096 00:37:51,679 --> 00:37:56,639 fairly expensive I would say to use and 1097 00:37:54,400 --> 00:37:58,240 uh a lot more difficult and so I do 1098 00:37:56,639 --> 00:38:00,639 think that lots of software there will 1099 00:37:58,239 --> 00:38:02,159 be a long tail where it won't like adapt 1100 00:38:00,639 --> 00:38:04,480 apps because these are not like live 1101 00:38:02,159 --> 00:38:06,239 player sort of repositories or digital 1102 00:38:04,480 --> 00:38:08,400 infrastructure and we will need these 1103 00:38:06,239 --> 00:38:09,679 tools. Uh but I think for everyone else 1104 00:38:08,400 --> 00:38:11,760 I think it's very worth kind of like 1105 00:38:09,679 --> 00:38:14,639 meeting in some middle point. So I'm 1106 00:38:11,760 --> 00:38:17,119 bullish on both if that makes sense. 1107 00:38:14,639 --> 00:38:18,639 So in summary, what an amazing time to 1108 00:38:17,119 --> 00:38:20,720 get into the industry. We need to 1109 00:38:18,639 --> 00:38:23,039 rewrite a ton of code. A ton of code 1110 00:38:20,719 --> 00:38:25,598 will be written by professionals and by 1111 00:38:23,039 --> 00:38:27,519 coders. These LLMs are kind of like 1112 00:38:25,599 --> 00:38:28,800 utilities, kind of like fabs, but 1113 00:38:27,519 --> 00:38:30,960 they're kind of especially like 1114 00:38:28,800 --> 00:38:34,320 operating systems. But it's so early. 1115 00:38:30,960 --> 00:38:36,079 It's like 1960s of operating systems and 1116 00:38:34,320 --> 00:38:38,960 uh and I think a lot of the analogies 1117 00:38:36,079 --> 00:38:41,599 cross over. Um and these LMS are kind of 1118 00:38:38,960 --> 00:38:43,358 like these fallible uh you know people 1119 00:38:41,599 --> 00:38:45,599 spirits that we have to learn to work 1120 00:38:43,358 --> 00:38:47,679 with. And in order to do that properly, 1121 00:38:45,599 --> 00:38:48,960 we need to adjust our infrastructure 1122 00:38:47,679 --> 00:38:50,639 towards it. So when you're building 1123 00:38:48,960 --> 00:38:52,800 these LLM apps, I describe some of the 1124 00:38:50,639 --> 00:38:54,719 ways of working effectively with these 1125 00:38:52,800 --> 00:38:57,039 LLMs and some of the tools that make 1126 00:38:54,719 --> 00:38:59,039 that uh kind of possible and how you can 1127 00:38:57,039 --> 00:39:00,800 spin this loop very very quickly and 1128 00:38:59,039 --> 00:39:03,519 basically create partial tunneling 1129 00:39:00,800 --> 00:39:04,880 products and then um yeah, a lot of code 1130 00:39:03,519 --> 00:39:07,199 has to also be written for the agents 1131 00:39:04,880 --> 00:39:09,519 more directly. But in any case, going 1132 00:39:07,199 --> 00:39:10,879 back to the Iron Man suit analogy, I 1133 00:39:09,519 --> 00:39:12,719 think what we'll see over the next 1134 00:39:10,880 --> 00:39:15,920 decade roughly is we're going to take 1135 00:39:12,719 --> 00:39:17,598 the slider from left to right. And I'm 1136 00:39:15,920 --> 00:39:19,358 very interesting. It's going to be very 1137 00:39:17,599 --> 00:39:21,519 interesting to see what that looks like. 1138 00:39:19,358 --> 00:39:25,639 And I can't wait to build it with all of 1139 00:39:21,519 --> 00:39:25,639 you. Thank you.