1 00:00:02,080 --> 00:00:06,318 We're so excited for our very first 2 00:00:03,678 --> 00:00:10,320 special guest. He has helped build 3 00:00:06,318 --> 00:00:14,879 modern AI, then explain modern AI, and 4 00:00:10,320 --> 00:00:16,640 then occasionally rename modern AI. He 5 00:00:14,880 --> 00:00:18,640 actually helped co-ound open AAI right 6 00:00:16,640 --> 00:00:21,039 inside of this office. Was the one who 7 00:00:18,640 --> 00:00:23,760 actually got Autopilot working at Tesla 8 00:00:21,039 --> 00:00:26,480 back in the day, and he has a rare gift 9 00:00:23,760 --> 00:00:28,640 of making the most complex technical 10 00:00:26,480 --> 00:00:30,160 shifts feel both accessible and 11 00:00:28,640 --> 00:00:31,760 inevitable. 12 00:00:30,160 --> 00:00:35,039 You all know him for having coined the 13 00:00:31,760 --> 00:00:36,480 term vibe coding last year, but just in 14 00:00:35,039 --> 00:00:38,878 the last few months, he said something 15 00:00:36,479 --> 00:00:41,519 even more startling. That he's never 16 00:00:38,878 --> 00:00:43,039 felt more behind as a programmer. That's 17 00:00:41,520 --> 00:00:44,160 where we're starting today. Thank you, 18 00:00:43,039 --> 00:00:46,000 Andre, for joining us. 19 00:00:44,159 --> 00:00:47,119 >> Yeah. Hello. Excited to be here and to 20 00:00:46,000 --> 00:00:49,520 kick us off. 21 00:00:47,119 --> 00:00:51,119 >> Okay. So, just a couple months ago, you 22 00:00:49,520 --> 00:00:53,039 said that you've never felt more behind 23 00:00:51,119 --> 00:00:55,359 as a programmer. That's startling to 24 00:00:53,039 --> 00:00:57,198 hear from you of all people. Um, can you 25 00:00:55,359 --> 00:01:00,159 help us unpack that? Was that feeling 26 00:00:57,198 --> 00:01:02,479 exhilarating or unsettling? 27 00:01:00,159 --> 00:01:05,280 >> Uh yeah, a mixture of both for sure. Uh 28 00:01:02,479 --> 00:01:06,959 well, first of all, um 29 00:01:05,280 --> 00:01:08,400 I guess like as many of you, I've been 30 00:01:06,959 --> 00:01:10,000 using agentic tools like lot code, 31 00:01:08,400 --> 00:01:12,000 adjacent things, uh for a while, maybe 32 00:01:10,000 --> 00:01:13,680 over the last year as it came out and it 33 00:01:12,000 --> 00:01:15,200 was very good at you know chunks of code 34 00:01:13,680 --> 00:01:16,320 and sometimes it would mess up and you 35 00:01:15,200 --> 00:01:18,400 have to edit them and it was kind of 36 00:01:16,319 --> 00:01:21,438 helpful and then I would say December 37 00:01:18,400 --> 00:01:22,799 was this uh clear point where for me I 38 00:01:21,438 --> 00:01:24,959 was on a break so I had a bit more time. 39 00:01:22,799 --> 00:01:26,880 I think many other people were similar 40 00:01:24,959 --> 00:01:28,640 and uh I just started to notice that 41 00:01:26,879 --> 00:01:30,000 with the latest models uh the chunks 42 00:01:28,640 --> 00:01:31,280 just came out fine and then I kept 43 00:01:30,000 --> 00:01:32,640 asking for more and it just came out 44 00:01:31,280 --> 00:01:34,799 fine and then I can't remember the last 45 00:01:32,640 --> 00:01:36,478 time I corrected it and then I was I 46 00:01:34,799 --> 00:01:38,251 just you know trusted the system more 47 00:01:36,478 --> 00:01:39,039 and more and then I was vibe coding 48 00:01:38,251 --> 00:01:42,079 [laughter] 49 00:01:39,040 --> 00:01:43,600 and uh so it was kind of a I do think 50 00:01:42,078 --> 00:01:45,039 that it was a very stark transition. I 51 00:01:43,599 --> 00:01:47,280 think that a lot of people actually I 52 00:01:45,040 --> 00:01:49,680 tried to I tried to stress this on uh 53 00:01:47,280 --> 00:01:52,079 Twitter and or X because I think a lot 54 00:01:49,680 --> 00:01:54,320 of people experienced AI last year as 55 00:01:52,078 --> 00:01:55,919 ChachiPT adjacent thing. Uh but you 56 00:01:54,319 --> 00:01:58,000 really had to look again and you had to 57 00:01:55,920 --> 00:01:59,680 look as of December uh because things 58 00:01:58,000 --> 00:02:01,920 have changed fundamentally and uh 59 00:01:59,680 --> 00:02:04,079 especially on this like agentic coherent 60 00:02:01,920 --> 00:02:07,359 workflow uh that really started to 61 00:02:04,078 --> 00:02:09,359 actually work. Um, and so I would say 62 00:02:07,359 --> 00:02:12,560 that um, yeah, it was just that 63 00:02:09,360 --> 00:02:14,239 realization that really uh, uh, had me 64 00:02:12,560 --> 00:02:16,239 um, go down their whole rabbit hole of 65 00:02:14,239 --> 00:02:18,159 just, you know, infinity side projects. 66 00:02:16,239 --> 00:02:19,599 Uh, my side projects folder is like 67 00:02:18,159 --> 00:02:21,759 extremely full with lots of random 68 00:02:19,598 --> 00:02:23,759 things and, uh, just, uh, V coding all 69 00:02:21,759 --> 00:02:25,679 the time. Uh, so, uh, yeah, that kind of 70 00:02:23,759 --> 00:02:26,799 happened in December, I would say, and I 71 00:02:25,680 --> 00:02:28,080 was looking at the repercussions of that 72 00:02:26,800 --> 00:02:30,000 since. 73 00:02:28,080 --> 00:02:33,120 >> Um, you've talked a lot about this idea 74 00:02:30,000 --> 00:02:35,360 of LLMs as a new computer. um that it 75 00:02:33,120 --> 00:02:38,080 isn't just better software, it's a whole 76 00:02:35,360 --> 00:02:41,040 new computing paradigm. And um software 77 00:02:38,080 --> 00:02:43,920 1.0 was explicit rules, software 2.0 was 78 00:02:41,039 --> 00:02:46,479 learned weights, software 3.0 is this. 79 00:02:43,919 --> 00:02:48,878 Um if that's actually true, what does a 80 00:02:46,479 --> 00:02:50,799 team build differently the day they 81 00:02:48,878 --> 00:02:53,280 actually believe this, 82 00:02:50,800 --> 00:02:56,000 >> right? So uh yeah, exactly. So software 83 00:02:53,280 --> 00:02:57,680 1.0, I'm writing code, software 2.0, I'm 84 00:02:56,000 --> 00:02:59,598 actually programming by creating data 85 00:02:57,680 --> 00:03:01,040 sets and training uh training neural 86 00:02:59,598 --> 00:03:02,479 networks. So the programming is kind of 87 00:03:01,039 --> 00:03:03,519 like arranging data sets and maybe some 88 00:03:02,479 --> 00:03:05,280 objectives and neural network 89 00:03:03,519 --> 00:03:07,439 architectures. And then what happened is 90 00:03:05,280 --> 00:03:09,759 that basically if you train one of these 91 00:03:07,439 --> 00:03:12,400 GPT models or LLMs on a sufficiently 92 00:03:09,759 --> 00:03:14,399 large set of tasks implicit basically um 93 00:03:12,400 --> 00:03:15,680 implicitly because by training on the 94 00:03:14,400 --> 00:03:17,280 internet you have to multitask all the 95 00:03:15,680 --> 00:03:18,800 things that are in the data set. Uh 96 00:03:17,280 --> 00:03:20,000 these actually become kind of like a 97 00:03:18,800 --> 00:03:21,760 programmable computer in a certain 98 00:03:20,000 --> 00:03:24,000 sense. So software 3.0 know is kind of 99 00:03:21,759 --> 00:03:25,840 about uh you know your programming now 100 00:03:24,000 --> 00:03:28,479 turns to prompting and what's in the 101 00:03:25,840 --> 00:03:30,479 context window is your lever over the 102 00:03:28,479 --> 00:03:32,399 interpreter that is the LLM that is kind 103 00:03:30,479 --> 00:03:34,158 of like interpreting your context and uh 104 00:03:32,400 --> 00:03:37,760 performing computation in the dig 105 00:03:34,158 --> 00:03:39,840 digital information space. So I guess um 106 00:03:37,759 --> 00:03:41,120 yeah that's kind of the transition and I 107 00:03:39,840 --> 00:03:42,560 think there's a few examples of that 108 00:03:41,120 --> 00:03:44,878 really drove it home for me and maybe 109 00:03:42,560 --> 00:03:48,000 that might be instructive. Uh so for 110 00:03:44,878 --> 00:03:49,759 example when you when openclaw came out 111 00:03:48,000 --> 00:03:50,878 when you want to install openclaw you 112 00:03:49,759 --> 00:03:52,719 would expect that normally this is a 113 00:03:50,878 --> 00:03:54,639 bash bash script like a shell script. So 114 00:03:52,719 --> 00:03:57,359 run the shell script to run to install 115 00:03:54,639 --> 00:03:58,559 open claw. Um but the thing is that in 116 00:03:57,360 --> 00:04:00,080 order to target lots of different 117 00:03:58,560 --> 00:04:01,680 platforms and lots of different types of 118 00:04:00,080 --> 00:04:03,200 computers you might run an open claw. 119 00:04:01,680 --> 00:04:05,120 This these shell scripts usually balloon 120 00:04:03,199 --> 00:04:06,158 up and become extremely complex. But the 121 00:04:05,120 --> 00:04:07,840 thing is you're still stuck in a 122 00:04:06,158 --> 00:04:09,840 software 1.0 universe of wanting to 123 00:04:07,840 --> 00:04:12,000 write the code. And actually the open 124 00:04:09,840 --> 00:04:13,920 claw installation is a is a copy paste 125 00:04:12,000 --> 00:04:15,759 of a b bunch of text that you're 126 00:04:13,919 --> 00:04:18,079 supposed to give to your agent. Uh so 127 00:04:15,759 --> 00:04:19,279 basically it's it's a little skill of uh 128 00:04:18,079 --> 00:04:20,639 you know copy paste this and give it to 129 00:04:19,279 --> 00:04:22,078 your agent and it will install open 130 00:04:20,639 --> 00:04:23,600 claw. And the reason this is a lot more 131 00:04:22,079 --> 00:04:25,199 powerful is you're working now in the 132 00:04:23,600 --> 00:04:27,759 software 3.0 paradigm where you don't 133 00:04:25,199 --> 00:04:29,439 have to precisely spell out you know all 134 00:04:27,759 --> 00:04:30,960 the individual details of that setup. 135 00:04:29,439 --> 00:04:32,560 The agent has its own intelligence that 136 00:04:30,959 --> 00:04:34,638 it packages up and then it kind of like 137 00:04:32,560 --> 00:04:36,399 follows the instructions and it looks at 138 00:04:34,639 --> 00:04:37,439 your environment, your computer and it 139 00:04:36,399 --> 00:04:38,638 kind of like performs intelligent 140 00:04:37,439 --> 00:04:40,399 actions to make things work and it 141 00:04:38,639 --> 00:04:42,960 debugs things in the loop and it's just 142 00:04:40,399 --> 00:04:44,478 like so much more powerful, right? So I 143 00:04:42,959 --> 00:04:46,000 think that's a very different kind of 144 00:04:44,478 --> 00:04:47,519 like way of thinking about it is just 145 00:04:46,000 --> 00:04:48,720 like what is the piece of text to copy 146 00:04:47,519 --> 00:04:50,560 paste to your agent? That's the 147 00:04:48,720 --> 00:04:52,560 programming paradigm. Now I think one 148 00:04:50,560 --> 00:04:54,160 more maybe uh example that comes to mind 149 00:04:52,560 --> 00:04:56,720 that is even more extreme than that is 150 00:04:54,160 --> 00:05:00,400 when I was building um menu genen. So, 151 00:04:56,720 --> 00:05:01,919 menu genen is this idea where you um you 152 00:05:00,399 --> 00:05:03,679 come to a restaurant, they give you a 153 00:05:01,918 --> 00:05:05,279 menu. There's no pictures usually. So, I 154 00:05:03,680 --> 00:05:07,680 don't know what any of these things are 155 00:05:05,279 --> 00:05:09,679 uh usually like 30% of the things I have 156 00:05:07,680 --> 00:05:12,000 no idea what they are, 50%. So, I wanted 157 00:05:09,680 --> 00:05:13,840 to take a photo of the restaurant menu 158 00:05:12,000 --> 00:05:16,240 and to get pictures of what those things 159 00:05:13,839 --> 00:05:18,239 might look like in a generic sense. And 160 00:05:16,240 --> 00:05:20,079 so I built I've vcoded this app that 161 00:05:18,240 --> 00:05:21,439 basically lets you upload a photo and it 162 00:05:20,079 --> 00:05:24,879 does all this stuff and it runs on 163 00:05:21,439 --> 00:05:26,719 Verscell and uh it basically rerenders 164 00:05:24,879 --> 00:05:28,319 the menu and it gives you like all the 165 00:05:26,720 --> 00:05:31,039 items and it gives you a picture that it 166 00:05:28,319 --> 00:05:33,839 uses an image um you know generator uh 167 00:05:31,038 --> 00:05:35,918 for to basically OCR all the different 168 00:05:33,839 --> 00:05:37,038 titles uh use the image generator to get 169 00:05:35,918 --> 00:05:39,839 pictures of them and then shows it to 170 00:05:37,038 --> 00:05:41,680 you. And then I saw the software 3.0 171 00:05:39,839 --> 00:05:43,198 version of this which is which blew my 172 00:05:41,680 --> 00:05:46,000 mind which is literally just take your 173 00:05:43,199 --> 00:05:48,879 photo give it to Gemini and say use 174 00:05:46,000 --> 00:05:51,439 Nanobanana to overlay the the things 175 00:05:48,879 --> 00:05:52,959 onto the menu. Uh and Nanabanana 176 00:05:51,439 --> 00:05:54,319 basically returned an image that is 177 00:05:52,959 --> 00:05:56,638 exactly the picture of the menu that I 178 00:05:54,319 --> 00:05:58,478 took but it actually put into the pixels 179 00:05:56,639 --> 00:06:02,079 it rendered the different things in the 180 00:05:58,478 --> 00:06:04,318 menu and this blew my mind because 181 00:06:02,079 --> 00:06:06,159 actually all of my menu gen is spirious. 182 00:06:04,319 --> 00:06:09,039 It's working in the old paradigm that 183 00:06:06,160 --> 00:06:11,360 app shouldn't exist. uh and uh yeah the 184 00:06:09,038 --> 00:06:14,000 software 3.0 paradigm is a lot more kind 185 00:06:11,360 --> 00:06:15,840 of raw. It just um your neural network 186 00:06:14,000 --> 00:06:18,079 is doing more and more of the work and 187 00:06:15,839 --> 00:06:19,839 your prompt or context is just the image 188 00:06:18,079 --> 00:06:21,439 and the output is an image and there's 189 00:06:19,839 --> 00:06:24,959 no need to have any of the app in 190 00:06:21,439 --> 00:06:27,839 between. Um so I think that people have 191 00:06:24,959 --> 00:06:30,000 to kind of like reframe you know not to 192 00:06:27,839 --> 00:06:31,439 work in existing paradigm of what things 193 00:06:30,000 --> 00:06:33,839 existed and just think about it as a 194 00:06:31,439 --> 00:06:36,000 speed up of what exists. It's actually 195 00:06:33,839 --> 00:06:37,359 like new things are available now. And 196 00:06:36,000 --> 00:06:38,879 going back to your programming question, 197 00:06:37,360 --> 00:06:40,479 it's not even I think that's also an 198 00:06:38,879 --> 00:06:41,680 example of working in the in the old 199 00:06:40,478 --> 00:06:42,879 mindset because it's not just about 200 00:06:41,680 --> 00:06:44,959 programming and programming becoming 201 00:06:42,879 --> 00:06:47,360 faster. This is more general information 202 00:06:44,959 --> 00:06:49,439 processing that is automatable now. So 203 00:06:47,360 --> 00:06:51,600 um it's not just even about code. So 204 00:06:49,439 --> 00:06:53,439 previous code worked over kind of like 205 00:06:51,600 --> 00:06:55,199 structured data, right? And uh you write 206 00:06:53,439 --> 00:06:56,800 code over structured data. But like for 207 00:06:55,199 --> 00:06:59,759 example with my LLM knowledge basis 208 00:06:56,800 --> 00:07:01,439 project um basically you get LLMs to 209 00:06:59,759 --> 00:07:03,120 create wikis for your organization or 210 00:07:01,439 --> 00:07:04,319 for you in person etc. This is not even 211 00:07:03,120 --> 00:07:06,720 a program. This is not something that 212 00:07:04,319 --> 00:07:08,080 could exist before because there was no 213 00:07:06,720 --> 00:07:09,360 there was no code that would create a 214 00:07:08,079 --> 00:07:10,959 knowledge base based on a bunch of 215 00:07:09,360 --> 00:07:14,000 facts. But now you can just take these 216 00:07:10,959 --> 00:07:15,918 documents and uh basically uh recompile 217 00:07:14,000 --> 00:07:17,680 them in a different way and uh reorder 218 00:07:15,918 --> 00:07:19,680 them and create something that is uh new 219 00:07:17,680 --> 00:07:22,400 and interesting uh as a reframing of the 220 00:07:19,680 --> 00:07:24,639 data. And so these are new things that 221 00:07:22,399 --> 00:07:26,560 weren't possible. Uh and so I think this 222 00:07:24,639 --> 00:07:29,038 is uh something that I keep trying to 223 00:07:26,560 --> 00:07:31,038 get back to as to not only what can we 224 00:07:29,038 --> 00:07:33,519 do that existed that is faster now but I 225 00:07:31,038 --> 00:07:35,038 think there's new opportunities of just 226 00:07:33,519 --> 00:07:36,240 things that couldn't be possible before 227 00:07:35,038 --> 00:07:37,199 and I almost think that that's more 228 00:07:36,240 --> 00:07:40,000 exciting. 229 00:07:37,199 --> 00:07:41,840 >> I love the menu genen progression and 230 00:07:40,000 --> 00:07:43,839 dichotomy that you laid out and I think 231 00:07:41,839 --> 00:07:45,519 even I'm sure many folks here followed 232 00:07:43,839 --> 00:07:48,879 your own progression of programming from 233 00:07:45,519 --> 00:07:51,038 last October to early January February 234 00:07:48,879 --> 00:07:54,560 this year. Um, if you extrapolate that 235 00:07:51,038 --> 00:07:56,959 further, what is the 2026 equivalent um, 236 00:07:54,560 --> 00:07:59,519 for building websites in the '9s, 237 00:07:56,959 --> 00:08:02,560 building mobile apps in the 2010s, 238 00:07:59,519 --> 00:08:04,560 building SAS um, in the last cloud era, 239 00:08:02,560 --> 00:08:06,720 what will look completely obvious in 240 00:08:04,560 --> 00:08:08,000 hindsight that is still mostly unbuilt 241 00:08:06,720 --> 00:08:10,240 today? 242 00:08:08,000 --> 00:08:12,399 >> Um, [clears throat] well, going with the 243 00:08:10,240 --> 00:08:13,519 example of menu, I guess, uh, so a lot 244 00:08:12,399 --> 00:08:15,198 of this code shouldn't exist and it's 245 00:08:13,519 --> 00:08:17,120 just neural network doing most of the 246 00:08:15,199 --> 00:08:19,120 work. Um I do think that the 247 00:08:17,120 --> 00:08:21,439 extrapolation looks very weird because 248 00:08:19,120 --> 00:08:23,598 you could basically imagine 249 00:08:21,439 --> 00:08:25,439 I don't I yeah so you could imagine 250 00:08:23,598 --> 00:08:28,800 completely neural computers in a certain 251 00:08:25,439 --> 00:08:30,478 sense you feed raw videos like imagine a 252 00:08:28,800 --> 00:08:32,719 device you takes raw videos or audio 253 00:08:30,478 --> 00:08:35,199 into basically what's a neural net and 254 00:08:32,719 --> 00:08:37,440 uh uses diffusion to render a UI that is 255 00:08:35,200 --> 00:08:40,560 kind of like you know unique for that 256 00:08:37,440 --> 00:08:42,159 moment in a certain sense and um I kind 257 00:08:40,559 --> 00:08:43,359 of feel like in the early days of 258 00:08:42,158 --> 00:08:45,039 computing actually people were a little 259 00:08:43,360 --> 00:08:46,800 bit confused as to whether computers 260 00:08:45,039 --> 00:08:48,480 would look like calculators or computers 261 00:08:46,799 --> 00:08:50,319 would look like neural nets and in 50s 262 00:08:48,480 --> 00:08:52,159 and 60s it was not really obvious which 263 00:08:50,320 --> 00:08:53,360 way would go and of course we went down 264 00:08:52,159 --> 00:08:55,120 the calculator path and ended up 265 00:08:53,360 --> 00:08:56,320 building classical computing and then 266 00:08:55,120 --> 00:08:58,159 neural nets are currently running 267 00:08:56,320 --> 00:09:00,240 virtualized on existing computers but 268 00:08:58,159 --> 00:09:01,600 you could imagine I think that uh a lot 269 00:09:00,240 --> 00:09:02,959 of this will flip and that the neural 270 00:09:01,600 --> 00:09:05,600 net becomes kind of like the host 271 00:09:02,958 --> 00:09:07,599 process and uh the CPUs become kind of 272 00:09:05,600 --> 00:09:09,278 like the co-processor so we saw the 273 00:09:07,600 --> 00:09:10,800 diagram of you know intelligence compute 274 00:09:09,278 --> 00:09:12,958 is going to of neural networks is going 275 00:09:10,799 --> 00:09:14,879 to take over and become the dominant 276 00:09:12,958 --> 00:09:17,199 spend of flops so you could imagine 277 00:09:14,879 --> 00:09:18,799 something really weird and foreign when 278 00:09:17,200 --> 00:09:20,320 where neural nets are doing most of the 279 00:09:18,799 --> 00:09:22,879 heavy lifting. They're using tool use as 280 00:09:20,320 --> 00:09:24,000 this like you know um historical 281 00:09:22,879 --> 00:09:25,759 appendage for some kinds of like 282 00:09:24,000 --> 00:09:27,600 deterministic tasks. Uh but what's 283 00:09:25,759 --> 00:09:29,919 really running the show is these uh 284 00:09:27,600 --> 00:09:31,278 neural nets that are in a certain way. 285 00:09:29,919 --> 00:09:33,120 Um so you can imagine something 286 00:09:31,278 --> 00:09:34,639 extremely foreign as the extrapolation 287 00:09:33,120 --> 00:09:36,720 but I think we're going to probably get 288 00:09:34,639 --> 00:09:39,120 there uh sort of piece by piece. Um and 289 00:09:36,720 --> 00:09:40,990 I don't yeah that that progression is 290 00:09:39,120 --> 00:09:41,120 TBD I would say. 291 00:09:40,990 --> 00:09:43,278 >> [snorts] 292 00:09:41,120 --> 00:09:45,278 >> I'd like to talk a little bit about um 293 00:09:43,278 --> 00:09:47,519 uh this concept of verifiability, the 294 00:09:45,278 --> 00:09:49,838 fact that AI will automate faster and 295 00:09:47,519 --> 00:09:52,480 more easily domains where the output can 296 00:09:49,839 --> 00:09:54,320 be verified. Um if that framework is 297 00:09:52,480 --> 00:09:56,399 right, what work is about to move much 298 00:09:54,320 --> 00:09:58,640 faster than people realize and what 299 00:09:56,399 --> 00:10:00,480 professions do we have that people 300 00:09:58,639 --> 00:10:02,799 actually think are safe but that are 301 00:10:00,480 --> 00:10:05,360 actually highly verifiable? 302 00:10:02,799 --> 00:10:07,519 Uh yes. So I I spent uh some time 303 00:10:05,360 --> 00:10:09,680 writing about verifiability and um 304 00:10:07,519 --> 00:10:12,399 basically like traditional computers can 305 00:10:09,679 --> 00:10:14,958 easily automate what you can specify in 306 00:10:12,399 --> 00:10:16,958 code and uh kind of this latest round of 307 00:10:14,958 --> 00:10:19,518 LLMs can easily automate what you can uh 308 00:10:16,958 --> 00:10:20,958 verify in a certain in a certain sense 309 00:10:19,519 --> 00:10:22,720 because the way this works is that when 310 00:10:20,958 --> 00:10:24,078 frontier labs are training these LLMs 311 00:10:22,720 --> 00:10:25,519 these are giant reinforcement learning 312 00:10:24,078 --> 00:10:28,159 environments. So they are given 313 00:10:25,519 --> 00:10:29,679 verification rewards and then because of 314 00:10:28,159 --> 00:10:32,000 the way that these models are trained 315 00:10:29,679 --> 00:10:34,239 they end up basically uh progressing and 316 00:10:32,000 --> 00:10:36,320 creating these like jagged entities that 317 00:10:34,240 --> 00:10:37,759 really peak in capability in kind of 318 00:10:36,320 --> 00:10:39,440 like verifiable domains like math and 319 00:10:37,759 --> 00:10:41,600 code and adjacent and kind of like 320 00:10:39,440 --> 00:10:43,279 stagnate and are a little bit um you 321 00:10:41,600 --> 00:10:44,959 know rough around the edges when uh 322 00:10:43,278 --> 00:10:46,480 things are not kind of like in that in 323 00:10:44,958 --> 00:10:47,759 that space. So I think the reason I 324 00:10:46,480 --> 00:10:49,519 wrote about verifiability is I'm trying 325 00:10:47,759 --> 00:10:52,159 to understand why these things are so 326 00:10:49,519 --> 00:10:54,000 jagged. Um and some of it has to do with 327 00:10:52,159 --> 00:10:55,838 how the labs train the models but I 328 00:10:54,000 --> 00:10:57,519 think some of it also has to do with um 329 00:10:55,839 --> 00:10:58,880 the focus of the labs and what they 330 00:10:57,519 --> 00:11:00,799 happen to put into the data 331 00:10:58,879 --> 00:11:01,919 distribution. Uh because some things 332 00:11:00,799 --> 00:11:03,759 basically are significantly more 333 00:11:01,919 --> 00:11:05,039 valuable in economy and end up creating 334 00:11:03,759 --> 00:11:06,720 more environments because the labs 335 00:11:05,039 --> 00:11:08,159 wanted to work in those settings. So I 336 00:11:06,720 --> 00:11:09,440 think code is a good example of that. 337 00:11:08,159 --> 00:11:10,879 There's probably lots of verifiable 338 00:11:09,440 --> 00:11:12,079 environments they could think about that 339 00:11:10,879 --> 00:11:13,278 happen not to make it into the mix 340 00:11:12,078 --> 00:11:15,919 because they're just not that useful to 341 00:11:13,278 --> 00:11:18,480 have the capability around. Um, but I 342 00:11:15,919 --> 00:11:21,120 think to me the big um I guess like the 343 00:11:18,480 --> 00:11:22,959 big mystery is uh the favorite example 344 00:11:21,120 --> 00:11:24,560 for a while was that how many letters 345 00:11:22,958 --> 00:11:26,000 are are in a strawberry and the models 346 00:11:24,559 --> 00:11:27,919 would famously get this wrong and it's 347 00:11:26,000 --> 00:11:29,759 an example of jaggedness. Uh the models 348 00:11:27,919 --> 00:11:32,159 now patch this I think but the new one 349 00:11:29,759 --> 00:11:34,480 is I want to go to a car wash to wash my 350 00:11:32,159 --> 00:11:36,799 car and it's 50 meters away. Should I 351 00:11:34,480 --> 00:11:38,399 drive or should I walk? And 352 00:11:36,799 --> 00:11:40,958 state-of-the-art models today will tell 353 00:11:38,399 --> 00:11:42,958 you to walk because it's so close. How 354 00:11:40,958 --> 00:11:46,078 is it possible that state-of-the-art 355 00:11:42,958 --> 00:11:48,799 Opus 4.7 will simultaneously refactor a 356 00:11:46,078 --> 00:11:50,479 100,000 like [laughter] codebase line 357 00:11:48,799 --> 00:11:52,559 codebase or find zero day 358 00:11:50,480 --> 00:11:56,480 vulnerabilities and yet tells me to walk 359 00:11:52,559 --> 00:11:58,958 to this car wash? This is insane. And to 360 00:11:56,480 --> 00:12:01,278 whatever extent these uh models are 361 00:11:58,958 --> 00:12:02,479 remain jagged, it's an indication that 362 00:12:01,278 --> 00:12:05,600 number one maybe something's slightly 363 00:12:02,480 --> 00:12:07,759 off or um number two you need to 364 00:12:05,600 --> 00:12:09,360 actually be in the loop a little bit and 365 00:12:07,759 --> 00:12:11,200 you need to treat them as tools and you 366 00:12:09,360 --> 00:12:12,879 do have to kind of stay in touch with 367 00:12:11,200 --> 00:12:14,480 what they're doing. And so I think all 368 00:12:12,879 --> 00:12:16,078 of my writing long story short about 369 00:12:14,480 --> 00:12:18,480 verifiability is just trying to 370 00:12:16,078 --> 00:12:20,399 understand um why these things are 371 00:12:18,480 --> 00:12:22,079 jacked. Is there any pattern to it? And 372 00:12:20,399 --> 00:12:25,200 I think it's some kind of a combination 373 00:12:22,078 --> 00:12:28,078 of verifiable plus labs care. Maybe one 374 00:12:25,200 --> 00:12:31,040 more anecdote that is instructive is uh 375 00:12:28,078 --> 00:12:33,199 from GPT 3.5 to GPT4 people noticed that 376 00:12:31,039 --> 00:12:34,480 chess improved a lot and I think a lot 377 00:12:33,200 --> 00:12:36,240 of people thought oh well it's just a 378 00:12:34,480 --> 00:12:38,000 progression of the capabilities but 379 00:12:36,240 --> 00:12:39,360 actually it's it's more that uh I think 380 00:12:38,000 --> 00:12:41,120 this is public information I think I saw 381 00:12:39,360 --> 00:12:43,759 it on the internet um a huge amount of 382 00:12:41,120 --> 00:12:46,000 like um data of chess made it into the 383 00:12:43,759 --> 00:12:48,319 pre-training set and just because it's 384 00:12:46,000 --> 00:12:50,159 in a data distribution uh basically the 385 00:12:48,320 --> 00:12:53,120 model improved a lot more than it would 386 00:12:50,159 --> 00:12:55,120 just by default. So someone at OpenAI 387 00:12:53,120 --> 00:12:56,799 decided to add this data and now you 388 00:12:55,120 --> 00:12:58,320 have a capability that just peaked a lot 389 00:12:56,799 --> 00:13:01,519 more. And so that's why I think I'm 390 00:12:58,320 --> 00:13:03,040 stressing this um dimension of it as we 391 00:13:01,519 --> 00:13:04,639 are slightly at the mercy of whatever 392 00:13:03,039 --> 00:13:06,240 the labs are doing, whatever they happen 393 00:13:04,639 --> 00:13:08,000 to put into the mix. And you have to 394 00:13:06,240 --> 00:13:10,159 actually explore this thing that they 395 00:13:08,000 --> 00:13:11,679 give you that has no manual. And it 396 00:13:10,159 --> 00:13:13,679 works in certain settings, but maybe not 397 00:13:11,679 --> 00:13:16,559 in some settings. And you have to kind 398 00:13:13,679 --> 00:13:17,838 of um explore it a little bit. And uh if 399 00:13:16,559 --> 00:13:19,919 you're in the circuits that were part of 400 00:13:17,839 --> 00:13:21,519 the RL, you fly. And if you're in the 401 00:13:19,919 --> 00:13:22,879 circuits that are out of the data 402 00:13:21,519 --> 00:13:24,240 distribution, uh you're going to 403 00:13:22,879 --> 00:13:26,159 struggle and you have to kind of figure 404 00:13:24,240 --> 00:13:28,159 out which which circuits you're in in 405 00:13:26,159 --> 00:13:29,519 your application. And if you and if 406 00:13:28,159 --> 00:13:30,958 you're not in the circuits, then you 407 00:13:29,519 --> 00:13:32,879 have to really look at fine-tuning and 408 00:13:30,958 --> 00:13:34,078 doing some of your own work because it's 409 00:13:32,879 --> 00:13:36,639 not going to necessarily come out of the 410 00:13:34,078 --> 00:13:38,078 LLM out of the box. 411 00:13:36,639 --> 00:13:40,240 >> I'd love to come back to the concept of 412 00:13:38,078 --> 00:13:42,479 jagged intelligence in a little bit. Um, 413 00:13:40,240 --> 00:13:44,799 if you are a founder today and thinking 414 00:13:42,480 --> 00:13:46,800 about building a company, you are trying 415 00:13:44,799 --> 00:13:49,039 to solve a problem that you think is 416 00:13:46,799 --> 00:13:51,359 tractable, something that uh is a domain 417 00:13:49,039 --> 00:13:53,120 that is verifiable, but you look around 418 00:13:51,360 --> 00:13:56,560 and you think, "Oh my gosh, well, the 419 00:13:53,120 --> 00:13:58,560 labs have really really started uh 420 00:13:56,559 --> 00:14:00,799 getting to escape velocity in the ones 421 00:13:58,559 --> 00:14:02,638 that seem most obvious, math, coding, 422 00:14:00,799 --> 00:14:05,679 and others." What would your advice be 423 00:14:02,639 --> 00:14:08,639 to to the founders in the audience? 424 00:14:05,679 --> 00:14:10,479 Um 425 00:14:08,639 --> 00:14:12,079 so I think maybe that comes to the 426 00:14:10,480 --> 00:14:14,800 previous question of I do think that 427 00:14:12,078 --> 00:14:17,039 verifiability because it um let me 428 00:14:14,799 --> 00:14:18,639 think. So verifiability makes something 429 00:14:17,039 --> 00:14:20,319 tractable in the current paradigm 430 00:14:18,639 --> 00:14:24,560 because you can throw a huge amount of 431 00:14:20,320 --> 00:14:26,800 RL at it. Um so maybe one way to see it 432 00:14:24,559 --> 00:14:28,638 is that uh that remains true even if the 433 00:14:26,799 --> 00:14:30,559 labs are not focusing on it directly. So 434 00:14:28,639 --> 00:14:31,839 if you are in a verifiable setting where 435 00:14:30,559 --> 00:14:34,078 you could create these RL environments 436 00:14:31,839 --> 00:14:35,279 or examples then that actually sets you 437 00:14:34,078 --> 00:14:36,719 up to potentially do your own fine 438 00:14:35,278 --> 00:14:38,078 tuning and you might benefit from that. 439 00:14:36,720 --> 00:14:39,759 But that is fundamentally technology 440 00:14:38,078 --> 00:14:41,198 that just works. You can pull a lever if 441 00:14:39,759 --> 00:14:43,439 you have huge amount of diverse data 442 00:14:41,198 --> 00:14:44,958 sets of RL environments etc. Uh you can 443 00:14:43,440 --> 00:14:47,920 use your favorite fine-tuning framework 444 00:14:44,958 --> 00:14:49,198 and um and uh pull the lever and get 445 00:14:47,919 --> 00:14:51,919 something that actually uh works pretty 446 00:14:49,198 --> 00:14:54,958 well. So um I don't know what the 447 00:14:51,919 --> 00:14:56,479 examples of this might be. Um, but I do 448 00:14:54,958 --> 00:14:58,159 think there are some very valuable uh 449 00:14:56,480 --> 00:14:59,519 reinforcement learning environments that 450 00:14:58,159 --> 00:15:01,278 people could think of that I think are 451 00:14:59,519 --> 00:15:02,720 not part of the Yeah, I don't want to 452 00:15:01,278 --> 00:15:04,799 give away the answer, but there is one 453 00:15:02,720 --> 00:15:06,480 domain that I think is very uh Oh, okay. 454 00:15:04,799 --> 00:15:08,639 Sorry, I don't mean to vape post on on 455 00:15:06,480 --> 00:15:09,360 the stage, but there are some examples 456 00:15:08,639 --> 00:15:11,039 of this. 457 00:15:09,360 --> 00:15:13,039 >> On the flip side, what do you think 458 00:15:11,039 --> 00:15:14,958 still feels automatable only from a 459 00:15:13,039 --> 00:15:17,278 distance? 460 00:15:14,958 --> 00:15:19,439 >> I do think that ultimately almost 461 00:15:17,278 --> 00:15:21,039 everything can be made uh verifiable to 462 00:15:19,440 --> 00:15:23,839 some extent. some things easier than 463 00:15:21,039 --> 00:15:25,679 others. Um because even for like things 464 00:15:23,839 --> 00:15:27,839 like writing or so on, you can imagine 465 00:15:25,679 --> 00:15:29,759 having a council of LLM judges and 466 00:15:27,839 --> 00:15:31,760 probably get get to some get get 467 00:15:29,759 --> 00:15:33,360 something uh reasonable out of the um 468 00:15:31,759 --> 00:15:36,639 from from this kind of an approach. So 469 00:15:33,360 --> 00:15:40,320 it's more about what's easy or hard. Um 470 00:15:36,639 --> 00:15:42,000 so I I do think that ultimately um uh 471 00:15:40,320 --> 00:15:43,199 yeah, I think uh 472 00:15:42,000 --> 00:15:45,679 >> everything [laughter] 473 00:15:43,198 --> 00:15:47,599 >> everything is automatable. 474 00:15:45,679 --> 00:15:49,278 >> Amazing. Okay. Um, so last year you 475 00:15:47,600 --> 00:15:50,800 coined the term vibe coding and today 476 00:15:49,278 --> 00:15:52,958 we're in a world that feels a little bit 477 00:15:50,799 --> 00:15:54,159 more serious, more regent engineering. 478 00:15:52,958 --> 00:15:55,359 What do you think is the difference 479 00:15:54,159 --> 00:15:57,360 between the two and what would you 480 00:15:55,360 --> 00:15:59,120 actually call what we're in today? 481 00:15:57,360 --> 00:16:01,120 >> Uh, yeah. So I would say vibe coding is 482 00:15:59,120 --> 00:16:03,039 about raising the floor for everyone in 483 00:16:01,120 --> 00:16:05,120 terms of what they can do in software. 484 00:16:03,039 --> 00:16:06,639 So the floor rises, everyone can vibe 485 00:16:05,120 --> 00:16:08,639 code anything and that's amazing, 486 00:16:06,639 --> 00:16:10,079 incredible. But then I would say agentic 487 00:16:08,639 --> 00:16:11,919 engineering is about preserving the 488 00:16:10,078 --> 00:16:13,838 quality bar of what existed before in 489 00:16:11,919 --> 00:16:15,919 professional software. So you're not 490 00:16:13,839 --> 00:16:18,880 allowed to introduce vulnerabilities due 491 00:16:15,919 --> 00:16:20,319 to VIP coding. Um you are um you're 492 00:16:18,879 --> 00:16:22,639 still responsible for your software just 493 00:16:20,320 --> 00:16:24,800 as before, but can you go faster? And 494 00:16:22,639 --> 00:16:26,240 spoiler is you can but how do you how do 495 00:16:24,799 --> 00:16:28,240 you do that properly? And so to me 496 00:16:26,240 --> 00:16:29,600 agentic engineering when I call it that 497 00:16:28,240 --> 00:16:31,198 because I do think it's kind of like an 498 00:16:29,600 --> 00:16:32,480 engineering discipline. You have these 499 00:16:31,198 --> 00:16:33,838 agents which are these like spiky 500 00:16:32,480 --> 00:16:35,759 entities. They're a bit fable, a little 501 00:16:33,839 --> 00:16:37,680 bit stocastic, but they are extremely 502 00:16:35,759 --> 00:16:39,839 powerful. is how do you how do you 503 00:16:37,679 --> 00:16:42,479 coordinate them to go faster without 504 00:16:39,839 --> 00:16:46,000 sacrificing your quality bar and doing 505 00:16:42,480 --> 00:16:48,879 that well and correctly um is the the 506 00:16:46,000 --> 00:16:50,078 realm of agentic engineering um so I 507 00:16:48,879 --> 00:16:51,759 kind of see them as as different like 508 00:16:50,078 --> 00:16:53,599 one is about maybe raising the raising 509 00:16:51,759 --> 00:16:55,360 the floor and the other is about um you 510 00:16:53,600 --> 00:16:58,159 know extrapolating and what I'm seeing I 511 00:16:55,360 --> 00:17:01,199 think is there is a very high ceiling on 512 00:16:58,159 --> 00:17:02,719 agentic engineer uh capability and you 513 00:17:01,198 --> 00:17:04,720 know people used to talk about the 10x 514 00:17:02,720 --> 00:17:08,558 engineer previously I think that this is 515 00:17:04,720 --> 00:17:11,759 magnified a lot more 10x is uh is not uh 516 00:17:08,558 --> 00:17:13,519 the speed up you gain. Um and I think uh 517 00:17:11,759 --> 00:17:16,160 it does seem to me like people who are 518 00:17:13,519 --> 00:17:18,078 very good at this um peak a lot more 519 00:17:16,160 --> 00:17:18,558 than 10x uh from from my perspective 520 00:17:18,078 --> 00:17:21,279 right now. 521 00:17:18,558 --> 00:17:23,519 >> I really like that framing. Um one thing 522 00:17:21,279 --> 00:17:25,199 that when Sam Alman came to AIN last 523 00:17:23,519 --> 00:17:27,199 year, one memorable thing he said was 524 00:17:25,199 --> 00:17:29,200 that people of different generations use 525 00:17:27,199 --> 00:17:31,200 chatpt differently. So if you're in your 526 00:17:29,200 --> 00:17:32,798 30s, you use it as a Google search 527 00:17:31,200 --> 00:17:35,200 replacement. But if you're in your 528 00:17:32,798 --> 00:17:37,440 teens, tragic is your gateway to the 529 00:17:35,200 --> 00:17:39,279 internet. What is the parallel here in 530 00:17:37,440 --> 00:17:42,640 coding today? If we were to watch two 531 00:17:39,279 --> 00:17:45,359 people code using OpenClaw, Claude Code, 532 00:17:42,640 --> 00:17:47,840 Codeex, one you'd consider mediocre at 533 00:17:45,359 --> 00:17:49,599 it and one you would consider fully AI 534 00:17:47,839 --> 00:17:51,591 native. How would you describe the 535 00:17:49,599 --> 00:17:51,678 difference? 536 00:17:51,592 --> 00:17:53,600 >> [clears throat] 537 00:17:51,679 --> 00:17:55,038 >> I mean I think it's a just trying to get 538 00:17:53,599 --> 00:17:56,798 the most out of the tools that are 539 00:17:55,038 --> 00:17:59,679 available utilizing all of their 540 00:17:56,798 --> 00:18:02,160 features investing into your own um kind 541 00:17:59,679 --> 00:18:03,440 of setup. Uh so just like previously all 542 00:18:02,160 --> 00:18:04,480 the engineers are used to basically 543 00:18:03,440 --> 00:18:06,558 getting the most out of the tools you 544 00:18:04,480 --> 00:18:09,519 use either it's vim or v code or now 545 00:18:06,558 --> 00:18:13,038 it's you know cloth code or codec or so 546 00:18:09,519 --> 00:18:16,400 on. So um just investing into your setup 547 00:18:13,038 --> 00:18:18,558 um and um utilizing a lot of the you 548 00:18:16,400 --> 00:18:20,798 know uh tools that are available to you. 549 00:18:18,558 --> 00:18:23,119 Um and I think it just kind of looks 550 00:18:20,798 --> 00:18:26,798 like that. I do think that um maybe 551 00:18:23,119 --> 00:18:29,839 related thought is um a lot of people 552 00:18:26,798 --> 00:18:31,918 are maybe hiring um for this right 553 00:18:29,839 --> 00:18:34,639 because they want to hire strong agentic 554 00:18:31,919 --> 00:18:37,280 engineers. I do think that um what I'm 555 00:18:34,640 --> 00:18:39,440 seeing is that uh the you know most 556 00:18:37,279 --> 00:18:41,918 people have still not refactored their 557 00:18:39,440 --> 00:18:44,240 um their hiring process for a gentic 558 00:18:41,919 --> 00:18:46,400 engineer capability right like if you're 559 00:18:44,240 --> 00:18:48,240 giving out puzzles to solve and this is 560 00:18:46,400 --> 00:18:50,000 still the old paradigm I would say that 561 00:18:48,240 --> 00:18:52,400 hiring have to has to look like give me 562 00:18:50,000 --> 00:18:53,839 a really big project and see someone 563 00:18:52,400 --> 00:18:57,280 implement that big project like let's 564 00:18:53,839 --> 00:18:59,038 write say a Twitter clone uh for agents 565 00:18:57,279 --> 00:19:01,519 and then uh make it really good make it 566 00:18:59,038 --> 00:19:03,839 really secure and then have some agents 567 00:19:01,519 --> 00:19:06,639 uh simulate some activity uh on this 568 00:19:03,839 --> 00:19:09,038 Twitter and then I'm going to use 10 569 00:19:06,640 --> 00:19:12,960 codecs 5.4x for X high to try to break 570 00:19:09,038 --> 00:19:15,440 your break your um uh this website that 571 00:19:12,960 --> 00:19:16,640 you deployed and they're going to try to 572 00:19:15,440 --> 00:19:18,320 basically break it and they should not 573 00:19:16,640 --> 00:19:20,000 be able to break it. And so maybe it 574 00:19:18,319 --> 00:19:21,678 looks like that, right? And so yeah, 575 00:19:20,000 --> 00:19:25,038 watching people in that that setting and 576 00:19:21,679 --> 00:19:26,559 building bigger uh projects and uh 577 00:19:25,038 --> 00:19:28,400 utilize utilizing the tooling is maybe 578 00:19:26,558 --> 00:19:29,038 what I would uh look at for the most 579 00:19:28,400 --> 00:19:31,280 part. 580 00:19:29,038 --> 00:19:33,679 >> And as agents do more, what human skill 581 00:19:31,279 --> 00:19:34,879 do you think becomes more valuable, not 582 00:19:33,679 --> 00:19:37,038 less? 583 00:19:34,880 --> 00:19:39,440 >> Uh so um yeah, it's a good question. I 584 00:19:37,038 --> 00:19:40,558 think um well right now the answer is 585 00:19:39,440 --> 00:19:44,480 that the agents are kind of like these 586 00:19:40,558 --> 00:19:46,960 intern entities right so it's remarkable 587 00:19:44,480 --> 00:19:48,558 um you basically still have to be in 588 00:19:46,960 --> 00:19:50,400 charge of the aesthetics the the 589 00:19:48,558 --> 00:19:52,480 judgment the taste and a little bit of 590 00:19:50,400 --> 00:19:54,559 oversight maybe one one of my favorite 591 00:19:52,480 --> 00:19:57,679 examples of like the the weirdness of 592 00:19:54,558 --> 00:20:00,558 agents is um for menu genen uh you sign 593 00:19:57,679 --> 00:20:02,559 up with a Google Google account but you 594 00:20:00,558 --> 00:20:04,160 um purchase credits using a stripe 595 00:20:02,558 --> 00:20:06,319 account and both of them have email 596 00:20:04,160 --> 00:20:08,400 addresses and my agent actually tried to 597 00:20:06,319 --> 00:20:10,879 basically 598 00:20:08,400 --> 00:20:13,038 um like when you purchase credits, it 599 00:20:10,880 --> 00:20:15,760 assigned it using the email address from 600 00:20:13,038 --> 00:20:18,000 Stripe to the Google email address like 601 00:20:15,759 --> 00:20:20,319 there wasn't a persistent user ID that 602 00:20:18,000 --> 00:20:21,599 that uh for people it was trying to 603 00:20:20,319 --> 00:20:22,720 match up the email addresses, but you 604 00:20:21,599 --> 00:20:24,480 could use different email address for 605 00:20:22,720 --> 00:20:26,798 your Stripe and your Google and 606 00:20:24,480 --> 00:20:28,240 basically would not associate the funds. 607 00:20:26,798 --> 00:20:29,918 And so this is the kind of thing that 608 00:20:28,240 --> 00:20:31,519 these agents still will make mistakes 609 00:20:29,919 --> 00:20:33,038 about is like why would you use email 610 00:20:31,519 --> 00:20:34,558 addresses to try to crossorrelate the 611 00:20:33,038 --> 00:20:36,720 funds? They can be arbitrary. You can 612 00:20:34,558 --> 00:20:39,038 use different emails, etc. Like this is 613 00:20:36,720 --> 00:20:40,480 such a weird thing to do. So I think 614 00:20:39,038 --> 00:20:43,519 people have to be in charge of this 615 00:20:40,480 --> 00:20:46,000 spec, this plan. And um I actually don't 616 00:20:43,519 --> 00:20:47,359 even like the plan mode. I I would I 617 00:20:46,000 --> 00:20:48,240 mean obviously it's very useful, but I 618 00:20:47,359 --> 00:20:49,599 think there's something more general 619 00:20:48,240 --> 00:20:51,440 here where you have to work with your 620 00:20:49,599 --> 00:20:53,599 agent to design a spec that is very 621 00:20:51,440 --> 00:20:55,360 detailed and maybe it's uh maybe 622 00:20:53,599 --> 00:20:56,959 basically the docs and then get the 623 00:20:55,359 --> 00:20:58,719 agents to write them and you're in 624 00:20:56,960 --> 00:21:00,480 charge of the oversight and the top 625 00:20:58,720 --> 00:21:02,480 level categories, but the agents are 626 00:21:00,480 --> 00:21:04,000 doing a lot of the under the hood. And 627 00:21:02,480 --> 00:21:05,839 um so I think you're not caring about 628 00:21:04,000 --> 00:21:09,200 some of the details. So as an example 629 00:21:05,839 --> 00:21:11,519 also with um arrays or tensors in neural 630 00:21:09,200 --> 00:21:13,279 networks. Um there's a ton of details 631 00:21:11,519 --> 00:21:14,960 between PyTorch and NumPy and all the 632 00:21:13,279 --> 00:21:17,279 different like pandas and so on for all 633 00:21:14,960 --> 00:21:18,960 the different little API details. And I 634 00:21:17,279 --> 00:21:20,639 I already forgot about the keep dims 635 00:21:18,960 --> 00:21:22,798 versus keep dim or whether it's dim or 636 00:21:20,640 --> 00:21:24,000 axis or reshape or permute or transpose. 637 00:21:22,798 --> 00:21:25,440 I don't remember this stuff anymore, 638 00:21:24,000 --> 00:21:26,640 right? Because you don't have to. This 639 00:21:25,440 --> 00:21:28,000 is the kind of details that are handled 640 00:21:26,640 --> 00:21:30,000 by the intern because they have very 641 00:21:28,000 --> 00:21:32,079 good recall and but you still have to 642 00:21:30,000 --> 00:21:33,679 know for example that um you know 643 00:21:32,079 --> 00:21:35,279 there's underlying tensor there's an 644 00:21:33,679 --> 00:21:37,200 underlying view and then you can 645 00:21:35,279 --> 00:21:38,319 manipulate view of the same storage or 646 00:21:37,200 --> 00:21:40,080 you can have different storage which 647 00:21:38,319 --> 00:21:41,519 would be less efficient and so you still 648 00:21:40,079 --> 00:21:43,439 have to have an understanding of what 649 00:21:41,519 --> 00:21:45,759 this stuff is doing and some of the 650 00:21:43,440 --> 00:21:47,600 fundamentals um so that you're not 651 00:21:45,759 --> 00:21:50,798 copying memory around unnecessarily and 652 00:21:47,599 --> 00:21:53,279 so on but uh the details of the APIs are 653 00:21:50,798 --> 00:21:55,599 now handed off so it um you're in charge 654 00:21:53,279 --> 00:21:57,038 of the taste the engineering the design 655 00:21:55,599 --> 00:21:58,240 um and that it makes sense and that 656 00:21:57,038 --> 00:21:59,519 you're asking for the right things and 657 00:21:58,240 --> 00:22:01,279 that you're saying that okay that these 658 00:21:59,519 --> 00:22:03,918 have to be unique user IDs that we're 659 00:22:01,279 --> 00:22:06,079 going to tie everything to um and so 660 00:22:03,919 --> 00:22:07,360 you're doing some of the design and 661 00:22:06,079 --> 00:22:08,879 development and the engineers are doing 662 00:22:07,359 --> 00:22:10,158 the fill in the blanks and that's 663 00:22:08,880 --> 00:22:11,600 currently kind of like where we are and 664 00:22:10,159 --> 00:22:13,679 I think that's what everyone of course 665 00:22:11,599 --> 00:22:15,359 is seeing I think right now 666 00:22:13,679 --> 00:22:18,559 >> do you think there's a chance that this 667 00:22:15,359 --> 00:22:20,079 um taste and judgment matters less over 668 00:22:18,558 --> 00:22:21,359 time or will the ceiling just keep 669 00:22:20,079 --> 00:22:22,720 rising 670 00:22:21,359 --> 00:22:25,439 >> um yeah it's a good question I would 671 00:22:22,720 --> 00:22:28,319 Okay. 672 00:22:25,440 --> 00:22:30,240 Um, I mean, I'm hoping that the that it 673 00:22:28,319 --> 00:22:31,519 improves. I think probably the reason it 674 00:22:30,240 --> 00:22:33,200 doesn't improve right now is again, it's 675 00:22:31,519 --> 00:22:36,558 not part of the RL. There's probably no 676 00:22:33,200 --> 00:22:39,840 aesthetics cost or reward or it's not 677 00:22:36,558 --> 00:22:41,200 good enough or something like that. Um, 678 00:22:39,839 --> 00:22:42,480 I do think that when you actually look 679 00:22:41,200 --> 00:22:44,480 at the code, sometimes I get a little 680 00:22:42,480 --> 00:22:46,079 bit of a heart attack because it's not 681 00:22:44,480 --> 00:22:47,120 like super amazing code necessarily all 682 00:22:46,079 --> 00:22:48,480 the time and it's very bloaty and 683 00:22:47,119 --> 00:22:50,239 there's a lot of copy paste and there's 684 00:22:48,480 --> 00:22:52,480 awkward abstractions that are brittle 685 00:22:50,240 --> 00:22:55,759 and like it works but it's just really 686 00:22:52,480 --> 00:22:57,839 gross. Um, and I do I do hope that this 687 00:22:55,759 --> 00:22:59,839 can improve in future models. Um, a good 688 00:22:57,839 --> 00:23:02,079 example also is this uh you know micro 689 00:22:59,839 --> 00:23:04,639 GPT project which where I was trying to 690 00:23:02,079 --> 00:23:06,639 simplify uh LLM training to be as simple 691 00:23:04,640 --> 00:23:08,799 as possible. The models hate this. They 692 00:23:06,640 --> 00:23:10,960 can't do it. I tried to I keep I kept 693 00:23:08,798 --> 00:23:13,599 trying to prompt an LLM to simplify more 694 00:23:10,960 --> 00:23:15,519 simplify more and it just can't you feel 695 00:23:13,599 --> 00:23:18,240 like you're outside of the RL circuits. 696 00:23:15,519 --> 00:23:20,240 It feels like you're obviously you know 697 00:23:18,240 --> 00:23:23,759 you're pulling teeth. It's not like 698 00:23:20,240 --> 00:23:25,120 light speed. So I think um I do think 699 00:23:23,759 --> 00:23:26,400 that people are still remain in charge 700 00:23:25,119 --> 00:23:27,599 of this. But I do think that there's 701 00:23:26,400 --> 00:23:28,640 nothing fundamental again that's 702 00:23:27,599 --> 00:23:30,399 preventing it. It's just the labs 703 00:23:28,640 --> 00:23:31,038 haven't done it yet almost. 704 00:23:30,400 --> 00:23:33,360 >> Yeah. 705 00:23:31,038 --> 00:23:36,480 >> So I'd love to come back to this idea of 706 00:23:33,359 --> 00:23:38,158 uh jagged forms of intelligence. you 707 00:23:36,480 --> 00:23:39,519 wrote a little bit about this with a 708 00:23:38,159 --> 00:23:42,640 very thoughtprovoking piece around 709 00:23:39,519 --> 00:23:44,400 animals versus ghosts. Um, and the idea 710 00:23:42,640 --> 00:23:46,799 is that we're not building animals, we 711 00:23:44,400 --> 00:23:48,559 are summoning ghosts. Um, and these are 712 00:23:46,798 --> 00:23:51,440 jagged forms of intelligence that are 713 00:23:48,558 --> 00:23:54,000 shaped by data and reward functions, but 714 00:23:51,440 --> 00:23:57,038 not by intrinsic motivation or fun or 715 00:23:54,000 --> 00:24:00,000 curiosity or empowerment. Uh, things 716 00:23:57,038 --> 00:24:02,879 that kind of came about via evolution. 717 00:24:00,000 --> 00:24:04,480 um why does that framing matter and what 718 00:24:02,880 --> 00:24:07,120 does it actually change about how you 719 00:24:04,480 --> 00:24:08,960 build and deploy and evaluate or even 720 00:24:07,119 --> 00:24:12,558 trust them? 721 00:24:08,960 --> 00:24:13,759 >> Uh yeah, so yeah, I think the reason I 722 00:24:12,558 --> 00:24:15,200 wrote about this is because I'm trying 723 00:24:13,759 --> 00:24:16,640 to wrap my head around what these things 724 00:24:15,200 --> 00:24:18,319 are, right? Because if you have a good 725 00:24:16,640 --> 00:24:20,080 model of what they are or are not, then 726 00:24:18,319 --> 00:24:23,759 you're going to be more competent at uh 727 00:24:20,079 --> 00:24:25,918 using them. Um and I do think that um I 728 00:24:23,759 --> 00:24:28,558 don't know if it has I'm not sure if it 729 00:24:25,919 --> 00:24:29,520 actually has like real power. [laughter] 730 00:24:28,558 --> 00:24:33,119 I think it's a little bit of 731 00:24:29,519 --> 00:24:34,798 philosophizing. Um, but I do think that 732 00:24:33,119 --> 00:24:36,879 um 733 00:24:34,798 --> 00:24:38,639 I think it's just um coming to terms 734 00:24:36,880 --> 00:24:40,080 with the fact that these things are not, 735 00:24:38,640 --> 00:24:41,278 you know, animal intelligences. Like if 736 00:24:40,079 --> 00:24:43,199 you yell at them, they're not going to 737 00:24:41,278 --> 00:24:46,798 work better or worse or it doesn't have 738 00:24:43,200 --> 00:24:48,159 any impact. Um, and uh it's all just 739 00:24:46,798 --> 00:24:50,960 kind of like these statistical 740 00:24:48,159 --> 00:24:53,200 simulation circuits where the the 741 00:24:50,960 --> 00:24:55,519 substrate is pre-training so like 742 00:24:53,200 --> 00:24:57,919 statistics and then but then there's RL 743 00:24:55,519 --> 00:25:00,400 bolting on top. So, it kind of like 744 00:24:57,919 --> 00:25:02,159 increases the dispendages and um maybe 745 00:25:00,400 --> 00:25:04,080 it's just kind of like a mindset of what 746 00:25:02,159 --> 00:25:05,840 I'm coming into or what's likely to work 747 00:25:04,079 --> 00:25:07,759 or not likely to work or how to modify 748 00:25:05,839 --> 00:25:09,359 it. But I don't actually I don't know 749 00:25:07,759 --> 00:25:11,278 that I have like here are the five 750 00:25:09,359 --> 00:25:12,639 obvious outcomes of how to make your 751 00:25:11,278 --> 00:25:14,640 system better. It's more just being 752 00:25:12,640 --> 00:25:16,480 suspicious of it and um 753 00:25:14,640 --> 00:25:18,400 >> figuring out over time. 754 00:25:16,480 --> 00:25:20,000 >> That's where it starts. Um okay, so you 755 00:25:18,400 --> 00:25:22,559 are so deep in working with agents that 756 00:25:20,000 --> 00:25:24,880 don't just chat. They have um real 757 00:25:22,558 --> 00:25:26,240 permissions. They have local context. 758 00:25:24,880 --> 00:25:28,240 they actually take action on your be 759 00:25:26,240 --> 00:25:30,079 your behalf. What does the world look 760 00:25:28,240 --> 00:25:31,278 like when we all start to live in that 761 00:25:30,079 --> 00:25:34,000 world? 762 00:25:31,278 --> 00:25:35,599 >> Uh yeah, I think I think every a lot of 763 00:25:34,000 --> 00:25:38,240 people probably here are excited about 764 00:25:35,599 --> 00:25:40,240 what this agent uh you know native 765 00:25:38,240 --> 00:25:41,359 agentic environment looks like and 766 00:25:40,240 --> 00:25:42,480 everything has to be rewritten. 767 00:25:41,359 --> 00:25:44,558 Everything is still fundamentally 768 00:25:42,480 --> 00:25:46,798 written for humans and has to be moved 769 00:25:44,558 --> 00:25:48,319 around. I still use most of the time 770 00:25:46,798 --> 00:25:49,679 when I use uh different frameworks or 771 00:25:48,319 --> 00:25:51,359 libraries or things like that, they 772 00:25:49,679 --> 00:25:53,120 still have docs that are fundamentally 773 00:25:51,359 --> 00:25:55,678 written for humans. This is my favorite 774 00:25:53,119 --> 00:25:57,038 pet peeve. Like I don't uh why are 775 00:25:55,679 --> 00:25:58,400 people still telling me what to do? Like 776 00:25:57,038 --> 00:26:00,227 I don't want to do anything. What is the 777 00:25:58,400 --> 00:26:02,880 thing I should copy paste to my agent? 778 00:26:00,227 --> 00:26:04,798 [laughter] Like uh so it's just um every 779 00:26:02,880 --> 00:26:06,000 time I'm told, you know, go to this URL 780 00:26:04,798 --> 00:26:07,359 or something like that, it's just like 781 00:26:06,000 --> 00:26:10,319 ah [laughter] 782 00:26:07,359 --> 00:26:12,240 you know. [snorts] So um everyone is I 783 00:26:10,319 --> 00:26:14,079 think excited about how do we decompose 784 00:26:12,240 --> 00:26:16,159 the workloads that need to happen into 785 00:26:14,079 --> 00:26:18,240 fundamentally sensors over the world, 786 00:26:16,159 --> 00:26:20,080 actuators over the world. How do we make 787 00:26:18,240 --> 00:26:23,359 it agent native? Uh basically describe 788 00:26:20,079 --> 00:26:27,839 it to agents first. um and then have a 789 00:26:23,359 --> 00:26:30,158 lot of automation around um you know the 790 00:26:27,839 --> 00:26:32,959 um yeah around data structures that are 791 00:26:30,159 --> 00:26:34,400 very legible to the LLMs. Uh so I think 792 00:26:32,960 --> 00:26:36,960 um yeah I'm hoping that there's a lot of 793 00:26:34,400 --> 00:26:39,038 agent first um infrastructure out there 794 00:26:36,960 --> 00:26:40,960 and that you know for Menuguen famously 795 00:26:39,038 --> 00:26:42,400 when I wrote the uh not I'm not sure how 796 00:26:40,960 --> 00:26:44,159 famously but when I wrote the blog post 797 00:26:42,400 --> 00:26:46,240 about Menuguen [laughter] 798 00:26:44,159 --> 00:26:47,440 um a lot of the work a lot of the 799 00:26:46,240 --> 00:26:48,720 trouble was not even writing the code 800 00:26:47,440 --> 00:26:50,240 for Menugen it was deploying it in 801 00:26:48,720 --> 00:26:51,440 versell because I had to work with all 802 00:26:50,240 --> 00:26:52,640 these different services and I had to 803 00:26:51,440 --> 00:26:54,960 string them up and I had to go to their 804 00:26:52,640 --> 00:26:56,720 settings and the menus and you know 805 00:26:54,960 --> 00:26:59,759 configure my DNS and it was just so 806 00:26:56,720 --> 00:27:01,759 annoying and so that's a good example of 807 00:26:59,759 --> 00:27:04,480 I would hope that menu gen that I could 808 00:27:01,759 --> 00:27:05,839 give a prompt to an LLM build menu genen 809 00:27:04,480 --> 00:27:07,839 and then I didn't have to touch anything 810 00:27:05,839 --> 00:27:09,678 and it's deployed in that same way on 811 00:27:07,839 --> 00:27:12,158 the internet. Uh I think that would be a 812 00:27:09,679 --> 00:27:13,360 good kind of a test for whether or not 813 00:27:12,159 --> 00:27:14,960 uh a lot of our infrastructure is 814 00:27:13,359 --> 00:27:17,278 becoming more and more agent native. And 815 00:27:14,960 --> 00:27:19,360 then ultimately I would say yeah I I do 816 00:27:17,278 --> 00:27:21,278 think we're going towards a world where 817 00:27:19,359 --> 00:27:25,519 um there's agent representation for 818 00:27:21,278 --> 00:27:26,960 people and for organizations and um you 819 00:27:25,519 --> 00:27:28,720 know I'll have my agent talk to your 820 00:27:26,960 --> 00:27:30,798 agent uh to figure out some of the 821 00:27:28,720 --> 00:27:33,038 details of our meetings or or things 822 00:27:30,798 --> 00:27:34,798 like that. So, [laughter] 823 00:27:33,038 --> 00:27:36,720 um I do think that that's uh roughly 824 00:27:34,798 --> 00:27:37,839 where things are going, but um yeah, I 825 00:27:36,720 --> 00:27:38,240 think everyone here is excited about 826 00:27:37,839 --> 00:27:40,000 that. 827 00:27:38,240 --> 00:27:41,679 >> I really like the visual analogy of 828 00:27:40,000 --> 00:27:42,640 sensors and actuators. I actually hadn't 829 00:27:41,679 --> 00:27:43,038 thought of that. That's super 830 00:27:42,640 --> 00:27:43,440 interesting, 831 00:27:43,038 --> 00:27:45,359 >> right? 832 00:27:43,440 --> 00:27:47,679 >> Um okay, I think we have to end on a 833 00:27:45,359 --> 00:27:49,359 question about education. Um because you 834 00:27:47,679 --> 00:27:51,200 are probably one of the very best in the 835 00:27:49,359 --> 00:27:53,519 world at making complex technical 836 00:27:51,200 --> 00:27:56,319 concepts simple and deeply thoughtful 837 00:27:53,519 --> 00:27:59,759 about how we design education around it. 838 00:27:56,319 --> 00:28:02,480 Um, what still remains worth learning 839 00:27:59,759 --> 00:28:05,440 deeply when intelligence gets cheap as 840 00:28:02,480 --> 00:28:07,759 we move into the next a era of AI? 841 00:28:05,440 --> 00:28:09,200 >> Yeah. Uh, there was a tweet that blew my 842 00:28:07,759 --> 00:28:10,558 mind recently and I keep thinking about 843 00:28:09,200 --> 00:28:12,640 it like every other day. It was 844 00:28:10,558 --> 00:28:14,240 something along the lines of um, you can 845 00:28:12,640 --> 00:28:16,640 outsource your thinking but you can't 846 00:28:14,240 --> 00:28:17,679 outsource your understanding. 847 00:28:16,640 --> 00:28:21,278 And um, 848 00:28:17,679 --> 00:28:23,519 >> I think that's really nicely put. I so 849 00:28:21,278 --> 00:28:25,119 yeah because I still I'm still part of 850 00:28:23,519 --> 00:28:26,720 the system and I still I still have to 851 00:28:25,119 --> 00:28:27,918 somehow information still has to make it 852 00:28:26,720 --> 00:28:29,278 into my brain and I feel like I'm 853 00:28:27,919 --> 00:28:30,799 becoming a bottleneck of just even 854 00:28:29,278 --> 00:28:32,880 knowing what are we trying to build why 855 00:28:30,798 --> 00:28:34,639 is it worth doing uh how do I direct you 856 00:28:32,880 --> 00:28:37,840 know how do I direct my my agents and so 857 00:28:34,640 --> 00:28:39,759 on so I do still think that ultimately 858 00:28:37,839 --> 00:28:43,278 something has to direct the thinking and 859 00:28:39,759 --> 00:28:44,720 the processing and so on and um that's 860 00:28:43,278 --> 00:28:46,240 still kind of fundamentally constrained 861 00:28:44,720 --> 00:28:47,679 somehow by understanding and this is one 862 00:28:46,240 --> 00:28:49,599 reason I also was very excited about all 863 00:28:47,679 --> 00:28:51,360 the LM knowledge bases because I feel 864 00:28:49,599 --> 00:28:53,199 like that's that's a way for me to 865 00:28:51,359 --> 00:28:54,959 process information and anytime I see a 866 00:28:53,200 --> 00:28:56,798 different projection onto information. I 867 00:28:54,960 --> 00:28:58,720 always like feel like I gain insight. So 868 00:28:56,798 --> 00:29:00,319 it's really just a lot of prompts for me 869 00:28:58,720 --> 00:29:03,360 to do synthetic data generation kind of 870 00:29:00,319 --> 00:29:05,038 over over some fixed data. Uh so I I 871 00:29:03,359 --> 00:29:06,719 really enjoy uh whenever I read an 872 00:29:05,038 --> 00:29:07,759 article I have my uh you know my wiki 873 00:29:06,720 --> 00:29:09,519 that's being built up from these 874 00:29:07,759 --> 00:29:12,640 articles and I love asking questions 875 00:29:09,519 --> 00:29:15,119 about things or um and I I think that 876 00:29:12,640 --> 00:29:17,278 ultimately these are tools to enhance 877 00:29:15,119 --> 00:29:18,558 understanding in a certain way and this 878 00:29:17,278 --> 00:29:20,079 is still kind of like a bit of a 879 00:29:18,558 --> 00:29:22,879 bottleneck because then you can't direct 880 00:29:20,079 --> 00:29:25,359 the you can't be a good director if you 881 00:29:22,880 --> 00:29:26,960 still uh because the LM certainly don't 882 00:29:25,359 --> 00:29:28,959 excel at understanding you still are 883 00:29:26,960 --> 00:29:31,038 uniquely in charge of that. So, uh, 884 00:29:28,960 --> 00:29:32,558 yeah, I think, uh, tools to that effect, 885 00:29:31,038 --> 00:29:33,200 I think are incredibly interesting and 886 00:29:32,558 --> 00:29:34,558 exciting. 887 00:29:33,200 --> 00:29:36,159 >> I'm excited to be back here in a couple 888 00:29:34,558 --> 00:29:38,480 years and to see if we've been fully 889 00:29:36,159 --> 00:29:40,000 automated out of the loop and they 890 00:29:38,480 --> 00:29:41,440 actually take care of understanding as 891 00:29:40,000 --> 00:29:42,930 well. Uh, thank you so much for joining 892 00:29:41,440 --> 00:29:44,950 us, Andre. We really appreciate it. 893 00:29:42,930 --> 00:29:44,950 [applause]