[00:01] Please welcome former director of AI [00:04] Tesla Andre Carpathy. [00:07] [Music] [00:11] Hello. [00:14] [Music] [00:19] Wow, a lot of people here. Hello. [00:22] Um, okay. Yeah. So I'm excited to be [00:24] here today to talk to you about software [00:27] in the era of AI. And I'm told that many [00:30] of you are students like bachelors, [00:32] masters, PhD and so on. And you're about [00:34] to enter the industry. And I think it's [00:36] actually like an extremely unique and [00:37] very interesting time to enter the [00:38] industry right now. And I think [00:41] fundamentally the reason for that is [00:43] that um software is changing uh again. [00:47] And I say again because I actually gave [00:49] this talk already. Um but the problem is [00:52] that software keeps changing. So I [00:54] actually have a lot of material to [00:55] create new talks and I think it's [00:56] changing quite fundamentally. I think [00:58] roughly speaking software has not [01:00] changed much on such a fundamental level [01:02] for 70 years. And then it's changed I [01:04] think about twice quite rapidly in the [01:06] last few years. And so there's just a [01:08] huge amount of work to do a huge amount [01:09] of software to write and rewrite. So [01:12] let's take a look at maybe the realm of [01:14] software. So if we kind of think of this [01:16] as like the map of software this is a [01:17] really cool tool called map of GitHub. [01:20] Um this is kind of like all the software [01:21] that's written. Uh these are [01:23] instructions to the computer for [01:24] carrying out tasks in the digital space. [01:26] So if you zoom in here, these are all [01:28] different kinds of repositories and this [01:30] is all the code that has been written. [01:31] And a few years ago I kind of observed [01:33] that um software was kind of changing [01:35] and there was kind of like a new type of [01:37] software around and I called this [01:39] software 2.0 at the time and the idea [01:42] here was that software 1.0 is the code [01:44] you write for the computer. Software 2.0 [01:46] know are basically neural networks and [01:48] in particular the weights of a neural [01:50] network and you're not writing this code [01:53] directly you are most you are more kind [01:55] of like tuning the data sets and then [01:56] you're running an optimizer to create to [01:58] create the parameters of this neural net [02:00] and I think like at the time neural nets [02:02] were kind of seen as like just a [02:03] different kind of classifier like a [02:04] decision tree or something like that and [02:06] so I think it was kind of like um I [02:09] think this framing was a lot more [02:10] appropriate and now actually what we [02:12] have is kind of like an equivalent of [02:13] GitHub in the realm of software 2.0 And [02:15] I think the hugging face is basically [02:18] equivalent of GitHub in software 2.0. [02:20] And there's also model atlas and you can [02:22] visualize all the code written there. In [02:24] case you're curious, by the way, the [02:25] giant circle, the point in the middle, [02:28] uh these are the parameters of flux, the [02:30] image generator. And so anytime someone [02:32] tunes a on top of a flux model, you [02:34] basically create a git commit uh in this [02:37] space and uh you create a different kind [02:39] of a image generator. So basically what [02:41] we have is software 1.0 is the computer [02:43] code that programs a computer. Software [02:45] 2.0 are the weights which program neural [02:48] networks. Uh and here's an example of [02:50] Alexet image recognizer neural network. [02:53] Now so far all of the neural networks [02:55] that we've been familiar with until [02:56] recently where kind of like fixed [02:58] function computers image to categories [03:01] or something like that. And I think [03:03] what's changed and I think is a quite [03:05] fundamental change is that neural [03:06] networks became programmable with large [03:09] language models. And so I I see this as [03:12] quite new, unique. It's a new kind of a [03:14] computer and uh so in my mind it's uh [03:18] worth giving it a new designation of [03:19] software 3.0. And basically your prompts [03:22] are now programs that program the LLM. [03:25] And uh remarkably uh these uh prompts [03:28] are written in English. So it's kind of [03:30] a very interesting programming language. [03:33] Um so maybe uh to summarize the [03:36] difference if you're doing sentiment [03:37] classification for example you can [03:39] imagine writing some uh amount of Python [03:42] to to basically do sentiment [03:44] classification or you can train a neural [03:46] net or you can prompt a large language [03:47] model. Uh so here this is a few short [03:50] prompt and you can imagine changing it [03:51] and programming the computer in a [03:52] slightly different way. So basically we [03:54] have software 1.0 software 2.0 and I [03:57] think we're seeing maybe you've seen a [03:59] lot of GitHub code is not just like code [04:01] anymore. there's a bunch of like English [04:03] interspersed with code and so I think [04:05] kind of there's a growing category of [04:07] new kind of code. So not only is it a [04:09] new programming paradigm, it's also [04:10] remarkable to me that it's in our native [04:12] language of English. And so when this [04:14] blew my mind a few uh I guess years ago [04:17] now I tweeted this and um I think it [04:20] captured the attention of a lot of [04:21] people and this is my currently pinned [04:23] tweet uh is that remarkably we're now [04:25] programming computers in English. Now, [04:28] when I was at uh Tesla, um we were [04:31] working on the uh autopilot and uh we [04:34] were trying to get the car to drive and [04:37] I sort of showed this slide at the time [04:39] where you can imagine that the inputs to [04:41] the car are on the bottom and they're [04:43] going through a software stack to [04:44] produce the steering and acceleration [04:47] and I made the observation at the time [04:48] that there was a ton of C++ code around [04:51] in the autopilot which was the software [04:52] 1.0 code and then there was some neural [04:54] nets in there doing image recognition [04:56] and uh I kind of observed that over time [04:58] as we made the autopilot better [05:00] basically the neural network grew in [05:02] capability and size and in addition to [05:05] that all the C++ code was being deleted [05:08] and kind of like was um and a lot of the [05:12] kind of capabilities and functionality [05:14] that was originally written in 1.0 was [05:16] migrated to 2.0. So as an example, a lot [05:19] of the stitching up of information [05:20] across images from the different cameras [05:22] and across time was done by a neural [05:24] network and we were able to delete a lot [05:26] of code and so the software 2.0 stack [05:29] quite literally ate through the software [05:32] stack of the autopilot. So I thought [05:34] this was really remarkable at the time [05:35] and I think we're seeing the same thing [05:37] again where uh basically we have a new [05:39] kind of software and it's eating through [05:40] the stack. We have three completely [05:42] different programming paradigms and I [05:44] think if you're entering the industry [05:45] it's a very good idea to be fluent in [05:47] all of them because they all have slight [05:49] pros and cons and you may want to [05:50] program some functionality in 1.0 or 2.0 [05:53] or 3.0. Are you going to train [05:54] neurallet? Are you going to just prompt [05:55] an LLM? Should this be a piece of code [05:57] that's explicit etc. So we all have to [05:59] make these decisions and actually [06:00] potentially uh fluidly trans transition [06:03] between these paradigms. So what I [06:06] wanted to get into now is first I want [06:09] to in the first part talk about LLMs and [06:11] how to kind of like think of this new [06:13] paradigm and the ecosystem and what that [06:15] looks like. Uh like what are what is [06:17] this new computer? What does it look [06:18] like and what does the ecosystem look [06:20] like? Um I was struck by this quote from [06:23] Anduring actually uh many years ago now [06:25] I think and I think Andrew is going to [06:27] be speaking right after me. Uh but he [06:29] said at the time AI is the new [06:30] electricity and I do think that it um [06:33] kind of captures something very [06:34] interesting in that LLMs certainly feel [06:36] like they have properties of utilities [06:38] right now. So [06:41] um LLM labs like OpenAI, Gemini, [06:44] Enthropic etc. They spend capex to train [06:47] the LLMs and this is kind of equivalent [06:48] to building out a grid and then there's [06:51] opex to serve that intelligence over [06:53] APIs to all of us and this is done [06:56] through metered access where we pay per [06:58] million tokens or something like that [07:00] and we have a lot of demands that are [07:01] very utility- like demands out of this [07:03] API we demand low latency high uptime [07:06] consistent quality etc. In electricity, [07:08] you would have a transfer switch. So you [07:10] can transfer your electricity source [07:12] from like grid and solar or battery or [07:14] generator. In LLM, we have maybe open [07:16] router and easily switch between the [07:18] different types of LLMs that exist. [07:20] Because the LLM are software, they don't [07:23] compete for physical space. So it's okay [07:25] to have basically like six electricity [07:26] providers and you can switch between [07:28] them, right? Because they don't compete [07:29] in such a direct way. And I think what's [07:31] also a little fascinating and we saw [07:33] this in the last few days actually a lot [07:36] of the LLMs went down and people were [07:38] kind of like stuck and unable to work. [07:41] And uh I think it's kind of fascinating [07:42] to me that when the state-of-the-art [07:43] LLMs go down, it's actually kind of like [07:45] an intelligence brownout in the world. [07:47] It's kind of like when the voltage is [07:49] unreliable in the grid and uh the planet [07:52] just gets dumber the more reliance we [07:55] have on these models, which already is [07:56] like really dramatic and I think will [07:58] continue to grow. But LLM's don't only [08:00] have properties of utilities. I think [08:02] it's also fair to say that they have [08:03] some properties of fabs. And the reason [08:06] for this is that the capex required for [08:09] building LLM is actually quite large. Uh [08:12] it's not just like building some uh [08:14] power station or something like that, [08:15] right? You're investing a huge amount of [08:17] money and I think the tech tree and uh [08:20] for the technology is growing quite [08:22] rapidly. So we're in a world where we [08:24] have sort of deep tech trees, research [08:26] and development secrets that are [08:28] centralizing inside the LLM labs. Um and [08:32] but I think the analogy muddies a little [08:34] bit also because as I mentioned this is [08:36] software and software is a bit less [08:38] defensible because it is so malleable. [08:40] And so um I think it's just an [08:43] interesting kind of thing to think about [08:44] potentially. There's many analogy [08:46] analogies you can make like a 4 [08:48] nanometer process node maybe is [08:49] something like a cluster with certain [08:51] max flops. You can think about when [08:53] you're use when you're using Nvidia GPUs [08:54] and you're only doing the software and [08:56] you're not doing the hardware. That's [08:57] kind of like the fabless model. But if [08:59] you're actually also building your own [09:00] hardware and you're training on TPUs if [09:02] you're Google, that's kind of like the [09:03] Intel model where you own your fab. So I [09:05] think there's some analogies here that [09:06] make sense. But actually I think the [09:08] analogy that makes the most sense [09:09] perhaps is that in my mind LLM have very [09:12] strong kind of analogies to operating [09:15] systems. Uh in that this is not just [09:17] electricity or water. It's not something [09:19] that comes out of the tap as a [09:20] commodity. uh this is these are now [09:22] increasingly complex software ecosystems [09:25] right so uh they're not just like simple [09:28] commodities like electricity and it's [09:30] kind of interesting to me that the [09:32] ecosystem is shaping in a very similar [09:33] kind of way where you have a few closed [09:36] source providers like Windows or Mac OS [09:38] and then you have an open source [09:39] alternative like Linux and I think for u [09:42] neural for LLMs as well we have a kind [09:45] of a few competing closed source [09:47] providers and then maybe the llama [09:49] ecosystem is currently like maybe a [09:51] close approximation to something that [09:53] may grow into something like Linux. [09:55] Again, I think it's still very early [09:56] because these are just simple LLMs, but [09:58] we're starting to see that these are [09:59] going to get a lot more complicated. [10:01] It's not just about the LLM itself. It's [10:02] about all the tool use and the [10:03] multiodalities and how all of that [10:05] works. And so when I sort of had this [10:07] realization a while back, I tried to [10:09] sketch it out and it kind of seemed to [10:11] me like LLMs are kind of like a new [10:12] operating system, right? So the LLM is a [10:15] new kind of a computer. It's sitting [10:17] it's kind of like the CPU equivalent. uh [10:19] the context windows are kind of like the [10:21] memory and then the LLM is orchestrating [10:24] memory and compute uh for problem [10:26] solving um using all of these uh [10:29] capabilities here and so definitely if [10:32] you look at it looks very much like [10:34] operating system from that perspective. [10:36] Um, a few more analogies. For example, [10:38] if you want to download an app, say I go [10:41] to VS Code and I go to download, you can [10:43] download VS Code and you can run it on [10:46] Windows, Linux or or Mac in the same way [10:50] as you can take an LLM app like cursor [10:53] and you can run it on GPT or cloud or [10:55] Gemini series, right? It's just a drop [10:57] down. So, it's kind of like similar in [10:59] that way as well. [11:00] uh more analogies that I think strike me [11:02] is that we're kind of like in this [11:04] 1960sish [11:05] era where LLM compute is still very [11:09] expensive for this new kind of a [11:10] computer and that forces the LLMs to be [11:13] centralized in the cloud and we're all [11:15] just uh sort of thing clients that [11:18] interact with it over the network and [11:20] none of us have full utilization of [11:22] these computers and therefore it makes [11:24] sense to use time sharing where we're [11:26] all just you know a dimension of the [11:28] batch when they're running the computer [11:30] in the cloud. And this is very much what [11:32] computers used to look like at during [11:33] this time. The operating systems were in [11:35] the cloud. Everything was streamed [11:36] around and there was batching. And so [11:39] the p the personal computing revolution [11:41] hasn't happened yet because it's just [11:42] not economical. It doesn't make sense. [11:44] But I think some people are trying. And [11:46] it turns out that Mac minis, for [11:48] example, are a very good fit for some of [11:50] the LLMs because it's all if you're [11:52] doing batch one inference, this is all [11:53] super memory bound. So this actually [11:55] works. [11:56] And uh I think these are some early [11:58] indications maybe of personal computing. [12:00] Uh but this hasn't really happened yet. [12:02] It's not clear what this looks like. [12:03] Maybe some of you get to invent what [12:05] what this is or how it works or uh what [12:08] this should what this should be. Maybe [12:10] one more analogy that I'll mention is [12:12] whenever I talk to Chach or some LLM [12:14] directly in text, I feel like I'm [12:16] talking to an operating system through [12:18] the terminal. Like it's just it's it's [12:21] text. It's direct access to the [12:22] operating system. And I think a guey [12:24] hasn't yet really been invented in like [12:26] a general way like should chatt have a [12:29] guey like different than just a tech [12:31] bubbles. Uh certainly some of the apps [12:33] that we're going to go into in a bit [12:35] have guey but there's no like guey [12:38] across all the tasks if that makes [12:40] sense. Um there are some ways in which [12:43] LLMs are different from kind of [12:45] operating systems in some fairly unique [12:47] way and from early computing. And I [12:49] wrote about uh this one particular [12:52] property that strikes me as very [12:54] different uh this time around. It's that [12:57] LLMs like flip they flip the direction [12:59] of technology diffusion uh that is [13:02] usually uh present in technology. So for [13:05] example with electricity, cryptography, [13:07] computing, flight, internet, GPS, lots [13:09] of new transformative technologies that [13:10] have not been around. Typically it is [13:12] the government and corporations that are [13:14] the first users because it's new and [13:16] expensive etc. and it only later [13:18] diffuses to consumer. Uh, but I feel [13:20] like LLMs are kind of like flipped [13:22] around. So maybe with early computers, [13:24] it was all about ballistics and military [13:26] use, but with LLMs, it's all about how [13:29] do you boil an egg or something like [13:30] that. This is certainly like a lot of my [13:32] use. And so it's really fascinating to [13:33] me that we have a new magical computer [13:35] and it's like helping me boil an egg. [13:37] It's not helping the government do [13:38] something really crazy like some [13:40] military ballistics or some special [13:42] technology. Indeed, corporations are [13:43] governments are lagging behind the [13:45] adoption of all of us, of all of these [13:47] technologies. So, it's just backwards [13:48] and I think it informs maybe some of the [13:50] uses of how we want to use this [13:52] technology or like where are some of the [13:53] first apps and so on. [13:56] So, in summary so far, LLM labs LLMs. I [14:01] think it's accurate language to use, but [14:03] LLMs are complicated operating systems. [14:06] They're circa 1960s in computing and [14:08] we're redoing computing all over again. [14:10] and they're currently available via time [14:11] sharing and distributed like a utility. [14:13] What is new and unprecedented is that [14:16] they're not in the hands of a few [14:17] governments and corporations. They're in [14:18] the hands of all of us because we all [14:20] have a computer and it's all just [14:21] software and Chaship was beamed down to [14:24] our computers like billions of people [14:26] like instantly and overnight and this is [14:28] insane. Uh and it's kind of insane to me [14:30] that this is the case and now it is our [14:33] time to enter the industry and program [14:34] these computers. This is crazy. So I [14:37] think this is quite remarkable. Before [14:39] we program LLMs, we have to kind of like [14:42] spend some time to think about what [14:43] these things are. And I especially like [14:45] to kind of talk about their psychology. [14:48] So the way I like to think about LLMs is [14:50] that they're kind of like people [14:51] spirits. Um they are stoastic [14:54] simulations of people. Um and the [14:56] simulator in this case happens to be an [14:58] auto reggressive transformer. So [14:59] transformer is a neural net. Uh it's and [15:02] it just kind of like is goes on the [15:04] level of tokens. It goes chunk chunk [15:06] chunk chunk chunk. And there's an almost [15:08] equal amount of compute for every single [15:10] chunk. Um and um this simulator of [15:14] course is is just is basically there's [15:16] some weights involved and we fit it to [15:19] all of text that we have on the internet [15:20] and so on. And you end up with this kind [15:22] of a simulator and because it is trained [15:24] on humans, it's got this emergent [15:26] psychology that is humanlike. So the [15:28] first thing you'll notice is of course [15:30] uh LLM have encyclopedic knowledge and [15:32] memory. uh and they can remember lots of [15:34] things, a lot more than any single [15:36] individual human can because they read [15:37] so many things. It's it actually kind of [15:39] reminds me of this movie Rainman, which [15:41] I actually really recommend people [15:43] watch. It's an amazing movie. I love [15:44] this movie. Um and Dustin Hoffman here [15:46] is an autistic savant who has almost [15:49] perfect memory. So, he can read a he can [15:51] read like a phone book and remember all [15:53] of the names and phone numbers. And I [15:55] kind of feel like LM are kind of like [15:57] very similar. They can remember Shaw [15:58] hashes and lots of different kinds of [16:00] things very very easily. So they [16:02] certainly have superpowers in some set [16:04] in some respects. But they also have a [16:06] bunch of I would say cognitive deficits. [16:08] So they hallucinate quite a bit. Um and [16:11] they kind of make up stuff and don't [16:13] have a very good uh sort of internal [16:15] model of self-nowledge, not sufficient [16:17] at least. And this has gotten better but [16:19] not perfect. They display jagged [16:21] intelligence. So they're going to be [16:22] superhuman in some problems solving [16:24] domains. And then they're going to make [16:26] mistakes that basically no human will [16:27] make. like you know they will insist [16:29] that 9.11 is greater than 9.9 or that [16:32] there are two Rs in strawberry these are [16:34] some famous examples but basically there [16:36] are rough edges that you can trip on so [16:38] that's kind of I think also kind of [16:40] unique um they also kind of suffer from [16:43] entrograde amnesia um so uh and I think [16:46] I'm alluding to the fact that if you [16:48] have a co-orker who joins your [16:49] organization this co-orker will over [16:51] time learn your organization and uh they [16:54] will understand and gain like a huge [16:55] amount of context on the organization [16:57] and they go home and they sleep and they [16:59] consolidate knowledge and they develop [17:01] expertise over time. LLMs don't natively [17:03] do this and this is not something that [17:04] has really been solved in the R&D of [17:06] LLM. I think um and so context windows [17:09] are really kind of like working memory [17:10] and you have to sort of program the [17:12] working memory quite directly because [17:13] they don't just kind of like get smarter [17:15] by uh by default and I think a lot of [17:17] people get tripped up by the analogies [17:19] uh in this way. Uh in popular culture I [17:22] recommend people watch these two movies [17:23] uh Momento and 51st dates. In both of [17:26] these movies, the protagonists, their [17:27] weights are fixed and their context [17:29] windows gets wiped every single morning [17:32] and it's really problematic to go to [17:34] work or have relationships when this [17:35] happens and this happens to all the [17:37] time. I guess one more thing I would [17:39] point to is security kind of related [17:42] limitations of the use of LLM. So for [17:44] example, LLMs are quite gullible. Uh [17:46] they are susceptible to prompt injection [17:48] risks. They might leak your data etc. [17:50] And so um and there's many other [17:52] considerations uh security related. So, [17:55] so basically long story short, you have [17:57] to load your you have to load your you [18:00] have to simultaneously think through [18:01] this superhuman thing that has a bunch [18:03] of cognitive deficits and issues. How do [18:05] we and yet they are extremely like [18:07] useful and so how do we program them and [18:10] how do we work around their deficits and [18:12] enjoy their superhuman powers. [18:15] So what I want to switch to now is talk [18:17] about the opportunities of how do we use [18:18] these models and what are some of the [18:20] biggest opportunities. This is not a [18:22] comprehensive list just some of the [18:23] things that I thought were interesting [18:24] for this talk. The first thing I'm kind [18:26] of excited about is what I would call [18:29] partial autonomy apps. So for example, [18:32] let's work with the example of coding. [18:34] You can certainly go to chacht directly [18:36] and you can start copy pasting code [18:38] around and copyping bug reports and [18:40] stuff around and getting code and copy [18:42] pasting everything around. Why would you [18:44] why would you do that? Why would you go [18:45] directly to the operating system? It [18:47] makes a lot more sense to have an app [18:48] dedicated for this. And so I think many [18:50] of you uh use uh cursor. I do as well. [18:53] And uh cursor is kind of like the thing [18:56] you want instead. You don't want to just [18:57] directly go to the chash apt. And I [18:59] think cursor is a very good example of [19:01] an early LLM app that has a bunch of [19:03] properties that I think are um useful [19:06] across all the LLM apps. So in [19:08] particular, you will notice that we have [19:09] a traditional interface that allows a [19:12] human to go in and do all the work [19:13] manually just as before. But in addition [19:16] to that, we now have this LLM [19:17] integration that allows us to go in [19:19] bigger chunks. And so some of the [19:21] properties of LLM apps that I think are [19:23] shared and useful to point out. Number [19:25] one, the LLMs basically do a ton of the [19:28] context management. Um, number two, they [19:31] orchestrate multiple calls to LLMs, [19:33] right? So in the case of cursor, there's [19:34] under the hood embedding models for all [19:36] your files, the actual chat models, [19:39] models that apply diffs to the code, and [19:41] this is all orchestrated for you. A [19:43] really big one that uh I think also [19:46] maybe not fully appreciated always is [19:48] application specific uh GUI and the [19:50] importance of it. Um because you don't [19:53] just want to talk to the operating [19:54] system directly in text. Text is very [19:56] hard to read, interpret, understand and [19:59] also like you don't want to take some of [20:00] these actions natively in text. So it's [20:03] much better to just see a diff as like [20:05] red and green change and you can see [20:06] what's being added is subtracted. It's [20:08] much easier to just do command Y to [20:10] accept or command N to reject. I [20:11] shouldn't have to type it in text, [20:13] right? So, a guey allows a human to [20:15] audit the work of these fallible systems [20:17] and to go faster. I'm going to come back [20:20] to this point a little bit uh later as [20:21] well. And the last kind of feature I [20:23] want to point out is that there's what I [20:25] call the autonomy slider. So, for [20:27] example, in cursor, you can just do tap [20:29] completion. You're mostly in charge. You [20:31] can select a chunk of code and command K [20:33] to change just that chunk of code. You [20:36] can do command L to change the entire [20:37] file. Or you can do command I which just [20:40] you know let it rip do whatever you want [20:42] in the entire repo and that's the sort [20:44] of full autonomy agent agentic version [20:46] and so you are in charge of the autonomy [20:48] slider and depending on the complexity [20:50] of the task at hand you can uh tune the [20:53] amount of autonomy that you're willing [20:54] to give up uh for that task maybe to [20:57] show one more example of a fairly [20:58] successful LLM app uh perplexity um it [21:03] also has very similar features to what [21:04] I've just pointed out to in cursor uh it [21:07] packages up a lot of the information. It [21:08] orchestrates multiple LLMs. It's got a [21:10] GUI that allows you to audit some of its [21:13] work. So, for example, it will site [21:15] sources and you can imagine inspecting [21:17] them. And it's got an autonomy slider. [21:18] You can either just do a quick search or [21:20] you can do research or you can do deep [21:22] research and come back 10 minutes later. [21:24] So, this is all just varying levels of [21:25] autonomy that you give up to the tool. [21:27] So, I guess my question is I feel like a [21:30] lot of software will become partially [21:32] autonomous. I'm trying to think through [21:33] like what does that look like? And for [21:35] many of you who maintain products and [21:36] services, how are you going to make your [21:38] products and services partially [21:40] autonomous? Can an LLM see everything [21:42] that a human can see? Can an LLM act in [21:45] all the ways that a human could act? And [21:47] can humans supervise and stay in the [21:49] loop of this activity? Because again, [21:50] these are fallible systems that aren't [21:52] yet perfect. And what does a diff look [21:54] like in Photoshop or something like [21:56] that? You know, and also a lot of the [21:58] traditional software right now, it has [22:00] all these switches and all this kind of [22:01] stuff that's all designed for human. All [22:03] of this has to change and become [22:04] accessible to LLMs. [22:07] So, one thing I want to stress with a [22:09] lot of these LLM apps that I'm not sure [22:11] gets as much attention as it should is [22:14] um we we're now kind of like cooperating [22:16] with AIS and usually they are doing the [22:18] generation and we as humans are doing [22:20] the verification. It is in our interest [22:22] to make this loop go as fast as [22:24] possible. So, we're getting a lot of [22:25] work done. There are two major ways that [22:28] I think uh this can be done. Number one, [22:30] you can speed up verification a lot. Um, [22:32] and I think guies, for example, are [22:34] extremely important to this because a [22:36] guey utilizes your computer vision GPU [22:39] in all of our head. Reading text is [22:41] effortful and it's not fun, but looking [22:43] at stuff is fun and it's it's just a [22:45] kind of like a highway to your brain. [22:47] So, I think guies are very useful for [22:49] auditing systems and visual [22:51] representations in general. And number [22:53] two, I would say is we have to keep the [22:56] AI on the leash. We I think a lot of [22:58] people are getting way over excited with [23:00] AI agents and uh it's not useful to me [23:03] to get a diff of 10,000 lines of code to [23:05] my repo. Like I have to I'm still the [23:07] bottleneck, right? Even though that [23:09] 10,00 lines come out instantly, I have [23:11] to make sure that this thing is not [23:12] introducing bugs. It's just like and [23:15] that it's doing the correct thing, [23:16] right? And that there's no security [23:17] issues and so on. So um I think that um [23:22] yeah basically you we have to sort of [23:25] like it's in our interest to make the [23:28] the flow of these two go very very fast [23:30] and we have to somehow keep the AI on [23:32] the leash because it gets way too [23:33] overreactive. It's uh it's kind of like [23:35] this. This is how I feel when I do AI [23:37] assisted coding. If I'm just bite coding [23:39] everything is nice and great but if I'm [23:40] actually trying to get work done it's [23:42] not so great to have an overreactive uh [23:44] agent doing all this kind of stuff. So [23:47] this slide is not very good. I'm sorry, [23:48] but I guess I'm trying to develop like [23:51] many of you some ways of utilizing these [23:53] agents in my coding workflow and to do [23:55] AI assisted coding. And in my own work, [23:58] I'm always scared to get way too big [23:59] diffs. I always go in small incremental [24:02] chunks. I want to make sure that [24:04] everything is good. I want to spin this [24:06] loop very very fast and um I sort of [24:09] work on small chunks of single concrete [24:10] thing. Uh and so I think many of you [24:13] probably are developing similar ways of [24:14] working with the with LLMs. [24:17] Um, I also saw a number of blog posts [24:19] that try to develop these best practices [24:22] for working with LLMs. And here's one [24:24] that I read recently and I thought was [24:25] quite good. And it kind of discussed [24:26] some techniques and some of them have to [24:28] do with how you keep the AI on the [24:29] leash. And so, as an example, if you are [24:32] prompting, if your prompt is vague, then [24:34] uh the AI might not do exactly what you [24:36] wanted and in that case, verification [24:38] will fail. You're going to ask for [24:40] something else. If a verification fails, [24:42] then you're going to start spinning. So [24:43] it makes a lot more sense to spend a bit [24:45] more time to be more concrete in your [24:46] prompts which increases the probability [24:48] of successful verification and you can [24:50] move forward. And so I think a lot of us [24:52] are going to end up finding um kind of [24:54] techniques like this. I think in my own [24:56] work as well I'm currently interested in [24:57] uh what education looks like in um [25:00] together with kind of like now that we [25:01] have AI uh and LLMs what does education [25:04] look like? And I think a a large amount [25:07] of thought for me goes into how we keep [25:09] AI on the leash. I don't think it just [25:11] works to go to chat and be like, "Hey, [25:13] teach me physics." I don't think this [25:14] works because the AI is like gets lost [25:16] in the woods. And so for me, this is [25:18] actually two separate apps. For example, [25:20] there's an app for a teacher that [25:22] creates courses and then there's an app [25:24] that takes courses and serves them to [25:26] students. And in both cases, we now have [25:29] this intermediate artifact of a course [25:31] that is auditable and we can make sure [25:32] it's good. We can make sure it's [25:33] consistent. and the AI is kept on the [25:35] leash with respect to a certain [25:37] syllabus, a certain like um progression [25:40] of projects and so on. And so this is [25:42] one way of keeping the AI on leash and I [25:44] think has a much higher likelihood of [25:45] working and the AI is not getting lost [25:47] in the woods. [25:49] One more kind of analogy I wanted to [25:51] sort of allude to is I'm not I'm no [25:54] stranger to partial autonomy and I kind [25:56] of worked on this I think for five years [25:57] at Tesla and this is also a partial [26:00] autonomy product and shares a lot of the [26:01] features like for example right there in [26:03] the instrument panel is the GUI of the [26:05] autopilot so it's showing me what the [26:07] what the neural network sees and so on [26:09] and we have the autonomy slider where [26:10] over the course of my tenure there we [26:13] did more and more autonomous tasks for [26:15] the user and maybe the story that I [26:18] wanted to tell very briefly is uh [26:21] actually the first time I drove a [26:22] self-driving vehicle was in 2013 and I [26:25] had a friend who worked at Whimo and uh [26:27] he offered to give me a drive around [26:29] Palo Alto. I took this picture using [26:31] Google Glass at the time and many of you [26:33] are so young that you might not even [26:35] know what that is. Uh but uh yeah, this [26:37] was like all the rage at the time. And [26:39] we got into this car and we went for [26:40] about a 30-minute drive around Palo Alto [26:42] highways uh streets and so on. And this [26:45] drive was perfect. There was zero [26:46] interventions and this was 2013 which is [26:49] now 12 years ago. And it kind of struck [26:52] me because at the time when I had this [26:54] perfect drive, this perfect demo, I felt [26:56] like, wow, self-driving is imminent [26:59] because this just worked. This is [27:00] incredible. Um, but here we are 12 years [27:03] later and we are still working on [27:04] autonomy. Um, we are still working on [27:07] driving agents and even now we haven't [27:09] actually like really solved the problem. [27:10] like you may see Whimos going around and [27:12] they look driverless but you know [27:14] there's still a lot of teleoperation and [27:16] a lot of human in the loop of a lot of [27:18] this driving so we still haven't even [27:20] like declared success but I think it's [27:22] definitely like going to succeed at this [27:24] point but it just took a long time and [27:26] so I think like like this is software is [27:29] really tricky I think in the same way [27:31] that driving is tricky and so when I see [27:34] things like oh 2025 is the year of [27:36] agents I get very concerned and I kind [27:38] of feel like you know this is the decade [27:41] of agents and this is going to be quite [27:44] some time. We need humans in the loop. [27:45] We need to do this carefully. This is [27:47] software. Let's be serious here. One [27:51] more kind of analogy that I always think [27:52] through is the Iron Man suit. Uh I think [27:56] this is I always love Iron Man. I think [27:58] it's like so um correct in a bunch of [28:01] ways with respect to technology and how [28:02] it will play out. And what I love about [28:04] the Iron Man suit is that it's both an [28:05] augmentation and Tony Stark can drive it [28:08] and it's also an agent. And in some of [28:10] the movies, the Iron Man suit is quite [28:11] autonomous and can fly around and find [28:13] Tony and all this kind of stuff. And so [28:15] this is the autonomy slider is we can be [28:17] we can build augmentations or we can [28:19] build agents and we kind of want to do a [28:21] bit of both. But at this stage I would [28:23] say working with fallible LLMs and so [28:25] on. I would say you know it's less Iron [28:29] Man robots and more Iron Man suits that [28:31] you want to build. It's less like [28:33] building flashy demos of autonomous [28:35] agents and more building partial [28:36] autonomy products. And these products [28:39] have custom gueies and UIUX. And we're [28:41] trying to um and this is done so that [28:43] the generation verification loop of the [28:45] human is very very fast. But we are not [28:48] losing the sight of the fact that it is [28:49] in principle possible to automate this [28:51] work. And there should be an autonomy [28:52] slider in your product. And you should [28:54] be thinking about how you can slide that [28:55] autonomy slider and make your product uh [28:58] sort of um more autonomous over time. [29:01] But this is kind of how I think there's [29:02] lots of opportunities in these kinds of [29:04] products. I want to now switch gears a [29:06] little bit and talk about one other [29:08] dimension that I think is very unique. [29:09] Not only is there a new type of [29:11] programming language that allows for [29:12] autonomy in software but also as I [29:15] mentioned it's programmed in English [29:16] which is this natural interface and [29:19] suddenly everyone is a programmer [29:20] because everyone speaks natural language [29:22] like English. So this is extremely [29:24] bullish and very interesting to me and [29:26] also completely unprecedented. I would [29:28] say it it used to be the case that you [29:29] need to spend five to 10 years studying [29:31] something to be able to do something in [29:32] software. this is not the case anymore. [29:35] So, I don't know if by any chance anyone [29:37] has heard of vibe coding. [29:40] Uh, this this is the tweet that kind of [29:42] like introduced this, but I'm told that [29:44] this is now like a major meme. Um, fun [29:46] story about this is that I've been on [29:49] Twitter for like 15 years or something [29:51] like that at this point and I still have [29:53] no clue which tweet will become viral [29:56] and which tweet like fizzles and no one [29:58] cares. And I thought that this tweet was [30:00] going to be the latter. I don't know. It [30:01] was just like a shower of thoughts. But [30:03] this became like a total meme and I [30:05] really just can't tell. But I guess like [30:06] it struck a chord and it gave a name to [30:08] something that everyone was feeling but [30:10] couldn't quite say in words. So now [30:13] there's a Wikipedia page and everything. [30:17] This is like [30:18] [Applause] [30:25] yeah this is like a major contribution [30:27] now or something like that. So, [30:30] um, so Tom Wolf from HuggingFace shared [30:32] this beautiful video that I really love. [30:34] Um, [30:37] these are kids vibe coding. [30:42] And I find that this is such a wholesome [30:44] video. Like, I love this video. Like, [30:46] how can you look at this video and feel [30:48] bad about the future? The future is [30:49] great. [30:52] I think this will end up being like a [30:53] gateway drug to software development. [30:56] Um, I'm not a doomer about the future of [30:59] the generation and I think yeah, I love [31:02] this video. So, I tried by coding a [31:04] little bit uh as well because it's so [31:07] fun. Uh, so bike coding is so great when [31:09] you want to build something super duper [31:10] custom that doesn't appear to exist and [31:12] you just want to wing it because it's a [31:13] Saturday or something like that. So, I [31:15] built this uh iOS app and I don't I [31:18] can't actually program in Swift, but I [31:20] was really shocked that I was able to [31:21] build like a super basic app and I'm not [31:23] going to explain it. It's really uh [31:24] dumb, but uh I kind of like this was [31:27] just like a day of work and this was [31:28] running on my phone like later that day [31:30] and I was like, "Wow, this is amazing." [31:32] I didn't have to like read through Swift [31:33] for like five days or something like [31:35] that to like get started. I also [31:38] vipcoded this app called Menu Genen. And [31:40] this is live. You can try it in [31:41] menu.app. And I basically had this [31:44] problem where I show up at a restaurant, [31:45] I read through the menu, and I have no [31:46] idea what any of the things are. And I [31:48] need pictures. So this doesn't exist. So [31:51] I was like, "Hey, I'm going to bite code [31:52] it." So, um, this is what it looks like. [31:55] You go to menu.app, [31:58] um, and, uh, you take a picture of a of [32:01] a menu and then menu generates the [32:03] images and everyone gets $5 in credits [32:06] for free when you sign up. And [32:08] therefore, this is a major cost center [32:10] in my life. So, this is a negative [32:13] negative uh, revenue app for me right [32:16] now. [32:17] I've lost a huge amount of money on [32:19] menu. [32:21] Okay. But the fascinating thing about [32:23] menu genen for me is that the code of [32:28] the v the vite coding part the code was [32:30] actually the easy part of v of v coding [32:32] menu and most of it actually was when I [32:35] tried to make it real so that you can [32:36] actually have authentication and [32:37] payments and the domain name and averal [32:39] deployment. This was really hard and all [32:41] of this was not code. All of this devops [32:44] stuff was in me in the browser clicking [32:47] stuff and this was extreme slo and took [32:49] another week. So it was really [32:51] fascinating that I had the menu genen um [32:54] basically demo working on my laptop in a [32:57] few hours and then it took me a week [32:59] because I was trying to make it real and [33:01] the reason for this is this was just [33:02] really annoying. Um, so for example, if [33:05] you try to add Google login to your web [33:07] page, I know this is very small, but [33:09] just a huge amount of instructions of [33:11] this clerk library telling me how to [33:13] integrate this. And this is crazy. Like [33:15] it's telling me go to this URL, click on [33:17] this dropdown, choose this, go to this, [33:19] and click on that. And it's like telling [33:21] me what to do. Like a computer is [33:22] telling me the actions I should be [33:24] taking. Like you do it. Why am I doing [33:26] this? [33:28] What the hell? [33:31] I had to follow all these instructions. [33:33] This was crazy. So I think the last part [33:36] of my talk therefore focuses on can we [33:39] just build for agents? I don't want to [33:41] do this work. Can agents do this? Thank [33:44] you. [33:46] Okay. So roughly speaking, I think [33:48] there's a new category of consumer and [33:50] manipulator of digital information. It [33:53] used to be just humans through GUIs or [33:55] computers through APIs. And now we have [33:57] a completely new thing and agents are [34:00] they're computers but they are humanlike [34:02] kind of right they're people spirits [34:04] there's people spirits on the internet [34:05] and they need to interact with our [34:06] software infrastructure like can we [34:08] build for them it's a new thing so as an [34:10] example you can have robots.txt on your [34:12] domain and you can instruct uh or like [34:15] advise I suppose um uh web crawlers on [34:18] how to behave on your website in the [34:19] same way you can have maybe lm.txt txt [34:21] file which is just a simple markdown [34:23] that's telling LLMs what this domain is [34:25] about and this is very readable to a to [34:28] an LLM. If it had to instead get the [34:30] HTML of your web page and try to parse [34:32] it, this is very errorprone and [34:33] difficult and will screw it up and it's [34:35] not going to work. So we can just [34:36] directly speak to the LLM. It's worth [34:38] it. Um a huge amount of documentation is [34:41] currently written for people. So you [34:42] will see things like lists and bold and [34:45] pictures and this is not directly [34:47] accessible by an LLM. So I see some of [34:51] the services now are transitioning a lot [34:52] of the their docs to be specifically for [34:54] LLMs. So Versell and Stripe as an [34:57] example are early movers here but there [34:59] are a few more that I've seen already [35:01] and they offer their documentation in [35:04] markdown. Markdown is super easy for LMS [35:06] to understand. This is great. Um maybe [35:10] one simple example from from uh my [35:12] experience as well. Maybe some of you [35:14] know three blue one brown. He makes [35:15] beautiful animation videos on YouTube. [35:19] [Applause] [35:23] Yeah, I love this library. So that he [35:25] wrote uh Manon and I wanted to make my [35:27] own and uh there's extensive [35:30] documentations on how to use manon and [35:32] so I didn't want to actually read [35:34] through it. So I copy pasted the whole [35:35] thing to an LLM and I described what I [35:37] wanted and it just worked out of the box [35:39] like LLM just bcoded me an animation [35:41] exactly what I wanted and I was like wow [35:43] this is amazing. So if we can make docs [35:45] legible to LLMs, it's going to unlock a [35:48] huge amount of um kind of use and um I [35:51] think this is wonderful and should [35:52] should happen more. The other thing I [35:55] wanted to point out is that you do [35:56] unfortunately have to it's not just [35:57] about taking your docs and making them [35:58] appear in markdown. That's the easy [36:00] part. We actually have to change the [36:01] docs because anytime your docs say click [36:04] this is bad. An LLM will not be able to [36:06] natively take this action right now. So, [36:09] Verscell, for example, is replacing [36:11] every occurrence of click with an [36:13] equivalent curl command that your LM [36:15] agent could take on your behalf. Um, and [36:18] so I think this is very interesting. And [36:19] then, of course, there's a model context [36:21] protocol from Enthropic. And this is [36:23] also another way, it's a protocol of [36:24] speaking directly to agents as this new [36:26] consumer and manipulator of digital [36:28] information. So, I'm very bullish on [36:29] these ideas. The other thing I really [36:31] like is a number of little tools here [36:33] and there that are helping ingest data [36:36] that in like very LLM friendly formats. [36:38] So for example, when I go to a GitHub [36:40] repo like my nanoGPT repo, I can't feed [36:42] this to an LLM and ask questions about [36:44] it uh because it's you know this is a [36:46] human interface on GitHub. So when you [36:48] just change the URL from GitHub to get [36:50] ingest then uh this will actually [36:52] concatenate all the files into a single [36:54] giant text and it will create a [36:55] directory structure etc. And this is [36:57] ready to be copy pasted into your [36:59] favorite LLM and you can do stuff. Maybe [37:01] even more dramatic example of this is [37:03] deep wiki where it's not just the raw [37:05] content of these files. uh this is from [37:08] Devon but also like they have Devon [37:10] basically do analysis of the GitHub repo [37:12] and Devon basically builds up a whole [37:14] docs uh pages just for your repo and you [37:18] can imagine that this is even more [37:19] helpful to copy paste into your LLM. So [37:22] I love all the little tools that [37:23] basically where you just change the URL [37:24] and it makes something accessible to an [37:26] LLM. So this is all well and great and u [37:29] I think there should be a lot more of [37:30] it. One more note I wanted to make is [37:32] that it is absolutely possible that in [37:35] the future LLMs will be able to this is [37:38] not even future this is today they'll be [37:39] able to go around and they'll be able to [37:40] click stuff and so on but I still think [37:42] it's very worth u basically meeting LLM [37:46] halfway LLM's halfway and making it [37:48] easier for them to access all this [37:49] information uh because this is still [37:51] fairly expensive I would say to use and [37:54] uh a lot more difficult and so I do [37:56] think that lots of software there will [37:58] be a long tail where it won't like adapt [38:00] apps because these are not like live [38:02] player sort of repositories or digital [38:04] infrastructure and we will need these [38:06] tools. Uh but I think for everyone else [38:08] I think it's very worth kind of like [38:09] meeting in some middle point. So I'm [38:11] bullish on both if that makes sense. [38:14] So in summary, what an amazing time to [38:17] get into the industry. We need to [38:18] rewrite a ton of code. A ton of code [38:20] will be written by professionals and by [38:23] coders. These LLMs are kind of like [38:25] utilities, kind of like fabs, but [38:27] they're kind of especially like [38:28] operating systems. But it's so early. [38:30] It's like 1960s of operating systems and [38:34] uh and I think a lot of the analogies [38:36] cross over. Um and these LMS are kind of [38:38] like these fallible uh you know people [38:41] spirits that we have to learn to work [38:43] with. And in order to do that properly, [38:45] we need to adjust our infrastructure [38:47] towards it. So when you're building [38:48] these LLM apps, I describe some of the [38:50] ways of working effectively with these [38:52] LLMs and some of the tools that make [38:54] that uh kind of possible and how you can [38:57] spin this loop very very quickly and [38:59] basically create partial tunneling [39:00] products and then um yeah, a lot of code [39:03] has to also be written for the agents [39:04] more directly. But in any case, going [39:07] back to the Iron Man suit analogy, I [39:09] think what we'll see over the next [39:10] decade roughly is we're going to take [39:12] the slider from left to right. And I'm [39:15] very interesting. It's going to be very [39:17] interesting to see what that looks like. [39:19] And I can't wait to build it with all of [39:21] you. Thank you.