[00:01] Please welcome former director of AI
[00:04] Tesla Andre Carpathy.
[00:07] [Music]
[00:11] Hello.
[00:14] [Music]
[00:19] Wow, a lot of people here. Hello.
[00:22] Um, okay. Yeah. So I'm excited to be
[00:24] here today to talk to you about software
[00:27] in the era of AI. And I'm told that many
[00:30] of you are students like bachelors,
[00:32] masters, PhD and so on. And you're about
[00:34] to enter the industry. And I think it's
[00:36] actually like an extremely unique and
[00:37] very interesting time to enter the
[00:38] industry right now. And I think
[00:41] fundamentally the reason for that is
[00:43] that um software is changing uh again.
[00:47] And I say again because I actually gave
[00:49] this talk already. Um but the problem is
[00:52] that software keeps changing. So I
[00:54] actually have a lot of material to
[00:55] create new talks and I think it's
[00:56] changing quite fundamentally. I think
[00:58] roughly speaking software has not
[01:00] changed much on such a fundamental level
[01:02] for 70 years. And then it's changed I
[01:04] think about twice quite rapidly in the
[01:06] last few years. And so there's just a
[01:08] huge amount of work to do a huge amount
[01:09] of software to write and rewrite. So
[01:12] let's take a look at maybe the realm of
[01:14] software. So if we kind of think of this
[01:16] as like the map of software this is a
[01:17] really cool tool called map of GitHub.
[01:20] Um this is kind of like all the software
[01:21] that's written. Uh these are
[01:23] instructions to the computer for
[01:24] carrying out tasks in the digital space.
[01:26] So if you zoom in here, these are all
[01:28] different kinds of repositories and this
[01:30] is all the code that has been written.
[01:31] And a few years ago I kind of observed
[01:33] that um software was kind of changing
[01:35] and there was kind of like a new type of
[01:37] software around and I called this
[01:39] software 2.0 at the time and the idea
[01:42] here was that software 1.0 is the code
[01:44] you write for the computer. Software 2.0
[01:46] know are basically neural networks and
[01:48] in particular the weights of a neural
[01:50] network and you're not writing this code
[01:53] directly you are most you are more kind
[01:55] of like tuning the data sets and then
[01:56] you're running an optimizer to create to
[01:58] create the parameters of this neural net
[02:00] and I think like at the time neural nets
[02:02] were kind of seen as like just a
[02:03] different kind of classifier like a
[02:04] decision tree or something like that and
[02:06] so I think it was kind of like um I
[02:09] think this framing was a lot more
[02:10] appropriate and now actually what we
[02:12] have is kind of like an equivalent of
[02:13] GitHub in the realm of software 2.0 And
[02:15] I think the hugging face is basically
[02:18] equivalent of GitHub in software 2.0.
[02:20] And there's also model atlas and you can
[02:22] visualize all the code written there. In
[02:24] case you're curious, by the way, the
[02:25] giant circle, the point in the middle,
[02:28] uh these are the parameters of flux, the
[02:30] image generator. And so anytime someone
[02:32] tunes a on top of a flux model, you
[02:34] basically create a git commit uh in this
[02:37] space and uh you create a different kind
[02:39] of a image generator. So basically what
[02:41] we have is software 1.0 is the computer
[02:43] code that programs a computer. Software
[02:45] 2.0 are the weights which program neural
[02:48] networks. Uh and here's an example of
[02:50] Alexet image recognizer neural network.
[02:53] Now so far all of the neural networks
[02:55] that we've been familiar with until
[02:56] recently where kind of like fixed
[02:58] function computers image to categories
[03:01] or something like that. And I think
[03:03] what's changed and I think is a quite
[03:05] fundamental change is that neural
[03:06] networks became programmable with large
[03:09] language models. And so I I see this as
[03:12] quite new, unique. It's a new kind of a
[03:14] computer and uh so in my mind it's uh
[03:18] worth giving it a new designation of
[03:19] software 3.0. And basically your prompts
[03:22] are now programs that program the LLM.
[03:25] And uh remarkably uh these uh prompts
[03:28] are written in English. So it's kind of
[03:30] a very interesting programming language.
[03:33] Um so maybe uh to summarize the
[03:36] difference if you're doing sentiment
[03:37] classification for example you can
[03:39] imagine writing some uh amount of Python
[03:42] to to basically do sentiment
[03:44] classification or you can train a neural
[03:46] net or you can prompt a large language
[03:47] model. Uh so here this is a few short
[03:50] prompt and you can imagine changing it
[03:51] and programming the computer in a
[03:52] slightly different way. So basically we
[03:54] have software 1.0 software 2.0 and I
[03:57] think we're seeing maybe you've seen a
[03:59] lot of GitHub code is not just like code
[04:01] anymore. there's a bunch of like English
[04:03] interspersed with code and so I think
[04:05] kind of there's a growing category of
[04:07] new kind of code. So not only is it a
[04:09] new programming paradigm, it's also
[04:10] remarkable to me that it's in our native
[04:12] language of English. And so when this
[04:14] blew my mind a few uh I guess years ago
[04:17] now I tweeted this and um I think it
[04:20] captured the attention of a lot of
[04:21] people and this is my currently pinned
[04:23] tweet uh is that remarkably we're now
[04:25] programming computers in English. Now,
[04:28] when I was at uh Tesla, um we were
[04:31] working on the uh autopilot and uh we
[04:34] were trying to get the car to drive and
[04:37] I sort of showed this slide at the time
[04:39] where you can imagine that the inputs to
[04:41] the car are on the bottom and they're
[04:43] going through a software stack to
[04:44] produce the steering and acceleration
[04:47] and I made the observation at the time
[04:48] that there was a ton of C++ code around
[04:51] in the autopilot which was the software
[04:52] 1.0 code and then there was some neural
[04:54] nets in there doing image recognition
[04:56] and uh I kind of observed that over time
[04:58] as we made the autopilot better
[05:00] basically the neural network grew in
[05:02] capability and size and in addition to
[05:05] that all the C++ code was being deleted
[05:08] and kind of like was um and a lot of the
[05:12] kind of capabilities and functionality
[05:14] that was originally written in 1.0 was
[05:16] migrated to 2.0. So as an example, a lot
[05:19] of the stitching up of information
[05:20] across images from the different cameras
[05:22] and across time was done by a neural
[05:24] network and we were able to delete a lot
[05:26] of code and so the software 2.0 stack
[05:29] quite literally ate through the software
[05:32] stack of the autopilot. So I thought
[05:34] this was really remarkable at the time
[05:35] and I think we're seeing the same thing
[05:37] again where uh basically we have a new
[05:39] kind of software and it's eating through
[05:40] the stack. We have three completely
[05:42] different programming paradigms and I
[05:44] think if you're entering the industry
[05:45] it's a very good idea to be fluent in
[05:47] all of them because they all have slight
[05:49] pros and cons and you may want to
[05:50] program some functionality in 1.0 or 2.0
[05:53] or 3.0. Are you going to train
[05:54] neurallet? Are you going to just prompt
[05:55] an LLM? Should this be a piece of code
[05:57] that's explicit etc. So we all have to
[05:59] make these decisions and actually
[06:00] potentially uh fluidly trans transition
[06:03] between these paradigms. So what I
[06:06] wanted to get into now is first I want
[06:09] to in the first part talk about LLMs and
[06:11] how to kind of like think of this new
[06:13] paradigm and the ecosystem and what that
[06:15] looks like. Uh like what are what is
[06:17] this new computer? What does it look
[06:18] like and what does the ecosystem look
[06:20] like? Um I was struck by this quote from
[06:23] Anduring actually uh many years ago now
[06:25] I think and I think Andrew is going to
[06:27] be speaking right after me. Uh but he
[06:29] said at the time AI is the new
[06:30] electricity and I do think that it um
[06:33] kind of captures something very
[06:34] interesting in that LLMs certainly feel
[06:36] like they have properties of utilities
[06:38] right now. So
[06:41] um LLM labs like OpenAI, Gemini,
[06:44] Enthropic etc. They spend capex to train
[06:47] the LLMs and this is kind of equivalent
[06:48] to building out a grid and then there's
[06:51] opex to serve that intelligence over
[06:53] APIs to all of us and this is done
[06:56] through metered access where we pay per
[06:58] million tokens or something like that
[07:00] and we have a lot of demands that are
[07:01] very utility- like demands out of this
[07:03] API we demand low latency high uptime
[07:06] consistent quality etc. In electricity,
[07:08] you would have a transfer switch. So you
[07:10] can transfer your electricity source
[07:12] from like grid and solar or battery or
[07:14] generator. In LLM, we have maybe open
[07:16] router and easily switch between the
[07:18] different types of LLMs that exist.
[07:20] Because the LLM are software, they don't
[07:23] compete for physical space. So it's okay
[07:25] to have basically like six electricity
[07:26] providers and you can switch between
[07:28] them, right? Because they don't compete
[07:29] in such a direct way. And I think what's
[07:31] also a little fascinating and we saw
[07:33] this in the last few days actually a lot
[07:36] of the LLMs went down and people were
[07:38] kind of like stuck and unable to work.
[07:41] And uh I think it's kind of fascinating
[07:42] to me that when the state-of-the-art
[07:43] LLMs go down, it's actually kind of like
[07:45] an intelligence brownout in the world.
[07:47] It's kind of like when the voltage is
[07:49] unreliable in the grid and uh the planet
[07:52] just gets dumber the more reliance we
[07:55] have on these models, which already is
[07:56] like really dramatic and I think will
[07:58] continue to grow. But LLM's don't only
[08:00] have properties of utilities. I think
[08:02] it's also fair to say that they have
[08:03] some properties of fabs. And the reason
[08:06] for this is that the capex required for
[08:09] building LLM is actually quite large. Uh
[08:12] it's not just like building some uh
[08:14] power station or something like that,
[08:15] right? You're investing a huge amount of
[08:17] money and I think the tech tree and uh
[08:20] for the technology is growing quite
[08:22] rapidly. So we're in a world where we
[08:24] have sort of deep tech trees, research
[08:26] and development secrets that are
[08:28] centralizing inside the LLM labs. Um and
[08:32] but I think the analogy muddies a little
[08:34] bit also because as I mentioned this is
[08:36] software and software is a bit less
[08:38] defensible because it is so malleable.
[08:40] And so um I think it's just an
[08:43] interesting kind of thing to think about
[08:44] potentially. There's many analogy
[08:46] analogies you can make like a 4
[08:48] nanometer process node maybe is
[08:49] something like a cluster with certain
[08:51] max flops. You can think about when
[08:53] you're use when you're using Nvidia GPUs
[08:54] and you're only doing the software and
[08:56] you're not doing the hardware. That's
[08:57] kind of like the fabless model. But if
[08:59] you're actually also building your own
[09:00] hardware and you're training on TPUs if
[09:02] you're Google, that's kind of like the
[09:03] Intel model where you own your fab. So I
[09:05] think there's some analogies here that
[09:06] make sense. But actually I think the
[09:08] analogy that makes the most sense
[09:09] perhaps is that in my mind LLM have very
[09:12] strong kind of analogies to operating
[09:15] systems. Uh in that this is not just
[09:17] electricity or water. It's not something
[09:19] that comes out of the tap as a
[09:20] commodity. uh this is these are now
[09:22] increasingly complex software ecosystems
[09:25] right so uh they're not just like simple
[09:28] commodities like electricity and it's
[09:30] kind of interesting to me that the
[09:32] ecosystem is shaping in a very similar
[09:33] kind of way where you have a few closed
[09:36] source providers like Windows or Mac OS
[09:38] and then you have an open source
[09:39] alternative like Linux and I think for u
[09:42] neural for LLMs as well we have a kind
[09:45] of a few competing closed source
[09:47] providers and then maybe the llama
[09:49] ecosystem is currently like maybe a
[09:51] close approximation to something that
[09:53] may grow into something like Linux.
[09:55] Again, I think it's still very early
[09:56] because these are just simple LLMs, but
[09:58] we're starting to see that these are
[09:59] going to get a lot more complicated.
[10:01] It's not just about the LLM itself. It's
[10:02] about all the tool use and the
[10:03] multiodalities and how all of that
[10:05] works. And so when I sort of had this
[10:07] realization a while back, I tried to
[10:09] sketch it out and it kind of seemed to
[10:11] me like LLMs are kind of like a new
[10:12] operating system, right? So the LLM is a
[10:15] new kind of a computer. It's sitting
[10:17] it's kind of like the CPU equivalent. uh
[10:19] the context windows are kind of like the
[10:21] memory and then the LLM is orchestrating
[10:24] memory and compute uh for problem
[10:26] solving um using all of these uh
[10:29] capabilities here and so definitely if
[10:32] you look at it looks very much like
[10:34] operating system from that perspective.
[10:36] Um, a few more analogies. For example,
[10:38] if you want to download an app, say I go
[10:41] to VS Code and I go to download, you can
[10:43] download VS Code and you can run it on
[10:46] Windows, Linux or or Mac in the same way
[10:50] as you can take an LLM app like cursor
[10:53] and you can run it on GPT or cloud or
[10:55] Gemini series, right? It's just a drop
[10:57] down. So, it's kind of like similar in
[10:59] that way as well.
[11:00] uh more analogies that I think strike me
[11:02] is that we're kind of like in this
[11:04] 1960sish
[11:05] era where LLM compute is still very
[11:09] expensive for this new kind of a
[11:10] computer and that forces the LLMs to be
[11:13] centralized in the cloud and we're all
[11:15] just uh sort of thing clients that
[11:18] interact with it over the network and
[11:20] none of us have full utilization of
[11:22] these computers and therefore it makes
[11:24] sense to use time sharing where we're
[11:26] all just you know a dimension of the
[11:28] batch when they're running the computer
[11:30] in the cloud. And this is very much what
[11:32] computers used to look like at during
[11:33] this time. The operating systems were in
[11:35] the cloud. Everything was streamed
[11:36] around and there was batching. And so
[11:39] the p the personal computing revolution
[11:41] hasn't happened yet because it's just
[11:42] not economical. It doesn't make sense.
[11:44] But I think some people are trying. And
[11:46] it turns out that Mac minis, for
[11:48] example, are a very good fit for some of
[11:50] the LLMs because it's all if you're
[11:52] doing batch one inference, this is all
[11:53] super memory bound. So this actually
[11:55] works.
[11:56] And uh I think these are some early
[11:58] indications maybe of personal computing.
[12:00] Uh but this hasn't really happened yet.
[12:02] It's not clear what this looks like.
[12:03] Maybe some of you get to invent what
[12:05] what this is or how it works or uh what
[12:08] this should what this should be. Maybe
[12:10] one more analogy that I'll mention is
[12:12] whenever I talk to Chach or some LLM
[12:14] directly in text, I feel like I'm
[12:16] talking to an operating system through
[12:18] the terminal. Like it's just it's it's
[12:21] text. It's direct access to the
[12:22] operating system. And I think a guey
[12:24] hasn't yet really been invented in like
[12:26] a general way like should chatt have a
[12:29] guey like different than just a tech
[12:31] bubbles. Uh certainly some of the apps
[12:33] that we're going to go into in a bit
[12:35] have guey but there's no like guey
[12:38] across all the tasks if that makes
[12:40] sense. Um there are some ways in which
[12:43] LLMs are different from kind of
[12:45] operating systems in some fairly unique
[12:47] way and from early computing. And I
[12:49] wrote about uh this one particular
[12:52] property that strikes me as very
[12:54] different uh this time around. It's that
[12:57] LLMs like flip they flip the direction
[12:59] of technology diffusion uh that is
[13:02] usually uh present in technology. So for
[13:05] example with electricity, cryptography,
[13:07] computing, flight, internet, GPS, lots
[13:09] of new transformative technologies that
[13:10] have not been around. Typically it is
[13:12] the government and corporations that are
[13:14] the first users because it's new and
[13:16] expensive etc. and it only later
[13:18] diffuses to consumer. Uh, but I feel
[13:20] like LLMs are kind of like flipped
[13:22] around. So maybe with early computers,
[13:24] it was all about ballistics and military
[13:26] use, but with LLMs, it's all about how
[13:29] do you boil an egg or something like
[13:30] that. This is certainly like a lot of my
[13:32] use. And so it's really fascinating to
[13:33] me that we have a new magical computer
[13:35] and it's like helping me boil an egg.
[13:37] It's not helping the government do
[13:38] something really crazy like some
[13:40] military ballistics or some special
[13:42] technology. Indeed, corporations are
[13:43] governments are lagging behind the
[13:45] adoption of all of us, of all of these
[13:47] technologies. So, it's just backwards
[13:48] and I think it informs maybe some of the
[13:50] uses of how we want to use this
[13:52] technology or like where are some of the
[13:53] first apps and so on.
[13:56] So, in summary so far, LLM labs LLMs. I
[14:01] think it's accurate language to use, but
[14:03] LLMs are complicated operating systems.
[14:06] They're circa 1960s in computing and
[14:08] we're redoing computing all over again.
[14:10] and they're currently available via time
[14:11] sharing and distributed like a utility.
[14:13] What is new and unprecedented is that
[14:16] they're not in the hands of a few
[14:17] governments and corporations. They're in
[14:18] the hands of all of us because we all
[14:20] have a computer and it's all just
[14:21] software and Chaship was beamed down to
[14:24] our computers like billions of people
[14:26] like instantly and overnight and this is
[14:28] insane. Uh and it's kind of insane to me
[14:30] that this is the case and now it is our
[14:33] time to enter the industry and program
[14:34] these computers. This is crazy. So I
[14:37] think this is quite remarkable. Before
[14:39] we program LLMs, we have to kind of like
[14:42] spend some time to think about what
[14:43] these things are. And I especially like
[14:45] to kind of talk about their psychology.
[14:48] So the way I like to think about LLMs is
[14:50] that they're kind of like people
[14:51] spirits. Um they are stoastic
[14:54] simulations of people. Um and the
[14:56] simulator in this case happens to be an
[14:58] auto reggressive transformer. So
[14:59] transformer is a neural net. Uh it's and
[15:02] it just kind of like is goes on the
[15:04] level of tokens. It goes chunk chunk
[15:06] chunk chunk chunk. And there's an almost
[15:08] equal amount of compute for every single
[15:10] chunk. Um and um this simulator of
[15:14] course is is just is basically there's
[15:16] some weights involved and we fit it to
[15:19] all of text that we have on the internet
[15:20] and so on. And you end up with this kind
[15:22] of a simulator and because it is trained
[15:24] on humans, it's got this emergent
[15:26] psychology that is humanlike. So the
[15:28] first thing you'll notice is of course
[15:30] uh LLM have encyclopedic knowledge and
[15:32] memory. uh and they can remember lots of
[15:34] things, a lot more than any single
[15:36] individual human can because they read
[15:37] so many things. It's it actually kind of
[15:39] reminds me of this movie Rainman, which
[15:41] I actually really recommend people
[15:43] watch. It's an amazing movie. I love
[15:44] this movie. Um and Dustin Hoffman here
[15:46] is an autistic savant who has almost
[15:49] perfect memory. So, he can read a he can
[15:51] read like a phone book and remember all
[15:53] of the names and phone numbers. And I
[15:55] kind of feel like LM are kind of like
[15:57] very similar. They can remember Shaw
[15:58] hashes and lots of different kinds of
[16:00] things very very easily. So they
[16:02] certainly have superpowers in some set
[16:04] in some respects. But they also have a
[16:06] bunch of I would say cognitive deficits.
[16:08] So they hallucinate quite a bit. Um and
[16:11] they kind of make up stuff and don't
[16:13] have a very good uh sort of internal
[16:15] model of self-nowledge, not sufficient
[16:17] at least. And this has gotten better but
[16:19] not perfect. They display jagged
[16:21] intelligence. So they're going to be
[16:22] superhuman in some problems solving
[16:24] domains. And then they're going to make
[16:26] mistakes that basically no human will
[16:27] make. like you know they will insist
[16:29] that 9.11 is greater than 9.9 or that
[16:32] there are two Rs in strawberry these are
[16:34] some famous examples but basically there
[16:36] are rough edges that you can trip on so
[16:38] that's kind of I think also kind of
[16:40] unique um they also kind of suffer from
[16:43] entrograde amnesia um so uh and I think
[16:46] I'm alluding to the fact that if you
[16:48] have a co-orker who joins your
[16:49] organization this co-orker will over
[16:51] time learn your organization and uh they
[16:54] will understand and gain like a huge
[16:55] amount of context on the organization
[16:57] and they go home and they sleep and they
[16:59] consolidate knowledge and they develop
[17:01] expertise over time. LLMs don't natively
[17:03] do this and this is not something that
[17:04] has really been solved in the R&D of
[17:06] LLM. I think um and so context windows
[17:09] are really kind of like working memory
[17:10] and you have to sort of program the
[17:12] working memory quite directly because
[17:13] they don't just kind of like get smarter
[17:15] by uh by default and I think a lot of
[17:17] people get tripped up by the analogies
[17:19] uh in this way. Uh in popular culture I
[17:22] recommend people watch these two movies
[17:23] uh Momento and 51st dates. In both of
[17:26] these movies, the protagonists, their
[17:27] weights are fixed and their context
[17:29] windows gets wiped every single morning
[17:32] and it's really problematic to go to
[17:34] work or have relationships when this
[17:35] happens and this happens to all the
[17:37] time. I guess one more thing I would
[17:39] point to is security kind of related
[17:42] limitations of the use of LLM. So for
[17:44] example, LLMs are quite gullible. Uh
[17:46] they are susceptible to prompt injection
[17:48] risks. They might leak your data etc.
[17:50] And so um and there's many other
[17:52] considerations uh security related. So,
[17:55] so basically long story short, you have
[17:57] to load your you have to load your you
[18:00] have to simultaneously think through
[18:01] this superhuman thing that has a bunch
[18:03] of cognitive deficits and issues. How do
[18:05] we and yet they are extremely like
[18:07] useful and so how do we program them and
[18:10] how do we work around their deficits and
[18:12] enjoy their superhuman powers.
[18:15] So what I want to switch to now is talk
[18:17] about the opportunities of how do we use
[18:18] these models and what are some of the
[18:20] biggest opportunities. This is not a
[18:22] comprehensive list just some of the
[18:23] things that I thought were interesting
[18:24] for this talk. The first thing I'm kind
[18:26] of excited about is what I would call
[18:29] partial autonomy apps. So for example,
[18:32] let's work with the example of coding.
[18:34] You can certainly go to chacht directly
[18:36] and you can start copy pasting code
[18:38] around and copyping bug reports and
[18:40] stuff around and getting code and copy
[18:42] pasting everything around. Why would you
[18:44] why would you do that? Why would you go
[18:45] directly to the operating system? It
[18:47] makes a lot more sense to have an app
[18:48] dedicated for this. And so I think many
[18:50] of you uh use uh cursor. I do as well.
[18:53] And uh cursor is kind of like the thing
[18:56] you want instead. You don't want to just
[18:57] directly go to the chash apt. And I
[18:59] think cursor is a very good example of
[19:01] an early LLM app that has a bunch of
[19:03] properties that I think are um useful
[19:06] across all the LLM apps. So in
[19:08] particular, you will notice that we have
[19:09] a traditional interface that allows a
[19:12] human to go in and do all the work
[19:13] manually just as before. But in addition
[19:16] to that, we now have this LLM
[19:17] integration that allows us to go in
[19:19] bigger chunks. And so some of the
[19:21] properties of LLM apps that I think are
[19:23] shared and useful to point out. Number
[19:25] one, the LLMs basically do a ton of the
[19:28] context management. Um, number two, they
[19:31] orchestrate multiple calls to LLMs,
[19:33] right? So in the case of cursor, there's
[19:34] under the hood embedding models for all
[19:36] your files, the actual chat models,
[19:39] models that apply diffs to the code, and
[19:41] this is all orchestrated for you. A
[19:43] really big one that uh I think also
[19:46] maybe not fully appreciated always is
[19:48] application specific uh GUI and the
[19:50] importance of it. Um because you don't
[19:53] just want to talk to the operating
[19:54] system directly in text. Text is very
[19:56] hard to read, interpret, understand and
[19:59] also like you don't want to take some of
[20:00] these actions natively in text. So it's
[20:03] much better to just see a diff as like
[20:05] red and green change and you can see
[20:06] what's being added is subtracted. It's
[20:08] much easier to just do command Y to
[20:10] accept or command N to reject. I
[20:11] shouldn't have to type it in text,
[20:13] right? So, a guey allows a human to
[20:15] audit the work of these fallible systems
[20:17] and to go faster. I'm going to come back
[20:20] to this point a little bit uh later as
[20:21] well. And the last kind of feature I
[20:23] want to point out is that there's what I
[20:25] call the autonomy slider. So, for
[20:27] example, in cursor, you can just do tap
[20:29] completion. You're mostly in charge. You
[20:31] can select a chunk of code and command K
[20:33] to change just that chunk of code. You
[20:36] can do command L to change the entire
[20:37] file. Or you can do command I which just
[20:40] you know let it rip do whatever you want
[20:42] in the entire repo and that's the sort
[20:44] of full autonomy agent agentic version
[20:46] and so you are in charge of the autonomy
[20:48] slider and depending on the complexity
[20:50] of the task at hand you can uh tune the
[20:53] amount of autonomy that you're willing
[20:54] to give up uh for that task maybe to
[20:57] show one more example of a fairly
[20:58] successful LLM app uh perplexity um it
[21:03] also has very similar features to what
[21:04] I've just pointed out to in cursor uh it
[21:07] packages up a lot of the information. It
[21:08] orchestrates multiple LLMs. It's got a
[21:10] GUI that allows you to audit some of its
[21:13] work. So, for example, it will site
[21:15] sources and you can imagine inspecting
[21:17] them. And it's got an autonomy slider.
[21:18] You can either just do a quick search or
[21:20] you can do research or you can do deep
[21:22] research and come back 10 minutes later.
[21:24] So, this is all just varying levels of
[21:25] autonomy that you give up to the tool.
[21:27] So, I guess my question is I feel like a
[21:30] lot of software will become partially
[21:32] autonomous. I'm trying to think through
[21:33] like what does that look like? And for
[21:35] many of you who maintain products and
[21:36] services, how are you going to make your
[21:38] products and services partially
[21:40] autonomous? Can an LLM see everything
[21:42] that a human can see? Can an LLM act in
[21:45] all the ways that a human could act? And
[21:47] can humans supervise and stay in the
[21:49] loop of this activity? Because again,
[21:50] these are fallible systems that aren't
[21:52] yet perfect. And what does a diff look
[21:54] like in Photoshop or something like
[21:56] that? You know, and also a lot of the
[21:58] traditional software right now, it has
[22:00] all these switches and all this kind of
[22:01] stuff that's all designed for human. All
[22:03] of this has to change and become
[22:04] accessible to LLMs.
[22:07] So, one thing I want to stress with a
[22:09] lot of these LLM apps that I'm not sure
[22:11] gets as much attention as it should is
[22:14] um we we're now kind of like cooperating
[22:16] with AIS and usually they are doing the
[22:18] generation and we as humans are doing
[22:20] the verification. It is in our interest
[22:22] to make this loop go as fast as
[22:24] possible. So, we're getting a lot of
[22:25] work done. There are two major ways that
[22:28] I think uh this can be done. Number one,
[22:30] you can speed up verification a lot. Um,
[22:32] and I think guies, for example, are
[22:34] extremely important to this because a
[22:36] guey utilizes your computer vision GPU
[22:39] in all of our head. Reading text is
[22:41] effortful and it's not fun, but looking
[22:43] at stuff is fun and it's it's just a
[22:45] kind of like a highway to your brain.
[22:47] So, I think guies are very useful for
[22:49] auditing systems and visual
[22:51] representations in general. And number
[22:53] two, I would say is we have to keep the
[22:56] AI on the leash. We I think a lot of
[22:58] people are getting way over excited with
[23:00] AI agents and uh it's not useful to me
[23:03] to get a diff of 10,000 lines of code to
[23:05] my repo. Like I have to I'm still the
[23:07] bottleneck, right? Even though that
[23:09] 10,00 lines come out instantly, I have
[23:11] to make sure that this thing is not
[23:12] introducing bugs. It's just like and
[23:15] that it's doing the correct thing,
[23:16] right? And that there's no security
[23:17] issues and so on. So um I think that um
[23:22] yeah basically you we have to sort of
[23:25] like it's in our interest to make the
[23:28] the flow of these two go very very fast
[23:30] and we have to somehow keep the AI on
[23:32] the leash because it gets way too
[23:33] overreactive. It's uh it's kind of like
[23:35] this. This is how I feel when I do AI
[23:37] assisted coding. If I'm just bite coding
[23:39] everything is nice and great but if I'm
[23:40] actually trying to get work done it's
[23:42] not so great to have an overreactive uh
[23:44] agent doing all this kind of stuff. So
[23:47] this slide is not very good. I'm sorry,
[23:48] but I guess I'm trying to develop like
[23:51] many of you some ways of utilizing these
[23:53] agents in my coding workflow and to do
[23:55] AI assisted coding. And in my own work,
[23:58] I'm always scared to get way too big
[23:59] diffs. I always go in small incremental
[24:02] chunks. I want to make sure that
[24:04] everything is good. I want to spin this
[24:06] loop very very fast and um I sort of
[24:09] work on small chunks of single concrete
[24:10] thing. Uh and so I think many of you
[24:13] probably are developing similar ways of
[24:14] working with the with LLMs.
[24:17] Um, I also saw a number of blog posts
[24:19] that try to develop these best practices
[24:22] for working with LLMs. And here's one
[24:24] that I read recently and I thought was
[24:25] quite good. And it kind of discussed
[24:26] some techniques and some of them have to
[24:28] do with how you keep the AI on the
[24:29] leash. And so, as an example, if you are
[24:32] prompting, if your prompt is vague, then
[24:34] uh the AI might not do exactly what you
[24:36] wanted and in that case, verification
[24:38] will fail. You're going to ask for
[24:40] something else. If a verification fails,
[24:42] then you're going to start spinning. So
[24:43] it makes a lot more sense to spend a bit
[24:45] more time to be more concrete in your
[24:46] prompts which increases the probability
[24:48] of successful verification and you can
[24:50] move forward. And so I think a lot of us
[24:52] are going to end up finding um kind of
[24:54] techniques like this. I think in my own
[24:56] work as well I'm currently interested in
[24:57] uh what education looks like in um
[25:00] together with kind of like now that we
[25:01] have AI uh and LLMs what does education
[25:04] look like? And I think a a large amount
[25:07] of thought for me goes into how we keep
[25:09] AI on the leash. I don't think it just
[25:11] works to go to chat and be like, "Hey,
[25:13] teach me physics." I don't think this
[25:14] works because the AI is like gets lost
[25:16] in the woods. And so for me, this is
[25:18] actually two separate apps. For example,
[25:20] there's an app for a teacher that
[25:22] creates courses and then there's an app
[25:24] that takes courses and serves them to
[25:26] students. And in both cases, we now have
[25:29] this intermediate artifact of a course
[25:31] that is auditable and we can make sure
[25:32] it's good. We can make sure it's
[25:33] consistent. and the AI is kept on the
[25:35] leash with respect to a certain
[25:37] syllabus, a certain like um progression
[25:40] of projects and so on. And so this is
[25:42] one way of keeping the AI on leash and I
[25:44] think has a much higher likelihood of
[25:45] working and the AI is not getting lost
[25:47] in the woods.
[25:49] One more kind of analogy I wanted to
[25:51] sort of allude to is I'm not I'm no
[25:54] stranger to partial autonomy and I kind
[25:56] of worked on this I think for five years
[25:57] at Tesla and this is also a partial
[26:00] autonomy product and shares a lot of the
[26:01] features like for example right there in
[26:03] the instrument panel is the GUI of the
[26:05] autopilot so it's showing me what the
[26:07] what the neural network sees and so on
[26:09] and we have the autonomy slider where
[26:10] over the course of my tenure there we
[26:13] did more and more autonomous tasks for
[26:15] the user and maybe the story that I
[26:18] wanted to tell very briefly is uh
[26:21] actually the first time I drove a
[26:22] self-driving vehicle was in 2013 and I
[26:25] had a friend who worked at Whimo and uh
[26:27] he offered to give me a drive around
[26:29] Palo Alto. I took this picture using
[26:31] Google Glass at the time and many of you
[26:33] are so young that you might not even
[26:35] know what that is. Uh but uh yeah, this
[26:37] was like all the rage at the time. And
[26:39] we got into this car and we went for
[26:40] about a 30-minute drive around Palo Alto
[26:42] highways uh streets and so on. And this
[26:45] drive was perfect. There was zero
[26:46] interventions and this was 2013 which is
[26:49] now 12 years ago. And it kind of struck
[26:52] me because at the time when I had this
[26:54] perfect drive, this perfect demo, I felt
[26:56] like, wow, self-driving is imminent
[26:59] because this just worked. This is
[27:00] incredible. Um, but here we are 12 years
[27:03] later and we are still working on
[27:04] autonomy. Um, we are still working on
[27:07] driving agents and even now we haven't
[27:09] actually like really solved the problem.
[27:10] like you may see Whimos going around and
[27:12] they look driverless but you know
[27:14] there's still a lot of teleoperation and
[27:16] a lot of human in the loop of a lot of
[27:18] this driving so we still haven't even
[27:20] like declared success but I think it's
[27:22] definitely like going to succeed at this
[27:24] point but it just took a long time and
[27:26] so I think like like this is software is
[27:29] really tricky I think in the same way
[27:31] that driving is tricky and so when I see
[27:34] things like oh 2025 is the year of
[27:36] agents I get very concerned and I kind
[27:38] of feel like you know this is the decade
[27:41] of agents and this is going to be quite
[27:44] some time. We need humans in the loop.
[27:45] We need to do this carefully. This is
[27:47] software. Let's be serious here. One
[27:51] more kind of analogy that I always think
[27:52] through is the Iron Man suit. Uh I think
[27:56] this is I always love Iron Man. I think
[27:58] it's like so um correct in a bunch of
[28:01] ways with respect to technology and how
[28:02] it will play out. And what I love about
[28:04] the Iron Man suit is that it's both an
[28:05] augmentation and Tony Stark can drive it
[28:08] and it's also an agent. And in some of
[28:10] the movies, the Iron Man suit is quite
[28:11] autonomous and can fly around and find
[28:13] Tony and all this kind of stuff. And so
[28:15] this is the autonomy slider is we can be
[28:17] we can build augmentations or we can
[28:19] build agents and we kind of want to do a
[28:21] bit of both. But at this stage I would
[28:23] say working with fallible LLMs and so
[28:25] on. I would say you know it's less Iron
[28:29] Man robots and more Iron Man suits that
[28:31] you want to build. It's less like
[28:33] building flashy demos of autonomous
[28:35] agents and more building partial
[28:36] autonomy products. And these products
[28:39] have custom gueies and UIUX. And we're
[28:41] trying to um and this is done so that
[28:43] the generation verification loop of the
[28:45] human is very very fast. But we are not
[28:48] losing the sight of the fact that it is
[28:49] in principle possible to automate this
[28:51] work. And there should be an autonomy
[28:52] slider in your product. And you should
[28:54] be thinking about how you can slide that
[28:55] autonomy slider and make your product uh
[28:58] sort of um more autonomous over time.
[29:01] But this is kind of how I think there's
[29:02] lots of opportunities in these kinds of
[29:04] products. I want to now switch gears a
[29:06] little bit and talk about one other
[29:08] dimension that I think is very unique.
[29:09] Not only is there a new type of
[29:11] programming language that allows for
[29:12] autonomy in software but also as I
[29:15] mentioned it's programmed in English
[29:16] which is this natural interface and
[29:19] suddenly everyone is a programmer
[29:20] because everyone speaks natural language
[29:22] like English. So this is extremely
[29:24] bullish and very interesting to me and
[29:26] also completely unprecedented. I would
[29:28] say it it used to be the case that you
[29:29] need to spend five to 10 years studying
[29:31] something to be able to do something in
[29:32] software. this is not the case anymore.
[29:35] So, I don't know if by any chance anyone
[29:37] has heard of vibe coding.
[29:40] Uh, this this is the tweet that kind of
[29:42] like introduced this, but I'm told that
[29:44] this is now like a major meme. Um, fun
[29:46] story about this is that I've been on
[29:49] Twitter for like 15 years or something
[29:51] like that at this point and I still have
[29:53] no clue which tweet will become viral
[29:56] and which tweet like fizzles and no one
[29:58] cares. And I thought that this tweet was
[30:00] going to be the latter. I don't know. It
[30:01] was just like a shower of thoughts. But
[30:03] this became like a total meme and I
[30:05] really just can't tell. But I guess like
[30:06] it struck a chord and it gave a name to
[30:08] something that everyone was feeling but
[30:10] couldn't quite say in words. So now
[30:13] there's a Wikipedia page and everything.
[30:17] This is like
[30:18] [Applause]
[30:25] yeah this is like a major contribution
[30:27] now or something like that. So,
[30:30] um, so Tom Wolf from HuggingFace shared
[30:32] this beautiful video that I really love.
[30:34] Um,
[30:37] these are kids vibe coding.
[30:42] And I find that this is such a wholesome
[30:44] video. Like, I love this video. Like,
[30:46] how can you look at this video and feel
[30:48] bad about the future? The future is
[30:49] great.
[30:52] I think this will end up being like a
[30:53] gateway drug to software development.
[30:56] Um, I'm not a doomer about the future of
[30:59] the generation and I think yeah, I love
[31:02] this video. So, I tried by coding a
[31:04] little bit uh as well because it's so
[31:07] fun. Uh, so bike coding is so great when
[31:09] you want to build something super duper
[31:10] custom that doesn't appear to exist and
[31:12] you just want to wing it because it's a
[31:13] Saturday or something like that. So, I
[31:15] built this uh iOS app and I don't I
[31:18] can't actually program in Swift, but I
[31:20] was really shocked that I was able to
[31:21] build like a super basic app and I'm not
[31:23] going to explain it. It's really uh
[31:24] dumb, but uh I kind of like this was
[31:27] just like a day of work and this was
[31:28] running on my phone like later that day
[31:30] and I was like, "Wow, this is amazing."
[31:32] I didn't have to like read through Swift
[31:33] for like five days or something like
[31:35] that to like get started. I also
[31:38] vipcoded this app called Menu Genen. And
[31:40] this is live. You can try it in
[31:41] menu.app. And I basically had this
[31:44] problem where I show up at a restaurant,
[31:45] I read through the menu, and I have no
[31:46] idea what any of the things are. And I
[31:48] need pictures. So this doesn't exist. So
[31:51] I was like, "Hey, I'm going to bite code
[31:52] it." So, um, this is what it looks like.
[31:55] You go to menu.app,
[31:58] um, and, uh, you take a picture of a of
[32:01] a menu and then menu generates the
[32:03] images and everyone gets $5 in credits
[32:06] for free when you sign up. And
[32:08] therefore, this is a major cost center
[32:10] in my life. So, this is a negative
[32:13] negative uh, revenue app for me right
[32:16] now.
[32:17] I've lost a huge amount of money on
[32:19] menu.
[32:21] Okay. But the fascinating thing about
[32:23] menu genen for me is that the code of
[32:28] the v the vite coding part the code was
[32:30] actually the easy part of v of v coding
[32:32] menu and most of it actually was when I
[32:35] tried to make it real so that you can
[32:36] actually have authentication and
[32:37] payments and the domain name and averal
[32:39] deployment. This was really hard and all
[32:41] of this was not code. All of this devops
[32:44] stuff was in me in the browser clicking
[32:47] stuff and this was extreme slo and took
[32:49] another week. So it was really
[32:51] fascinating that I had the menu genen um
[32:54] basically demo working on my laptop in a
[32:57] few hours and then it took me a week
[32:59] because I was trying to make it real and
[33:01] the reason for this is this was just
[33:02] really annoying. Um, so for example, if
[33:05] you try to add Google login to your web
[33:07] page, I know this is very small, but
[33:09] just a huge amount of instructions of
[33:11] this clerk library telling me how to
[33:13] integrate this. And this is crazy. Like
[33:15] it's telling me go to this URL, click on
[33:17] this dropdown, choose this, go to this,
[33:19] and click on that. And it's like telling
[33:21] me what to do. Like a computer is
[33:22] telling me the actions I should be
[33:24] taking. Like you do it. Why am I doing
[33:26] this?
[33:28] What the hell?
[33:31] I had to follow all these instructions.
[33:33] This was crazy. So I think the last part
[33:36] of my talk therefore focuses on can we
[33:39] just build for agents? I don't want to
[33:41] do this work. Can agents do this? Thank
[33:44] you.
[33:46] Okay. So roughly speaking, I think
[33:48] there's a new category of consumer and
[33:50] manipulator of digital information. It
[33:53] used to be just humans through GUIs or
[33:55] computers through APIs. And now we have
[33:57] a completely new thing and agents are
[34:00] they're computers but they are humanlike
[34:02] kind of right they're people spirits
[34:04] there's people spirits on the internet
[34:05] and they need to interact with our
[34:06] software infrastructure like can we
[34:08] build for them it's a new thing so as an
[34:10] example you can have robots.txt on your
[34:12] domain and you can instruct uh or like
[34:15] advise I suppose um uh web crawlers on
[34:18] how to behave on your website in the
[34:19] same way you can have maybe lm.txt txt
[34:21] file which is just a simple markdown
[34:23] that's telling LLMs what this domain is
[34:25] about and this is very readable to a to
[34:28] an LLM. If it had to instead get the
[34:30] HTML of your web page and try to parse
[34:32] it, this is very errorprone and
[34:33] difficult and will screw it up and it's
[34:35] not going to work. So we can just
[34:36] directly speak to the LLM. It's worth
[34:38] it. Um a huge amount of documentation is
[34:41] currently written for people. So you
[34:42] will see things like lists and bold and
[34:45] pictures and this is not directly
[34:47] accessible by an LLM. So I see some of
[34:51] the services now are transitioning a lot
[34:52] of the their docs to be specifically for
[34:54] LLMs. So Versell and Stripe as an
[34:57] example are early movers here but there
[34:59] are a few more that I've seen already
[35:01] and they offer their documentation in
[35:04] markdown. Markdown is super easy for LMS
[35:06] to understand. This is great. Um maybe
[35:10] one simple example from from uh my
[35:12] experience as well. Maybe some of you
[35:14] know three blue one brown. He makes
[35:15] beautiful animation videos on YouTube.
[35:19] [Applause]
[35:23] Yeah, I love this library. So that he
[35:25] wrote uh Manon and I wanted to make my
[35:27] own and uh there's extensive
[35:30] documentations on how to use manon and
[35:32] so I didn't want to actually read
[35:34] through it. So I copy pasted the whole
[35:35] thing to an LLM and I described what I
[35:37] wanted and it just worked out of the box
[35:39] like LLM just bcoded me an animation
[35:41] exactly what I wanted and I was like wow
[35:43] this is amazing. So if we can make docs
[35:45] legible to LLMs, it's going to unlock a
[35:48] huge amount of um kind of use and um I
[35:51] think this is wonderful and should
[35:52] should happen more. The other thing I
[35:55] wanted to point out is that you do
[35:56] unfortunately have to it's not just
[35:57] about taking your docs and making them
[35:58] appear in markdown. That's the easy
[36:00] part. We actually have to change the
[36:01] docs because anytime your docs say click
[36:04] this is bad. An LLM will not be able to
[36:06] natively take this action right now. So,
[36:09] Verscell, for example, is replacing
[36:11] every occurrence of click with an
[36:13] equivalent curl command that your LM
[36:15] agent could take on your behalf. Um, and
[36:18] so I think this is very interesting. And
[36:19] then, of course, there's a model context
[36:21] protocol from Enthropic. And this is
[36:23] also another way, it's a protocol of
[36:24] speaking directly to agents as this new
[36:26] consumer and manipulator of digital
[36:28] information. So, I'm very bullish on
[36:29] these ideas. The other thing I really
[36:31] like is a number of little tools here
[36:33] and there that are helping ingest data
[36:36] that in like very LLM friendly formats.
[36:38] So for example, when I go to a GitHub
[36:40] repo like my nanoGPT repo, I can't feed
[36:42] this to an LLM and ask questions about
[36:44] it uh because it's you know this is a
[36:46] human interface on GitHub. So when you
[36:48] just change the URL from GitHub to get
[36:50] ingest then uh this will actually
[36:52] concatenate all the files into a single
[36:54] giant text and it will create a
[36:55] directory structure etc. And this is
[36:57] ready to be copy pasted into your
[36:59] favorite LLM and you can do stuff. Maybe
[37:01] even more dramatic example of this is
[37:03] deep wiki where it's not just the raw
[37:05] content of these files. uh this is from
[37:08] Devon but also like they have Devon
[37:10] basically do analysis of the GitHub repo
[37:12] and Devon basically builds up a whole
[37:14] docs uh pages just for your repo and you
[37:18] can imagine that this is even more
[37:19] helpful to copy paste into your LLM. So
[37:22] I love all the little tools that
[37:23] basically where you just change the URL
[37:24] and it makes something accessible to an
[37:26] LLM. So this is all well and great and u
[37:29] I think there should be a lot more of
[37:30] it. One more note I wanted to make is
[37:32] that it is absolutely possible that in
[37:35] the future LLMs will be able to this is
[37:38] not even future this is today they'll be
[37:39] able to go around and they'll be able to
[37:40] click stuff and so on but I still think
[37:42] it's very worth u basically meeting LLM
[37:46] halfway LLM's halfway and making it
[37:48] easier for them to access all this
[37:49] information uh because this is still
[37:51] fairly expensive I would say to use and
[37:54] uh a lot more difficult and so I do
[37:56] think that lots of software there will
[37:58] be a long tail where it won't like adapt
[38:00] apps because these are not like live
[38:02] player sort of repositories or digital
[38:04] infrastructure and we will need these
[38:06] tools. Uh but I think for everyone else
[38:08] I think it's very worth kind of like
[38:09] meeting in some middle point. So I'm
[38:11] bullish on both if that makes sense.
[38:14] So in summary, what an amazing time to
[38:17] get into the industry. We need to
[38:18] rewrite a ton of code. A ton of code
[38:20] will be written by professionals and by
[38:23] coders. These LLMs are kind of like
[38:25] utilities, kind of like fabs, but
[38:27] they're kind of especially like
[38:28] operating systems. But it's so early.
[38:30] It's like 1960s of operating systems and
[38:34] uh and I think a lot of the analogies
[38:36] cross over. Um and these LMS are kind of
[38:38] like these fallible uh you know people
[38:41] spirits that we have to learn to work
[38:43] with. And in order to do that properly,
[38:45] we need to adjust our infrastructure
[38:47] towards it. So when you're building
[38:48] these LLM apps, I describe some of the
[38:50] ways of working effectively with these
[38:52] LLMs and some of the tools that make
[38:54] that uh kind of possible and how you can
[38:57] spin this loop very very quickly and
[38:59] basically create partial tunneling
[39:00] products and then um yeah, a lot of code
[39:03] has to also be written for the agents
[39:04] more directly. But in any case, going
[39:07] back to the Iron Man suit analogy, I
[39:09] think what we'll see over the next
[39:10] decade roughly is we're going to take
[39:12] the slider from left to right. And I'm
[39:15] very interesting. It's going to be very
[39:17] interesting to see what that looks like.
[39:19] And I can't wait to build it with all of
[39:21] you. Thank you.