[00:02] We're so excited for our very first
[00:03] special guest. He has helped build
[00:06] modern AI, then explain modern AI, and
[00:10] then occasionally rename modern AI. He
[00:14] actually helped co-ound open AAI right
[00:16] inside of this office. Was the one who
[00:18] actually got Autopilot working at Tesla
[00:21] back in the day, and he has a rare gift
[00:23] of making the most complex technical
[00:26] shifts feel both accessible and
[00:28] inevitable.
[00:30] You all know him for having coined the
[00:31] term vibe coding last year, but just in
[00:35] the last few months, he said something
[00:36] even more startling. That he's never
[00:38] felt more behind as a programmer. That's
[00:41] where we're starting today. Thank you,
[00:43] Andre, for joining us.
[00:44] >> Yeah. Hello. Excited to be here and to
[00:46] kick us off.
[00:47] >> Okay. So, just a couple months ago, you
[00:49] said that you've never felt more behind
[00:51] as a programmer. That's startling to
[00:53] hear from you of all people. Um, can you
[00:55] help us unpack that? Was that feeling
[00:57] exhilarating or unsettling?
[01:00] >> Uh yeah, a mixture of both for sure. Uh
[01:02] well, first of all, um
[01:05] I guess like as many of you, I've been
[01:06] using agentic tools like lot code,
[01:08] adjacent things, uh for a while, maybe
[01:10] over the last year as it came out and it
[01:12] was very good at you know chunks of code
[01:13] and sometimes it would mess up and you
[01:15] have to edit them and it was kind of
[01:16] helpful and then I would say December
[01:18] was this uh clear point where for me I
[01:21] was on a break so I had a bit more time.
[01:22] I think many other people were similar
[01:24] and uh I just started to notice that
[01:26] with the latest models uh the chunks
[01:28] just came out fine and then I kept
[01:30] asking for more and it just came out
[01:31] fine and then I can't remember the last
[01:32] time I corrected it and then I was I
[01:34] just you know trusted the system more
[01:36] and more and then I was vibe coding
[01:38] [laughter]
[01:39] and uh so it was kind of a I do think
[01:42] that it was a very stark transition. I
[01:43] think that a lot of people actually I
[01:45] tried to I tried to stress this on uh
[01:47] Twitter and or X because I think a lot
[01:49] of people experienced AI last year as
[01:52] ChachiPT adjacent thing. Uh but you
[01:54] really had to look again and you had to
[01:55] look as of December uh because things
[01:58] have changed fundamentally and uh
[01:59] especially on this like agentic coherent
[02:01] workflow uh that really started to
[02:04] actually work. Um, and so I would say
[02:07] that um, yeah, it was just that
[02:09] realization that really uh, uh, had me
[02:12] um, go down their whole rabbit hole of
[02:14] just, you know, infinity side projects.
[02:16] Uh, my side projects folder is like
[02:18] extremely full with lots of random
[02:19] things and, uh, just, uh, V coding all
[02:21] the time. Uh, so, uh, yeah, that kind of
[02:23] happened in December, I would say, and I
[02:25] was looking at the repercussions of that
[02:26] since.
[02:28] >> Um, you've talked a lot about this idea
[02:30] of LLMs as a new computer. um that it
[02:33] isn't just better software, it's a whole
[02:35] new computing paradigm. And um software
[02:38] 1.0 was explicit rules, software 2.0 was
[02:41] learned weights, software 3.0 is this.
[02:43] Um if that's actually true, what does a
[02:46] team build differently the day they
[02:48] actually believe this,
[02:50] >> right? So uh yeah, exactly. So software
[02:53] 1.0, I'm writing code, software 2.0, I'm
[02:56] actually programming by creating data
[02:57] sets and training uh training neural
[02:59] networks. So the programming is kind of
[03:01] like arranging data sets and maybe some
[03:02] objectives and neural network
[03:03] architectures. And then what happened is
[03:05] that basically if you train one of these
[03:07] GPT models or LLMs on a sufficiently
[03:09] large set of tasks implicit basically um
[03:12] implicitly because by training on the
[03:14] internet you have to multitask all the
[03:15] things that are in the data set. Uh
[03:17] these actually become kind of like a
[03:18] programmable computer in a certain
[03:20] sense. So software 3.0 know is kind of
[03:21] about uh you know your programming now
[03:24] turns to prompting and what's in the
[03:25] context window is your lever over the
[03:28] interpreter that is the LLM that is kind
[03:30] of like interpreting your context and uh
[03:32] performing computation in the dig
[03:34] digital information space. So I guess um
[03:37] yeah that's kind of the transition and I
[03:39] think there's a few examples of that
[03:41] really drove it home for me and maybe
[03:42] that might be instructive. Uh so for
[03:44] example when you when openclaw came out
[03:48] when you want to install openclaw you
[03:49] would expect that normally this is a
[03:50] bash bash script like a shell script. So
[03:52] run the shell script to run to install
[03:54] open claw. Um but the thing is that in
[03:57] order to target lots of different
[03:58] platforms and lots of different types of
[04:00] computers you might run an open claw.
[04:01] This these shell scripts usually balloon
[04:03] up and become extremely complex. But the
[04:05] thing is you're still stuck in a
[04:06] software 1.0 universe of wanting to
[04:07] write the code. And actually the open
[04:09] claw installation is a is a copy paste
[04:12] of a b bunch of text that you're
[04:13] supposed to give to your agent. Uh so
[04:15] basically it's it's a little skill of uh
[04:18] you know copy paste this and give it to
[04:19] your agent and it will install open
[04:20] claw. And the reason this is a lot more
[04:22] powerful is you're working now in the
[04:23] software 3.0 paradigm where you don't
[04:25] have to precisely spell out you know all
[04:27] the individual details of that setup.
[04:29] The agent has its own intelligence that
[04:30] it packages up and then it kind of like
[04:32] follows the instructions and it looks at
[04:34] your environment, your computer and it
[04:36] kind of like performs intelligent
[04:37] actions to make things work and it
[04:38] debugs things in the loop and it's just
[04:40] like so much more powerful, right? So I
[04:42] think that's a very different kind of
[04:44] like way of thinking about it is just
[04:46] like what is the piece of text to copy
[04:47] paste to your agent? That's the
[04:48] programming paradigm. Now I think one
[04:50] more maybe uh example that comes to mind
[04:52] that is even more extreme than that is
[04:54] when I was building um menu genen. So,
[04:56] menu genen is this idea where you um you
[05:00] come to a restaurant, they give you a
[05:01] menu. There's no pictures usually. So, I
[05:03] don't know what any of these things are
[05:05] uh usually like 30% of the things I have
[05:07] no idea what they are, 50%. So, I wanted
[05:09] to take a photo of the restaurant menu
[05:12] and to get pictures of what those things
[05:13] might look like in a generic sense. And
[05:16] so I built I've vcoded this app that
[05:18] basically lets you upload a photo and it
[05:20] does all this stuff and it runs on
[05:21] Verscell and uh it basically rerenders
[05:24] the menu and it gives you like all the
[05:26] items and it gives you a picture that it
[05:28] uses an image um you know generator uh
[05:31] for to basically OCR all the different
[05:33] titles uh use the image generator to get
[05:35] pictures of them and then shows it to
[05:37] you. And then I saw the software 3.0
[05:39] version of this which is which blew my
[05:41] mind which is literally just take your
[05:43] photo give it to Gemini and say use
[05:46] Nanobanana to overlay the the things
[05:48] onto the menu. Uh and Nanabanana
[05:51] basically returned an image that is
[05:52] exactly the picture of the menu that I
[05:54] took but it actually put into the pixels
[05:56] it rendered the different things in the
[05:58] menu and this blew my mind because
[06:02] actually all of my menu gen is spirious.
[06:04] It's working in the old paradigm that
[06:06] app shouldn't exist. uh and uh yeah the
[06:09] software 3.0 paradigm is a lot more kind
[06:11] of raw. It just um your neural network
[06:14] is doing more and more of the work and
[06:15] your prompt or context is just the image
[06:18] and the output is an image and there's
[06:19] no need to have any of the app in
[06:21] between. Um so I think that people have
[06:24] to kind of like reframe you know not to
[06:27] work in existing paradigm of what things
[06:30] existed and just think about it as a
[06:31] speed up of what exists. It's actually
[06:33] like new things are available now. And
[06:36] going back to your programming question,
[06:37] it's not even I think that's also an
[06:38] example of working in the in the old
[06:40] mindset because it's not just about
[06:41] programming and programming becoming
[06:42] faster. This is more general information
[06:44] processing that is automatable now. So
[06:47] um it's not just even about code. So
[06:49] previous code worked over kind of like
[06:51] structured data, right? And uh you write
[06:53] code over structured data. But like for
[06:55] example with my LLM knowledge basis
[06:56] project um basically you get LLMs to
[06:59] create wikis for your organization or
[07:01] for you in person etc. This is not even
[07:03] a program. This is not something that
[07:04] could exist before because there was no
[07:06] there was no code that would create a
[07:08] knowledge base based on a bunch of
[07:09] facts. But now you can just take these
[07:10] documents and uh basically uh recompile
[07:14] them in a different way and uh reorder
[07:15] them and create something that is uh new
[07:17] and interesting uh as a reframing of the
[07:19] data. And so these are new things that
[07:22] weren't possible. Uh and so I think this
[07:24] is uh something that I keep trying to
[07:26] get back to as to not only what can we
[07:29] do that existed that is faster now but I
[07:31] think there's new opportunities of just
[07:33] things that couldn't be possible before
[07:35] and I almost think that that's more
[07:36] exciting.
[07:37] >> I love the menu genen progression and
[07:40] dichotomy that you laid out and I think
[07:41] even I'm sure many folks here followed
[07:43] your own progression of programming from
[07:45] last October to early January February
[07:48] this year. Um, if you extrapolate that
[07:51] further, what is the 2026 equivalent um,
[07:54] for building websites in the '9s,
[07:56] building mobile apps in the 2010s,
[07:59] building SAS um, in the last cloud era,
[08:02] what will look completely obvious in
[08:04] hindsight that is still mostly unbuilt
[08:06] today?
[08:08] >> Um, [clears throat] well, going with the
[08:10] example of menu, I guess, uh, so a lot
[08:12] of this code shouldn't exist and it's
[08:13] just neural network doing most of the
[08:15] work. Um I do think that the
[08:17] extrapolation looks very weird because
[08:19] you could basically imagine
[08:21] I don't I yeah so you could imagine
[08:23] completely neural computers in a certain
[08:25] sense you feed raw videos like imagine a
[08:28] device you takes raw videos or audio
[08:30] into basically what's a neural net and
[08:32] uh uses diffusion to render a UI that is
[08:35] kind of like you know unique for that
[08:37] moment in a certain sense and um I kind
[08:40] of feel like in the early days of
[08:42] computing actually people were a little
[08:43] bit confused as to whether computers
[08:45] would look like calculators or computers
[08:46] would look like neural nets and in 50s
[08:48] and 60s it was not really obvious which
[08:50] way would go and of course we went down
[08:52] the calculator path and ended up
[08:53] building classical computing and then
[08:55] neural nets are currently running
[08:56] virtualized on existing computers but
[08:58] you could imagine I think that uh a lot
[09:00] of this will flip and that the neural
[09:01] net becomes kind of like the host
[09:02] process and uh the CPUs become kind of
[09:05] like the co-processor so we saw the
[09:07] diagram of you know intelligence compute
[09:09] is going to of neural networks is going
[09:10] to take over and become the dominant
[09:12] spend of flops so you could imagine
[09:14] something really weird and foreign when
[09:17] where neural nets are doing most of the
[09:18] heavy lifting. They're using tool use as
[09:20] this like you know um historical
[09:22] appendage for some kinds of like
[09:24] deterministic tasks. Uh but what's
[09:25] really running the show is these uh
[09:27] neural nets that are in a certain way.
[09:29] Um so you can imagine something
[09:31] extremely foreign as the extrapolation
[09:33] but I think we're going to probably get
[09:34] there uh sort of piece by piece. Um and
[09:36] I don't yeah that that progression is
[09:39] TBD I would say.
[09:40] >> [snorts]
[09:41] >> I'd like to talk a little bit about um
[09:43] uh this concept of verifiability, the
[09:45] fact that AI will automate faster and
[09:47] more easily domains where the output can
[09:49] be verified. Um if that framework is
[09:52] right, what work is about to move much
[09:54] faster than people realize and what
[09:56] professions do we have that people
[09:58] actually think are safe but that are
[10:00] actually highly verifiable?
[10:02] Uh yes. So I I spent uh some time
[10:05] writing about verifiability and um
[10:07] basically like traditional computers can
[10:09] easily automate what you can specify in
[10:12] code and uh kind of this latest round of
[10:14] LLMs can easily automate what you can uh
[10:16] verify in a certain in a certain sense
[10:19] because the way this works is that when
[10:20] frontier labs are training these LLMs
[10:22] these are giant reinforcement learning
[10:24] environments. So they are given
[10:25] verification rewards and then because of
[10:28] the way that these models are trained
[10:29] they end up basically uh progressing and
[10:32] creating these like jagged entities that
[10:34] really peak in capability in kind of
[10:36] like verifiable domains like math and
[10:37] code and adjacent and kind of like
[10:39] stagnate and are a little bit um you
[10:41] know rough around the edges when uh
[10:43] things are not kind of like in that in
[10:44] that space. So I think the reason I
[10:46] wrote about verifiability is I'm trying
[10:47] to understand why these things are so
[10:49] jagged. Um and some of it has to do with
[10:52] how the labs train the models but I
[10:54] think some of it also has to do with um
[10:55] the focus of the labs and what they
[10:57] happen to put into the data
[10:58] distribution. Uh because some things
[11:00] basically are significantly more
[11:01] valuable in economy and end up creating
[11:03] more environments because the labs
[11:05] wanted to work in those settings. So I
[11:06] think code is a good example of that.
[11:08] There's probably lots of verifiable
[11:09] environments they could think about that
[11:10] happen not to make it into the mix
[11:12] because they're just not that useful to
[11:13] have the capability around. Um, but I
[11:15] think to me the big um I guess like the
[11:18] big mystery is uh the favorite example
[11:21] for a while was that how many letters
[11:22] are are in a strawberry and the models
[11:24] would famously get this wrong and it's
[11:26] an example of jaggedness. Uh the models
[11:27] now patch this I think but the new one
[11:29] is I want to go to a car wash to wash my
[11:32] car and it's 50 meters away. Should I
[11:34] drive or should I walk? And
[11:36] state-of-the-art models today will tell
[11:38] you to walk because it's so close. How
[11:40] is it possible that state-of-the-art
[11:42] Opus 4.7 will simultaneously refactor a
[11:46] 100,000 like [laughter] codebase line
[11:48] codebase or find zero day
[11:50] vulnerabilities and yet tells me to walk
[11:52] to this car wash? This is insane. And to
[11:56] whatever extent these uh models are
[11:58] remain jagged, it's an indication that
[12:01] number one maybe something's slightly
[12:02] off or um number two you need to
[12:05] actually be in the loop a little bit and
[12:07] you need to treat them as tools and you
[12:09] do have to kind of stay in touch with
[12:11] what they're doing. And so I think all
[12:12] of my writing long story short about
[12:14] verifiability is just trying to
[12:16] understand um why these things are
[12:18] jacked. Is there any pattern to it? And
[12:20] I think it's some kind of a combination
[12:22] of verifiable plus labs care. Maybe one
[12:25] more anecdote that is instructive is uh
[12:28] from GPT 3.5 to GPT4 people noticed that
[12:31] chess improved a lot and I think a lot
[12:33] of people thought oh well it's just a
[12:34] progression of the capabilities but
[12:36] actually it's it's more that uh I think
[12:38] this is public information I think I saw
[12:39] it on the internet um a huge amount of
[12:41] like um data of chess made it into the
[12:43] pre-training set and just because it's
[12:46] in a data distribution uh basically the
[12:48] model improved a lot more than it would
[12:50] just by default. So someone at OpenAI
[12:53] decided to add this data and now you
[12:55] have a capability that just peaked a lot
[12:56] more. And so that's why I think I'm
[12:58] stressing this um dimension of it as we
[13:01] are slightly at the mercy of whatever
[13:03] the labs are doing, whatever they happen
[13:04] to put into the mix. And you have to
[13:06] actually explore this thing that they
[13:08] give you that has no manual. And it
[13:10] works in certain settings, but maybe not
[13:11] in some settings. And you have to kind
[13:13] of um explore it a little bit. And uh if
[13:16] you're in the circuits that were part of
[13:17] the RL, you fly. And if you're in the
[13:19] circuits that are out of the data
[13:21] distribution, uh you're going to
[13:22] struggle and you have to kind of figure
[13:24] out which which circuits you're in in
[13:26] your application. And if you and if
[13:28] you're not in the circuits, then you
[13:29] have to really look at fine-tuning and
[13:30] doing some of your own work because it's
[13:32] not going to necessarily come out of the
[13:34] LLM out of the box.
[13:36] >> I'd love to come back to the concept of
[13:38] jagged intelligence in a little bit. Um,
[13:40] if you are a founder today and thinking
[13:42] about building a company, you are trying
[13:44] to solve a problem that you think is
[13:46] tractable, something that uh is a domain
[13:49] that is verifiable, but you look around
[13:51] and you think, "Oh my gosh, well, the
[13:53] labs have really really started uh
[13:56] getting to escape velocity in the ones
[13:58] that seem most obvious, math, coding,
[14:00] and others." What would your advice be
[14:02] to to the founders in the audience?
[14:05] Um
[14:08] so I think maybe that comes to the
[14:10] previous question of I do think that
[14:12] verifiability because it um let me
[14:14] think. So verifiability makes something
[14:17] tractable in the current paradigm
[14:18] because you can throw a huge amount of
[14:20] RL at it. Um so maybe one way to see it
[14:24] is that uh that remains true even if the
[14:26] labs are not focusing on it directly. So
[14:28] if you are in a verifiable setting where
[14:30] you could create these RL environments
[14:31] or examples then that actually sets you
[14:34] up to potentially do your own fine
[14:35] tuning and you might benefit from that.
[14:36] But that is fundamentally technology
[14:38] that just works. You can pull a lever if
[14:39] you have huge amount of diverse data
[14:41] sets of RL environments etc. Uh you can
[14:43] use your favorite fine-tuning framework
[14:44] and um and uh pull the lever and get
[14:47] something that actually uh works pretty
[14:49] well. So um I don't know what the
[14:51] examples of this might be. Um, but I do
[14:54] think there are some very valuable uh
[14:56] reinforcement learning environments that
[14:58] people could think of that I think are
[14:59] not part of the Yeah, I don't want to
[15:01] give away the answer, but there is one
[15:02] domain that I think is very uh Oh, okay.
[15:04] Sorry, I don't mean to vape post on on
[15:06] the stage, but there are some examples
[15:08] of this.
[15:09] >> On the flip side, what do you think
[15:11] still feels automatable only from a
[15:13] distance?
[15:14] >> I do think that ultimately almost
[15:17] everything can be made uh verifiable to
[15:19] some extent. some things easier than
[15:21] others. Um because even for like things
[15:23] like writing or so on, you can imagine
[15:25] having a council of LLM judges and
[15:27] probably get get to some get get
[15:29] something uh reasonable out of the um
[15:31] from from this kind of an approach. So
[15:33] it's more about what's easy or hard. Um
[15:36] so I I do think that ultimately um uh
[15:40] yeah, I think uh
[15:42] >> everything [laughter]
[15:43] >> everything is automatable.
[15:45] >> Amazing. Okay. Um, so last year you
[15:47] coined the term vibe coding and today
[15:49] we're in a world that feels a little bit
[15:50] more serious, more regent engineering.
[15:52] What do you think is the difference
[15:54] between the two and what would you
[15:55] actually call what we're in today?
[15:57] >> Uh, yeah. So I would say vibe coding is
[15:59] about raising the floor for everyone in
[16:01] terms of what they can do in software.
[16:03] So the floor rises, everyone can vibe
[16:05] code anything and that's amazing,
[16:06] incredible. But then I would say agentic
[16:08] engineering is about preserving the
[16:10] quality bar of what existed before in
[16:11] professional software. So you're not
[16:13] allowed to introduce vulnerabilities due
[16:15] to VIP coding. Um you are um you're
[16:18] still responsible for your software just
[16:20] as before, but can you go faster? And
[16:22] spoiler is you can but how do you how do
[16:24] you do that properly? And so to me
[16:26] agentic engineering when I call it that
[16:28] because I do think it's kind of like an
[16:29] engineering discipline. You have these
[16:31] agents which are these like spiky
[16:32] entities. They're a bit fable, a little
[16:33] bit stocastic, but they are extremely
[16:35] powerful. is how do you how do you
[16:37] coordinate them to go faster without
[16:39] sacrificing your quality bar and doing
[16:42] that well and correctly um is the the
[16:46] realm of agentic engineering um so I
[16:48] kind of see them as as different like
[16:50] one is about maybe raising the raising
[16:51] the floor and the other is about um you
[16:53] know extrapolating and what I'm seeing I
[16:55] think is there is a very high ceiling on
[16:58] agentic engineer uh capability and you
[17:01] know people used to talk about the 10x
[17:02] engineer previously I think that this is
[17:04] magnified a lot more 10x is uh is not uh
[17:08] the speed up you gain. Um and I think uh
[17:11] it does seem to me like people who are
[17:13] very good at this um peak a lot more
[17:16] than 10x uh from from my perspective
[17:18] right now.
[17:18] >> I really like that framing. Um one thing
[17:21] that when Sam Alman came to AIN last
[17:23] year, one memorable thing he said was
[17:25] that people of different generations use
[17:27] chatpt differently. So if you're in your
[17:29] 30s, you use it as a Google search
[17:31] replacement. But if you're in your
[17:32] teens, tragic is your gateway to the
[17:35] internet. What is the parallel here in
[17:37] coding today? If we were to watch two
[17:39] people code using OpenClaw, Claude Code,
[17:42] Codeex, one you'd consider mediocre at
[17:45] it and one you would consider fully AI
[17:47] native. How would you describe the
[17:49] difference?
[17:51] >> [clears throat]
[17:51] >> I mean I think it's a just trying to get
[17:53] the most out of the tools that are
[17:55] available utilizing all of their
[17:56] features investing into your own um kind
[17:59] of setup. Uh so just like previously all
[18:02] the engineers are used to basically
[18:03] getting the most out of the tools you
[18:04] use either it's vim or v code or now
[18:06] it's you know cloth code or codec or so
[18:09] on. So um just investing into your setup
[18:13] um and um utilizing a lot of the you
[18:16] know uh tools that are available to you.
[18:18] Um and I think it just kind of looks
[18:20] like that. I do think that um maybe
[18:23] related thought is um a lot of people
[18:26] are maybe hiring um for this right
[18:29] because they want to hire strong agentic
[18:31] engineers. I do think that um what I'm
[18:34] seeing is that uh the you know most
[18:37] people have still not refactored their
[18:39] um their hiring process for a gentic
[18:41] engineer capability right like if you're
[18:44] giving out puzzles to solve and this is
[18:46] still the old paradigm I would say that
[18:48] hiring have to has to look like give me
[18:50] a really big project and see someone
[18:52] implement that big project like let's
[18:53] write say a Twitter clone uh for agents
[18:57] and then uh make it really good make it
[18:59] really secure and then have some agents
[19:01] uh simulate some activity uh on this
[19:03] Twitter and then I'm going to use 10
[19:06] codecs 5.4x for X high to try to break
[19:09] your break your um uh this website that
[19:12] you deployed and they're going to try to
[19:15] basically break it and they should not
[19:16] be able to break it. And so maybe it
[19:18] looks like that, right? And so yeah,
[19:20] watching people in that that setting and
[19:21] building bigger uh projects and uh
[19:25] utilize utilizing the tooling is maybe
[19:26] what I would uh look at for the most
[19:28] part.
[19:29] >> And as agents do more, what human skill
[19:31] do you think becomes more valuable, not
[19:33] less?
[19:34] >> Uh so um yeah, it's a good question. I
[19:37] think um well right now the answer is
[19:39] that the agents are kind of like these
[19:40] intern entities right so it's remarkable
[19:44] um you basically still have to be in
[19:46] charge of the aesthetics the the
[19:48] judgment the taste and a little bit of
[19:50] oversight maybe one one of my favorite
[19:52] examples of like the the weirdness of
[19:54] agents is um for menu genen uh you sign
[19:57] up with a Google Google account but you
[20:00] um purchase credits using a stripe
[20:02] account and both of them have email
[20:04] addresses and my agent actually tried to
[20:06] basically
[20:08] um like when you purchase credits, it
[20:10] assigned it using the email address from
[20:13] Stripe to the Google email address like
[20:15] there wasn't a persistent user ID that
[20:18] that uh for people it was trying to
[20:20] match up the email addresses, but you
[20:21] could use different email address for
[20:22] your Stripe and your Google and
[20:24] basically would not associate the funds.
[20:26] And so this is the kind of thing that
[20:28] these agents still will make mistakes
[20:29] about is like why would you use email
[20:31] addresses to try to crossorrelate the
[20:33] funds? They can be arbitrary. You can
[20:34] use different emails, etc. Like this is
[20:36] such a weird thing to do. So I think
[20:39] people have to be in charge of this
[20:40] spec, this plan. And um I actually don't
[20:43] even like the plan mode. I I would I
[20:46] mean obviously it's very useful, but I
[20:47] think there's something more general
[20:48] here where you have to work with your
[20:49] agent to design a spec that is very
[20:51] detailed and maybe it's uh maybe
[20:53] basically the docs and then get the
[20:55] agents to write them and you're in
[20:56] charge of the oversight and the top
[20:58] level categories, but the agents are
[21:00] doing a lot of the under the hood. And
[21:02] um so I think you're not caring about
[21:04] some of the details. So as an example
[21:05] also with um arrays or tensors in neural
[21:09] networks. Um there's a ton of details
[21:11] between PyTorch and NumPy and all the
[21:13] different like pandas and so on for all
[21:14] the different little API details. And I
[21:17] I already forgot about the keep dims
[21:18] versus keep dim or whether it's dim or
[21:20] axis or reshape or permute or transpose.
[21:22] I don't remember this stuff anymore,
[21:24] right? Because you don't have to. This
[21:25] is the kind of details that are handled
[21:26] by the intern because they have very
[21:28] good recall and but you still have to
[21:30] know for example that um you know
[21:32] there's underlying tensor there's an
[21:33] underlying view and then you can
[21:35] manipulate view of the same storage or
[21:37] you can have different storage which
[21:38] would be less efficient and so you still
[21:40] have to have an understanding of what
[21:41] this stuff is doing and some of the
[21:43] fundamentals um so that you're not
[21:45] copying memory around unnecessarily and
[21:47] so on but uh the details of the APIs are
[21:50] now handed off so it um you're in charge
[21:53] of the taste the engineering the design
[21:55] um and that it makes sense and that
[21:57] you're asking for the right things and
[21:58] that you're saying that okay that these
[21:59] have to be unique user IDs that we're
[22:01] going to tie everything to um and so
[22:03] you're doing some of the design and
[22:06] development and the engineers are doing
[22:07] the fill in the blanks and that's
[22:08] currently kind of like where we are and
[22:10] I think that's what everyone of course
[22:11] is seeing I think right now
[22:13] >> do you think there's a chance that this
[22:15] um taste and judgment matters less over
[22:18] time or will the ceiling just keep
[22:20] rising
[22:21] >> um yeah it's a good question I would
[22:22] Okay.
[22:25] Um, I mean, I'm hoping that the that it
[22:28] improves. I think probably the reason it
[22:30] doesn't improve right now is again, it's
[22:31] not part of the RL. There's probably no
[22:33] aesthetics cost or reward or it's not
[22:36] good enough or something like that. Um,
[22:39] I do think that when you actually look
[22:41] at the code, sometimes I get a little
[22:42] bit of a heart attack because it's not
[22:44] like super amazing code necessarily all
[22:46] the time and it's very bloaty and
[22:47] there's a lot of copy paste and there's
[22:48] awkward abstractions that are brittle
[22:50] and like it works but it's just really
[22:52] gross. Um, and I do I do hope that this
[22:55] can improve in future models. Um, a good
[22:57] example also is this uh you know micro
[22:59] GPT project which where I was trying to
[23:02] simplify uh LLM training to be as simple
[23:04] as possible. The models hate this. They
[23:06] can't do it. I tried to I keep I kept
[23:08] trying to prompt an LLM to simplify more
[23:10] simplify more and it just can't you feel
[23:13] like you're outside of the RL circuits.
[23:15] It feels like you're obviously you know
[23:18] you're pulling teeth. It's not like
[23:20] light speed. So I think um I do think
[23:23] that people are still remain in charge
[23:25] of this. But I do think that there's
[23:26] nothing fundamental again that's
[23:27] preventing it. It's just the labs
[23:28] haven't done it yet almost.
[23:30] >> Yeah.
[23:31] >> So I'd love to come back to this idea of
[23:33] uh jagged forms of intelligence. you
[23:36] wrote a little bit about this with a
[23:38] very thoughtprovoking piece around
[23:39] animals versus ghosts. Um, and the idea
[23:42] is that we're not building animals, we
[23:44] are summoning ghosts. Um, and these are
[23:46] jagged forms of intelligence that are
[23:48] shaped by data and reward functions, but
[23:51] not by intrinsic motivation or fun or
[23:54] curiosity or empowerment. Uh, things
[23:57] that kind of came about via evolution.
[24:00] um why does that framing matter and what
[24:02] does it actually change about how you
[24:04] build and deploy and evaluate or even
[24:07] trust them?
[24:08] >> Uh yeah, so yeah, I think the reason I
[24:12] wrote about this is because I'm trying
[24:13] to wrap my head around what these things
[24:15] are, right? Because if you have a good
[24:16] model of what they are or are not, then
[24:18] you're going to be more competent at uh
[24:20] using them. Um and I do think that um I
[24:23] don't know if it has I'm not sure if it
[24:25] actually has like real power. [laughter]
[24:28] I think it's a little bit of
[24:29] philosophizing. Um, but I do think that
[24:33] um
[24:34] I think it's just um coming to terms
[24:36] with the fact that these things are not,
[24:38] you know, animal intelligences. Like if
[24:40] you yell at them, they're not going to
[24:41] work better or worse or it doesn't have
[24:43] any impact. Um, and uh it's all just
[24:46] kind of like these statistical
[24:48] simulation circuits where the the
[24:50] substrate is pre-training so like
[24:53] statistics and then but then there's RL
[24:55] bolting on top. So, it kind of like
[24:57] increases the dispendages and um maybe
[25:00] it's just kind of like a mindset of what
[25:02] I'm coming into or what's likely to work
[25:04] or not likely to work or how to modify
[25:05] it. But I don't actually I don't know
[25:07] that I have like here are the five
[25:09] obvious outcomes of how to make your
[25:11] system better. It's more just being
[25:12] suspicious of it and um
[25:14] >> figuring out over time.
[25:16] >> That's where it starts. Um okay, so you
[25:18] are so deep in working with agents that
[25:20] don't just chat. They have um real
[25:22] permissions. They have local context.
[25:24] they actually take action on your be
[25:26] your behalf. What does the world look
[25:28] like when we all start to live in that
[25:30] world?
[25:31] >> Uh yeah, I think I think every a lot of
[25:34] people probably here are excited about
[25:35] what this agent uh you know native
[25:38] agentic environment looks like and
[25:40] everything has to be rewritten.
[25:41] Everything is still fundamentally
[25:42] written for humans and has to be moved
[25:44] around. I still use most of the time
[25:46] when I use uh different frameworks or
[25:48] libraries or things like that, they
[25:49] still have docs that are fundamentally
[25:51] written for humans. This is my favorite
[25:53] pet peeve. Like I don't uh why are
[25:55] people still telling me what to do? Like
[25:57] I don't want to do anything. What is the
[25:58] thing I should copy paste to my agent?
[26:00] [laughter] Like uh so it's just um every
[26:02] time I'm told, you know, go to this URL
[26:04] or something like that, it's just like
[26:06] ah [laughter]
[26:07] you know. [snorts] So um everyone is I
[26:10] think excited about how do we decompose
[26:12] the workloads that need to happen into
[26:14] fundamentally sensors over the world,
[26:16] actuators over the world. How do we make
[26:18] it agent native? Uh basically describe
[26:20] it to agents first. um and then have a
[26:23] lot of automation around um you know the
[26:27] um yeah around data structures that are
[26:30] very legible to the LLMs. Uh so I think
[26:32] um yeah I'm hoping that there's a lot of
[26:34] agent first um infrastructure out there
[26:36] and that you know for Menuguen famously
[26:39] when I wrote the uh not I'm not sure how
[26:40] famously but when I wrote the blog post
[26:42] about Menuguen [laughter]
[26:44] um a lot of the work a lot of the
[26:46] trouble was not even writing the code
[26:47] for Menugen it was deploying it in
[26:48] versell because I had to work with all
[26:50] these different services and I had to
[26:51] string them up and I had to go to their
[26:52] settings and the menus and you know
[26:54] configure my DNS and it was just so
[26:56] annoying and so that's a good example of
[26:59] I would hope that menu gen that I could
[27:01] give a prompt to an LLM build menu genen
[27:04] and then I didn't have to touch anything
[27:05] and it's deployed in that same way on
[27:07] the internet. Uh I think that would be a
[27:09] good kind of a test for whether or not
[27:12] uh a lot of our infrastructure is
[27:13] becoming more and more agent native. And
[27:14] then ultimately I would say yeah I I do
[27:17] think we're going towards a world where
[27:19] um there's agent representation for
[27:21] people and for organizations and um you
[27:25] know I'll have my agent talk to your
[27:26] agent uh to figure out some of the
[27:28] details of our meetings or or things
[27:30] like that. So, [laughter]
[27:33] um I do think that that's uh roughly
[27:34] where things are going, but um yeah, I
[27:36] think everyone here is excited about
[27:37] that.
[27:38] >> I really like the visual analogy of
[27:40] sensors and actuators. I actually hadn't
[27:41] thought of that. That's super
[27:42] interesting,
[27:43] >> right?
[27:43] >> Um okay, I think we have to end on a
[27:45] question about education. Um because you
[27:47] are probably one of the very best in the
[27:49] world at making complex technical
[27:51] concepts simple and deeply thoughtful
[27:53] about how we design education around it.
[27:56] Um, what still remains worth learning
[27:59] deeply when intelligence gets cheap as
[28:02] we move into the next a era of AI?
[28:05] >> Yeah. Uh, there was a tweet that blew my
[28:07] mind recently and I keep thinking about
[28:09] it like every other day. It was
[28:10] something along the lines of um, you can
[28:12] outsource your thinking but you can't
[28:14] outsource your understanding.
[28:16] And um,
[28:17] >> I think that's really nicely put. I so
[28:21] yeah because I still I'm still part of
[28:23] the system and I still I still have to
[28:25] somehow information still has to make it
[28:26] into my brain and I feel like I'm
[28:27] becoming a bottleneck of just even
[28:29] knowing what are we trying to build why
[28:30] is it worth doing uh how do I direct you
[28:32] know how do I direct my my agents and so
[28:34] on so I do still think that ultimately
[28:37] something has to direct the thinking and
[28:39] the processing and so on and um that's
[28:43] still kind of fundamentally constrained
[28:44] somehow by understanding and this is one
[28:46] reason I also was very excited about all
[28:47] the LM knowledge bases because I feel
[28:49] like that's that's a way for me to
[28:51] process information and anytime I see a
[28:53] different projection onto information. I
[28:54] always like feel like I gain insight. So
[28:56] it's really just a lot of prompts for me
[28:58] to do synthetic data generation kind of
[29:00] over over some fixed data. Uh so I I
[29:03] really enjoy uh whenever I read an
[29:05] article I have my uh you know my wiki
[29:06] that's being built up from these
[29:07] articles and I love asking questions
[29:09] about things or um and I I think that
[29:12] ultimately these are tools to enhance
[29:15] understanding in a certain way and this
[29:17] is still kind of like a bit of a
[29:18] bottleneck because then you can't direct
[29:20] the you can't be a good director if you
[29:22] still uh because the LM certainly don't
[29:25] excel at understanding you still are
[29:26] uniquely in charge of that. So, uh,
[29:28] yeah, I think, uh, tools to that effect,
[29:31] I think are incredibly interesting and
[29:32] exciting.
[29:33] >> I'm excited to be back here in a couple
[29:34] years and to see if we've been fully
[29:36] automated out of the loop and they
[29:38] actually take care of understanding as
[29:40] well. Uh, thank you so much for joining
[29:41] us, Andre. We really appreciate it.
[29:42] [applause]