Structured Agentic Software Engineering Round 2

Transcription provided by Huntsville AI Transcribe

Quick thing, we do this every other week. And then in the middle of one of those, we’ll do a paper review, which will be next week.

And that’ll be all virtual.

I think that’s about it. Let’s hop over to Josh and get into it. All right, so today’s going to be kind of a, oh, OK. All right, kind of a different session than we’ve done in the past. So we’ve been talking about a lot of software engineering, sort of papers and things in our past paper review series. Generally in there, we’re going very, very in depth, going into the theory, theoretical systems that might be there. And I kind of wanted to take some time to kind of show some of the things that we’ve been talking about in that series.

Some of the things we’ve talked through is things like, How do you deal with identity, with agents, define what are the things that they’re capable of?

What’s their reputation? Are they able to be trusted? Where do they come from?

All that sort of gorge. So we have some stuff in here about that. And we’re really looking at the golden corral of sort of things today. And so this should be something where we’re talking a lot.

The questions that you have, we can kind of work through different parts of the problem set.

So one of those things is going to be the agentic identity stuff with agent to agent. That’s what we have implemented inside of here. Then we also are going to cover some stuff related to memory. So there’s lots of things that you can do with memory. I’ve got a pretty detailed thing in this repo where I’ve basically started this repo this weekend, you know, put a whole bunch of stuff in it and every single one of my specs I saved it to get. and as I kind of iterate it on those, I have those things kind of evolving over time and show how I’m using it to kind of create a database of the design decisions that happen with the repository. Let me kind of look at some of the stuff that I did there. I can talk through why I did the things I did.

If you’ve done things, we can talk about things that you think might be helpful.

That’s kind of a vital thing for us. That’s one thing. That’s our September paper.

We’re also going to talk a little bit about some sort of like MCP servers, playwright, we have some tools and stuff like that. There’s just kind of a whole gamut. We have lots of things we can talk through.

I also have stuff related to, I think we’ve gone back and forth from our Phoenix. So telemetry, open telemetry, I have that implemented in here as well. So you can kind of track costs and MLOps sort of stuff. But yeah, I’m just going to kind of go through and show the actual application that we have.

There is a GitHub repo for this, where I have pretty much just, oh, it helps if I don’t look up there.

Yeah, okay. One second, we’re gonna go stops, or I can remember how to use Zoom.

Okay, there we go. All right, so we have this thing called AI-inch Sandbox.

There’s a link in the Discord if you care to look at any of this stuff at some point.

And so part of this is this little act thing. This thing is really kind of messy. That was the intent. Nice. software environments, everything’s perfectly set up is not kind of what we’re working with generally.

So I make sure to put in things that were conflicting advice, things that would trip the agent up, different stuff like that.

So we kind of have this sandbox here for us to go through.

We have a whole big thing of files here where we’re going to talk a lot tonight about cloud code.

That’s going to be our main driver for a lot of stuff. We also have codecs. And so I’m going to show kind of some of the features that the cloud code has available and how you kind of use that with their slash commands, but also how it doesn’t really matter.

I can just use codex with this framework and still get the most of the use cases.

So the cloud code stuff is just nice and quick.

We’re going to talk a little bit about how to build up a sort of… reference layer for your models so that they can stop kind of hallucinating what these different things are and kind of use your tools that are available with these things to go fetch stuff to build documentation over time that you can then reference so that you can easily kind of clear things out and start with fresh and you know, do multiple things.

So we’ll talk a little bit about that. And then Yeah, so much stuff.

So looking here, what we got, I saw, I have this little thing that kind of goes through all those files I talked about. And then also meeting is being ported things.

Okay, this little agent thing where we have the ability to ask it some stuff, you know, normal LLM slot sort of thing. So it does MSE.

So we kind of got some somewhat complex stuff going on here.

and then sort of our Phoenix thing. And just to give you kind of a quick view of what this looks like, this is basically something where we can open it up and it’s doing a run right now.

And I can go in and I can see exactly where it’s going in and running tools and doing different things.

I can see that here’s my system prompt that I have fed into the thing and then track things like costs over time.

You can add all sorts of different metadata to these things as much as you want, but I kind of accept the normal stuff. So yeah, that is the big area. So there’s open floor that we can talk about a sort of that, or we can jump in and maybe just try and add a feature to this thing.

What would you guys be interested in seeing?

And if you want to add a feature for some of the things you might be interested in.

Don’t everybody go at once.

Well, do you have it?

Why don’t you take a demo a little bit further?

Okay.

What can actually run right now? Kind of show it off and then we can kind of come up with a feature list based on what features you currently have.

Sure.

So I have the ability to kind of do these different prompts and that’s fine.

So inside of here, I have the ability to do a basic search that might work.

Let’s see, I could say maybe look for a rise.

Just doing a VM25 best match semantic search here to go fetch things out of here.

And so why don’t I go in and look at, so this Phoenix integration thing.

So I’m going to go through and try and click that. So that could be one thing is that we could fix that defect. All right, so yeah, I see what’s happened here. Okay, so we’ve gotten this thing, I have clicked on this sort of thing right here, and that’s great. Let’s do that. God, I’ve got this running thing.

Sure, well, let’s go ahead and let it do that.

Okay, so I’m going to go ahead and pop up this, let me reset it real quick. And I’m gonna pop up the claw code through the command line. There’s lots of different ways you can do this.

This is how I’m gonna do it. And so one of the things here is that we found this defect.

And I’m going to use this slash command that I have called slash defect here. And so what this is, it gives you the ability whenever I put this right here, it’s just going to feed this thing straight into context. And as part of this command, I have some metadata that I have attached to it. This sort of stuff right up here is mostly for me, for me to keep it things organized, whenever I set the spec up.

There are things that Cloud Code itself does use. So like this allowed tool things actually is meaningful for this system and it will tell it what tool that has allowed for it.

But then it will then take all of this stuff, it rips this stuff out of here.

So that does go into its context. And what this is, is what’s basically called a meta prompt.

So it is a prompt that is going to make another prompt.

And that’s going to do it.

And I have to spec up kind of what my data defect is.

It’s basically an issue template for the agents.

And I said, you know, I’ve got this sort of format that I want you to do.

And it’s going to do that. And while it’s doing that, we can’t look through things. And so I’m going to say here, first, I’m going to check if it has the MCP server bill it does. All right. So I’m going to tell it to go do defect and use your. mcp server to go to this address.

And then click on, what do we click on?

Phoenix integration?

Yeah, Phoenix, this, actually, I’m just gonna skip ahead.

I don’t need to do that, I really do. Yeah, let’s get ahead. Play right. And I click on this and classify RCA and resolution for the bug.

All right, so we’re going to let it date that.

You can see here it’s got something right here called defect is running. It’s a right there playwright.

If you guys don’t know what playwright is, this is a browser automation engine that’s a way for it to pop stuff up and they can buy a headless or something else.

And so you can see there that it’s popped this thing up over here, and it’s now seeing the bug.

So now I don’t have to argue with it. It convinced it that this bug exists. I just cut out 30 minutes out of my life. I don’t have to do it. Because I’m giving it the tools to go ask the environment. The big thing is that when you’re doing this, you want to start thinking about what does your agentic layer look like for your environment? How am I reconfiguring my development environment so that I don’t have to use words to convince these things what is going on. And the nice thing about this, so it’s probably able to see what it’s going back in. So it’s right here, it’s taking a screenshot.

This is one of the biggest things with Playwright. It’s nice is that we’re accessing multiple modalities.

I’m just going to say I want you to be able to take a screenshot all the time.

So I want to have to say yes to this. That’s what we’re here for. And so it’s going to ask for a bunch of permissions and that’s Let’s network requests.

That’s fine. I don’t care. So it’s going to go through and start iterating on this thing and get some understanding about what the problem is. So stewing instead of thinking, is that something you did or is that a clue? No, that’s what I don’t. I think my favorite one is liberty-giveting or something like that.

Yeah, that’s just some cute stuff that they put in there. It bears no actual meaning on what it’s doing. Yeah, so you can see it’s just kind of going and letting loose. And while it’s doing that, let’s get rid of some of these old things. I like what you’ve done as far as making your own set of reference material because I lost a few hours a week or so ago because the What I’m trying to fix was using Google test that it found the web and found the version of Google test that’s online, which is about three years ahead of where I’m actually at.

And it keeps trying to put it back like, you know, to use something that’s, so if I wound up fixing it by telling you exactly what version, but if I had done this, it could make that a whole lot easier.

Yeah. So I can go kind of into that area of stuff right now too. And so we’ve got all of our references here.

You can see it’s talking about this whole bits and bobs here.

And I just kind of put them inside of this references folder under .cloud.

See there, it’s kind of finished out that spec. I’m going to show this real quick. And so what is it? We’re trying to access the observability arise Phoenix thing.

And so this is kind of one of what it has is like, you know, links out.

So it’s almost like a link tree where it’s giving a little bit of context.

But I’ve actually got another command that says, go read this reference and then go fetch these materials that are relevant to you.

So it’s just kind of thinking about chaining these sort of things together. And then you don’t have to convince it that GPT-5 exists because you have the documentation right there. OK, so now we can see here that it’s generated out of the spec file.

And this whole concept like specs, spec related engineering, there’s a whole bunch of different things that are doing this.

I think it’s a pretty good one as far as, you know, working with these tools.

I think there’s this thing called Be Man, the spec kit.

It’s all a bunch of fancy words that people are trying to do to make you give them money. But just write markdown files and save them and kind of keep up with them. There you go. That’s the whole thing.

And so it’s kind of, Navigate here, it’s putting all the information here, and that’s great. Just because of the time, I’m going to assume that it’s mostly right here. It’s a pretty easy problem. We’re going to talk about something very specific now that’s really, really useful for leveling up durability to work with these things and making them effective. That is, if we look right now, it’s just gone off and done a whole bunch of stuff. I’m going to look at the context of where it is.

Right now, it’s got a 200K token overall context.

that’s available to us.

It’s about halfway through. Now, are you guys familiar with the concept of context rot?

The idea that not all tokens are the same.

So the more tokens that you have, you know, it might be able to go up to a million tokens, but the more tokens that it are, it basically destroys its ability to attend to things.

Because it’s still just doing a sequence to sequence next token prediction.

The ability to effectively kind of make semantic connections as you add more things in, that decrees.

Now I might be able to do this thing called a needle and haystack and pick out like this one little fact, maybe, but the kind of that that synthesis of information gets lower fidelity.

And so part of the reason that we do this spec sort of shuffling stuff is that I can do this right now, which is I’m just going to clear it.

And now my context is empty.

Besides, let’s see, this little MCP sort of thing.

Let’s see, what is taking up all my space?

Oh, they got this little buffer.

Okay, well, that’s fine.

And so I’m now going to go into this implement sort of face.

And so this is a different prompt.

And this whole prompts idea is that it takes in my other problem.

And so I have this, what was it? I guess I should have looked at what that spec was. I think I was, I behaved and cleared this out. So you see, I’ve got an archive of all my old steps. We can talk about that a little bit later, kind of how you would use that.

I can just now go here, and I’m going to let it go into the plan mode just because it might use its plan sub-Asian to explore. I do this just because I want to, it doesn’t, sometimes it has these little sub-agents and sometimes it won’t go into it, it doesn’t think it needs it.

I want it to go in there just so we can see it right now. And so what it’s going to do is I have this implement sub-command, and this is kind of a second space.

And the other one was really long, had all this gorm about all this sort of stuff.

This one’s just like, hey, do that.

And after you do that, I want you to do two very important things, which is to run rough and my pie is basically my linters. So some sort of a static analysis tool to tell it, you know, are you able to build, you have a bunch of crack here that I don’t care about and kind of get that for the feedback loop.

So I’m not accumulating technical data.

And so let’s see.

So right now you can see here, it’s got a small plan thing going on.

and it’s going to go and explore the code base, and it’s going to do a bunch of things inside of here.

It’s doing that in what’s called sub-agent. So sub-agents are an interesting thing that’s kind of emerging, and think of them almost like they’re a context bubble, where it’s going to send this sort of stuff into the sub-agent, and it’s going to give it a prompt. So think about where we are in the inception land right now, is that I have given it a defect prompt that has created a spec prompt. And then I’ve given it an implement prompt that’s feeding that into that.

And now it is coming up with this sub agent prompt.

It’s feeding those three things into this, which is an entire bubble that’s sitting out there. And this thing is going to respond to the main agent, the orchestration agent, and then it’s going to present a plan to me and I’m going to accept or reject that plan. So your agent is prompting its own agent. It is.

That’s absolutely what it is right now.

So it is actually, it generates a prompt for this.

And that is defined in this, actually, I don’t have that prompt.

That’s a prompt.

So even the baby defined.

But it works. That’s the big thing is that because all this is, is that it, let’s see.

So it’s asking to look at tansec.

That makes sense.

Okay. All right. Are you able to see that?

the spec it wrote for the sub agent or the result that came from the sub agent.

That’s the one place I’ve struggled to come here so much. I don’t know what Claude holds on to, whatever those specs are, and you don’t have any sort of idea of what’s gone into it, anything that’s come out of it. Even to your own agents, you can’t. Right? Like not just for the fee. No, yeah. It’s something with cloud code specifically. Now sometimes I put a client or something like that in hand, but so there’s no cheat bypass.

I’ve also output it to you.

You can do that. I’ve also done stuff where you can hook up, you know, we’re talking about Phoenix. So you can hook up Hotel. which is the open telemetry so I can hook that up and make it sort of spit stuff out too. They really don’t want you to do it. It’s very hard. So you got to hook through a bunch of things and kind of look into the actual hotel spec. But yeah, the reason why is same thing happened with deep seat and open eye. You know, they, if I’m able to access all of its internal reasoning traces, I can train it as a teacher model on a supplement.

So there’s a good reason why it just makes me a noisy abuser.

Because I want to train people. It’s really hard to develop your own agents when you can’t see what’s going in and out. Because every time I record from the right first time, it will be 0.

With this, there’s just hardly any way to know.

The lucky benefit is that generally the solution is not to use sub-agent.

And I think that’s true even if we were able to see it.

You really need to be thinking about how can I use There’s so many jokes about either prompt engineers, non-store stuff, but that really is the atomic element for what we’re talking about.

How do I just kind of take these little spec prompts sort of things and use that as my first solution?

Now, don’t go for the nail gun.

Sometimes cameras fine. This is your screwdriver, so just use this whenever you can. So Josh, in regards to multiple agents here, have you tried smaller mini agents? Does this require check to PT? No, not at all. You can actually use this. So you obviously have to kind of scale whatever agent you are using to what its capability is. So there are a few agents that can probably do some aspects of this sort of thing.

So I think like DevStroll, GPT, OSS120B is probably capable of doing some sorts of things like this.

And then some of the Quinn models, some of those other models as well, GLM and all that.

He said 120B, not 120B. 120B, for sure not 120B. Now, I’m sure that you could do some task level stuff with the 120B, but once you start getting into chain sort of things, it’s going to fall off a good bit. But in general, I would say that this sort of stuff, you are best with a codex or a cloud sort of zone right now. That’ll be very different, I think, in about six months. All right, so we can see now, so it spit out this plan. You can see there, it used 57 tokens. The nice thing is now, is that those 57 tokens are not in my context.

Yeah, 57 tokens. Sorry.

57 thousand tokens.

And those are now not in my context.

And so it just went out and did all that planning. And if that was in my context, I’m done. Well, that’s like, you know, I’m already rotten. But it’s able to kick that out, probably just 2k tokens, and I don’t know when to keep going, but it’s kind of explored everything.

And so this is how we get away from the hallucinations and sort of things like that, where it’s actually going and kind of crawling the curve.

So everything’s really about kind of recycling that stuff out. And so now here I have Claude, it’s asking me, hopefully it’s actually kind of, I’ll let her redirect. Yes, I have a sneaking suspicion that, so I wrote this on Windows. This smells certainly to me like a Mac related permissions problem. So we’ll see what it’s doing there.

And so I’m actually going to go over here into, I can stop looking up there.

Okay. Yeah.

All right. So I’m going to go in here and I’m going to show some of the other capabilities that we want.

So I’m going to go into Claude again. And so here’s another sort of thing that I think is really useful. So I have this great reference thing.

And so I don’t want to go out and necessarily find all these sorts of things.

So I have this specific prompt that really says, I want to go look at this topic.

I want to go get the links to these things.

So tell me like an overview of this thing and go kind of do the link crawls.

But what I’m trying to do cut back on is its ability to assert that, you know, this is what this reference is and kind of fill it in.

I don’t want it to give me a guide.

I want to give me a link tree that I can go kind of query appropriately because I trust it to go give me links to a certain extent.

So I’m going to do here and so create a reference for Mac file and path permissions. and issues with static routing.

So that’s kind of a suspicion that something that I kind of think might be related here.

And I’m just gonna tell you to go out and do that when, so see before you write the reference, ask me some questions about what to, All right, so this is going to send it into a mode when it comes back, and it might do it before, it might do it after, where it’s going to have some things that probably are not going to agree with itself, or things that are kind of a judgment call. And you can actually send these agents with tool calling.

You can give it a tool that says ask for clarification.

And if they’re trained for it, then they’ll do that. And actually, it seems like it’s done that already, but I think it’s actually probably smarter.

So the context of the reference, it’s, are we working on a specific application? So yeah, I’ll just say Docker. I can’t see it.

What’s the, sure, yeah. Just gonna ask me some stuff, web server, level of depth.

Sure, let’s kind of do a balance.

So it’s asking a bunch of things and now it’s going to feed that context in. So it’s kind of narrow it search a little bit.

So you can see this kind of like in deep search and deep research.

So they’ll kind of do these things by normally. It’s very useful to add this stuff to your development environment too. Especially whenever you’re feeding it, you know, these big long profiting, a lot of context, feeding it into code base with multiple authors. that might have written conflicting advice, which one do you care about?

Because sometimes they don’t line up. And that can be for a lot of different reasons. And yeah, this is a good way to kind of handle that. So we’re going to let that go and do its thing. And now I’m going to, what even is this?

Oh, OK.

This is one where I left it just because it’s full.

Put it on purpose. Stop looking at it there. OK. So, okay, so it’s doing stuff. So let’s see, what has Claude been up to over here?

Does he need a reference?

Let’s see if he can kind of just figure it out.

So he’s added some sort of past resolution utility.

I don’t know about all that, but it doesn’t matter because he knows how to test. That’s the big thing is that we give it through access to playwright. in the Vthex spec, it has said how to reproduce this problem.

And so it fits completely off on its own area, and this isn’t going to work.

It’s going to know that.

Now, sometimes, Paul will get sassy with you and say, well, I can’t fix it, so I’m just going to run the linting. We’ve got to keep an eye on that. They’re not always all the way there, but it’s cosmetic error. It’s cosmetic. This is the mind of production. It’s production. And so I will have, you’ll see in my Claude MD, is that there is a do not ignore lent diagnostic errors and not meant to be bypassed.

So have you had any luck with getting it not to get stuck in a cycle?

Yeah, I don’t really have much. Like, hey, if you try to save them three times in a row, and it doesn’t work, we want you to have a different type of solution.

Yeah. So I definitely do.

So I talk about dbt OSS 120B. I absolutely have that with that.

So that model when you need it.

Sonnet, I have found and doesn’t need that. Codex, does. So Codex, I found, has a lot of issues with that. Especially it’ll kind of get really, you know, locked in on something. All right. So it is declaring.

manual testing required.

Okay. So I’m going to remind it. All right. So we’re going to kind of nudge it a little bit.

Yeah. Yeah. Yeah, I am actually right. Thank you. All right. So it’s going to tire, try and pull it up. This is, this bodes well. I’m not seeing a red error.

All right, do you notice that, yeah, it’s not really loading, is it?

Okay. Yeah, I think, oh, okay, that’s good.

Okay, I think it might have actually solved it.

All right, so we can see some of this other sort of gorg here.

What do you guys say?

Are you happy for yourself? All right, so let’s try and see. Take a picture now.

So one thing, this is a big thing with MCP servers.

So a lot of people talk about MCP servers. You know, they’ve got bees knees, they’re great.

The thing with MCP servers that you’ve got to remember, is that they fill up your context like, whoa.

Now, it can be sometimes worth it to use them anyways, but you really should only enable them whenever you want to use them.

And so one of the things that I have here, so it’s gonna blur up a whole bunch of times, is that you can actually conditionally load MCPs. Unfortunately, this is something you have to do with the file level right now, so I’ll have them loaded as config files and you can kind of turn it off and go back in.

that’s something that you can do with these sorts of things.

Hopefully eventually we’ll be able to turn these off, you know, inline. You know, as it goes, I’ve been trying to, you know, find a project that’s able to do like tool sets, I think is what people generally call them, but I can conditionally add them in and out.

But man, it’s buggy.

So it just, it’s not a good way of doing it at scale. There’s always something wrong with it.

I was running into issues getting, I needed to have one setup that used certain variables or environment variables. And there’s really no way to do that to get it to drop that into a file.

What do you mean? Just whatever? Drop environment variables in the file?

Yeah, I had something for a, try to remember what it was.

I was accessing GitHub. with their mcp and it needs your off-token or whatever to drop in the header push over.

And I was trying to throw that in an environment variable, but I can’t, it won’t expand the environment variable when it’s coming out of that mcp config file.

I wound up having to write it by hand doing the same thing. It wasn’t hard. It was annoying. Why can’t I put dollar sign bracket? Here’s my variable and you figure it out.

They’re definitely still figuring a lot of stuff.

Because the clock codes, I remember, was one of those.

I think both clock code and MCP were things that they just kind of dropped.

And everyone’s like, yeah, that’s it.

We’re like that. So we can project.

I guess this is my job now.

So there’s lots of little things like that in both of those. But it’s really nice as far as features.

They’ve got it now.

They’re kind of ahead. I like codecs for a lot of things, but their features suck.

We’ve unlocked this.

Good job, Claude. Let’s save this off for him. You can see now a little bit of, I’ve made a little UI. You can see some of the metadata that I store on these things for content for these.

This one’s looking at the Piedantic AI, Arise Phoenix.

And just think about, as you’re gathering these things, how am I going to sort through these things over not just months, but years and possibly decades of projects of what are the tools that I have used, what are the specs that I have used, that work with different versions of the models, and these are the outputs. And what are the things that are going to matter as I’m managing the software experience or environment over that period of time? And how am I going to make it traceable?

And so there’s just some metadata that makes sense to me for this sort of stuff where I think this is a valid document, it’s stable, it’s a guide, it’s about observability. I’ve got some tags that I care about this sort of thing, version, sources. So this kind of makes sure it’s keyed in.

What do I reference in the top?

A lot of times with the markdown files, they’ll throw this all the way down to the bottom.

It’s kind of the standard way, that’s how markdown files really work.

But I want this in my YAML metadata at the top.

so I can kind of manage this stuff over time.

And then another thing that I have here, and we’ll see if this one works, because this is one of the, okay, wow, okay, it really did get it. So this is a related document where I have links to other documents inside of my library. So it’s starting to kind of build out a graph structure for my reference documentation where it can kind of group, you know, explore things, you know, it’s doing that agent search. And so I’ve got this Phoenix Docker Compose development.

It’s got all its information.

And so I start here. It’s going to start reading the first 100 lines of the file.

These things read incrementally, streams, things that.

And so we can know that there’s somewhere else that can go if I’m having trouble with the semantic conventions for open telemetry.

I can click here and hope that it doesn’t error out. So maybe there is some issues still. Too bad, Claude.

Yeah, that’s what I was going to ask next.

Since you already gave it a way to check one specifically, you just take that same prompt, say you have access to the full site, check every link the same way, and by the same fix.

So we can do that.

Or I can do this. I figured you’d already have one.

So this kind of gets into another thing that’s very useful, is if you have sorts of things like this, don’t just rely on prompt. You can still write code.

Isn’t that crazy?

So I have these two specs that, you know, this is something that’s pretty normal.

Just because it’s relative file routing, I’m relying on it to do this sort of thing.

So I wanted to have the ability to go check this sort of stuff out.

Let’s see, use, I think this might be my fault.

Let’s make sure it’s able to do things. It’s not using it. I’m shocked. I just don’t want to use UVL for that. And it really fights you against that for some reason. All right. Yeah. So they can see now that this probably happened whenever I know exactly what happened.

So I archived a whole bunch of specs all at once because I had not done that until about five minutes before this meeting. And so it’s now broken everything because it’s manually. I just got this manual links there. It’s probably on the right. And so that’s what’s broken everything. So can you fix it?

Please.

So please ever do anything?

No, it just makes people laugh. Actually, I think they’ve actually, I do recall a greener paper where it actually hurts. Yes.

Actually, I think the.

I just have two comments.

One is about the MCP thing. I actually have a lot of good results from just having it use CLI tools.

And so like I actually tend towards that over MCPs.

The majority of the time, the only time that I really use them is for the browser use. So like the Google… the Google DevTools, the Chrome DevTools MCP is the one that I tend towards. Cursor also has the browser used now, so it hasn’t built into the IDE to pop it open and look at stuff.

It’s actually better, I think, with the Kithub CLI than the Kithub MCP. Yeah, exactly. If you want to use that, if it’s installed, it will default that anyway.

It’s really good.

You can have a good look up CI runs and results, which is good. Because it never does my linting. So I do that in CI, and then I’m like, hey, is CI feeling? It’s a car retro on linting. I’m like, yeah, it’s something like every crop I tell you, but OK. with the MCPs are giving this long, long thing about like, here’s like all the tool calls. If you don’t have to do that, don’t do that.

Yeah. I started, you know, when MCP came out, I was like on it, on day one, you know, I was in the discord talk to people, yeah, let’s get class together.

So I went through that phase and I am now strictly playwrights, the only MCP I own.

Legitimately. Maybe Firecrawl, but not on doing development. If I’m doing, if that’s like, if I’m, you know, using my VS code to do market research, I might use some stuff, but you know. Yeah, I’m like, there are, And I also use Notion MCP to allow it to connect to my company’s Notion. And I can like paste a Notion document in this like organizational docs. But you can actually go into the MCP and disable all the tools other than like view document.

And that will keep the MCPs like overhead kind of low.

But the other comment I wanted to make is about the writing of the code.

A lot of times. I’ll have the agent actually build like shell scripts for me to do things that are like repeated things I know it’s going to need to do later.

I was recently building a Prometheus exporter and I needed it to like have a copy of the outputted metrics in my committed to my Git repo just so that people can see like this is what it looks like.

So I’ll have a script that actually like will load the app, scrape it and then dump it to a file that it can just like every time it makes a change to the structure of the metrics it will just rerun that instead of having to kind of refigure out how to do that thing. The last thing I would mention is credentials are always a problem and so I actually have it like all the commands in my readme’s and stuff I have them wrapped in OP run or one password CLI and that will allow it to like connects to my one password, pull down secrets, and it actually doesn’t see them in the output of the command or anything like that, but it knows that it needs to use that to, like, give environment variables to actually run the things, so.

And I did the same thing, but with AWS Secrets Bander, which is the AWS command.

Also, hashcorp ball. Hashcorp, yep.

Yeah, if you run around print.

That was one of the things that we covered in our the identity sort of thing. So really thinking about a zero trust for AI is very important.

So how do we deal with these things?

You know, if an AI agent gets compromised through a prompt injection, these things are super brutal. You want that thing to last for as short as time as possible, you want a tail switch.

So that helps.

We’ve done it about the environment variables.

Use an organizational AI so that it doesn’t leak your stuff into the… Yeah, that would be a good start. Yeah, turn off the training on my data thing. All right, so it has creative aspect to fix my laziness. Let’s go ahead and do the thing we know to do, which is clear that out. Luckily, it should have some information about that command to run it again.

So we’re going to let it do that.

I’m going to be lazy this time.

Just let it kind of choose what to do. So let’s go check back in. The other one with the metadata spec.

So it’s got a plan.

And okay, it’s going to create some information for this.

Let’s see what it does there.

And yeah, so you just kind of do this. This is kind of a flow of stuff. I think that covers most of it. I guess any other talks, kind of one hits. Are you doing this all in just one repo right now? Yeah, it’s just one.

If you haven’t tried using Get WorkTrees, I dabble with it. It’s definitely cool. So like your implement task, for example, like I’m not one of those, and I start with Create a WorkTree based on the feature branch name you have.

And so you kick it off and it’ll create a WorkTree in a subfolder and then go do its work over there so you can start working on any next spec and you find another bug and you’re like, hey you.

You go do that and you kick another one off and start to get a little tree.

The only thing you have to be careful with then is then you do about three or four half completed code bases and now you have to go check more and more. You’re like, how many PRs do I have to review now?

I’ve seen a lot of people doing basically like a monicolor tree search sort of thing too with that where they’ll have one defect and I’m going to get five agents all go solve that defect. And so I’m going to go and take and see which ones I like from it. You can do some stuff there for some search. In the latest version of cursor, they actually built it into the IDE to do multi-agent runs. It’s been built into Keradex Cloud for some time. That sounds very cool. To me, that sounds like the way things will probably go as time goes on. Here you’re doing my time to the cost. Of course, Codex wants you to do it as long as you’re on API.

So John, speaking of cost, can we add a feature?

Can we add a feature?

Sure.

So under the dashboard, can we do something like an analytical pie chart on your cost?

Sure.

I think that should be possible.

All right. What do you do? We’re not going to have you do that. I’m changing my mind. Although it sounds like a good idea. All right. So let’s see.

We want to do a feature.

And so what we probably need to do is I’m going to set it into plan mode. And so we have the arise.

Yeah.

OK. Review the arise Phoenix. API and determine how to query overall costs to add a chart to the dashboard.

Let’s go help it out.

Just find out what it is. Okay, set root to… Can you drive my folders and files from PS code into the box?

See a lot. I don’t know.

We haven’t been this right in a while. You got to push the button that doesn’t look like a light switch. Oh, right.

Yeah.

All right. So yeah, okay. So let’s try and do that. So I don’t think so. I think that’s one of the things that does still annoy me. I always have to be pass. So like if I want to do an image and drag it in, I have to drag it into the like a local file and then copy the path and then I can feed it in.

Not great.

So yeah, I would like it if I did not have to do that.

And as you put your documents under the cloud folder under references, is that a folder the cloud knows about or just somewhere you put it? It’s just kind of keep it kind of tight. You know, otherwise I have, you know, .clon and .agents and AI Docs, AI RepDocs and just kind of sprawls out.

I don’t think there’s a really good way of doing it.

I definitely have to check out the work trees. Oh, I had a re-implemented client on our organization. I turned around one day and I’ve got a junior level software developer that has four different clones in their repository with four different eighths.

Basically, he had his own group of interns off.

The interns have interns. Yes. It was great.

I learned a lot.

What is the UX of using Git work trees?

What kind is that?

Are you familiar with just Git work trees from the CLI?

I’ve never used it before.

Okay. A work tree is just checking out another branch without having to go clone the whole repo again, since you’ve already got everything local.

And so what I do is I create a trees folder just in my root and add that to get ignore so that it won’t try to do it.

And then I just tell it, you know, here’s the scene like man, when you start, start by read the document, decide what the feature branch name is going to be and create a work tree in the trees folder for it, change directory into that and do your work there.

And that way it’s just working in that work tree.

And I can tell us my trees folder and I can see like each of my specs that you have like each feature has a branch and each one is its own branch and that way you can you can push it up and do a PR and not worry about it because otherwise you get one long run of branch guy and like I wanted to fix this other thing I’m like crap you’re already using my using my thing I had to go clone it again and pick a new name and it just it’s just easy if they just always spawn off into a work tree to know that if you need the what I usually use it for is the in the main view but while we were having the specs on the whisper working on a plan, well, I’ve got somebody working on a booktree. I’ve actually been exploring Jujutsu. Have you ever heard of that? It’s a different type of CVS that’s like separate.

It’s like, I don’t know, it was built by some folks at Google to kind of solve some of the problems they felt like they had with Git. And it has some different mental models, one of them being that there is no like index. So like as soon as you make a change to a file, it is like captured by Introduce2.

And so is that similar to what Mark Merrill does then?

I remember working with that a while back and it blew my mind. Yeah, and so… Automatic branch creation, that’s what I need. Thousands of branches, I don’t know what any of these things are.

Yeah, so like it’s supposed to really help with doing things like multi-branch workflows and doing like branches off of branches and managing… you know, like pull the glass off of it and stuff.

But yeah. I mean, it was not made for this sort of scenario where I have one defect and I have 10 solutions.

It’s just not, it’s a different form factor.

So it makes sense that my only concern would be, you know, if they’re able to handle it before it’s in their training data sort of thing, even if it gets bad, if it’s a normal distribution of bad that it knows, you know, how long until it can do the new thing?

Yeah, so it might be? No, no, no. I found Claude surprisingly bad at dealing with merge conflicts.

If there’s a merge conflict, I must just stop and be like, no, no, no, don’t touch it, because I thought it would be easy. It’s like, oh, you understand what the code does. It’s clearly marked.

Here’s what was in one branch, the other branch.

I don’t think I’ve ever seen it do a merge conflict.

It was anything other than trivial, right?

If you give it too much rope, too, it’ll just. The other the other rage quit that it does whenever the test stopped passing. And it’s like, it finds lints that maybe I’ve had to reset its context and it broke lints before I reset the context. It’s the square button that does not look like a light switch to the right, to the right, other wall. It’s a square button, push the button.

There you go.

All right. Now we’re good for the end of the night. He’s been praying. But yeah, so a lot of times it all starts trying to stash the entire thing or like get checked out. I was like, no, this was not part of my spec. I did not do this. So I am not responsible for this lit. And it’s just like, it’s like, whoo, whoo, pick her. So this is like. We’re doing real engines.

That’s where putting in CIs works for me.

Because no matter what I do, it’ll always be like, well, that wasn’t related to you.

Yeah. Put it in CIs, and you say, hey, push your open PR, and check CIs.

Right, right.

You know that I got to get that in the face of it. Yeah, well, yeah, yeah. Taking to a completely new thing where it hasn’t had a task. Because it just wants to get to that end. You think about these things. It’s RL.

It’s trained on getting to the end and saying, I have completed my task. green i get to live you know essentially uh and so if i’m saying hey you can solve this limit i was like no no no no no no if you start over again and say hey you go fix lunch issues your your job is to fix length it’s like good i’m gonna stop shooting for opening the PR when it thinks it’s done i can clear it out and it’ll open the PR it’ll check the thing and be like oh i must have forgot that But like you said about the police and avoid saying that and wrong. So I have found some interesting results. I’ve threatened to fire an AI because some of it maybe gets killed.

I feel like once I’m there, I’m already lost and it’s just catharsis.

I just know in the back of my mind, I need to stop arguing with my keyboard right now. The responses and change of direction has been interesting. It is interesting, yes.

For sure.

Interesting.

All right. So I guess another topic that you’ve touched on here, but it’s worth reiterating.

creating feedback loops, just like how you allow the agents to do good work, otherwise they will bring you to work and you’re like, stop it. Right. Like you do it, you do the work, but you have to actually give them the ability to see the output of their own work. They can do that and they will iterate on it. But if you don’t give them that ability, then they just are stuck. So anything you can do to like, you know, hook a linter in or like you were doing earlier with the browser, like giving them the ability to see the web page.

That makes a huge difference in the workflow.

The most important thing that you can do to make this work is think about validation. Think about validation. Think about it for right now. Think about for it for the end of the day, for the end of the week, end of the month. How are you going to build those things and kind of get them to feed up? You know, it’s the same thing. about testing for decades and decades. It’s just so much more important now because you’re trying to essentially run a team of engineers and how are you going to validate those things? We’re just going to kind of do stuff. We can finally do test drive and develop them. You can, but that’s not a golden ticket, because I don’t want to just believe you as many wrongs. I’m going to have TD near me. Because I don’t have to do it, it’s the agents that are doing it. You partially write a test with you. I can’t see that that would be… Well, the AIs will write an entire test suite for you in one go and turn around and write the entire implementation because it’s already figured it out in its little robot brain. And so it doesn’t really matter which… order to write it in, but if you do have them write the tests, then you can at least like it can see flaws in its own reasoning and be able to fix them without you having to tell them. And also along that same lines of the feedback loop is like the scripts and stuff you’re talking about.

There’s a group of places there to insert just a little bit of feedback too.

So like I was using it for Ansible the other day and like it always wants to run prepare and it didn’t run create.

And like after having to tell it like three or four times, I gave you a script that checks and if it tries to do it, it spins back out, you have to run create before preparing. and it just doesn’t have that problem anymore because that error message is relevant.

It doesn’t just tell it that road.

It says, this is what you have to do for the next step or for the UV thing. I got tired of it trying to pipe on directly. I’ve got a quote now that if you try to run a command and it starts to pipe on, it means it rejects it. It says, I always use UV.

It used to be from Python.

I never have to tell if that can work.

It’ll just pipe on. I see the red error come up and it’ll be like, OK, UV. Yes, it usually corrects itself very quickly.

Well, it’s because I put in the message.

the only use you need for Python commands.

That’s the small response I give it.

Because I’ve said this half a dozen times, and I’m going to say it every time you start to commit a Python. I’m the co-worker that actually made his git command an alias to a function that if it is git commit, it does a prompt.

You know, are you sure?

It makes the AI get stuck so that if it tries to run it, it won’t succeed. It’ll get stuck. And there are ways that you can, yeah, like you were saying, like putting in hooks, changing aliases in your shell to use one command instead of another. So I guess in your command, you can just… make it not even work, you know? Like alias.wv, I thought.

Yeah, that’s true.

You came in with just an alias and you’re like, hey, like, never do that or I’m just going to make it do the right thing for you. All right, I make files a lot. And so I’ll have like make build and I’ll just point the AI to that and it likes to use them. Yeah, I found it worked really well. So I have a make here too. That’s why I found the worst mess.

It can be hard on Windows.

If you have Windows, you can find something else or you just get back to them like that. I’ve had some pretty good success with task file to go with me. I also use Taskfile as well.

The thing I like about Taskfile is you don’t run just, you have just bash scripts in line.

You go out in the bash scripts.

So you’re just like, I want to do this. You could be like, hey, it’s not a task for that. You just wrap it, bash script around it. And then, you know, does it the same way every time at least, you know, you do not need to. I can’t watch it.

All right. Yeah. Yeah. With the bash scripts, one thing I don’t like it doing the inline make, just because I usually have so much.

If I make files, I do usually have it go off to a script directory and I’ll have things like run all scripts.

I’ll have things like MyPy where it does all the MyPy on all my packages individually and then writes an aggregate.

So I can just grab things off the top and have it type those things into a log directory.

Because then if you’re starting to think about that sort of thing, then I can take and say I have eight packages in my monitor repo and four of them have issues. I can create a spec for each one of those. and then have four different agents go off and do it, and that kind of helps you keep your context recycled. That’s just all I always think about how you get, you know, the context down and reset into green, so it doesn’t get confused. I thought I saw something where if a makefile was too long that it just wouldn’t.

Sorry, I turned it into the talk a few weeks ago.

I was running into trouble where with the long makefile. It was just disregarding Yeah, yeah, for sure. That’s a totally good idea. It’s just common, you know, how much content is going to pick something out of the middle of it. See, you know, maybe there’s something in there that isn’t so useful anymore. It’s hard for me to see.

You know, it’s easy for me to look at this and get an idea of what’s in my make file right now.

And so if like, for some instance, you know, I take out, you know, this generate API command.

So I’m not doing that, you know, the dynamic design.

Uh, it’s easy to receive that there, but if I have all of these things in line, this is a 500 line file, reading that for a little, you know, five code project.

So, uh, yeah.

Okay.

Good.

So, like that.

These are all essentially using the same, like, have you seen any major differences between like, uh, like cursor versus get out of versus cloud code versus some of these other ones?

Yeah. So the other one, I didn’t do it tonight. but I do have codecs too, and I use them pretty much half and half for my stuff. There are a lot of things, especially, it’s hard to get into the things that I use codecs for in a situation like this. There’s usually stuff that’s really, really perspicuity.

You know, Sonic sometimes just gets so, I don’t know how to say, it’s emotional, you know, it kind of gets up on the paunches on things, and it’s certain about how it wants to do stuff. Codex has none of those problems. It’ll go and just robotically do stuff. And so if I have something like I want to refactor a, so I did this one. So this is kind of one of the last things. So I have this one spec where is called, you know, this thing where I had A violent upgrade where I took it from basically a basic violent to I’m doing all the settings. We’re doing everything exact. You can only do 50 lines on a function.

Everything’s too far.

Maybe not what I do in a normal project, but I wanted to force it into something. And Sonic would not do it. Because it’s just like, this is not necessary. You do not need to keep lines at 50. I don’t care. I’m telling you to do it. And it’s like, I would get it to me and I would go a little bit further. It’s like, oh, my context links. I can’t do it. I’m just going to put this off. And we’re going to do a simple. And Codex has no problem.

And it might be because it has longer context. I think it does have like 250.

So there’s things like that that make it a lot better.

I can’t think of why I would not use both of those kind of happen half right now. I can’t think of a single reason why it would be Jim and I at this exact moment. Now, a week or two when apparently they’re about to do Gemini 3, and I’m sure that will be quite a bit.

So I’m sure that I will be saying something different about them.

Is Codex using GPD5, or is it using something else? It is using GPD5 codecs. So they’ve got a fine tune, kind of like what they do with DevStroll.

OK, so it is trying to be a codecating specific.

Yeah, it says fine tune. Uh, there’s things that it sucks in.

Uh, there are a lot of, you know, like all, generally I will never get codex of web search tasks in that documentation tasks that I have.

No.

Well, it’s, it’s very focused, has no imagination, can’t design it so much.

You actually have to enable its ability to web search.

Yeah. Yeah. Yeah. And cloud has it straight up out of that. Can we lose something?

Here’s some.

I still see it on the screen.

We’re getting reported. Yeah, that’s fair.

see you know one of the reasons i was asking about this so i don’t get to code it early as much as you do in this video but i use get that code out i like it for my stuff i can switch between sonnet and all the different models um but i’ve actually not gone into the lending errors that y’all are mentioning like almost all the time like mine will be almost everything but like it will still notice okay there it’s the lending error here let me go back and fix it and i’m sure that that’s more just like the framework that you know, Microsoft has put around getting a phone out of it. It could be more time tied into BS code. Exactly.

I was just curious how those might be different, but it makes sense that there would be differences between them. Yeah, one of the things that we were talking about too, is you know, if you just think about it, the model state exactly the same as where we are right now.

What you’re talking about kind of at least this, just from improvements to cloud code, codex, you know, client, we’re going to still see major improvements just as we kind of change our systems, you know, same models, no, no, no progress in email whatsoever. But as we’re, you know, looking at, you know, our, our tools, but also our environments. So, you know, how we organize our projects and, you know, keep our artifacts in date, kind of update things that we had access to these things.

That’s going to help all this stuff for, I think, a good long while.

We’ve kind of got stuff to. The models have been coming out faster than we’ve reached the limits. Absolutely, yes.

I was actually going to ask how you were doing the archive or completion on it, but it sounds like you’re doing it like me, and importantly, cleaning up, and then occasionally… No, no, I’m very guilty. That’s the only problem I have with flat bottles like that in the repo, is I end up looking at it like, oh crap, I’ve got 15 in here. Yeah. Oh, most of those are everybody. I think as time goes along, you’re gonna be starting to look at some of the rag techniques. So things like putting the full file and the cold storage, like keeping the summaries at top, you can certainly think about that sort of thing as you go along. I think that sort of stuff is very useful. Hey Josh, I got a question about something you said earlier. When you’re talking about the difference between Claude and Codex and how Claude was having some issues with constraints you were trying to put in like 50 lines or whatever.

Was it just ignoring the constraint, or was it saying I can’t do it? It was kind of quibbling about it. I don’t know. It was not ignoring it. Oftentimes, because I didn’t even get enough steps, and it tries to kind of wriggle its way out of it. I think it’s the best way I can talk about quality setting.

Yeah, yeah. It’s just orange that it’s not valid. It’s not. It’s not important right now. It’s a minor linting issue. That’s the one of the things you’ll do. You’ll see that on anything it struggles with, whether or not. That’s why you’ll see the linting from what you’ve watched. Yeah, it’s not super important. You want to do it? No. It’s not important.

Yeah, it’s an excuse. Yeah, it’s a hallucination essentially. It adds a gap in what it’s able to do. It’s inventing the most reasonable, you know, because Kwan is helpful. So it’s always going to be helpful. So we can’t deny that. So reality has to be.

Yeah, and dogging my homework. Yeah. I think the black test of it, it passes, and then it’s like, oh yeah, I’m good to go. Yeah. Let’s disable it. Do you have a behavior like that?

The number of times I’ve done my test, and passed, get markers, and I was like, no, maybe.

Yeah.

I’m like, right now, this nice test, and it changes you.

If true equals true, then I was like, well, yeah, that’s going to pass. I was like, I’ll just try to talk with the fails and the CI, because CI is not configured correctly. I’m like, what? No?

Yeah.

It’s like a test that I made myself. I can’t do that. Yeah. One of the things I think it’s especially when you kind of get those hard to one part wants to deal with is thinking about, you know, parsers.

So like if you’re doing a type script battle parser, you can get access to the abstract syntax tree.

So I’ll actually have a whole suite of lentic tools that I have that checks for things like over guarded exceptions where it’s basically doing a try catch.

And in any exception, it just passes.

You know, so it just kind of kind of swallows things.

I think as we go along, things like that are going to be more important.

You know, checking for areas where it’s kind of hedging its bets.

It’s trying to get into that green and laying the technical debt in. It’s going to be very useful to understand things like abstracts and extracurricular. So I would suggest looking into that then if you don’t know what that is, because the eight hours you spend learning it might save you eight weeks at some point.

I’ve been waiting for Crush was the one that came out, a coding agent.

It wasn’t very good, so I stopped using it for a while. But one of the big things was they were looking up to a list so that you could actually learn more about the code than just tracking through it. I was kind of surprised nobody else has done that yet. I thought that would be pretty powerful. I was waiting for podcoder, so I didn’t get that update. On the topic of different agents, strengths, if you use them, a bunch of them, you’ll find that you just prefer how some of them work over others. Like, you know, Claude talks to you more. I mean, you just may find that you just like, you know, Codex, yeah, it’s good, but I like it when my agent talks to me more or like explains what it’s doing.

Whereas Codex does it really.

Like you tell it what to do and it kind of It’s like an emo engineer. He goes to a car pole, and he doesn’t say anything, and then he comes back and he’s like, here man. And he’s like, he has to get a follow up, and he’s like, nah. I don’t. I don’t need that. But Claude’s like the helpful engineer, and it’s like, let me talk to you as I go. One downside I guess to using like cloud code is like the You’re effectively you’re buying into that ecosystem and like if you’re at work You don’t have a really hard time convincing your boss that you you need cloud code and codex and you know this other thing But if you convince them to get in cursor you get basically all And so it gives you the ability to kind of move between them. And when new models are added, they almost always get added like the day one on one or server.

So that’s nice.

And cursor also actually just recently added a CLI tool.

So it has some ability to do something similar to this UX, even outside of an IDE.

And so you can use it like, like I personally use the CLI when I’m like in between projects.

I actually haven’t moved like patterns from like project data, project B without having to open like an IDE that has all like a global view.

I use it started on servers.

You don’t like turn it on to like the yellow mode or whatever.

So it’s come a long way.

It’s learned a lot of lessons from clock code for sure.

Yeah. Speaking of cursors, VS Code or is this?

This is VS Code, but I’ve used cursor.

I heard that you had a 2.0 recently that has kind of gotten some people who were burned to come back.

So that might be good.

See, I think a cursor is good. I mean, they’re all good.

You need to be able to be flexible. I think it’s a big thing. This is not time to edge your bets. It is the time to get your thing. You need to be able to… I don’t think Gemini’s worth crap right now. Two weeks, I might be doing it every day again. There’s something we messed up through the year.

Normally at the end of the year, we’ll do this again in December.

We’ll do an end of the year kind of road.

walk through and talk through. Here’s the main things we covered.

Here’s the stuff that didn’t exist the year before.

Let’s do this year.

We should have been camping.

A running list of our favorite models.

Because I remember when somebody said it’s Gemini 2.5 or 2.5. The discord, we might be able to scroll up and just come up with over the year.

Here was the model de jure.

I can look through my bank statements from whichever side.

We need to start working towards wrap up. It’s just like this that’s kind of how it works. Yeah, well, thanks for letting me in. It’s still working on the edition. It is. Yes, it is.

Let’s see.

I’ll get it to go ahead and… We don’t have to leave. I’m just saying, if you did plan to leave here at 7-7-15, it is that time. It is that time.

Do you have a rough estimate for all this new costs?

I’m assuming this dashboard just doesn’t pop up.

Rough estimate for… So let’s see both of these are not on EDI.

I’m using GPD5 nano so probably 0.01 second.

So what it would be tracking costs for is the thing I have. Oh, the question is from all you’re asking. Yeah. Do you have nanos actually? Nanos? Even smalls.

Mini is not very expensive.

Yeah, I have found that just to kind of give you an idea. of where these are. GPT OSS120B is better than NANF, actually.

But it doesn’t have multimodal.

That’s one thing.

But it’s so better than the free version of GPT-5.

Really?

Yeah. I’ve been very surprised. Because GPT OSS120B came out that harmony tone. I think that it was a weird chat template, and it came out, and I ran my evals on it, and I was like, this thing is not worth crap.

But I came back to it this last month, and it’s very, very stuffy. pretty much the worst model alpha out there. But if you push get into the pro, it’s pretty much the best model.

Let’s go and do it stuff.

Cool. Let’s see. I don’t know.

Okay, so that’s going to try and run its end-to-end script. So you did mention me using a lot of these. Are you doing a lot of the subscription? Yeah, well, I have a max. We just use API work out of a lot of it.

We like burning money.

That’s great.

Yeah. The client API is pretty nuts.

I saw that they were giving $1,000 of credits in the browser until the 18th or something like that.

I figured it was one day and I was using like $100 but they took it every day. I’ve seen a lot of people complain about the max cutting off a lot sooner than they expected but I’ve never tried it. Does it actually work really well? I don’t know. I think I think those people are not doing any of the things that we’ve been talking about as far as recycling context and they’re just hammering and hammering and hammering. MCP stock servers I think. I’ve pretty much used it constantly this past week and for the first time since I’ve subbed, I’ve hit the weekly limit about two hours before the second. I’m running multiple windows. Yeah, most of them.

Okay, so again, they traveled it back some because some people were running multiple windows 24-7. Yeah, it’s a leaderboard up and you know that people are trying to get it. It’s like they’re not reviewing any of this first. I just saw it on the corner of my eye that biome ignores are trying to pop up in my… Okay.

Ignores? Lent ignores? Yeah. I want to keep going. But you can see it’s starting to be this sort of stuff. You can also follow source graph, their CLI.

It’s called AMP code.

They take a different approach.

You don’t have the option to select the model. They kind of curate what model is best for what task, and then you ask it to do certain things. The most canonical example is they’re talked to the Oracle, and it will find out that task to GPT-5 to do that thing.

I think it’s like, I don’t know if it’s pro or not, but probably not.

It’s probably just GPT-5, but maybe GPT-5 codex.

But then, like, for normal everyday tasks, it uses sign for five.

And they take it upon themselves to kind of figure out what’s going to give you the best experience.

And then their CLI just does that. It’s an automatic routing to decide what you want to do.

Yeah, it also has this concept of the librarian. So you can ask the librarian to find the information about this code.

And it will search using an index and be really fast and really accurate at finding data.

Either in your repo or… online and stuff. So what was that called? Ampcode.

Yeah, but I guess my point of that was like you can kind of watch what these people do and like which models they choose for their their like best of class and then kind of follow that pattern.

For example, I tend to use GPT-5 for like doing like code reviews and brainstorming type tasks, but then I’m using sauna for doing a lot of heavy lifting code because it’s so fast.

Yeah. Take something for sure.

Yeah. It’s going to get stuck here. Thank you, Josh. Yeah. Nice to meet you. So this repo is kind of like your personal knowledge management tool.

Not really.

I mean, it’s mostly just for this demo.

It’s just for this demo. Yeah, I wanted some of this fairly, had some stuff in it that was complex, a mono repo. But that’s what the app kind of does, as you manage your cloud stuff and has these like tinker tools for agent to agent and stuff. We’re talking about context, we’re kind of doing this through stuff. We can still kind of keep on what we’re about. Yeah, I think, you know, probably something like this makes sense for folks to have, but, you know, what that’s going to look like, I can’t, I wouldn’t try to make that somebody else.

It’d be very difficult. Yeah, so it’s now stuck at a point where I don’t want to, I don’t feel like dealing with that PMT. Right now, that’ll be quite honest, but it’s something that I need to fix known modules on Docker container build. So one thing I’ve noticed that’s so annoying is convincing these things that the Docker container has hot reload, and then you can put things in PMPM, and it’s going to restart again, just like convincing it that it’s not a cache issue, it’s not something else.

This is what we’re talking about.

It’s like, ah, this is not a problem. Anything to do with Docker, it seems to be a whole of just excuses. Anytime you start hearing back, well, I think it was probably a caching issue. It doesn’t have to do with what I’m trying to get to do. It’s like, oh, the cash issue. I need to start at the end and start over. No, no, you don’t. It’s not a cash issue. Try something else. There’s still lots of little things like that.