Cline-OpenRouter

Cline + OpenRouter

Transcription provided by Huntsville AI Transcribe

So a little bit of housekeeping type stuff and just some updates from the last couple of weeks.

The AI Huntsville Task Force, which is separate from different from Huntsville AI, which is always fun. We just flipped the names.

They are now on LinkedIn, so you can go follow them on LinkedIn. I’ll probably post something at some point to say, hey, if you’re following our page, you probably ought to follow them too, because that’s actual city task force, nonprofit. But if you want to do the nonprofit kind of stuff, go there. It will also highly irritate the troll that they have also picked up now. So yes, the person that tried to trademark the name Huntsville AI, started all this stuff, is now claiming there’s a lot of interesting things. So yes, let’s sue the city of Huntsville. Great idea. It feels like they have the money over a trademark that you already did tonight. Anyway, it’s wild. So that’s a thing.

Through AI Huntsville, we did a community-based AI 101 session a couple of weeks ago. That was Thursday afternoon at the Chamber of Commerce. Deloitte came in and did a pretty good presentation. It was pretty good.

Really, really kind of interactive, kind of surface level, here’s some AI kind of stuff that you might come across in your day-to-day life or in your job. Here’s how I used it recently. It was really good. We had, I think, over 100 people sign up, and I think we had about 50 show up, which is, it’s a Thursday acronym. I don’t know how much to read into that, but we did a survey beforehand and afterward to get some more information about what they like, what people didn’t like, what people are looking for. I’m going to actually do the next session for them in August. It’s interesting when you try to do something with a committee. And yeah, kind of like, well, we need to do that.

Well, I do this every week. Sure, well, let’s go.

So every other week turned into August.

So at that point, I may reach out on the Discord channel and look for some ideas and things like that.

What I’m planning on doing is putting myself in the context of a small business owner that’s having a sale on a particular day. What kind of tools do I have to come up with some marketing materials, some imagery, some other kinds of things, some emails, whatever? What does that look like today versus what it was in the past? And just kind of just do a walkthrough. Okay, here’s a way that is useful.

That’s the thought right now.

Maybe come up with some other things, but really more, how do I use the things? Still trying to figure it out. Yeah. I gave a presentation two weeks ago to my company’s leadership team of about 110, sort of a kind of intro to AI.

And I can’t believe how many people came up and said, I’ve never used it.

That was amazing. Oh, I’m going to go back and now write my emails. That was the breakthrough. That was the breakthrough. Oh my gosh.

It’s going to make me sound so much better with my emails. I’m not kidding. This is to a room of business leaders who don’t know what’s out there. And so my only word of caution is whatever you’re going to do, cut the complexity in half because it was mind blowing. Yeah. What we usually do, the trick, I don’t want to say it’s a trick.

Usually what we do is I normally combine two things. One of them is a little bit of history that we didn’t just make this up. This has been ongoing for a long time and somebody just made it popular real quick. So that’s one side. The other side is what I call the magic wand.

You can talk about stuff and people are like, Hey, I keep here. You just show them something real quick and go, well, hey, you can do that. Let me summarize an email. Let me do this on the screen.

Just do it. It’s not hard, whatever. And it just kind of removes that whole, this is hard or whatever. I had them all.

I had pre-printed off prompts, beginner, intermediate, advanced.

All of them use it.

So even the people are like afraid of AI.

I mean, I can’t for sure say they did, but kind of gave them like beginner level for them and then advanced level, taking advantage of deep research and things like that. It was awesome. I mean, just getting them to use it was like the eye opener. What was the, did you take questions? What was the what?

If you ask the questions, did you get any weird or surprising?

Did you open the floor up or just present?

No, we ran out of time. Okay. I had two more days of those people and they didn’t get anything really bizarre. Okay. Now we did, early this year, we did a present, same kind of presentation, but over three sessions for a group called Learning Quest that’s in town that is retired, 90 something percent retired folks.

But in Huntsville, we’re talking about retired attorneys and retired physicists and retired realtors and retired teachers. Oh my gosh. I’ve never been in a room with that many people smarter than, I mean, just like, wow.

And some of the questions that they had weren’t at a technical level at all.

Nearly all of them were at a philosophical or an ethical or a, well, how do I know I can trust it? Right. You can’t. That’s the answer. You can’t. It’s like hiring a teenager. And at the time, it was, I got up because they were like, well, well, how do I know if it’s going to tell me the truth? I’m like, well, my truth or your truth? Like, what do you mean? Question, show of hands, should Alabama have been in the college football playoffs? Half hands go up. The truth. The other hand, not truth.

Like, here you go. So it was a really interesting kind of thing. But yeah, that type of thing, there’s so much, it’s kind of what the AI Huntsville task force is really about, is trying to figure out as a community, how do we get the general population more aware of what’s available and how to use it, you know, that kind of thing. And then this group is more for the technician side or the let’s build stuff or let’s tinker or let’s do something, you know, things. And then with Josh, the paper series is more on the research layer, even lower than that of what makes these things tick.

So I think we got a lot covered.

Our problem is we got, there’s a lot more appetite for this and for other stuff. I mean, we could, if we did this at lunchtime, it would be full. If we did this on Friday night, it might be full, you know.

It’s just so many, yeah, there’s a lot of people here. Try to think if that’s that. What are they trying to answer? What if they question, are they trying to answer for the community or are they just stirring the pot, trying to get it interesting? The main concern is that the fundamental way that knowledge workers operate is going to change. Because if your job is to take information in one form and put it into another form, I get news. That’s probably not going to have a personal need to take this to your, you know, pretty quick. So the community is concerned more for the job force that may change than the employers are worried about it. Yeah, to an extent, because if you look at the load that it would put on the city itself or on any city, really, we just happen to have a very high concentration of people that do systems engineering or development or, you know, we’re not heavy on the building, the physical things or service things or, you know, things that other areas might be able to absorb more. So, I mean, that’s one thing.

The other side of it is they’re looking at a, how should we change our policy for education? What does it mean? Do I need kids to use chat GPT? They actually have a different one, not that one that they can, if they’re allowed to use, even though the kids have their phone next to them, they can go look up the other answer. And then you’ve got the, well, we’re taking critical thinking out or whatever. And I’m like, well, actually you kind of move the critical thinking out of which model do I use?

You know, so they’re trying to figure that out.

Do we want to attract companies here from an AI perspective?

If that’s so, how do you do that?

What does that even mean?

You know, what does local mean in the AI marketplace?

I mean, okay.

That’s kind of what I was thinking for them going, where do they try to protect? Yeah. And you’re saying they’re trying to protect a working force that’s paying the taxes. Yeah. That’s really what they’re trying to do in a way. And also they’re trying to figure out from a, can they, and they’re doing the same thing other places are doing. Hey, how can we reduce our cost using other AI stuff?

The latest thing they’ve got, I think they got the contract done.

There’s a company out of Birmingham that provides a way to put a camera on a truck and then figure out whether there were code violations as this thing drives down the street. Not hard. All right. We could build something like that in about a day.

And then they figured out that, wait a minute, we have a couple of trucks that drive every street of the city. They’re garbage trucks. Every week they go down the street, you know what I mean? So that’s what they’re looking at doing. So you’ve got, and again, back to the, we may not replace the people, but the job the person is doing might change because they’ve got, I think six to eight code inspectors in the city whose job it is to drive around and notice all of these things and write up reports and say, did they fix the thing that they said they were going to fix?

And well, that’s all probably about to be automated with maybe one person.

I mean, it’s how things are moving, especially things that are easy.

Something I’d love if we had a conversation sometime, but mentioning critical thinking and education and AI, it always, it sounds funny to me because like we just said, to use AI, you need critical thinking skills to get the best results out of it. So it seems more like critical thinking is more important than you need to encourage that along with AI. Yeah.

Or if you’re interested in getting the right results. Right.

That’s the- In schools it’s usually more free thinking. It was very bold. I don’t know. Well, one of the things they actually showed at the AI 101 was basically I’m going to type in an outline or whatever, and I’m going to generate an email and boom, now I have a professional-sounding email. What they didn’t cover is me on the receiving end taking that email, using AI to summarize it and get to the same freaking bulletin points they put in to start with. So you could have just shook the bullet points. We don’t even need the AI. The flower you speak with is interesting to me. So it’s interesting, et cetera. So with that, I think that’s- Oh, other thing we got Tom Plunkett coming, trying to remember what day I’ve got him lined up for. He has made it through a lot of the Nvidia certification stuff, but all their exams, all that kind of stuff. And he’s going to come talk about what that experience was like, what kind of stuff, because that’s a fairly decent certificate to go walk in somewhere with. So that’ll be good. And Tom’s fun.

He’s got a data science slash attorney slash author slash, you know what I mean?

He’s done all the things. Yeah. So with that, we will hop into the main part of the presentation.

We’ve been talking about agent kind of things pretty much this year, and we’ll probably keep talking about agent kind of things for most of the year. Hopefully we’ve moved past the point of like the year before where everything was a rag, and the year before everything was language model, and this year everything’s an agent. Go figure. I’m not sure what next year’s going to be, but- Real lots of agents.

Yeah. So one of the things we did a couple of weeks ago, a month ago or so, was using open hands Delve to kind of hook up something like an agent in our GitHub stack to where I could attach it to a- I could basically mention it in a GitHub issue of, hey, agent, can you go do this, this? And it would pick it up and go try to do things. It would create things, create a branch, create code changes, possibly act as an extra person to do a peer review. I’m actually trying to figure out if I can get that at work. So some of the easy things like, hey, does this meet the Java coding standards, blah, blah, blah, blah. It’s really good at that.

That kind of thing where I can take the load off of people that I don’t really need a person to do that if I’ve got that. I’d rather have a person looking to see, does this build the thing the requirement says it’s supposed to build?

But that thing was more like you hit save or whatever and it goes off and it does its thing and then 15 minutes later or whenever it was, it’ll come back.

So it’s pretty good from an asynchronous standpoint, but it was still kind of funky to try to work with. And then I think it was Josh that mentioned, hey, have you seen client? And I’m like, well, no, but it sounds fun. So we went and looked at that. About the same time I came across the open router thing, I screwed up the name of it for a couple of times until I figured out, I got on and created an account, did all the things. So going from zero to having client installed through open router, picked a model and went and did a couple of things, took about 15 minutes and it’s pretty interesting to play with.

So we’ll do a little bit of just walk through kind of what it is.

By the way, everything on here was done at about 15 seconds earlier today, my client itself.

And we looked at, we’ll look at that, they call them tasks, what do you want me to do?

And so I told it, hey, look in this directory, I need a presentation, name it this, it’s about client and open router, follow the same template as the other presentations in the folder.

And this is what it came up with, which is pretty close to what I normally would put up here.

So yeah, and it is not worded the way I word things. So it is not using the same kind of voice I would normally use. It’s pretty, I can look at this data, right?

The most interesting thing about Klein is that is a, initially it started off as a command line utility, CLI, I’m not sure what the NE stood for, but you can actually load it up and use it for bash.

It’s kind of weird that way.

I think you set up your API token and other stuff and you just started asking it to do stuff. It’s kind of interesting. And then they got it in a plugin for VS code, which I use VS code all the time. So it’s fine with me. And it’s pretty interesting.

It feels like it is, I would almost say it’s similar to pair programming and the way and the interactiveness that it works, which is kind of cool.

It can generate code, it can generate text files, depending on which model you use, you can do image stuff, you can do all kinds of live.

I think you can recognize images. I don’t know that we’re in the generating images part of that, but if you’ve got screenshots of stuff, or if you had a design diagram or something, it’s good at that.

Refactoring code, explaining code, like it says, and much more, exclamation point.

That’s how you know I didn’t write it.

So seamless integration of VS code, and that’s pretty spot on.

I wish they could change their layout slightly, but that may just be that I don’t quite know what I’m doing yet, because I’ve got a couple of hours in on this so far.

Some of the things it can do are pretty powerful and interesting.

Easy to use interface.

Yeah, I mean it’s getting from zero to where I’m actually using it and doing something was super easy to do.

Their documentation is nearly high level enough that it’s geared towards non-coders, as in it walks you through, hey, here’s a sample tech stack you can pick to build an app.

It’s got a this kind of thing, JS or reactor or something on the front end.

It’s used this data, but whatever, and you just pull in this kind of a thing to tell it what you want to build, what your tech stack is, and then you tell it what kind of app you want.

It goes and doesn’t.

So that was good for people that don’t already have a, for me, I’ve already got a project I want to use it with. How do I do that?

So started playing with that. There’s a couple of things that aren’t here that are fairly new. As far as the workflow goes, I think that was in one of the more recent versions.

They’ve got the three different things.

Hey Phil, come on in.

So there are three different things that you can look at that I don’t have implemented here. If you have a client ignore folder, like a .clientignore similar to a gitignore, if you’ve got stuff that you don’t want it to look at in your project, in other words, whatever it’s going to look at, it may pick up and then shove over to this LLM and cost you money because those are tokens. So if you’ve got stuff you want to keep out, you can do that. There’s also a way to control the context of what parts of things you want it to look at. I haven’t got too far into know how to do that yet.

Right now, it’s just, I haven’t looking at everything that’s in my repo because I don’t have giant repos yet.

I’m still looking for something. I haven’t pulled an Eclipse repo that’s got a couple hundred thousand lines of code or something like that to point it out and see what happens.

But that’d be fun.

The other thing, the two other things, one of them is you can give it a set of rules, which are, they implement that in markdown files. Let me keep talking for a minute and then we’ll go look at their website and see what that looks like. So they implement the rules in markdown files as far as use this kind of, like if I wanted this to be a highly technical report, you know what I mean?

If I’ve got something I want it to always do, I could have made a rule and do it that way.

If you need to follow the MLA standard, you could probably put that in a rule, which pretty much winds up, I almost see it as an extension of a prompt kind of a thing. And then we were talking earlier before it started about does their model, is it set to try to align to that specific kind of a format? Because I don’t know if you’ve ever argued with an LLM to try to get it to do what you want and it just ignores you. If they’ve trained it to kind of align to follow that, I could see that being a little more effective than me trying to come up with a magic prompt that makes it do what I want it to instead of what it wants to do. And then the third, kind of the third leg of that stool is a workflow.

You can actually, if you’ve got tasks or things that you find that you’ve been doing over and over again in this, you can actually put that list of kind of whatever you’ve been prompting in, you can actually drop that into a markdown file in a workflow directory.

And if you want to run it, you just slash name of the file and it will pick that up and just go do whatever all that stuff is.

So if you wanted to, I’m trying to think of some of the stuff that you could do. We didn’t cover MCP because I didn’t get too far into that, but it can also do things like if you’ve got GitHub, install it on command line where you can just do get gh and tell it to go create a merge request or go look at this, go look at pull requests, I guess, for GitHub or issue or something. You could actually, I could see a workflow like in a team environment where if I’m going to work a particular issue, for us, the first thing we do is I’ve got the issue, I pull the name of the issue and some text out of there.

I create my branch name with the same issue number and the branch name.

If it’s a feature, I do it this way.

If it’s a defect, I name it this way.

Then I do some initial stuff and I automatically go create the pull request because for us, we do the pull request when we create the branch. So you can actually kind of, teams can kind of see what other teams are doing and review kind of as we go. That kind of thing, I could take the stuff out of the issue, go ahead and drop it in the pull request as far as that goes.

I could probably figure out what kind of approach I should use to test this change I’m about to make based on the issue. I might be able to go ahead and generate a unit test that fails using a test first approach kind of thing. We all like that thought, but when it comes to actually doing it, it takes time.

I could see dropping that in the workflow where I just say, hey, this workflow, this issue, go.

And 30 seconds later, I would have a branch, a merge request with information, things like that.

So that’s something that I’m still trying to get into and see how that plays out.

To show you, let me see if I can actually find, here’s my email sent out.

I think I’ve got this for docs.

My deficit was blah, blah, blah, somewhere down here. So this is basically an example of a rule you might have for a project. Maybe your directory structure layout.

If you’ve got documentation, put it in this folder.

If you’ve got tests, put them in this folder. If you’ve got, here’s a template that I want you to follow for other kinds of documentation.

Code style, patterns, test standards, stuff like that.

One place I’ve seen this is a tech stack kind of approaches or things that, like if you have chosen like day job, we use a Google test for our C++ framework stuff.

So if I wanted to say, hey, if I want a unit test, I don’t want to have to explain every time I want to write a unit test, I would use this framework.

So that would go in a role. So that kind of consistency kind of thing. I’m a little interested to see how this plays at more of a team or an enterprise kind of a level. Do you build the rules first or do you just see how things are going and identify what the rules ought to be after you’ve already been in it for some period of time? That would be kind of neat. So for them, you take your rules, you write a .client rules folder, and you drop your stuff in there.

And there’s thing.

So yeah, they got a little complicated. I don’t know that I, here’s a rules bank. Okay, great.

I don’t know the same thing. Okay, great. I don’t know.

This sounds like a solution in search for problems.

Yeah, I think they’re trying to make it work. It gives me vibes of all the vibe coder that’s just jamming their thing over and over again, trying to make it work. And the rules I have found that these don’t always work because you’re tacking this thing on that’s super, super important. Then a hundred thousand lines of code inside of your code base.

And it’s arguing between those.

You really need a better validation mechanism than this.

But I mean, it’s nice, especially if you’re starting out small tasks, you’re not doing the whole code base, I think you can do. But yeah, you know, it needs something a little bit foreign.

Folder approach, useExchange, blah, blah, blah. And these are new enough that I think this was like in the last month. As far as what the 3.17 version or whatever, there’s a .17 version included in the workspace, the workflow code.

Yeah, 16.

It was like last weekend.

It was recent. So I was a little upset at first. I read this like, this is great. Let me go find some examples.

And I can’t find any examples. I go back, oh, this happened Friday.

I don’t like to be that far on the front end of things.

Like for other people to go find things.

And I just kind of report on what’s cool and what happened.

You’re looking for a bullet proof result?

Oh, no, no, no.

I’m looking for a silver bullet proof result.

And then the workflows part, they kind of walk you through some of those.

And again, this is so new, it doesn’t really tell you much. I did find a couple of places where people had gone out and done, you know, tried to build some of their own.

And I was just in places. I love this one.

Because I’ve seen this one.

Do not be lazy. Don’t truncate code. You know, somewhere in some of the system prompts of these, I bet there were some prompting to try to make it be as concise as possible to say all tokens.

And I’ve actually seen things where it’s building things.

I’m like, there should be an end to this function.

Save the tokens.

I climb here by pleasure.

What is this?

Oh, that’s great.

See, don’t do that. Let’s go look at this one.

Are you talking about the current federal budget? No. That’s not very useful. And that’s definitely not any useful. So, hopefully, the community at large comes up with some way to, if they want to do workflows, they want to do rules.

I expect to see, that’s kind of what we were looking at earlier, thinking about maybe some kind of a, I don’t want to say marketplace.

I don’t think you pay for this stuff. But let’s say, I should be able to get to a point, go, okay, I’m using this and an app that’s got a React front end and it’s using maybe a MyGo database or whatnot. Can somebody, does anybody have like a set of rules and a set of workflows or things that they normally use?

80%, whatever.

Can I go pull those?

You know, and okay, now I’ve got something good enough.

I’ve got a new person on my project or something.

It’s at least gets them going. But that’s another thing for Klein that is a little over my head, and I haven’t tried it yet. You can actually pick different models.

They have kind of a plan and an act, kind of an approach.

You can pick a different model for plan than you pick for the act.

And they’ve done some interesting stuff to see what kinds of models are better for you.

And they’ve done some interesting stuff to see what kinds of models are better at planning, what kinds of models, which models are better at acting and things like that. You do need a decent model to do good things, which is not a new concept here. Please don’t go point this at a 3B model. Well, you can, and you will get things.

Maybe you could do, I don’t know.

Right now, the thing, where it really, I’m trying to remember which model it was.

They’ve been working on this for a while. And I think it was, I don’t know, was it the Sonnet, one of the Sonnet models? Yeah, Sonnet 3.5.

It just, oh my gosh, this is incredibly better than anything.

Yeah, and after using it, I can actually, I can almost feel, and I can’t imagine their surprise when they went from a, I don’t know what they were using before that, but it got way better, way fast. And so that’s, we’re actually, the one I’m going to use is Sonnet 3.7 from Claude to do both the plan and the act because I’m not quite, I don’t want to say smart up, but I haven’t put the time in to see what the difference is if I care about splitting them up between the plan, you know, circle and the act circle.

It works way well enough for what I’m doing. The other thing we’re going to talk about is OpenRouter. So you install Klein, Visual Studio, you can just go into Visual Studio, go to the extension marketplace, find Klein, install it.

It’s going to ask you for a, which kind of a provider you want for an LLM. And in this case, I really like this thing right now. So OpenRouter, there are a couple little nits and picks that you, you know, if you want. But it lets me from one place switch between a Google Gemma 3 model that I’ve got set up right now, Claude and Anthropic Claude, sign it.

I think I’m on 3.7, you can go back to 3.5. Use Grok, use Grok, you know, without having to have a different account with Anthropic and then a different account with Google and then a different account with whoever Grok is. And then, you know, open AI, hey, here’s all the open AI. And it’ll also give you a, and I’m still not quite sure how some of this is set up.

There are some free models on here that you can use for free.

And they’re not, they’re not just cheap small models.

Some of these are, you know, some of the larger models you want, but latency might be the best. It might be a little longer. But if you’re working on something and if you’re on a, you’d have to be on a super, super shoestring budget because I’m putting $5 into this thing and I’m still, I don’t know what I’ve got left, but… Well, you’re probably not using Claude for Opus either. I can burn money.

Let me see.

I noticed that.

Yeah. Oh!

Oh, it’ll get those tokens out real fast.

All right.

Loves to think. So… How am I going to spend all this money? Let’s see.

Let’s see.

So even with the… 3.7 is still going to be a little bit pricey probably.

Yeah, 3.7 is the one I was using at $3 per million tokens. And then I said, forget that. Let’s go with Gemini.

I was going to say Gemini will give you some free time for a while.

Gemini Flash, the 05 version is so good.

I think it’s as good as Sonic 3.7. It’s like nothing.

Gemini 2.5?

Yeah, the Flash, the most recent version.

They fixed the tool call. Oh, I need the… What am I looking for?

Gemini 3.

I want the 27.

It’s 05.20.

There it is, free.

There’s a free one.

15 cents a million. Geez. Yeah. Yeah, so here we’re at 10 cents.

So yeah, but even this one is pretty good.

So I think I spent… So let me go ahead and hop into… Let’s see. Let’s go look at credits. I have $3.08 worth of credits, whatever that means.

Let’s bring $2.

And I swear, I feel like a kid at the fair. We want to ride stuff and they just give you tickets. And this ride is eight tickets. And I’m like, I don’t even know what that means anymore.

That’s where we are as a community. So yeah. How much did it go?

I don’t know.

It was some credits. That’s a token. So I’m still waiting until I can bid my next project in credits. I will give a software engineering director a BOE based on credits.

Well, I really like the use crypto option.

We just want a free Bitcoin contract.

How many Doge does it take to buy one credit?

All right.

So initially we will look and see what kind of interaction this is.

I’m going to close.

So this was me trying to build… So the entire presentation is going to be about Bitcoin.

So the entire presentation that we just walked through cost a penny and a half for it to go build or whatnot.

So starting off, let me see if I can… Just trying to see if it will show me what the actual… Yeah. So I basically asked to create a markdown file, 25 directory presentation. I’m doing a client and open router, same template, blah, blah, blah.

Here’s the file name, outline, should include a basic overview, open router, blah, blah, blah. Oh, I didn’t check to see if it’s got a section at the end for questions.

We’ll see if it actually followed through. And then it went and it did all the things. Let me close that back out. And so what’s really interesting… And then I had to go say, no, I really need an image at the top that’s got the same logo we always use. It wasn’t good enough to go find that that was in all of the other ones we had done and go, hey, do that. I would have given an example to another AI and tell it to write the prompt for you.

It just would have been easier. To write the prompt for the… Yeah. So it would have said, here’s your icon, here’s the error. Well, if I had a rules file that said all presentations will have the structure, will have this image… I don’t know why I didn’t think of that.

That’s a perfect example.

Again, it’s kind of… I know.

I call it the POSIX approach.

There’s an interview with Linus Torvalds a long time ago where there was a lot of think going on as far as what kind of… What the definition of something should be.

And he said he worked backwards.

He found the stuff that was already working and he just wrote down what it was. You already had an implementation of it. So you didn’t have to go building it. But it was a common already instead of trying to make a new thing and then make it common. Which nothing’s ever common by itself. So I think that’s more along the grassroots approach of use the thing a couple of times. Then you figure out that, oh, okay, so let me… I keep telling it to do this. Let me make it a rule. And there’s… I did figure out it says in the documentation, it’s like, oh, there’s a rules tab where you can put the rules in. And I’m up here looking for, okay, this is a server history account settings, which settings are basically open router key.

What model do you want to use?

You can do some other stuff in here as well. What I did realize is tabs also down here. Where you can make rules and you can make workflows and it’ll go put them into your folder structure in case you don’t know how to do folders kind of thing. So that is using it to build a presentation. That’s great.

Let me flip over to something else. I was working… transcribe. Where’s the code base for the transcription engine that we run?

And I will see if it loads up something interesting.

What can I do for you?

So one of the first things I did with this, I was using a different version of CUDA.

I had updated to CUDA 12, something like that. And for some reason, I had some stupid library in here that was using CUDA 8 or CUDA 9. I was on one and there was something using the other. Like, can you please figure this out?

It ripped through every dependency I had and then found some kind of a… Let’s see.

Oh, that’s way after I actually had it fix the problem. Faster Whisper package by this. Somewhere in here, it actually had gone out… I’m trying to find it. It found Faster Whisper. Yeah, it found it. There was somewhere… Maybe not. Yeah, I’ll do that.

Okay. I don’t know if it was this one or another.

It had actually reached out and found a place and the Faster Whisper had an issue on it for where this thing was, where it was talking about going from CSIS, I can’t think of the name, library that changed over.

And so it was like, well, I did this and this.

And based on this issue on the repo for Faster Whisper, there’s a conversation on this.

So this might be a pretty spot on the money for exactly what I would have had to do personally to go do it. And it did cost me 45 cents, but this is when I’m still using the CLOG 3.7 expensive one. So there was that. And we’ll ask you to do something in a second because we’re getting a little short on time. But let me close that one.

This one was more interesting. So let me walk you through some of this repo.

Everything’s Dockerized.

I’ve got the main transcription piece. Actually, where is that Dockerflutter? Oh, there it is.

So the main transcription piece, I already have a full set of unit tests that check.

I mean, this thing does database interaction and it does interaction with Stripe.

It does interaction with SES. It does things where it’s pulling from an S3 bucket.

And then it’s running things through a model.

So I’ve got unit tests for all of that.

Pretty straightforward because I write all the, you know, it’s class-based.

It might be the right word.

Not hard to test. On the other hand, I have this thing, the user interface, the back end part of that, the front end, front, back, whatever, is in React.

But I’m still using Flask just to serve things up. Flask is notoriously kind of funky to try to unit test because there’s this other library it uses to say, hey, when you hit this URL, call this function and pass this info over and do these things.

And it’s got a session behind it, you know, all that kind of stuff.

So last night, I sat down and said, can you build me a unit test for this? Put it on this branch.

You know, do this thing. I think I’m still on that branch. Let me get a Flask test.

So what I want to do just to kind of show you what this is like. Yeah, whatever. What am I trying to do?

Go back to dev. I’m going to switch this back to if I find my little icon thing again. Go back to settings. Go back to. Anthropic.

Claude.

3.7.

Sonic.

Why not?

Let’s go big.

Let’s save. So I’m pretty much going to go find this.

I’m wondering, can I rerun the same thing?

It’ll destroy your history. That’s fine. You probably can’t if you go. So I changed my branch back to dev so it doesn’t have any of that stuff in here. Let’s see. History.

I don’t see a way to. Otherwise, I’m just going to start doing the same.

Basically, I’m going to do the same thing. You can copy the task if you go to the top. I’ll pop that back out again. This? Yeah, you see the copy button?

Yeah, this guy. Copy task. OK. Just to give you an idea what this feels, what the feel is here.

So let me close on this out.

So what we got is this big one giant file that is the, you know, the server part.

So I’m just going to tell it, hey, create a set of unit tests using PyTest for the flash gap in this directory.

Create a branch name, blah, blah, blah.

Featured to actually, I already have that branch.

It would fail.

That’s true. I say go. You’re going to see it go off and make an API request where it’s actually packaging up all of its stuff. Here’s where it’s supposedly thinking about things and, you know, it’s OK. It uses Amazon SES, which I do, and it uses, I use Stripe in the back. Let me actually explain. That’s good. So it’s going to go start writing some things using PyTest. Similar to if I hadn’t, and this is one of our normal conversation points, is it’s a lot easier to use the AI and the models when you actually have some understanding of what it is they’re giving you back. So I’ve done this before, so I kind of know that, hey, that’s a test config. It is. It can do things.

And it says, OK, it wants to create a new file. Here’s the file that it wants to create.

I can save it.

I can reject it. You know, things like that.

I’m going to go ahead and save it.

So that’s the config. I’ve got to go with the Friantra’s name over there so we can see your dollars ticking up.

Unimportant. Totally important. We need to see how much you’re spending.

I would.

I don’t know how to move that.

I’ll say it should be right behind that, I think.

It might disconnect when you’re in it. Yeah. Show me you can, like, slide it over.

Isn’t that where the dollars tick up in that corner?

Yeah.

I can. Don’t want you to run out of credits. You can flip what? No, exactly.

So that isn’t here.

Oh, interesting.

So it’s 0.17.

So $0.17 so far.

So it’s got this test app right now.

If I save that.

And so now it’s going to go write one to go test the routes. And the interesting thing is that in the test code I wrote, I’ve never come across the patch part where it can take an object or an instance then overwrite certain things.

I just want to patch these couple places.

I wish I had felt that. It would have made my other tests a lot better. So it’s still rocking and rolling. We’ll run it after we get it done. Okay. So, Josh, well, that’s going on the client rules, right? Where you got rule or the big bank of rules or whatever they’re calling that. Something like this.

Put in testing rules, right?

Strategies, things like that, and a theory that should do better.

Yeah, I can.

I usually, for mine, I put, like, do not mock anything. I don’t want any patching. You can’t think of a way to do it with this setup and don’t do it. Especially Sonic. Sonic loves to mock crap. We’re testing nothing, essentially. But yeah, you can do that. The client is a little bit better than Cursor because they actually passed in the context.

So do you have a pretty robust client rule?

Not really.

I have.

I’ve had multiple subdirectories and all the special things.

So apply here, apply there.

It’s better to just do it intentionally. I want this rule and this rule. You’ve got to have the intentionality of the thing.

For now. So the less rules, the more intentional?

Well, you’ve got to think about what if your two rules don’t, they clash? You know, stuff like that.

It’s not going to be able to figure that out. It’s going to do weird stuff. Just make all your rules spying. Chaos. It’s now building the README to put into the test folder to explain how to run the tests. Man, that one cost you 16 cents. The README. I’m in 51 cents now, by the way.

New file for, okay, make it a package.

Sure.

Yeah.

When I finish, which I’ll run it, but in case it updated my requirements in file and add the stuff I’m going to need. Oh, this is different. Last night, it did not include PyTest-flask. Yeah, that’s what I was thinking. You said it needed Flask. There’s a package for it. Yeah. It found it. It has all the pictures. This time.

Yeah. It didn’t know about it last night. I learned a little bit over an hour. I thought I could do better.

Who says you’re not training it on user data?

The reason the cost is so high, you can see what the tokens are.

It’s doing a cache, which basically has the embeddings off on the server, which decreases the cost, but still.

646,000 tokens already.

Here’s my test that it went back and updated after it went through.

That was kind of interesting. It created a config, a COPS test, whatever this file is, to start with, because it knows it needs one to go build the PyTest stuff. And then it actually goes and does all the actual tests that goes back to the thing it already created to go add more stuff to it. I’m all kind of wondering, would it cost me less if it would have just waited?

So seeing how different this is, I’m going to be interested to see how much of it actually drops. Because last night, I think I hit like out of the box, it was like 60% test coverage of my code.

I think I had like eight tests that failed. And I can show you what actually I won’t show you because this is recorded in live.

I have a .env file that has all of my actual values for tables, certain keys, other kinds that it doesn’t have access to because they’re not actually in this repo.

Because I know you can get ignore them.

I don’t actually trust myself enough to put things in even the same directory structure as my repo.

So I go somewhere else, you know, and do that.

It tried to commit it.

Yeah. I told it to create this branch.

Okay. I told it what branch name and put it here.

That’s cool.

And it did. And it’s like, okay, what else?

So I’m at a dollar and seven cents for that. Activate. Let’s go see what happens.

The other thing I need to do.

Test.

I have a test environment here.

Docker Flask.

Nope. Where was that?

Flask.

Not sure this had issues with getting the right packages installed. So out of the box, there are more tests now than were yesterday, too. So the interesting thing that I did yesterday, a lot of the things that are failing that I found were pretty quick fixes because it tried to guess what I had as my authentication key.

It’s in my environment file. It didn’t get it right. So I put the right stuff in or pull it from the environment. Now I was able to clear a lot of those pretty quick. But just looking at the… So I’ve got one test for Socket.

IO, one test for routes, one test for the app part config, and a full up readme that actually walks you through.

Here’s the test we did.

Here’s what they test.

Here’s how to install PyTest Flask.

Here’s how to… And this is right.

Side question.

Yeah. If you were to run Opus 4, would some of that fail?

Would it go away?

Is it smarter?

I don’t think you can find the things that are hidden in my environment file.

So it doesn’t even know.

That might be a good test.

If it could guess the secret things I have in a different file.

I need better guessing.

I need better secret things. I think in general, 4 would be more likely to not write the test.

It knows it can’t write.

That would be the difference. Generally, the 4 ones… I was talking to somebody today. They’ve been trying to train Claw to ask follow up questions whenever it doesn’t have information. And 4 is another generation past it.

So it’s a bit better.

Yeah. Some of the rules that I’ve seen actually have statements in there.

If you need additional information, ask.

Things like that.

Please do not code.

Tell me what you’re going to do first. That’s one of my standard go-tos for Claw prompting for 3.7 and for 4.

Don’t start on this yet.

Present your plan.

Ask me for clarification.

That’s what’s really great about the plan mode for this one, too.

When I’m in Cursor, I do the same thing.

They take away the edit tool. They can’t edit things. It can go call things. It can call servers. It can search. It can do rag. It can do whatever. You just can’t edit the file. It tries to, and it can’t. And then it gives up and keeps doing its job.

Which is good.

So plan and act at the same cost. You’re still paying extra. Oh, absolutely.

Yeah.

But you’re just getting the plan figured out before you’re letting it go wild. Right. It’s not allowed. It can’t. Just five code yellow.

Well, I think the settings in there, you can block a lot of the automation, too.

Right. On the client settings. Yeah. Sure. So at some point, we’re going to have an AI model that five codes off of another.

It’s going to be fun. But just looking through, OK, I’m at a dollar and seven cents. I mean, this is not a small amount of work. So we were sitting talking before this.

The set of unit tests I’ve got for the other pieces I had to write by hand.

If I had to do this by hand, probably three or four hours.

Just going and figuring out what are the external functions I need to call?

How many do I need to do?

What kind of boundary testing do I need on this? Are you actually running unit tests or are you just checking for code coverage? I’m actually running unit tests. But yeah, I don’t have a requirement to be a hundred percent.

That might actually be where this might. If you had some crazy requirement because you’re working to go from a contract that says you have to be X number or whatever, this might actually be able to figure out different ways to cover that. You’ll never meet this one.

Don’t test that. Go to this one. It might, but some of the things that we pass on are due to the amount of work it would take to write the test framework to test this one little thing.

But if I’ve got a thing here that could just go whack it out pretty quick and it’s like, I’ll find a way to test this thing. You know, it may need to call this function, this thing, this thing, this and then get it in this state so that now I can actually insert this and cause an exception or something.

That might be where, you know, there’s been a lot of places where we don’t write the test for that because it’s so much effort to get the object in a particular state where it’s got these other kind of properties where I can get it into that exception case.

If you needed, oh, that might be interesting as well.

Some of the places I’ve seen useful things for this would be if I’ve got an algorithm that is doing some kind of calculational things and you may have certain places in this particular algorithm that you need to check, maybe the center point, maybe the edges, maybe very small numbers, maybe very big numbers.

I mean, I’ve seen some interesting things blow up that you wouldn’t expect, blow up in a digital sense.

I can see this being able to kind of figure out where some of those might be that we don’t normally just jump to that conclusion.

But overall, as a side project where I am the sole developer, to trade four hours of my life for a dollar and seven cents is not a hard call.

I can probably give it the additional information that’s in my ENV file. I go with this new info, now go update with these actual values for my test environment file.

And it may clear a lot of these.

I have to circle back on that one.

So with that said, I want to go check one more thing and see if this thing actually updates in real time.

Well, I open the floor for questions.

It is there.

Yeah. So I promise not to vibe present the whole thing for the rest of the year, but it is kind of hard to pass up on a, well, hey, can you make your presentation? Okay, here, 15 seconds later, post it. But it doesn’t really require a lot of thought. But again, back to the other thing, if you don’t actually have to work in to know whether what it’s telling you is right or not, you know, it’s still going to get you very far off some of that. Comments, cries of heresy. This is pretty much it. Oh, and it’s open source. If you want to go look at the code for client, it’s on, you know.

You want to talk about the, it does reach back a lot. There are things you can turn off. I don’t know if I’ve got that on here. Yeah, I think you can disable it off in settings, but there is lots of telemetry stuff for little tiny actions. Yeah. So, if you’re using this on something private or whatever, you’re going to want to turn off the anonymous reporting on things. Or put it in a place where it can’t get out. Oh, which is where I live. So, there’s that.

But this is most likely, especially flipping over to the Gemma 3 model.

That’s 10 cents per million tokens instead of $3.

And I haven’t, I haven’t tried any of the harder stuff like we just did with Gemma 3. So, well, my player has some of that. It’s not going to go so well. See what happens. But then again, back to open router, being able just to say, okay, which one do you want to try for that thing?

Okay, now we’re on this one.

Now we’re on, you know, I don’t know if it drops that into the task. You know, I mean, I don’t know if it’s, when I run a task, I don’t know if it’s got the info for what model I was using when it ran the task.

That might be a… Josh, you’re saying that Gemini is the best cheap one?

Huh?

The Gemini Flash.

Oh, Gemini 0520 is king for value, like by a lot.

So, that would be the one that if you’re going to VibeCo something in here, use that one.

Yeah, if that was my money.

Yeah. Somebody else’s money. Right. Export, I haven’t tried export yet.

That might be interesting.

It can also be good to do Claude to plan.

Yeah. And then do that to Vibe, you know. So, plan is, and each one’s set to its own, like plan is always Claude and act is always… You can do that. You can set it manually as like you need to, you know, anytime you think about like, you know, long range planning, you know, PRDs, different stuff like that, doing specs and stuff like that, then it could be good for that.

Have you seen this?

So, this whole piece up here, you can actually highlight over it.

Oh, yeah.

Yeah, I think it’s cool.

What? I didn’t know that.

Yeah.

So, terminal command, yellow is a read a file.

So, this response, blue is going to be a new file.

Yeah, and it’s showing you the raw function call, so it usually doesn’t.

Right.

That’s what a function call is, that little JSON object.

So, super fun stuff.

Try to get anybody on… Let me check the chat over here, see if we’ve got any client and tools like it kind of have most of the tokens cached.

Question from Jack. Actually, that might be it if I just do it over here. Client is very good about it, but obviously, the provider has to have caching enabled. So, that’s like mainly your big three that do it.

But I mean, client is really good because you see how much context it leaves in there.

You know, a cursor rule will compress it and summarize it and do stuff like that that makes it start losing degradation.

Yeah. I like the clarity of what I’m spending and where. Like you were talking about earlier, maybe I just use Sonnet for Plan and then something else that’s cheaper for Act. I wouldn’t even know that if I didn’t have the granularity of what… Especially if you’re going to take something like this and push it out to an enterprise layer of teams and you’ve got people doing all kinds of things.

Yeah. Oh, the other thing I like about OpenRouter is I don’t have to give it a budget because it’s just got what I gave it.

You know, it’s got $5 to spend.

There’s your budget. I don’t have to worry about my card. They do have a way that you can set it up to automatically top off or something, which was… Right.

Or top up.

I guess that’s a different way to say it, but… You haven’t hit the zero before yet.

I haven’t hit it yet. We’ll see what happens. So one of the things that’s nice about the ongoing cost is like a tool like Replit and CEO is just commenting about this yesterday. Like if you use the Replit agent, it’s 25 cents each time it logs a checkpoint. And that checkpoint might be like, hey, change this color. 25 cents.

Hey, move this over here.

25 cents. Whereas they also have an assistant, which is like still AI, but it’s only a nickel apiece. So there’s this like, okay, well, I’m going to need an agent for this. I’m going to use assistant for that. And he’s talking about going to a model like this. So like, let’s just charge for the complexity of the task.

And that makes sense because then there’s transparency and you can start learning like, do I want to manually do this?

Do I want to have the automation do this?

You start learning how much things cost. You’re like, oh, I might just do this myself or I can run a command for a penny. You go for it. And I posted on LinkedIn when I reshared the event thing.

The concept I’m still kind of working through here is initially way back in the day, we started off this MBSC thing.

We’re going to put everything in a model and it’s going to generate all over again.

That sucked. No, I agree. But we finally settled on a point where I put enough stuff in. I generate the code.

I didn’t want to write the first place.

I don’t write file storage and read from file, etc. And getters and, you know, copy this option, you know, or, you know what I mean?

Let’s put our effort into the behavior parts of things or the structural parts.

And I’m at a point now where a lot of these code agents, things like that, I’d like them to do things that I really didn’t want to do.

Writing unit tests isn’t fun. Not a lot of fun. Unless you’re writing unit tests for somebody else’s code. That’s when it gets a little fun.

You can try to find the one way that they didn’t protect that object or something.

So next thoughts on next up.

We’re looking at thinking about pedantic agents. Those are kind of interesting.

I can’t remember which it was on TLDR today.

On somebody else.

Mistral now has an agent’s API.

Oh, yeah.

Why?

You know, it’s like, so I’m sure we’ll have to go see what Mistral thinks agent API should be versus pedantic versus open AI versus, you know, the other ones.

That one.

Not sure whether to care or whether to, you know, I normally wait and see if it’s just a flash or if anything catches.

So again, Tom coming to talk.

I think you had mentioned. Yeah. Well, the question. On.

Hold on.

Props for mid journey, that kind of thing.

Josh. I was wondering if you’d be interested. I had a couple of people talk about it. Oh yeah. That’d be fun. We haven’t done an image one for like a little while.

If you want to partner up on it. Cause I know you’re do so much to that.

Yeah, sure.

I’m not sure what I picked for that. It’ll be early July. And now that we did something else at work and I found out that you can just like drop $50 at Yellow Hammer and they will like cordon off parts of that thing. So the last time we did social, we had some kind of a, I’m gonna say a school mom group move in in our territory and start running something.

And I mean, they arranged the tables to keep people out. And I was not about to go up against a bunch of school moms. I knew that battle would be lost as soon as I tried. So we wound up at the big long table.

You better stand up for your, you better stand up, Joy. I’m too smart for that, man. I’m not standing up for Madison Mott. Ain’t happening. Hey, they do it with small children. They’re elusing you. Oh yeah. They know they’re doing it. So that’s kind of the thoughts coming up. If you, I think I’ve got a tentative schedule through the summer and into August. So things are looking pretty much, not all agents, we should just get it right. I think we can do at least every other thing we can stop with agents, but they are kind of where our movement is right now.

Is there anybody in here that’s an Arduino with us?

I was a few years ago, but it’s been so long.

Is there something specific?

Let’s close this out first and then we’ll do that. I gotta hit stop report. Actually, first off, thanks for coming. Those that are new, if you don’t mind, or you found us through this somehow, say you’re on the mailing list or have got forwarded it to you.

So anyway, thanks for coming. Let me stop recording first.