Agentic AI

AI Agents with HuggingFace

Transcription provided by Huntsville AI Transcribe

So what we wound up doing was finding a data set called DIBDS.

It’s a bunch of microscopy data for bacteria that’s in 20 different categories with, I don’t have any subcategories they had. Some of them, like staff, has like like eight subcategories and stuff. So we actually used a model from Microsoft that was their BioMed clip.

Stand by, let me see if this, okay. It was a BioMed clip model off of Hugging Face. We tried to use ChatGPT to get us some code to fine-tune it. First off, we tried to use it to classify these things and it sucked really bad. It was like zero percent accurate. And so we tried to find some code to fine-tune it.

We tried to run it. It didn’t work like at all. And then we had Ben come in and pretty much, how would you say what you did? You rip off the head and then try to do it? Yeah, just rip off the head, just inventing, so then trying to do head on it. Yeah. So it was a head gasket replacement on our model. And we were getting some of the, some of the, some of the classifications, we were getting 98 plus, you know, confidence score. Some of the other ones were around 56 to 60 if you were, I think you ran it like four epics. And we had 630 files total to train on.

So not a lot, but it worked and we did some things. It’s probably still live somewhere. Oh yeah, it’s still up. Yes, this was our, this was our submission, along with a little QR code so the judges could take the picture and look at this on their phone as well. That was kind of neat. So you can pick one of these, like whatever this thing is. I can’t, the other hard thing is entering any kind of challenge where you can’t pronounce the words. So this is a, whatever this is, Coli3, you submit, it comes back and it says, yes, that 96%, it might be some of the others.

So that was fun. The teams that won, I’ve come up with a new approach for next time. If you want to win a challenge, if it’s an AI-based thing and you’re not in an AI-based conference, you can, if you are just trying to figure out how to place or do whatever, go just find the state of the art. You don’t have to build anything.

They don’t know what the state of the art is. So half of this is more educational. We could have done just bio med clip itself. It’s got some really cool demos, really cool things where it’ll say, you know, some of the training that it was built on was like, well, here’s, here’s all of these lung scans and here’s the ones that have nodules on them along with a mask of where the nodule is. You know, I mean, all that exists and has for a couple of years, but they don’t know it. So that might be a thing for next time is, and maybe submit two things.

One of trying to build something like we did and another just to educate on what state of the art for what’s already out there. Some of the things they were, one of the teams that won took a ResNet 50 and trained it on identifying spots on corn that were fungus. Like, okay, okay, cool. It’s a thing. But anyway, that’s, that’s what we did.

It was fun. We got a t-shirt. I think Jeff got a hoodie because he signed up before like February, whatever. So for next year, what we might do is just go ahead and see how many folks we, it was $10 to sign up. So it’s not, not a huge outpouring area outlay, but we might see how many, how many folks we can get into it next year.

So that was fun. Another thing before we get into the agentic thing, has anybody heard of the American Planning Association? They contacted me. There’s one, the North Alabama chair works for the city of Huntsville as a city planner. And they reached out saying, hey, we do this lunch and large thing and we’ve been, we keep getting asked to go learn AI stuff, you know, so can you come talk to us about AI stuff? And I’m like, I can, I don’t know what you do, you know? So we’ve had some back and forth. So did some initial digging to try to figure out what kinds of planning things are used in, you know, what kind of AI things are used for them for city planning and whatnot. And so it came across a couple of things from other kinds of government grants that have been going on and that kind of thing. So we’ve got some material to cover, but I asked, hey, can I bring some other folks from the group if they’re interested in learning about what y’all do? And they’re like, sure.

So if, I’ll get more info out when we get closer.

It’ll be in the middle of the day. Not sure where they were looking at holding it. But if you’re interested in that kind of thing, let me know and we’ll get you on the list to go. Did they say anything about smart cities when they were talking about it? They, they won some, they got some kind of a grant for, I think it was Holmes Avenue, something specific to that. For the last year, there’s been a ton of research into using AI to power smart city infrastructure. So there’s a lot of studies you can find from like Boston, Chicago. So that might help them out some.

Okay.

And then I was trying to go, what else?

There was a, might have been an AI crowd.

There was a competition a while back trying to use reinforcement learning between like buildings and energy use and something like that. We looked at that and it was the, another one where they tried to make you use an AI gel, but it was just broken. Yeah. It was, you can do it, but I can’t use this thing that you’re making us use to, anyway, that was, that was one where you’re kind of excited about it a little bit and then ran into, like you do sometimes with these hackathons, you download the data and figure out, oh, this is not a thing you get.

Well, this isn’t going to work. So there’s that.

So that’s preliminary info on the Hugging Face thing. We’ll hop over and look at some of their contents.

The reason we’re talking about it, Hugging Face came out a while back with this AI agents course. It’s in like four different units. I think they’ve published unit, several pieces of unit two right now.

And some of us are just kind of interested. It is put together in a way that it’s really, really approachable from a, if you don’t know a whole lot about agents, even if you don’t know a whole lot about LLMs or, or what some of the stuff is.

And then it kind of goes a little further into detail and stuff like that.

It’s kind of backwards from the way the fast AI courses we used to look at. Fast AI was, hey, get something, make it useful and do it, and then start peeling the onion and tell you, okay, this layer is doing this. This layer is doing this. So you learn, but, but you’ve already built something. This one is a little more on the, well, let me explain what it does and how to do it and some in-depth things. And then towards the end of unit one, now you’re finally, you know, putting your hands on some stuff. And so it’s, it is a little different, but it’s, it’s, it’s pretty, pretty good.

So all of the stuff I’ve got in this section is more of my links into unit one from, from going through some things.

They did have a pretty good description that an AI agent is a system that leverages an AI model to interact with its environment in order to achieve a user-defined objective, combines reasoning, planning, and the execution of actions to fulfill tasks. So with that, we’ll jump over into pretty much some of their, some of their pieces.

This was actually, this is how it’s laid out. You’ve got this big thing over here on the left where you can jump, jump around, which this has been pretty useful. I’ve gotten into some pieces and actually had to go back and go through and see, hey, what were they talking about here?

What was a specific part?

I did get all the way through unit one and get the little certificate that it wanted me to post on LinkedIn.

Oh, I can probably bring up the quiz from unit one and just take a survey and it’s, each question has four possible answers.

Two out of the four, obviously wrong.

So you got the three. Yeah. Three.

Yeah.

So you got, it’s, it’s like a, you got a 50-50 shot at getting just about every question.

But anyway, it is kind of, at a kind of a beginner level, at least to start with, let me, the first part for introduction to agents is where, you know, just looking at this, it is a really good way to break it down, where you got what it is, how it works.

Of course, they all have LLMs in the back end of it somewhere.

They walk in through, you know, your think, act, observe kind of, kind of an iterative approach.

And then the main part that, that you get into later is tools and actions. So LLMs, they operate on text, text in, text out. They can do other things if you provide them other actions they can take. And so the other actions they can take need to use text as the input to get something back out. So they go through that, get you all excited. And they, they mentioned that you have the certificate you can get, which that is what it looks like. It’s got the hugging face thing on it. So they get through, what is an agent?

Let me go back to my piece.

So they talk about the LLM is a brain, basically being able to understand natural language. So it’s like if you were trying to, like an agent, if it’s going to be talking to the general public, it’s going to need to understand how the general public talks or types depending on what you’re doing. And then the main, one of the main pieces that the models have been kind of, I’m not sure you could do this stuff we’re looking at now, what, two or three years ago.

I’m trying to figure out when the reasoning chain of thought stuff actually started coming in.

Probably about September, August last year. Okay. So before that actually was able, you know, before the things, step-by-step paper and some things like that actually were dropping, you could have built tools, but having an LLM in the back end that could take a problem, break it down into steps, and then figure out, hey, to do this step, I have this other tool that I can use, I can call and get more information like a web search or something like that.

This wouldn’t have even been, you know, I don’t know that this would have been possible. You probably could have prompted it in a certain way to make it do a thing. You had to fine-tune the function calling into them. It wasn’t until like llama 3.1-ish time period that it started becoming common for all to do it.

Right.

So there are like ways to do it, but it always sucked.

I mean, it just depends. Some of it, I think it still sucks for me because I haven’t quite got my head wrapped around some pieces of it, and we’ll get to some of that later. But then the reasoning step where I break the piece down, whatever you’re asking me to do, into steps and then figure out, okay, for this step, I need to call this tool because you’ve told me about the tool and some kind of a massive system prompt, which we’ll get into in a minute, and then observe the reply from the tool or the results from whatever tool you call and figure out, did it work? If so, maybe I keep going.

If it didn’t work, well maybe I go see if there’s another tool that looks like this one.

And I can imagine getting to a point where it’s nearly like walking into my garage and opening my toolbox and going, well, I got three kinds of screwdrivers.

And that’s just the Phillips heads, you know what I mean?

And then I could see it getting to the point where there’s so many little individual no-bespoke tools that do little things.

It could get unwieldy.

So, and then interact with the environment, get the information to figure out what’s going on.

The other, this is probably a good introductory session on its own.

If you were talking to somebody that just was trying to just get a basic understanding of what an LLM is. We don’t have to go through a lot of this, but I mean, they walk through, you know, encoders, decoders, sequence to sequence, they throw the tokenizer playground out there, just trying to get an understanding of, I put this text in, I get these numbers.

What does that mean?

It actually is, actually is a good way to look at it.

Was that actually showing me? This one, it was pretty much a word for word. I thought it would break actually up. Compound word. Do I got a compound word?

I’m running 300.

Put some numbers in there.

I mean, like actual numbers.

Oh, actual numbers.

Yeah, put 9.1 is greater than 9.11.

Put the greater than sign. Is greater than 9.11 strawberries. So it didn’t split running either. 300 miles at 9, the dots. Yeah, fun stuff. But it’s an interesting one. You’re trying to explain tokens.

The text before getting into this was pretty good. They talk about special tokens like end of sentence, end of text, and then how each model under the covers has its own, I guess they can’t have, some of them they share, not quite sure if any of them. It makes me kind of wonder how we got so many different kinds at the same time anyway, but I get it.

So you don’t have to do it, but they talk about, you know, predicting the next token, things like that. And let’s see, beam search was a pretty good, they do a pretty good job talking about that as well, about looking at, okay, I need to predict the next word.

Let me go ahead and predict a few pieces and then figure out, okay, which one of these bigger pieces fit and then move to the next step. And now, okay, generates more stuff and figure out which one of these is the next fit. And then, you know, that kind of a thing. So that was pretty good. Talk about attention and actually not a bad way of thinking about it, where all words are not created equal. You know, in a sentence, some of them are just fillers, other things actually matter. And then figuring out how you can just look at, you know, the words that matter to go there.

Prompting is important, yes.

They go through how they’re trained, you know, how do you use them, how do you run them, how they’re used in the AI agents is kind of the key here. Being able to understand and generate human language, you know, because we’re all interacting with these things, either through text or through voice translated to text or voice direct, I guess, if we’ve got some of these interesting models that are voice in, voice out. So that was that piece. They also did a really good job on the what the tools part are. So if you think about a tool, in the way they talk about it, it needs to be, think of the smallest thing you can make that does a thing and it’s got a clear output and a clear set of attributes or inputs. That’s going to be the best way that you can, that you have to get this model to call your tool.

It needs to be, you know, what kind of information you have in, what you need out.

They also talk about complementing the power of an LLL.

In other words, don’t make a tool that does the same thing the LLL already does. Do the things that it has trouble with or needs help with. Like provide some kind of a web search to go find current data on something, you know, in addition to its training stuff. Yeah, so what’s, yeah, it might, I don’t know if this is right.

If all of them will just out-of-the-box hallucinate what the weather currently is in Paris, they might, or if you got tools that already can do that, it may, it should use your tool first versus, you know, hallucinating. So what they’ve got defined as what a tool should be, it’s got to have a textual description because of course these LLMs, text in, text out, they understand that.

So you remember in, you know, when you’re learning how to program, they keep talking about making header comment blocks and stuff like that, and then some of you probably working on programs that require comment blocks and things like that.

This is why.

This is like having, don’t forget about another developer using your code.

You’re now writing code and an LLM is supposed to be able to rip through this comment block and figure out what it does and how to use it.

So yes, arguments with typings, outputs with typings if you can.

I’m trying to find my cursor.

There it is. So you actually teach the LLM about the tools and the prompt.

It’s a big system prompt thing.

Let me skip down. How do you give tools to an LLM?

So kind of like if you wanted to implement a tool for a calculator.

And this, I still have an open question as to the stuff that you provide as far as the tools.

Do they actually have to be compilable Python code or is the model looking at what the code does and inferring that?

It infers the argument.

So you can give it natural language and it will kind of extract the function calling.

It’s figuring out how to take that into structure.

Yeah, I’ve had good luck with pseudocode.

So when you’re writing functions, I ran out of, we’ll get into the part later, I ran out of tokens on the thing I was using.

So it looks like it needs to be compilable code because it’s all the stuff in here and this course anyway is Python based.

So you’re going to be writing Python functions to do x, y, and z. But I think you could write just about whatever you wanted as long as it doesn’t crash the application code that we’re putting this thing into.

Is there a better way?

If it’s going to call the function, like what’s the use case where you’re not writing real code?

I don’t, what I’m saying is I don’t know that it calls the function.

I think we’re just using the… So there’s an executor that usually runs under the hood.

So all these things, they always have an executor and it basically, there’s some sort of entry point where it takes the output and puts it into that scenario.

The reason why you always want it, they say you can optionally type it, but I wouldn’t consider that optional because that’s what forces it in.

So it’s very loose, you know, it’s a house of cards as far as how all of it works, but they’re starting to get very consistent with this sort of stuff.

So the way that I’ve explained it before is as a developer or person, you can see that calculator function and you know what it does.

Did you compile it?

No.

You looked at it and you know what it does.

You’ve seen enough code, you’ve seen enough stuff that you’re, okay, well this multiplies two numbers.

Let me multiply the two numbers. And you know, it’s really, really interesting how some of it plays out.

One of the big things too lately is if you’re not a coder, asking the LLM to code, to find the solution you’re asking it for and then to execute that code.

Right.

And it automatically makes it run through all that and gives you a better answer most of the time.

Yeah. It’s like you could drop in and say, hey, I’m trying to build a function to calculate two numbers, you know, the multiply two numbers.

Can you give that to me in Python?

And yeah. It would have more success generating the code than doing it in lines because there’s usually a mapping.

There is code usually that it’s mapping to.

Right.

It just doesn’t do that at runtime, you know, inside of its own system.

Yeah. Okay, I’m with you now.

You’re saying it looks at the headers to know what code, but it does execute on the external system.

Yeah, yeah, there is a mapping at some point out there.

Yeah, for sure. That’s what I was trying to figure out. Why would you have pseudocode in the function?

Because eventually it’s going to pass your data, you have to run it on.

Yeah, yeah, right. That was the weirdest thing is like I’m writing functions, but these are not the functions you’re looking for. These aren’t the ones you’re going to call. It’s going to take this and use this to do its own interpretation of the function.

It’s very good at generating out the functions, but not like executing the symbolic logic internally.

It doesn’t have its own Python interpreter.

Right.

So you go through a basically how that would turn into, we’ll skip into some of that.

So where am I at?

640.

Okay.

Are there any that will actually call a compiler with compiler output and try to fix it?

I’m sure.

Yeah, I think if you look at some of the coding agents that’s been specially done, they probably have special things to do that.

So they’ve got this whole agents class from, I’m not even sure how to pronounce it, but they’ve got this class from, I’m not even sure how to pronounce it.

I don’t know if it’s small agent, small agent. Okay. Well back to like the bio, you know, bacteria things. I couldn’t pronounce any of those. So small sounds more like a good Southern, Southern agent. It’s a small agent. So they’ve, they’ve created some wrappers and some things that you can use annotations and such to actually take a lot of the extra text out of these things. Oh, so ah, wrong thing. There we go.

So I’ve got some decorated code and they, they use this to inject things into the system prompts.

Let me, I think I’ve got the, let’s, before I go the hard way, we’ll hop over to their think, acts, observe, kind of cycle. So what their, what their breakdown is in their, their form of doing it is to actually, you know, have the LLM take whatever it is you’re trying to ask, break that down and to see what kind of actions it may need to take in order to do what you’re asking it to do. Oh, because if you, the other kind of a kind of mental jump that I had to make or a little, I don’t know what the right way to say that. A lot of times I’ve been using, you know, the LLM stuff just to give me answers to questions or, hey, I’m trying to figure this out, you know, what is this?

If you think of an agent, an agent is something that does something on your behalf.

So, hey, book an appointment, add such and such for three o’clock on Friday. If they’re available, if not, look at my schedule and see when the next available place might be.

You know, things like you want it to actually do things, which part of that gets kind of scary on the doing things that may cost money.

That’s still a, that’s still kind of an interesting, interesting piece.

So the thought figured out the action, which is do the action and the observation, which is figure out what the reply was or the result was or an exception that it may catch or something like that. So of course they’ve got some nice graphics that I figured I would show you instead of trying to rebuild.

It’s got arrows between boxes and then probably goes back.

The other fun part, they had some things, you have to tell it when to stop, which I think is built into the system prompt if you use their library to do it. So in other words, which thing is good enough? Should it keep looking for the next tool to get a better answer?

All right, so we got that.

They talk about Alfred, the weather agent, about current weather in New York. So it goes through some internal reasoning, it goes and finds a tool saying there’s a tool called get weather. The description probably says something about getting weather in a location. So then it acts, it gets feedback.

And then now that it’s got that, it can, we can answer.

The final action.

So the other thing I learned is that some of these, if they, if it makes an iteration through trying an action and that action didn’t, you know, didn’t succeed or whatever, and it gets that back and it’s got to go try to take the next action. Basically, this thing is just building out more and more and more and more context into a giant system prompt. And it’s just kind of iterating and iterating and iterating.

And I’ve seen some of the descriptions of, well, actually I will plug the Hugging Face Discord. They’ve got a really active channel running along with this course.

So if you are trying to do something and you’re having problems, your first step is probably hit their Discord, drop a search at the top.

But some of the stuff I saw on their channel are people trying to build some fairly interesting agents.

And the more steps and the more kinds of things you take, you’re just kind of taking the previous action and the observations and stacking it all to the next system prompt that you’re about to send it.

And then sticking to the next one on the system prompt, you know, and they’re blowing out the stack for what their context is trying to do these with some smaller models.

So anyway, that was kind of interesting.

The thing that really got me was kind of the whole, they walk through a piece of let’s do this just without any libraries. Let me just with a system prompt describe what this tool is, how to call it, all that kind of stuff. And you get down in here, so they want you to set up a serverless piece, Capital of France is, and they will, if you actually run this by itself, you wind up with Capital of France Paris, you know, blah, blah, blah.

And it’s just going to keep repeating and repeating and repeating based on how they’ve got it set up. So they got max new tokens.

That’s great.

Now we’re flipping over to the chat kind of approach where they’re actually got messages, basically an API based thing instead of writing out your own full up text of what the prompt should be. Talk about that.

So then they start getting into the, well, okay, let’s write our own prompt to do a tool calling kind of a thing because you’ve got a tool called get weather input is location is New York, you know, things like that. And it’s now start. And it actually, you can actually run this thing and it works. This is actually expanded into the whole, you know, beginning token, end token, you know, things like that.

It’s quite verbose and trying to figure out how to take the thing you want to do and get it into this format is pretty crazy. So what they do, and we’ll hop into this is they get this thing called small agents, where they define, think of this as a template for making your own functions, along with a different library that lets you combine a bunch of functions into a, into a, basically generates a system prompt on the other side.

And then you tell it run with this model.

And so, you know, it’s a, it’s a, it’s a, at that point, you’ve got a kind of like a framework that’s tied to an LLM on the back end, and you’ve got your small agent set that you’ve got on the front end.

So you’ve got, you wind up hooking it up to your kind of your interface, chat interface or whatever interface you’ve got.

You got the input coming in, you hand it to this small agent piece to start with, and it uses the model that you identify to go try to answer your question and, or do your task, and then come back to you. So that’s fun and all.

It goes through how to, you know, they created one of these Hugging Face spaces, and they want you to go duplicate or clone the space with their template thing.

I did all of that. We’ll show you that in just a second because we’re getting close to time. They defined a couple of tools already.

The easy ones, the get current time and time zone, where they look up a place and then go find out what time it is there and return it.

That works out of the box.

As part of the course, they want you to take this custom tool and make it do something, something interesting. What I wound up doing was taking a string input and then just basically the integer I used is how many times to append the string to itself to make a, you know, a bigger string for something, something easy. And then they’re using a Quinn two and a half coded 32B. It’s truck model. This is basically their application code where they create, this is basically the LLM piece.

This is the code agent that’s part of the Swole library and you give it the model, the list of tools, how many steps you want it to be able to make.

So in other words, if it needs a failover, first tool didn’t work, let me go to the next, to the next.

They didn’t really go into what some of the other pieces mean. And then throw it into Gradio. Gradio?

Gradio?

Anybody have a preference?

I’ve been saying it like eight different ways.

That’s the correct place.

That sounds right up my alley.

And so then you basically run this in a space and it does things. So I did. Let me close out a bunch of these windows and get back to where mine was. One of the first things I learned was that if you create a hugging face space and then migrate away from it, it gets kind of hard to find. I didn’t find a good way to say, here’s my spaces. You go to spaces and it shows you all the spaces.

If you just happen to search for your name, it’ll show you all yours. But I always have to go to my profile to find some of my stuff. It’s like, why is it hard?

I was obviously just here. I would show you this running, but we’ll get into that, why that’s not in a minute.

So what I wound up doing, let me drop into the code.

Sorry, I had to restart my machine right before we got into this. So I also learned about how to use hugging face spaces. I’d never played with them before. Creates a Git repo in its own little space with its own little runtime, and you can clone the repo, make code changes, commit your code changes.

It’ll restart your little runtime thing and have at it.

The free side of it gives you two CPUs and CPUs and 16 gig of RAM, something like that. So what I wound up doing, I don’t know if this is, is that, I don’t have Phil here to tell me how big it needs to be on the screen.

Can you read that?

Okay.

All right, good deal. So what I wound up doing was a custom tool basically says, hey, append this argument a bunch of times and then return it.

I did bring in the DuckDuckGo search tool, and then it’s using the model.

You give it your big list of tools and then say go. And unfortunately, this is a pretty sorry demo because I ran into an issue with, where is this?

I have exceeded the monthly credits for inference providers. I exceeded the monthly limit for inference providers in less than 10 minutes last night. I don’t even know what, so you can go to a pro level, and they will 20 exits. So I guess I would get 200 minutes, maybe.

So the pro level is $9 a month and they give you $2 worth of credits for whatever that is. I’m like, why can’t I use all $9 towards the, so where I’m at, I ran out of time. Where I’m at now is you can take, instead of using this Hugging Face API model, which wraps everything back to Hugging Face, you can actually provide a different model that reaches out to either Clog or OpenAI or your own hosted thing if you want, if you got it exported. So that’s my next step is to actually, in order to make it through Unit 2, I am going to have to figure out how to actually get this thing where I can run. I was able to run it several times. The interesting thing, it took my tool and tried to use it for taking the string and doing a bunch of times and what it returned to me. It actually said the string that I gave it six times, or I gave it like the string and the number six, print the string six times, and it said the answer is the string six times. Not what I wanted. So my basic experience so far with this, it’s been a little interesting trying to wrap my head around it, you know, things like that, but right now this feels like writing a function and then trying to figure out the right way to get an LLM to call your function.

And it’s, some of it feels like programming, some of it feels like psychology with the model.

So hey, I built this nice function and I’m trying to figure out the right way to explain my function and to explain the parameters and all so that it actually gets used the way I did it.

So it’s, it’s been a little interesting. So I know, let me go back and see on the chat side, because I know Lauren was working through some of this as well. All right, yeah, look through that. So you got sound, that’s good. I do need to make sure I get enough folks that are online with my cell phone number, that way you can just blow me up if I start doing something like that.

Lauren, did you have any kind of thoughts on, I know you’ve gotten a decent way through?

I’m not sure if you ran into the same kinds of issues I did or something different. I don’t know if you’re able to talk with us or not. All right, well, I will keep talking for a little bit. So I did the class.

I haven’t done the hands-on piece of it, but I did do all the text reading through it. I thought it was really well written. I was a little surprised. I thought there was going to be more verbal lectures and it was mostly written, but that was okay. I read most of Unit 1 on my phone while I don’t know what my wife was watching on TV. But I’m just kind of scrolling through.

It’s really easy to navigate and whatnot. I actually like that more than a video describing things and stuff, because I could bookmark where I was.

I reopened it on that tab and now I’m back to where I was and I can keep moving through. Some of the graphics, they did really good explaining things. Yeah, if you’re familiar with AI, you could do Unit 1 in a couple of an hour or two. If the reading part, not the hands-on part, the hands-on part is going to take more time. Yeah, that’s what I like about the text. I’m curious if I can make an agent complete the training on agents. It seems like it would be easy to run through, so. You’d be trained, not you. But if I train it to train itself, then… Make an agent build other agents. I’m curious if you’re trying to switch off a fucking face to something else, if you’re going to find that your tools don’t work. Because it’s been several months since I was really looking at agents, but one of the big barriers were that, for a while, not everybody supported it. It was function calling and then it was tools.

Every provider had a slightly different format.

So you get a tool and you’re like, cool, what does Gemini do?

Oh, Gemini doesn’t use that format. Let me rewrite the tool format, the function call to Gemini correctly. One of the reasons why they’re pushing the hugging face faces and stuff is they’ve got a lot of different models that you can inference on.

And you can switch real easy between Quinn and Claude. I mean, there’s all kinds of stuff you can do, which may mitigate that to an extent.

But that’s something where we could, if I can figure out how to make it called a different provider and get away from that.

Somebody that doesn’t just use OpenAI, because that’s simple enough, right?

It’s like, use a hugging face or an open router, but just go to Anthropic or Gemini that are just stubbornly not using the same API.

You can use OpenAI with Gemini.

With Gemini, you know?

I did not know that.

Yeah, the only one that’s a holdout is Anthropic, because I really want you to use MCP server, which is objectively better, but it’s different. Right. It would be nice if you could just make that pluggable finally.

I think Anthropic is going to hold out and make everybody’s lives hell, but it’s… I hope they win.

It’s kind of a Mac versus Windows.

It is, yeah. faith in versus VHS. Yeah, in a way, MCP does feel a whole lot more like program, you know, where you are actually tying things together on purpose, rather than just kind of hoping that the right thing is called or the right inference gets hidden, you know? Any other thoughts?

So I plan on doing this again after I make it through unit two and do some of the other stuff.

One of my developers was pointing out that like Cursor supports MCP. And so if you do want to write a little agent or something and you implement the Cursor API, MCP, you can pull that in things like Cursor.

So it’s becoming more pluggable.

I thought that was neat. Yeah, like if you’re using Cursor for coding, like there’s a GitHub MCP plugin, you just pop it in and you can just talk to it about repeds or issues. The latest version, I think, now has that built-in native.

It doesn’t? Yeah.

Okay.

And you can still post. You don’t even have to use a Claude now as a desktop. There’s like bridges and stuff like that. There’s one that I use all the time called MCP bridge, which is… It takes that, the anthropic sort of protocol and puts an open AI API interface on front of it, which completely eliminates, you know, so I would like to be able to take that out.

So I can reuse my stack, but that eliminates everything.

So I can have one bridge that goes to all of my MCP servers. Some of them are official.

Some of them are like one file, you know, servers.

So you have an open AI to MCP bridge.

So yeah, it’s an official bridge in the middle.

But then I can go Gemini, open AI, anthropic from like one chain.

Yeah.

Call all three and, you know, merge it back together, bunch of stuff like that. It does the SSC protocol.

So services, events, and it kind of uses that with standard IO.

Okay.

Works, does the thing. So you can use it with Gemini. You get 1500 free Gemini calls a day.

So it’s pretty cool.

Yeah, I use Gemini a lot because it came with my phone this year.

So yeah, if you look at open AI endpoints, they have a beta endpoint that does almost all of the tool calling stuff.

I didn’t know that.

We’ve been working very hard to get that because their office nonsense like Google, all their authentication stuff, but it goes through that after you’re on. So you can Gemini until you run out? Until you get to that 1500, you get flash thinking.

I mean flash, 2.0 flash, very solid.

So any other comments on the agent stuff? I feel a little bit smarter and a little bit dumber at the same time. It’s a different way of trying to think through things sometimes. And it’s still kind of trying to, you know, it’s more me trying to get my head wrapped around it in the right way. But the thing I’d like to do when I’m done is be able to build an agent. Nobody makes it this far down into my videos that we post these things. So I don’t have to worry about somebody stealing my idea. One of the hardest things I’ve had to do is reserve a room at the Mini Glacier Lodge in Glacier National Park. It is because they charge you for the first night only and will refund 100% of your money if you cancel up to 24 hours per morning. So people just reserve the heck out of it in case they might be able to go. The good news is a lot of these get canceled through the year. So if you had something that could go scrape their UI, they don’t have an API and any kind of things, something is going to have to go like trying to find, I need the same room for four nights straight.

As soon as you get it, send me a text or whatever.

I’m not saying book it because I’m not quite that trusting yet.

But that is something I should be able to build an agent to do. So that’s kind of what my internal kind of goal is, to find something that is something I would do myself if I had time to do that every day or every hour. I’m not sure how often to try to get this thing to run.

But if I had an agent doing something that I could do automated instead of me having to do it, that’s pretty clear.

It’s 1500 times a day.

I guess I could.

You’re using an LLM to do this.

So you’re going to have to put vision calls for all your web pages. It’s going to be pricey. Maybe. Can you make it for every time Nancy Pelosi makes a stock trade?

I could.

If you could tell, I don’t think she trades her own. But if you could build a congressional stock index. I would definitely have an app right now that just came out.

I’ve totally seen people talk about that on X. You can just follow Nancy Pelosi’s husband’s trades.

You figure out who’s trading instead of her. And then just one time system automatically trades.

Right.

Oh, I know who you’re talking about.

Yeah.

If you did want to take an agent and do something like you’re talking about, the very first thing you’re going to do is plug it into a browser.

Right. So you might have a browser plugin they like, or would you just use any of the libraries in Python or whatnot to do a headless browser?

Like, how would anybody approach that?

What was the question?

Like, if you were going to try to do something, you’d find like booking something on a website, would you try to actually drive a browser?

Does somebody have like a plugin or a proxy that does that?

Or would you just pull in just the headless Chrome browser stuff you can pull into like Python?

See, a lot of people use Selenium.

Yeah. Yeah. I mean, that’s the one I see the most often, but there are other ways of doing it.

I’ve done that without an agent.

So I’m kind of wondering why you’d be able to live now. It was a good question. Which one had the tool that would actually take over your browser and do some interesting… It’s computer use, yeah. The cloud has computer use. There’s some open source versions of it now, and there’s MCP servers that do. I was wondering if there’s an MCP server.

Yes, absolutely. Yeah, okay.

Absolutely. So you do that, or do you… Yeah, it’s… Yeah, and the stuff I’m looking at, I don’t think it has a whole lot of images.

You could probably look at the exact return from the page in text and figure out what and figure out what’s open or not open based on… The last time I tried to do something similar without agents, it was caching and cookies. That’s the problem.

You couldn’t just go to the website. You had to go through their off-clone and all that jazz, and so if you could drive a real browser, it might make that easier. Yeah, the issue with images, the image API for everything, talking about tool servers, that’s actually more together than the multimodal APIs for all these different… Every single model will be completely different. They have different tokens, different places to put them.

That’s the biggest hot mess to me. Multimodal stuff. So the thought start and stop tokens were bad between models.

It does drive me crazy, bouncing between different models and then being like, what do you accept?

Do you take PDFs? Do you take images? Which of these models still takes an image?

My first question is always, how do I prop you?

You know, and it works okay, but let’s… I’ll stop recording and we can start shutting the meetup part of this down.