AI Workloads with Runpod – Huntsville AI

Transcription provided by Huntsville AI Transcribe

So… Anyway, we’ll just carry on and later if we need to ask questions and do that kind of thing, we can. Everyone here and online has been here before. So, again, am I going to do the whole spill? I guess a couple of just before we get started looking at doing a paper review next week. looking at doing another social the week after that.

Probably going to try for Yellowhammer again, especially if I can get in on a lazy night.

That was pretty nice. If not there, we’ll find someplace. Well, there’s nice enough Stovehouse. Yeah, we can always go back to Stovehouse and do that one. Looking for additional speakers or topics, you know, after that.

So if you got things or topics you want to talk about. Let me know I know we’ve got See Tony they come sometimes Can’t remember the name of the company through or something like that does a lot of VR There were some things that popped recently on the IEEE spec for Get next week is agentic engineering environments Josh There was something in a spec for like URLs that include location encoding things like that that should make it interesting to To be able to have something available for the URL just based on its location Or something like it’s kind of neat. Yeah, so the agentic engineering environments is that? Agentic run engineering environments or engineering environments for making agents like I don’t know Josh do you uh could you hear that I can can you hear me I guess is the question yeah yes yeah okay cool uh so it is mostly on how do you basically set up a co-engineering environment for agents uh so dealing with them inside of your environment uh it would also be how to make them as well though um since it’s kind of a lot of the same stuff but that’s mostly it Thank you. Yeah, so there’s a couple of topics. Some of it leading into, you know, you kind of get crazy towards the end of the year. So we normally schedule one thing in December just to end the year and just cover all the, you know, just quick roundup.

But here’s all the stuff we talked about.

Then, of course, another tagline for here’s all the stuff we talked about that didn’t exist until this year, you know, which normally is most all of it. So that’s pretty good.

There are a couple of things coming up from the AI Huntsville perspective. They are, their website is live, but I’m not sure how stable it is. So as soon as I find it stable, I will link to it and hopefully stop getting messages intended for them, but they always find me.

That’ll help.

The other thing they’re looking at from the workforce committee, We’re looking at doing a couple of sessions, kind of ending this year morning, coffee, conversation, just AI, just topics, things like that.

More of a discussion, panel, whatever, still looking for that.

I’ll let you know dates, dives, stuff as soon as I find out.

Anyway, that’s all for that and of course going into next year January time frame we’ve got this big a I suppose in thing which a lot of us are also doing workshops and other stuff for that Some of the workshops from last from early this year in January are posted on YouTube if you go look for mine is not I need to go figure out what happened with that one because others the one that was right before me is posted. And the one after me is not. So I don’t know if they’re still working to worry through stuff or if they had some issue yet.

I just started to start to remember to look and it says AI symposium 2025, J. Langley and Dr. William McBride on YouTube. Okay. Yesterday. And that was last year. Yeah, 2025. Right. They just come to it. That was this year. I would just hit, okay.

Yeah, it’s the top hit on Google if you search HSP. Oh, nice. I might be famous.

Because they just posted it. Yay, control refining. Someone says thumbs up and subscribe now. I’ve been waiting on that to hit because that was one of the uh I get asked a decent amount of times how how do you keep up with AI stuff and the answer is good good luck give up now but if you want to keep trying here’s some thoughts you know um anyway there’s that so anyway that’s that’ll be early next year I still don’t know what the cost and all that is. I’m sure we’ll find out more about that as it gets closer. We can set them up with the automated reports and just swamp them every week with everything that’s changed so that they can kind of get a feel.

Maybe. Just the PR, you know, official stuff, not the in-right. The next year is all about agents. I thought this year was the year of agents. This year was human in the loop. Next year is agents.

year before that I don’t know what it was. But I don’t, I think they’re expecting you know more than they had last year based on early signups, things like that. So I think last year was 450 to 500ish and so they’re expecting more of that.

They got some, they got some really good speakers. They are splitting some tracks so that you’ve got just the AI, the popular, all the stuff, as well as more of a technical kind of thing for how to do things. So it’ll be good. Anyway, there’s that.

But tonight, we’re talking about AI workloads with Runpod because that’s what I had been working and that’s what I have.

So some things that are pretty interesting.

that were much easier to get working than I thought, with a little help from my friend, Klein, that just took what I had and said, okay, now here’s a run pot version of it. He’s like, okay, let me try that. And it mostly worked. I was like, wow, this is crazy. What I wanted to start with is the current AWS architecture for this transcription service.

This is stuff we’ve done before, as far as these kind of this session.

I’ll see you on the chat. All right. Yes. OK.

I’m going to stop sharing and see if it will actually show this whiteboard up here.

Oh, the one that says Wi-Fi on it?

Yeah, the one that says Wi-Fi. I think so why don’t you not wait? If you’re online, it needs to know what our Wi-Fi here is. So what I’ve got, what we’re doing right now that we’ve had for this transcription service, I had a web, wow. That’s right. This might not go very far. There we go.

A web UI.

And this thing was on AWS Fargate. Everything’s docker containers fly all over the place. So I had this guy coming out to a load balancer. Here’s my internet gateway. Just following a lot of the things that if you read AWS tutorials on how to start up, how to host a web website or app, anything like that.

For a long time, we actually had the transcription service also running in Fargate itself.

And then when Faster Whisper came out, it was able to run at a much lower hardware.

aspect of it.

I actually moved into a Lambda.

So super, super cheap.

So I’ve got the actually service.

I’m just going to call it that.

That guy is running in a Lambda.

Over here, I’ve got a, I’m already drawing all the stuff in the wrong place, but I’ve got a SQS to actually Be able to trend this guy talks to there this guy receives from there, you know all that kind of stuff fun things I’ve got an s3 bucket So what happens is you upload a transcript you do some stuff here it drops into the s3 That then triggers my lambda to run over here. I forgot I’ve got a dynamo DB Because this guy, after you authenticate with your email address and pay with strike, it grabs your token, shoves a little record into here, shoves a file down, this guy reads the file, double checks against what’s in DynamoDB to give you a second kind of a thing. That way you can’t just automatically drop files in and expect to, you know, you can’t spoof it really that easy. I’m sure you can figure out what, but… So there’s that. And then this guy runs, updates this.

This shoves a transcript back into a separate S3 bucket, deletes this guy and cleans up itself. This gets pushed back to the UI and SES is their email service. So it winds up emailing you the transcript. So a lot of fun. It is complicated for what it does. The other thing that they don’t really tell you a lot when you get started is they’ll tell you how much money this thing costs. You know, it’s not a lot. This thing, the web UI was probably $5 a month, maybe less. I’ve got to run it really, really cheap. But instead of, if all of a sudden this starts getting hit, it’ll spend up another one, you know, whatever, it scales. snow big deal. It’ll tell you this costs money but this is you pay like pennies for a million messages. I mean it’s I don’t even know if I this doesn’t even rank. Also I forget where I put this other thing. There’s also an Amazon elastic container registry where your Docker stuff goes.

They really push you to use that instead of using Docker Hub because it’s much faster to load from here into any of these services because they’re all in the same kind of cloud, whatever.

This costs money, not a lot, but based on how many images you have, the size of the images you have, all that kind of stuff. DynamoDB costs money, but again, it’s based on how many transactions I’ve got. This costs money, again, based on how many transactions. Both of these are based on storage.

Not that not that big of a deal plus if you’re on Amazon AWS Moving data from S3 to any of these things that exist inside of AWS. They don’t charge you for moving up the data They nail you on getting data in and getting it out. That’s firmly right The thing that wasn’t wasn’t really apparent This cost money This cost money And there’s a overall, there’s a VPC virtual private cloud that has all of these things. And then you wind up setting up your subnets because you have to have at least two to have a load balancer. And then you’ve got to have the port for here.

You got to know what ports to go.

I mean, there’s a ton of setup.

We wound up using Terraform for the whole thing, which was really, really a great idea because when I needed to tear it all down, It was like delete these three files and say terraform plan terraform apply and boom it’s gone. If I want it back I can go do a get reset hard terraform apply terraform plan apply and it’s back. So with the with looping all of the stuff off of the web UI I pulled it off of Fargate now it’s hosted on DigitalOcean.

five dollars a month for basically the same thing.

And I don’t need this, I don’t need this, I don’t need this. It’ll also load balance for me if things start to happen and it spins up. No big deal. This guy is what we moved over to Runtot, which we’re talking about tonight.

And it actually, pretty much, I’m still using these pieces for now.

I’m not using Lambda.

I’m still using SQSS3, but basically I got rid of all of this.

And this is now, I got rid of the ECR because now I just shove stuff into Doctor Hub like a normal person. This is Digital Ocean. And Service is now RunPot.

I’ve got it set up so that’s still a Docker. Everything is still Docker containers. Almost nothing changed with either one of these. except the actual interface.

Actually, nothing changed with this guy.

I just rehosted the same Docker image and did an app with DigitalOcean. So make this one app please. And it said, okay. The other thing it does is I said, okay, also make it a transcribe.ai.hsv. And it said, okay, I’ll make a certificate and host it, all that kind of stuff. With Amazon, I had to create my own certificate in AWS, which it would bug me every year to redo and all. And then I had to change my C name on my DNS registry to point to the new thing here. Not hard at all.

I’m looking at, the weird thing is when I was running in a Lambda, It was so slow that I wound up dropping this where the SQS came from.

I was streaming, using that as a way to stream the transcription back to the UI so it shows up as it’s going and you can actually see what’s going on.

It’s kind of a lot of fun. When I first moved over to RunPod, I was doing the same kind of a thing except with RunPod, I have access to GPU runners.

So now instead of a lambda running on the slowest shared CPU that it can find and taking like it would typically time out 15 minutes if you gave it something longer than like 45 minutes worth of audio.

This thing right now is chugging along at about an hour’s worth of audio. It can clip in about two and a half minutes. So the streaming part gets really weird because it’s shoving more stuff at you than you have any. any chance to clear. So I’m going to get rid of that one. And also with it, there is also a way to stream directly from the service back.

I haven’t got working yet because I had to spend some time doing slides for some other thing next week.

But we’ll talk about some of that. and might be able to really, really simplify this. So far, I’m really happy with the SES service from Amazon. If you know of any other ways to send an email, extremely cheap without having to, and they’re all authenticated as well through the right kind of stuff.

So they don’t wind up in your spam box. And it’s, I mean, that was half of the battle with that. So there’s that.

And then S3, I can’t really think of a better way right now for just showing the files out there and then being able to get to them later somewhere else.

But anyway, let’s jump back to your screen and get out of that.

And the marker was a little invisible.

People watching.

Oh, OK.

Sorry.

Yes. All right, so again, Wiley Bay as AWS cost was one of the main things Looking at cost Explorer Right now this thing was chugging just over 60 bucks a month and then $15 of that is hospital.ai I have another another light cell site up for some photography stuff I do. And then another fun fact I just learned, if you have a light cell instance, even if you turn it off, they still charge you for it.

So I had three.

One had been off for months because I migrated, I moved from one version of WordPress up to another. And I found the best way to do that is to keep my old instance because, oh my gosh, but then migrate, but that way I have the old one to turn back on if it gets screwed up, which a lot of times the first shot screws it up. So anyway, that was fun. But if you actually look at the breakdown of this load balancing $16 a month, the actual ECS was running up to, and again, these costs when I first did this was over two years ago. And they weren’t this expensive, but over time it starts to kind of climb and climb.

You don’t really notice it unless you actually take this graph and break it down and say, hey, show me three years worth. And it’s like, oh my gosh, this used to be $25 a month.

And again, none of this is super, super expensive after you get down into the SNS.

the SES, the S3, the Lambda stuff, you know.

But just paying the stuff for the, oh, the ECR, this is because I had a bunch of images up there.

But even that’s a couple of bucks. So again, we’re not talking about huge, huge amounts of money. And it’s not like there’s, nor do I have a ton of people using this at the moment. I use it weekly. I have a couple other people that hop on and use it when they have something and they don’t have an easy way to, most places now, you have easier ways to do transcription. Yeah. So when you look at your elastic load balancing, you know, it never drops below about $15 a month.

Is there a way you can bump up a static feed and pay less on the load balancing?

Not that I can find.

and they won’t let you do it without, I had to have the load balancer to actually get it through to the internet gateway. I can’t just go directly, I know. Or maybe you could, I just couldn’t figure it out.

You could go directly through API gateway?

Maybe.

If I did that, I couldn’t get to some of, it could be because I was running Fargate at the time on the service and it really wants your thing to kick off another service to be in the same VPC.

And a policy might only be a couple dollars saved a month.

I was just thinking if you were pushing something bigger through here.

Right. Yeah, the other thing is removing all the complexity was really nice. That was, try to remember where I’m at. Her compute unit that you use. So as long as you have it up, you’ll be paying. And that’s why it’s virtually the same every month. Yeah. Like you said, the convenience is worth a few dollars a month. I thought you could pay for all virtual server for that much and put anything on it instead of having this much for that, this much for that.

The other thing was GPU orders.

I have fought with Amazon for, I don’t know where we’re at. I really fought.

I just give up because you asked for a quota increase for this or that. I’m not even at an account level that even replied to it. It’s like, that’s great. You have GPUs but nobody can use them. Or no ordinary people can use them. So that’s why we kind of shifted towards, let’s see if I can run this in a Lambda and actually do something unique and really, really cut my cost.

With run pod, you just go figure out what kind of GPU do you want to run this on?

And they’re saying massive.

I’m a selection.

We’ll get into the end point stuff in a minute. But the interface similarity, running with a Lambda, just to find a function that takes an event which gets converted into a dictionary by their service at a context which I haven’t found anyone using.

Yeah, there’s things in it.

I don’t know if any of it’s useful.

I can’t find examples where they actually use the context.

So it’s just kind of, anyway.

The run pod version of that, they just get rid of the context.

You still get an event that winds up being a dictionary.

The exposed API or the exposed endpoint, you just pass the JSON package that gets converted into this event that you wind up receiving on the inside.

Super easy.

Not hard. Let me drop in and jump into an endpoint configuration to kind of show you what this looks like.

So serverless.

Right now I’ve got, and this was another interesting piece. Let me see if I can make this go away.

Which you can’t see, but there’s a window in front of half of what I’m looking at.

You can deploy the same it was so easy to deploy stuff.

I went ahead and created I’ve got the The stuff I’ve got has a couple of environment variables that it uses for s3 buckets which table I use and everything I just got a different environment file for test versus production So all I did was load the same Docker image up here and register two different endpoints, one to use for testing, one to use for production, and they just have different parameters, different environment variables, and things that set them up as to what they’re doing. If I look at the development one, let’s see, workers by actually manage. Let’s go to edit first.

So pretty much, Right now I’ve got it picked to run a 16 gigabyte GPU if it finds one first, otherwise hop up to a 24 gig.

If you want a 180 gig, you know what I mean, it’s just kind of crazy what you can do, especially at the level that I’m at.

I’m not running multi-GPU workloads or anything like that, you know. You could probably do something like that with this if you wanted to.

I’m just not quite sure how.

So I’ve got… Right now I’m at 16… I don’t even know how to calculate that.

Oh, it looks like a penny per minute for 16 gigabytes.

Sounds kind of crazy. Yeah. That’s not… I know.

So… There’s, so you can pick what kind of things you want. You can tell it how many workers. And you can’t just spin it up when the server gets it.

Too bad you can’t just spin it up when your server gets it. Somebody wants something. That’s exactly what this is doing. It’s not idling, costing you anything. No, it’s not. Okay. At all.

the active the active workers part there’s some we’ll talk about it in a minute but the cold start issue of somebody hits it now I’ve got a service that’s right and this is the other part from a security perspective I’ll talk about as well these are running all over the world if you’ve got a GPU cluster or something you can go connect it up to run pod and they will use your cluster to run their stuff and pay you for your compute time of course seeing what the prices are, they ain’t paying you much.

But if you happen to have idle compute time that you’re not using for anything, but you have to keep the thing powered on to keep, you know, you might as well do something.

But just know that your data and all of the stuff that you’re putting into this is running on someone else’s machine with who knows what kind of security in place or if they have anything like that in place. And we’re not only talking about cyber security, we’re talking about physical security. This could be running in my basement for all you know. With no VPN or anything.

Right.

Anyway, you can set up for how long you want things to stay up after a job finishes.

In other words, if you are actually running under heavy load and you want to keep things from idling down and being spun back up, you can actually tell us what kind of, you know, you can tune all of this.

As far as the GPUs, those are all NVIDIA GPUs?

You can be fairly specific as to what kind of GPU you have to have.

The ones that I’ve seen are all NVIDIAs.

I was just curious about that because sometimes some of the models don’t like certain GPUs that don’t run well without having something between them and then they don’t perform well.

Yeah. And what I’ve got in the code for this transcription service with faster whisper provides a way to do batched inference. That’s why I’m so freaking fast is because it chops it up and then shoves them all through at the same time. I’ve got a batch size that’s based on… a query for the available RAM on the GPU, and then use that to adjust the batch size. That way, if I’m running on a, I could take the most, you get the most out of it, because if I’m going to pay forward 20 for the least time.

Right.

I’m at least going to get faster and maybe, you know, might need to actually be done for the same cost.

Not quite sure.

Or less, because it did take less seconds maybe. Right. Environment variables, these are all, Pretty easy.

So we’ve got the name here.

RunPy lets you set up secret stuff, which is only like your whole account or your whole project. And then you can reference those in any kind of configuration you want.

So you’ll see anything that’s got an actual key in it is secret.

Anything that’s just dumb like the name of a bucket.

I don’t really care.

Yeah, things like that. Back size starts off at 24 and then adjusts based on which kind of GPU I’m on.

Docker configuration, that’s what I’m using, what kind of disk storage do I want.

You can actually specify what kind of CUDA versions you allow.

you know, things like that. That’s the kind of thing that pretty much tells me I’m running on an NVIDIA GPU.

You could set up, there’s models at the bottom, right?

What kind of GPU types you want to allow, which data centers you want to pick from, you know, if I wanted to only run this in the US, I I could do that assuming I trust them that when they say us they really mean us.

So that kind of thing. So wasn’t really hard to set up. Again, I got there pretty quick. Did a lot faster than what I expected anyway. variables, we checked that cold start time.

When I was on AWS, either with Fargate or Lambda, I mean, we were about five minutes, where we were, you know, basically, that’s the whole, that was part of the reason for the SES to email you back, because you just kick it off, you just shut it down, then you get an email about 30 minutes, so do I have your stuff in it?

I didn’t really see too many people just sitting there waiting on there, you know, it’s not like I got to have it now, email it to me later. But now with the way that you can do some things with the Cold Start, it really, you can tune it. There are some things that if you have sporadic use like what I have, you’re just gonna have to deal with it. Cause it’s gonna have to load the Docker image, which is sometimes can be a bit much. Their documentation has you capped at a 10 gigabyte. Docker image which is still fairly large until you start loading models inside of your image. I have found that that 10 gigabyte is not a hard limit.

So what I’m running is a little over 10.

I think it’s over 10.

I’ve done ones over 10.

I’m still playing around a little bit to see because right now the image spins up and then reaches out to load the model from Hugging Face and then kicks off.

which is costing a bit of time on the initial spin up. But then you got the offset of that.

If I put the model in the Docker image, that’s going to take longer to load the image. But if we trigger the keep them active kind of thing, at least I’m not loading that across. You’re looking, trying to think of how many gig the actual faster whisper model is.

So there’s that if you actually go look at workers You can see actually let’s look at metrics.

Oh I’m on there.

I’m on the list I’m on the yeah, let’s go back and look at the other point for production because this is the one that’s actually hooked up That we’ve been I’ve been playing around with lately Hourly I think I was probably in the same hour So I ran two different pieces, or two different, what’s the right way to say that?

Two different transcriptions already. Scale strategy. Let’s go look at workers.

So right now it’s got two that it’s got kind of active or idle, and then it’s got three that if these two are in use, they’ll go ahead and drop to these.

It’ll tell you what version of your, you know, of your container is actually loaded.

Let’s play around this a little bit.

Let me start over.

I will need to remember to change the offense. So now we’re just, this is what you see in the email every time. This is free for you. It cost me like five cents, so don’t worry about it. But I will have to change this after I post this online. It’s already set up to where if you don’t have the authentication token and you enter your email, if it doesn’t find you already in the database before, it’ll give you one for free. Just to let you try it out. Langley at HSV.ai. The one I’m going to pick a file.

What I’m going to upload is actually from when Tom did the NVIDIA certification talk. Right now, this takes a minute because I’m on some not great Wi-Fi. Yes, some of you know. I haven’t seen any goggles on. No, I don’t know, starting the process, we’ll flip back over to, to run Pies, excuse for their spilling. It must be the Wi-Fi. So I saw one already pick up and is running. So we’re going to pick this guy.

It gives you some actual, you know, it’s a A4500.

I can look at the logs and see what it’s actually doing.

Currently it’s loading a model, loading a paragraph model. Next thing you’ll see somewhere in here is it’s okay, grab the file, pulling it from S3, starting the transcription. It should pop up and tell me, and I’ve got another thing away here. By giving up on that anyway. It doesn’t show up on your screen, but there’s a thing in the way that I can’t see. Okay, so it knows it’s got almost an hour’s worth.

So from the cold start perspective, when I hit upload to when I’m actually looking at logs of this thing running was about as fast as I can get over there.

It should tell me that it’s, okay, so it’s English, probably 99 cents.

the VAD filter removes some silent parts.

Go back to telemetry. I’m not sure if this is actually, I don’t know if it’s already done. Yeah, that part about the automatically flipping through, you know, I think I’ll update this every 10 seconds. I’m going to get rid of this whole thing.

But another interesting thing that I haven’t seen before, if you want to SSH into your container that’s running on somebody else’s machine, have at it.

I thought that was kind of odd.

Landon would never let me do that. But if you’re troubleshooting something, And you want it to spin up and wait on some other kind of thing so you can actually hop in and inspect what’s going on and check to see what’s what’s what That’s a thing. Oh, it’s super helpful Yeah, because is that because of that double VPN bandwidth sharing?

No, I meant you get off your subnet maybe. Yeah, I don’t know Anyway, this thing is already, I think I messed it up somewhere. One of the things I’m looking at now, I think that actually did.

I should if I go back over and find the right email address.

Nope, not that one.

I swear Zoom always puts stuff right over the top.

It’s probably gonna want my… Anyway, this is pretty much it. If you hit the little Gemini star, you can do summarize. I can. I actually used to have a summarization piece built into this that would hallucinate crazily.

and would summarize things and then go add additional info or what it was nuts.

Jim and I is a little better, actually a lot better.

Let’s just say that you have the little star next to the gear so you can hit it, right? And it’ll summarize what’s on the page.

Yeah, it should.

I’m even going to type it wrong. Yeah, me too. It figures it out. Oh, come on.

I do have access.

Oh, down at the bottom, share current page. Click.

Now do it.

Switch account to current page. Oh, OK. Because I couldn’t see your page. No, this is a weird thing. I’m logged in right now as my personal account. And this is not a personal email address. This is a different workspace.

But you usually can’t see it unless you’ve already shared it with Gemini where you hit share page.

It’ll just say I. I always ask, can you read the page I’m looking at? No.

I’m going to hit share page.

You’re like, OK, here’s what’s on your page. Yes. I’ve been using it pretty much pretty well, except.

If it may be sharing your Gmail with Gemini on the same account. It probably is.

So you never have to answer or say just. Right. So I am using a different model to figure out where paragraph breaks are.

So all I get is a bunch of text, and what it’s doing is loading each sentence in and saying, is this sentence eight different subjects than the previous sentences?

And if it’s over like a 70% chance, I go ahead and insert a page break.

And it’s fairly good from what I’ve seen.

So that was the endpoint parts.

Let’s see what else I had on this.

So cold start times, interesting.

From a run pod perspective, they have a really, really active Discord server with probably a couple of thousand people on it live at any given time.

If you’ve got questions about, hey, I’m trying to do this, odds are high that you’re not the only person out there trying to figure out how to run this particular workload on run pod.

is that they have a lot of pre-built workers.

If you wanted to run, if you wanted to set up an endpoint that runs Gemma 370, there’s already one that does that.

You just follow the tutorial, you set it up and now it gives you an API and you take your run bot key that you have a token that you can use to guarantee, you know, give you access and keep others out, but it can do that. There’s a couple of things already spent up the first time I think we hit run pot was for comfy UI doing some I think that was video generation Happy February or something this year where we did that one We’re looking at clean and some of the others and then had a self-hosted in run pot using that was they have that out of the box now if you want to play around with that So a lot of your open models you could do with that I’m kind of curious to see what it would be like trying to host an endpoint here and then hit it from like a chat server or something.

Like if I wanted to put something on HSV.AI as a, hey, ask us a question or something and then bounce it across kind of back to the RAG type thing.

But again, then I figured out a database, you know, you’re back to some of the same stuff, but. So if I go back over to run pod, you can see I still have $25 available.

Can you start for free?

Huh?

Can you start for free?

A little bit, but not much at all.

But again, you can like load five bucks to it.

I mean, it’s.

So you can set it to automatically when it gets past certain number to add 25 forward to whatever.

They don’t have a top level budget and my first rule is never ever set up an account without a budget.

A hard budget as it shut the whole thing down.

I was gonna say I like it to run out of money and everything quits. I know. I know. I know the card it’s connected with has a low budget.

Yeah, that was the way you controlled it. Right, so I’m currently the only option right now is to just watch that or have some kind of alert set up so it tells me if it gets lower then I have to go add more to it because I refuse to have it auto do anything as far as money goes. Another shortfall or negative Moving the UI over to DigitalOcean, it was really easy to find a Terraform thing that goes and says, hey, set this up, make it available at this address, load this image, and go.

And it did.

I can’t find anything close to that with Runbot.

There are API things, but it’s not quite as easy as here is a template or a file that I control in source code, source control, and then make sure that, you know, because you saw how many environment variables there were, I had to do each one of those by hand.

It sucked. You know, that was the longest thing it actually took me to set up was entering all of those environment variables to do all the things. And then logging.

This one gets interesting.

I know there is a lot of services you can use for this stuff.

But I’ve got logs based on which kind of worker ID and all of this, but all of this is kind of hosted on run pod.

I’m not quite sure how long they keep on. It’s not really good for being able to search kind of works. But if you’re running into any kind of issue, try to troubleshoot something. This sucks pretty bad for trying to figure out what’s going on. With Amazon, I was using their CloudWatch service that I don’t have on the board here, but it was yet another thing. And of course, you got ephemeral containers that come up and they go away later. So capturing the logs and keeping those for traceability later, especially when I’m accepting payment. and doing things like that, if somebody needs a refund or whatever, I need to be able to actually walk through and see, you know, first of all, yeah, keep your refund, I don’t care. But actually for myself, Trace is back to see what happened. Let me see what Charlie had in the comments. Yeah, that does sound right. Digital Ocean actually has a law kind of a thing that they do. They wanted to put it both in at the same and figuring out how to go that route. That way my UI parts and my other stuff is in the same place.

Let’s see what else.

And that was it.

But going from something that was First off, going from Fargate to Lambda wasn’t, it wasn’t that big of a deal at all. And now going from Lambda over to Runtod endpoint was super easy. It was a little weird.

I thought this can’t be right. You know, it’s gotta be harder than this, but it really was. Any questions or any cries of heresy or things I should have done different or anything?

The Docker image was similar from what you deployed for, right? It’s what you were using for Lambda when you moved it over to Runpods.

It ended up being virtually the same.

Close. Was there any like weird stuff with like Runpods specific?

Because I know for some other services, they have like specific container images or specific things you have to add around. The weirdness was getting rid of all the crap I had using Runlambda. So to run in Lambda, they make you do a read-only file system? Which you have to get all your stuff in the right place except for like slash temp That you can know so you got to figure out how you get all your environment stuff like downloading a hugging face bottle Well, you got to set the thing up so it gets shoved into slash temp grounds.

It’ll it’ll die It was easier to to troubleshoot things with run pot because they give you Your main why actually let me show Oh, let’s get out of here.

Shoot. So we were playing with the other day, and Google gave you like $300 to start with.

I have no idea what their costs are after that. So it’s kind of like a, come on.

Sure, I’ll give you $300.

Just give me your credit card number.

Yeah, I’ll show you what the, and of course, their run pots tutorials are really, really good.

I haven’t come across one yet that I didn’t just type everything in like they said and it works.

That’s pretty cool.

That’s very rare today.

I know.

So your main basically says run pod serverless start and you tell it your handler function.

So you can also run this if you have the run pod SDK installed in your environment.

You can run this locally.

Instead of it being in a Docker container and it will do the same things it would do if it were dockerized.

And then after you spin it up as a Docker container, it is a little, that one was a little bit weird because they add some of their own environment variables and some other stuff to do their hosting. It was still a whole lot easier trying to troubleshoot Lambda running Docker stuff. Just for the people that are not as dockerized. The main machine is one big container and then everything underneath that smaller docker container so that they can move machine to machine and things like that around the AWS. The container is built so that I’m trying to think of the right way to explain this container. So it’s like docker and docker. It’s like their version of their dockerized They could be running this inside a docker.

They could be running this on bare metal. I have no way to know. Right, because it’s all virtual. You don’t know. And I don’t care. And they just run their version of another docker instance and keep you in the docker instance and then charge you based on what you turn on and turn off.

Kind of sort of, yeah. It’s kind of all built off of the same. Kind of back to the initial ISO container.

There’s a book called Box that’s pretty interesting.

from a, you know, shipping was changed like overnight just by the adoption of an ISO container. All boxes are the same.

Well, you got different kind of size boxes, but everything is the same. They have the same connection points.

They have the same, you know, and lo and behold, you got container ships. And now I’ve got containers on trains and containers on trucks. And so that’s a physical version of it. This is just the virtual version of the same thing.

I define it, it’s got a GNOME set of endpoints, entry points, exit points.

It works this way.

Well, if you want to run on a Mac, you run on a Mac, you run on a Linux, you run on a Linux, you run on a Linux, right?

Yeah. Can I just get confused between KVMs and dockers and all the different ways of making different kinds of containers or different kinds of companies? Oh yeah.

Right?

And it’s like, well, it could be a KVM running a docker underneath KVM or it could be. Right. This could be running in Docker.

It could be running in Kubernetes.

It could be running. I don’t even know. Yeah.

Like you said, you don’t really care because just long your section works and I can control your section. My section, that’s the other thing. And that’s why I also need to make sure that before I exit this container that I delete all of the stuff that I have gone and created because the drive that that’s on could be mounted somewhere locally. I don’t want to leave stuff laying around if I don’t have to. That’s not a charge if you leave it laying around. It’s not a charge, but you want your transcription. It’s a security problem, I understand. Yes, that’s the problem that bugs me the most. I mean, if you hit the transcription website, it tells you, I don’t keep anything. You can use the old data address trick to slow everything down.

I also don’t want to pay money to store stuff that I don’t have to. and it doesn’t actually benefit me any. So it’s been pretty good. I’m sure I can see if there’s any questions online.

It does seem if you could keep your models on your drive if you’re being very sad. All right, well with that, I am going to Stop the recording as soon as I figure out how again.