Faster Whisper – Huntsville AI

Transcription provided by Huntsville AI Transcribe

So just about everything we’ve done is on GitHub.

You can see like the previous sessions we’ve done this year going back to 2018, but it does get a little sparse the further back you go. When we first started, I think we met like five times the first year and then eventually we got into a rhythm.

So what we’re talking about tonight is something called Faster Whisper. If you don’t know, we actually have a transcription service. And of course, if you are on here and watching this live, if you do drop an auth token, it will let you transcribe stuff for free.

So how about it?

If I start to lose too much money that way, I will change the auth token.

So I also follow my own advice and there is a quota on that budget.

We’ll not exceed what I’m willing to give away and stuff. But anyway, it’s a quick and dirty transcription service that uses Whisper from OpenAI behind the scenes. So when this other library called Faster Whisper dropped, it got my attention. Josh, you might have been the first one to mention it to me. As in, hey, this might be a thing, which half of what we do in AI these days is hey, this might be a thing. A lot of times it’s not or a lot of times it is until the next thing comes along and it’s really hard to catch up. The most interesting thing I came across is this library itself, it pulls in the Whisper Library from OpenAI. It does something similar to what we were doing with LamaCPP where it has a quantization.

It is a little more stripped down than from what I’ve been able to see because with LamaCPP, I can run an 80-beta quantization, or a 5-beta, 6-beta, 4-beta, doing a whole bunch of different things. This one seems to be a wrapper around C-translate2, but it seems to be a little more opinionated.

It did some work and it gave you like three options of shimabu-jigenturn, which is good for me because I don’t have time to research the 800 parameters we go into some of these models.

So let me jump over to C-translate real quick. I don’t know if I dropped that in a link.

I’m not quite sure who OpenNMT is as far as the organization goes.

The interesting thing here is that they cover a wide range of models.

This was not just the GPT style, but between the encoder decoder and the encoder only, there’s kinds of things.

So they’re fast, efficient.

So they’ve got only, and this might be where Faster Whisper was like limiting to only a couple of types because it seems like the C-translate2 itself limits you to certain precision levels, which I haven’t done a whole lot of work to figure out how much of one of the things that the Lama piece would do, Lama CTP, it had ways to quantize different parts of the transformer in different ways because it would find out if you, I’m going to screw up the terms, but it would do a couple of layers at like a five bit quantization and some others it could do at a four bit quantization and find that it didn’t lose the additional precision or accuracy, I guess is the right word there. Doing things like automatically figuring out if you’re running CPU or on a GPU, that was kind of neat. These models are, again, well, they’re quantized, so they are smaller.

If you go look at the Faster Whisper page or where their repo is, you actually can see some of their memory usage between, they’ve done some studies between the OpenAI Whisper and their version using 16-bit or 8-bit integer, along with some timing.

The reason, one reason I do kind of like what they were doing, they actually spun this thing up in a specific AWS instance, limited to just the stuff to get more of an apples to apples, instead of, you know, hey, we run this on some super specific hardware that you can’t replicate, you know, it’s hard to follow up sometimes on things like that.

They even link to the actual get commit.

I’m going with the test that’s front.

I’m taking a poll.

It’s open.

Yeah. That was an intelligent session. Oh, this is an artificial, this is actual intelligence. Natural intelligence. What’s beam size?

I was going to open that, but I can actually cover that right now.

Spot size.

Spot size. If you’ve ever thought of a breadth-first search, whereas I want to go, I’m looking for the token for the next token, because a lot of these are generative gun models.

So what it’ll actually do is instead of just generating one option and then moving to that next option, generate the next option, if it’s a beam size, it’ll actually generate five different options and then five off of those.

So then five off of those and then figure out which one of those beams it wants to, which one of the, instead of looking at the next token and then evaluating that next token makes sense.

Now I’ve got to think of these five different phrases and it’s a whole lot easier looking at phrase and say, is that actually makes sense to anybody, you know, than just one word at a time. Beam size is more, it’ll cost you more from performance at that point.

Open AI is whisper modeled by default. They’re beam sizes we want. But you do, you do have a decent amount of difference if you run a beam size of five. I’ve been playing around with five versus one. And we’ll jump in that in a little bit.

I’ve tried a bunch of different options on this with the different models.

All the exact same recording of, I think it was the February AI talk we did where we talked about the different competitions, like AI crowd versus Kaggle versus, you know, some of the other stuff. I’ve got a 30 minute segment of me talking, which is bad enough. And I run it through all these different models and I’ve got the transcriptions and text form in a way that I can actually bring them up and compare.

So the best thing I found sometimes there’s only one in particular that will get the word Kaggle correct.

The others have one of them had it like champion or caption or anyway, it’s it’s been interesting.

Especially when you get into the areas of language that aren’t common terms in the text that these models were trained on, which half of the guy falls into that category sometimes. But that’s being sized.

They actually got some others. I haven’t played with this distilled whisper yet. But of course, I also don’t have a kuda, you know, with the I don’t have a 3090 I have to use something, you know, work for that. I was trying to find again the the installation.

Super simple.

It was easier than when I was pulling along the CPV where I had to tell it what version of kuda I had, what version of this I had and then pulled things separately to use it.

Basically you import you tell it what model you want, what device you’re running on and whether you are wanting to use a 16 bit floating point or a you can use an eight or you can actually there’s another way that I’m still exactly sure what an eight float 16 is. But that’s how you get it to use an eight while on a GPU.

So that was really efficient because it has to take it down and then put it back up.

It was weird. So you can give it a beam size and tell it to transcribe the file. I’ve so far I’ve used MP3s.

I’ve used there’s an N4A audio file. I actually download from zoom. There’s an N4A from the video first. I’ve tried all the different things I’ve just got laying around and it’s been able to just rip through all of them. The interesting thing here that was different is it will give you back segments and info and you can actually grab what language it thinks it is.

You can grab a probability you think that was based on.

It doesn’t actually start doing the transcription until you start iterating over the segments, which was weird for me, but it worked pretty well. Because I was trying to print my anyway. So apparently the N8 float 16 is the representative float, the N8 has a float 16 because that’s buffer fulfillment on it.

Is that what you’re doing?

Yeah.

I know it was faster because I tried both.

Let’s see.

I haven’t tried the actual multiple languages.

So multi-segment language detection.

I haven’t tried that. The batched one I stayed away from just initially because the license is different and it kind of was a little weird as to what you’re pulling in that’s different. They would make them have to change the license. They’re probably whatever this V8 model or something is probably a BSD license or something. They had to license it that way. Let’s see. And again, I didn’t have time really to get into the actual distilled piece. You can get word level time stamps, which is a bit interesting. I may play around with that.

One of the things I’ve been looking at for the transcription is specifically for like meetups that we have now. I found it, the ones I’ve got that have a transcription that I throw into the post on the website will wind up getting hits because of certain words have to be on that site or whatever. You could easily also flip that into some type of a search engine type thing where if you’re looking for a term, not only can I tell you where it is on a page or whatever, I could actually do a link that jumps into this video at exactly that point in time.

So it’s kind of fun to think about.

I’m still not quite sure.

Oh, the van.

Okay, I know what that is. We’d run across that sometime when we were looking at the, gosh, I already forgot what they called it, the Stills thing.

We were trying to do that.

Gosh, it was an audio.

It was an audio competition to try to take a audio file and convert it and pull out the drum track, the vocal track, the keyboard track and other. Again, we were doing over K to start off with and then I found out we are pushing like one submission a day to this competition because it takes so long to get all this stuff that trained and whatnot. Some of the other folks competing were pushing like 28 times a day. So, you know, so they obviously had some kind of some beefy horsepower somewhere that it was pretty clear after like the first week that there is not a chance. Keep it up.

Anyway, they covered some logging things, which was nice. I don’t know if you’ve ever tried to pull in just to Python library. All of a sudden your logging stuff is just filled with stuff from the library.

Just imported without you really wanting it in there. So it is nice of them to give you a clue as to a good, a good way to say, hey, I want you to debug log at the level or login debug level. I think I’ll do those two.

So anyway, one of the things that I was going to do was pull up a little code snippet that I’ve been running.

Let me see if I can get this to where it’s readable.

Maybe.

Oh, it has to go.

Okay. So this is about 15 seconds ahead. Drop it out. And then let me make this small. So you can forget the time limit.

It’s not a problem.

It’s a problem. It’s not a problem. So, Then let me make this small. Okay.

So, So, So, So, So, So, You can forget the boat. We’re three in the hash live. This was something else I was working with. I found a way now that this particular transcription model gives me a segment of text at a time of the transcription website. I’ve got it actually trans Files the whole file and it takes a transcript and gives that to me and he must do what. With this. I can actually stream back to the website as it’s trans right.

So that’s an interesting thing. I’m using a Amazon sq sq.

Which sounds weird when I say that with anyway.

It’s hard to pronounce and sound like it’s anyway, whatever. It’s a few topics. Well, you know, producer consumer type thing. So I handed that in here playing around with it. But that’s not nearly the topic of tonight. I also played around a little bit. You can have faster whisper use a certain number of CPUs if you want. By default it uses whatever you have available. So I’ve actually gotten slightly better performance by restricting the number of CPUs.

It has probably because I’m also running other stuff all the moment on laptop. So maybe maybe by making sure it stays a little constrained may have actually removed some resource condition it may have been running into. So I was playing around with that some.

So I’ve got a file from the again 2024 to 08 which would have been. Yeah, that was the one where we were talking about different challenges.

I was playing around with the different model names that they’ve got. So you’ve got a similar to whisper from open AI.

I’ve got a tiny model, a base model, small model mediums and large.

And I’ve got three different versions of large because they’ve been kind of tuning it over time.

Each one of those is is a different size on disk and each one of those is a different.

I’m not quite sure the parameter count for each one.

I should have probably looked that up.

So anyway, I grab the speech file and then I’m also logging the transcripts using the model name device with compute based namesite naturally compared to the actor we’re done. Then of course I call whisper model.

Tell it to transcribe.

I’m currently use the beam size of one across the board to stay consistent.

And then doing basically the same stuff that was in there.

You know how to on your main reading page.

You’ll see something a little different where I’m grabbing the date time stamp and then doing encoding.

This is something I was doing to actually generate a key I could use for that SQSG.

So you don’t have to actually worry about that. I mean that commented out for the timing runs I was doing. So I’ve got opening my log file as a file.

And then of course it doesn’t actually start doing the transcription until you actually start iterating over segments.

I’m actually printing out the text. I’m actually printing to console. I don’t know why I’m not using the lawyer but printing the console as I go and then right into the file. And then that’s it. So let me go back and make sure I was using because I’ll show you another thing I ran into as well. So let me go CPU. And we’ll actually do.

Let me go with the base.

English. They also have English version only of some of these that are slightly faster. Of course I am speaking English well mostly and it’s also a little better. So if I run this now.

And it’ll actually it’s actually a wrapper around that C translate to and it will also reach back to hugging face and pull whatever model it is downloaded for you, which will get you into trouble in the next thing I’ll show you.

So.

It’s.

Do you find some phones.

Yeah. Is this for debate last night. No.

No.

I may do that.

Oh, the time stamps from the metadata of the file or is this for handling the type sense.

So within the file that phrase was in the same of the file itself.

Okay. You’re doing you’re doing audio.

So audio to text. I gave it an audio file and it’s telling me what was in it. And what’s telling you like say you have a white zombie. So you’ll tell you that times and which one was on the city. Yeah, which times.

What’s a train union a car train union there.

I don’t know.

Okay, we’ll go back in real quick.

Okay, we’ll go back and run the iron go ahead kill this one.

So from what I’m going to tell this this equates to some of the stuff I have done initially with the open a is version of whisper anything smaller than small.

Is pretty much unusable unless you just are just looking for some words of words that were probably mentioned.

So smaller be like what token.

The small is how big that model is how many how big the network is inside that way.

Okay.

And what’s interesting the parts lawyers where you see the word capital instead of cattle.

If I read this and I go back and I listen to my own session. It makes sense. I actually talked this way. Oh, it’s like those sketchy translations on the Instagram videos. Pretty much can this do by string like it’s close to. It’s it could you would the actual large if you are if you actually have a TV.

And you’re running large model it can actually keep up with real time. Which is a little different from the open air.

Actually, I think I can go faster real time with the large.

It just batches in 30 second shots.

So let’s say one problem of the fly one is it be a little bit late.

I’m not sure if there’s a way to on that under the.

There is a there is an add-on where they kind of operate.

Yeah, but yeah. I think it’s a little bit different from the open air. I think I can go faster real time with the large.

There is an add-on where they kind of operate.

Yeah, but yeah. So still like it’s self does not do that. Yeah, that’s what I was wondering for the time stamps because if you have like a really long file, assuming you had enough compute, you could break that up into smaller files.

Yeah, I’ll let you do it. Yeah, a spectrogram of that 30 second chunk and that’s how they’re. Okay, cool. So then like the open AI is API for whisper.

You actually have to break up the file yourself and ship segments over and you get text back.

So you have to do a lot of that, you know, on top. Let me kill this one. So just to show you work. The other thing I ran into that while I’m while I have this window open, I’ll show you what I ran into.

If I change this over to a CUDA, which runs on a GPU and I change this over to.

I forgot what the. What?

What is the.

To 16.

See the first thing I’m going to get is actually an error.

Which is written up on their issue board, but nobody has really quite gotten into fixing it yet, even though I mean, yeah, the bunch of workarounds. So what I have to do is go figure out where this library is. And kind of in these.

It’s in there somewhere. I just happened to have installed something that uses that particular version of CUDA before.

So now I can.

Now there’s better ways to do this, but this is the one I know off the top of my head.

So I’m going to be this way.

I’m going to export the library path.

Work to me or your great work directory and then back.

And I’m going to go back to where I was. Whoops.

They’re on the same thing again.

So now it should find that library and be able to to actually load and get going. I probably lost an hour just trying to figure out what the heck. But it’s not the first time I’ve actually been in a. In a mode trying to use Python and has C libraries underneath the bottom line. So it’s a little tricky. The interesting thing, I believe this one probably has.

Let’s see if it actually picked up cattle.

Quite sure where that was. No, it still thinks capital. One of the models, I don’t remember if it was a medium or something actually knew the word was capital.

The other cool thing about whisper itself that was different from the previous we’ve tried before before whisper came out. We’ve done we’ve done sessions on deep speech. We had done sessions on trying to remember what that was. We had George in the group, George’s death and we were working through some, you know, we wanted to do the lyrics thing where we were streaming videos and all of that. We were trying to basically build our own transcription thing because their video service did not include caption. So we’re trying to build our own and it was hard. It was so bad. We didn’t really use it. I think it was mute because he does too. He did. Okay. There’s a video we have designed if you want to talk to it.

Yeah. Right. Yes.

I always thought I could hear.

I couldn’t talk. No, he can’t hear. That was the and that’s where life changed rapidly. That was the neat thing because you could talk to him and phone and read it in the community thing. That’s how we all communicate. Oh, if you heard of Arab Boile.

No. I think it’s a Canadian. It’s got certain polls.

Yes. I have a few folks through from that.

That’s pretty funny.

So with that, I want to flip over to wherever I had this thing.

So I did some, some runs using the N8, the floating point 16 on the GPU and the floating point in eight.

That’s the text post and running the same file on the same laptop while watching college football games.

So CPU time walking from Tiny all the way up to the biggest, you know, your range is 41 seconds.

And this is a 30 minute audio file.

Of course, like I said, tiny and base, pretty much.

You’re going to get work suite out of the same. Small English and the small is actually usable.

It’s close enough that if you can, if you’re willing to forgive the couple of words that it got wrong or whatever, it’s fine.

And you’re basically you’re nearly 10 times faster in real time as far as that goes. Even the large mall on a CPU while slower than like real time was, I still can’t believe that actually finished. I tried to run on my GPU on this this laptop, which I had 481 memory, like GPU and it says no.

The medium ran fine.

Everything else when I got into the tiny one, all I got was explanation points. Something in there is just so screwed up. It’s like, why are you doing this kind of thing? Running it with the trying to run it with the N6 and eight.

It actually wouldn’t give me anything on the two small models.

But the interesting thing for me was even running at the integer eight with a large version three of this thing.

I’m still running faster than you know, that was on this laptop.

There is no way I’m touching that with the actual open AI was from on which that was that was pretty interesting. And the thing that really got me was when I was using the open AI was from all. I had to have the transformers installed. I mean, I had to have a boat load of stuff installed in order to be that I had to have FF MPG impact FF impact to actually do some things before I send it over to that model. And so that’s one of the reasons why I have that thing built into a right now that transcription service that I have. This is actually a repo for it. Also, if you join up on the house way I get right now, this is not public public, but if you join the order, you can see this is nothing. Nothing super, super duper. So my actual requirements file looks something like this, BodoCore, CVE, FF impact, Python, you know, I mean, it’s it’s a fairly large list of things that I got to have to run that for faster whisper.

I actually had this in the container as well.

That’s my requirements file.

It’s also and I’m not currently loading model I ran into some issues, which will cover in a second, but it occurred to me that right now when I have this thing running lay of us. I have a container shoved up into Amazon container history. It’s around.

I think it’s around a gig and a half worth it. Let me see if I got that here.

Let me go back over and start Docker.

I just look it up. Where was that?

Oh, my bad.

This is where I actually let them stand by.

Okay. So the actual transcription model.

This is the, this was the big one.

You’re sitting it. It wasn’t a given. The one I’ve got right now for the whisper one is the one that’s a gig and a half here for faster whisper. The one I have tagged down here is transcribe. That’s the one that I have open a eyes whisper model in that’s over a four gig image.

So when I got this thing and I’m using Amazon Fargate to actually deploy this thing and hook into it and spin it up when I need to.

Because that was the key that let me run the transcription service without paying a boatload of money. It’s not actually doing anything until you upload file. He’s I want it drops in an S3 bucket, it kicks up the container, spins it up and actually does the transcription, sends the file and then it all goes away. Which also helps for the security standpoint because I’m not keeping anything because I also don’t want to pay to keep the things. Spinning up that four gig image and fargate can take three to five minutes. You can imagine as a user sitting in front of a website trying to get something transcribed and it’s like three to five minutes before it even starts to do the transcription and then you’re trying to do it.

It’s not great. But then again, it’s cheap. And so that’s where I was trying to find the market and that if you don’t do it like right now, it’ll send it in your email when it’s done.

You can kick it off the walkway. I was there watching. So anyway, what got me when this thing is now at a gig and a half, that’s probably spent a whole lot faster. And then it hit me that, well, this is small enough. I could run this in a land up instead of actually having to run it in a fargate task because if I run it in a fargate. Now I’ve got to have a provision network to attach it to.

I’ve got to have it.

And because of that, I have subnets attached to it. I have to have security groups.

There’s like a load of stuff.

I wasn’t trying to get this thing to run. If you actually need help and all of that, if you actually join up, all of that is in Terraform where it deploys. I’ve actually got it now hooked up. Yeah. Here’s all the things with Terraform and it’s actually set up to run.

If I merge a change to any of these pieces back to the Dev branch, it automatically runs a pipeline, recomputes, does all the things, shoves it back up to AWS and restarts that service.

Yeah, I’d have to have myself cruising with Terraform.

Yes.

It’s a love hate relationship. I do like that I can run Terraform plan and it will tell me what, you know, what’s there, what needs to be there and all.

I use it similar to how I use it.

I do a bunch of stuff and I check stuff out and then I do it similar to a get reset dash dash hard. It’s like I have to hide this through this. I put it back like it used to be. It’s great for that. So I actually wound up putting this thing into a, into a Lambda, which amazingly enough, actually works. It actually worked pretty well. So I thought I’d cover that real quick in case somebody else is running into something similar. I can nearly see the trend if this is actually there’s a couple of things they could do differently with Lambda. It wouldn’t surprise me if they do at some point because several years ago they had a limit on how big the container image you can use was they had certain limits that they want to change. They want to change it over time. It would be interesting to see them change some of these as well. If they ever make a GPU available with Lambda, then we’re off the races.

I mean, it’s, it would be that’s not it wouldn’t have to be a big one.

They can take some of these old GPUs they’ve got sitting around somewhere. Yeah. What’s the key numerical thing that’s going on inside of which one.

What’s what’s happening?

What’s the key?

So the 4A transform is a table lookup.

It’s a VNLP.

It’s a NLP counter.

It’s a in-carrier.

So this is going through a transformer’s block.

So it’s taking the attention of a spectrogram of 30 second jumps.

And then it’s deciding, it has two models, an encoder and a decoder that are both doing attention on that at the same time.

One at the raw audio going in and one for the tokens coming out.

And so the decoder is taking that into a latent space and then running it through all of its different layers.

And then the decoders doing the same thing out to whatever the model is.

Is it conceivable that could be put into an FPGA or a set of FPGAs rather than the GPU?

It would be a pre-incorporated into FPGA. Because if you’re using 8-bit integer arithmetic, it’s very simple to do the 8-bit arithmetic operations in FPGA. I think it’s the size of it.

As in how many of these you’re doing chain together.

That may be prohibitive to get an FPGA that would fit that.

Anyway.

If you have a specific use case, that would be interesting to see if you could fine tune a tiny or a small auto.

Or even two specific audio that you carry out. Because you don’t need to know everything.

You just need to know some specific subset. You reduce the number of tokens and it’s vocabulary and fine tune it.

You may be able to get it in better performance. But it’s going to be very limited in use case in that sense.

It’s like an IOT or something like that.

If you’re a tiny one, if it only needs to understand yes or no or pay a person, then maybe it’s okay. When we had Scott here from Massive, Tashacore, always mess up. There’s part of the lesson we always mess up. But he was talking about a need for having actually like this kind of audio transcription, but on the space session. You can imagine the power of requirement that would be, is there a way to do this? Because they were trying to figure out how to take certain verbage or certain language or certain things from astronauts and automatically detect whether there’s a stress level or a, you know, think of synonym analysis on voice.

Instead of active translating the text, which may be better off just going direct. Anyway, because I could say the same thing when I’m mad as I say when I’m not mad. Just because I’ve been working in the corporate world for long enough to know not to say things that are in my head sometimes. So what I ran into is even though whisper faster whisper has a much smaller footprint, it’s still too big to run directly as it landed. The even with the layers and all because it’s it’s still over their 250 mag size. So of course, they also have a way to instead of using it directly, you just create a doctor’s spin the doctor image up and it’s really not hard at all.

If you’re used to using doctor and used to building images and stuff. They even provide a large set of AWS images that probably have most of your base libraries that you’re looking for.

In my case, I went with a, I think I’m using a Python slim 3 out of 10.

I think.

Let’s see where that was up here somewhere.

Oh, three out of 12.

And then my lambda function, which will cover part of it in a second, I’ve got a lambda handler.

And my doctor file says, Hey, call lambda function that lambda handling asset, not hard really at all, except for a couple of gadgets, which will cover the second.

But the other thing you do when you create a lambda, you tell it, you don’t tell it how many cores you want, you tell it how much memory you want, and then they compute how many cores based on that much memory.

So for my case, I want as many cores I could get. And I definitely don’t need that much memory. So you can imagine my I’d like to set it up for 880 880 8486.

There’s some other things I started looking at.

When I thought I cared a lot about it, but I didn’t.

It, while it’s not being that big of a deal for me. There’s also when you’re running this thing. It’s got this kind of a life cycle it runs into.

Where the first time that you trigger it, you’ll see a little init phase in a log.

And for me, the nit phase was was 10 seconds.

So at that point.

If it’s a 10 second it but then after the first time it’s there the next time you trigger it, it doesn’t have to do that. So if but init phase was like a minute, I probably would start digging into it, but it wasn’t enough for me to worry about why this much. Also, if you’re interested to see what they put into their image. I mean, it’s, it’s pretty, pretty interesting. You can actually go look through their set of base images and that was kind of neat.

I like it when they’re transparent like that. So the other hidden constraint that I’ve marked in here.

So you get your documents, you run it, you send stuff to it, and it’s doing its thing. You think it’s working. You shove it up and you create a lambda. The first thing it says is it’s complaints about some read-only files. Well, I just ran this locally.

What’s what’s the deal?

When AWS loads your container and lambda, they do it with a read-only file system except for slash 10.

That is the only directory that you can actually write things to.

So your normal Python libraries, especially Huggingface, they just love to put stuff in a cache directory underneath your home somewhere, you know, or in the Python install.

Oh, yeah. Do I?

No, it’s okay. So in order to test this, the magic Docker command is Docker run and the new talent dash dash read on.

Except for this volume you mount to your own slash 10. Then it will run your Docker container the same way that it’s running on AWS.

And you can actually define these nifty little things.

So, you know, with that, you can actually exact into the image as it’s running.

Go CDD and your different directories traveling things by hand. It really troubles you a whole lot better that way. So it was that the other thing that got me when I was running initially as a lambda, I wasn’t seeing anything coming out of water.

Even though I had told it to set to an if I would upload it and if I was right into the error, I would see it.

If I wrote to the info level, I wouldn’t.

And even my normal thing is normally, hey, set your basic config to info there, thing after I get it.

When AWS, when lambda loads this thing in, it connects a handwork to the logger so that it can output that over this thing called cloud watch, which you’ll also know if you ever play with that, which bypasses the whole by the time I call set config basic or actually basic config on this, it’s already passed that whole point.

So what I have to do now if you’re going to run this, this doesn’t just win a container, but you’re going to run anything to lambda.

If you want to get your login set up. First off, you need to check to see if it has handlers.

And if so, get the base logger and set its level.

Otherwise, you can be your normal thing.

So that, that took a little time to figure out the other thing when you actually go create a lambda in Amazon system, it’ll say what container do you want to test it with a container.

So you go it’ll let you actually look up your, your list of containers and your registry, you can pick this one, it’ll spin it up, you can do things. That’s great.

And then you’re going to go change stuff and then I want to change my function. It’s not doing what I want. I’ll load my new container and I trigger my land of again. And I lost a solid half a day probably trying to figure out what the heck I’m making changes, but it just doesn’t seem like anything’s working. Well, your lambda that you have is still using your old image when it ran and did it’s in it phase it grab a piece and actually instantiated that that’s what it’s like to until you actually go, you update your image until you go touch that thing and say now choosing the new image. It’s still using the old it’s almost like the change your code and forgot to require. Which I’ve done that more times I’d like to admit as well. That cost me a good bit of time. And then the other thing was actually a faster whisper piece. When you actually go into the lambda function.

In this case, where was this.

It was nice enough to tell so I can tell it when I tell you what model to use, please use slash chill as your boundary. At that point it was it was nice to have you I didn’t have to tell it. That’s all I had to do. There’s also a way with whisper or fast whisper.

You can actually give it a directory that you pick.

So if I wanted to as what I was playing around with initially looking at timing.

I could actually put the image in the container itself or in the doctor image and push it out so it doesn’t have to go reach out to a place at all. If you’re in an air gap system that’s going to be mandatory. So at first you can do things like it’s let me go back to my other. There was that. There we go.

So for here I could use what I was trying to do was model.bin, which is what they call that’s what they name the bottles that they download.

And I gave the full path to where model.bin is and it kept complaining about the something about the format of the of the hugging face repo was wrong. Doing exactly what you’re telling me is telling me and I know this while I was here when I what it wants isn’t the full path to the model binary what it wants is the full path to the directory that contains the model binary, which is totally not using it off. Which that that took a minute.

No, but let me actually go. I can text on the third party host or they don’t have a phone. I don’t know where that goes to stuff. Just probably was a problem. So the other thing I wanted to cover real quick after we stop video stuff all log into AWS and got to show you kind of what that’s looking like. And it was I have all of these transcripts. To get more space. So I was able to do things like look at the difference between CPU in eight and base in eight. You know just looking at compare selected.

I thought that would go side by side but maybe I might notice.

Oh, I started. I should have done what I did.

And if I some I definitely haven’t messed with since then. This one. So I have a brother.

Basically I restarted my transfer files.

So the medium from Cuda I was checking see if the Cuda part.

Reminds something different. Having trouble with the size of each one. Let’s try medium versus small.

Definitely got these differences there. By the way, the smallest 244 million parameters and the medium is 769.

Okay.

Or just it’s 1.5 billion something is finding a small one that I haven’t blown away by doing something stupid.

Let’s try. Let’s try this. Okay. According.

I really wish it would start doing the pop up thing.

Yeah, let’s do this one first. Let’s give you this one. Capital versus this is the one that actually knew that what we were talking about was cattle.

So occasionally you see it actually pick up a better, better form.

So now this knows that we’re talking about Google co lab instead of co dash lab.

So if I can find a way to step up to the medium model or probably will get me over the small.

The other thing that I haven’t played around with yet, which is actually using a large language model behind the transcription. So you do the transcription first, and then you use a large language model for like five segments in this text that are out of context or that don’t match the. Yeah, I mean, there’s a couple of things you can do to actually go find the things and then actually possibly correct later. This isn’t intended to be a word for word transfer what you would need in a court case or something. This is more of a fun. I’ve got this recording from my grandmother talking about, you know, growing up in Mississippi. I actually haven’t done that. Yeah, actually they wrote it down. So I don’t actually have to try to write it. But some of that might be, you know, an interesting piece to follow on. So that is what you should do.

You should take they just started open sourcing the raw audio and text in. True. So they just released one of those so you could take that and take in the raw audio and see how well it’s doing against it. You probably make some fine tune data for that to interest things there. Yeah, the whisper actually has a fairly good mechanism for fine tuning their models with I mean they provide that with their different. You can go down on the transcript.

You could say it was my father who my transcript these words were these times now go fine tune. And they’re smaller models.

You don’t have to have a giant horse horsepower thing to run that on.

So if you actually did have some number of transcripts that were actually you can actually go pay companies to go transcribe things for you. Usually I did a price at one point we were somewhere between someone more kind of crazy several more likely two and a half, you know, dollars a minute worth of costs to transcribe cheese. But they are also guaranteeing you that your word rate in word error a small.

You can look at doing something like that to bootstrap enough transcripts to then go fine to a model and then running.

You’re saying small size models.

And then you know speak like T2 small T3 large.

I don’t know what right now I’m running it on a six course with 10 gigabit brand.

Which are just selecting you know then Amazon somewhere else calling for that.

No, I mean that is what I say when I tell Amazon the function I run around and registry it as a function I have to tell it what memory I need. And based on that Amazon allocates this number course based on where what my initial action was. If I didn’t want to run in a lambda actually want to set up a DCQ instance or something and then have to look at those just shoot across that would work.

But I did the pricing and currently to transcribe that 30 minute segment audio would cost me roughly five cents.

So there’s here’s that.

Of course that doesn’t include all of the other infrastructure, you know, that you have it’s definitely cheaper to go lambda than to this parking. I don’t require to be there’s several things we did this like last year sometime to take this thing and then spin it up as a serverless type approach and AWS.

You pay for the compute that you use.

You get nickel and dime on a lot of other things. It’s like, oh, you need a connection as three or oh you need a connection to the Internet or oh you need to connect oh you need a security group. Oh you need a this or there’s some things that are free. And then there’s other things you look at your, your cost at the end of the month was like, wait a minute. And it’s nothing like super duper expensive. But if you’re like trying to get a bare bombs thing all of a sudden figure out you’re paying $10 months just for the Internet piece that connects me to my ideas.

Yeah, or oh you wanted a static.

You want to turn off my account.

It was still a $30. Right. So, that’s what I’ve got so far. I’ll act on a disconnect and I’ll log into the, you know, to the, before I do that, let me go make sure nobody actually popped on.