Mixture of Experts: Harnessing the Hidden Architecture of GPT4

Recording of Huntsville AI Meetup

Transcription provided by Huntsville AI Transcribe

Yeah, sure. That’s fine.

Yep. No worries. Yeah, actually I’m going to copy before I post new things. That’s like, you know, okay. All right. Thank you. Thank you for your recording. So let’s go back over here. All right. So while we’re around, I kind of got a gauge on the folks in the room.

So it seems like we’re pretty wide as far as our experience level.

So we’ll be going through kind of from the ground up.

What mixtures of exits are as long as well as some sort of basic primers on the core concepts around neural networks and before layers is for the central things that are important for the mixture of experts architecture.

And so yeah, this is going to kind of be all over the place. So feel free to raise your hand, ask questions as we go through. And the idea here is basically you don’t have to understand every single thing inside of this talk. But this is mostly a here’s all the different things for you to go look at. This is a giant emergent area right now.

This is a new ish architecture that’s also extremely old, but it’s now coming in and go again and everybody’s actually trying it out and it’s changing in new ways and it’s changing almost every week at this point.

So this is called mixture of experts harnessing the hidden architecture of GPT for the reason behind that title.

It’s a is that GPT for is supposed to be using this architecture, which has kind of led to the flurry of activity around this thing.

Since that got released in March and now a lot has happened since then with this architectural pattern.

So can we go to the next slide?


All right, so what we’re going to hit is what is mixture of experts first, then why do I care?

How do they work?

How don’t they work? And how can I learn more than go to the next one that mostly hit that.

So the first thing to understand with the mixture of experts is that it’s very deeply tied to the scaling laws of neural networks.

Scaling law, which basically means how many of you guys are familiar with the bitter lesson? It’s a short story or a dirge basically saying that we spent all this time working on machine learning algorithms and finding too many things and doing clever things and doing all this all that doesn’t matter.

You just need to make model bigger and you get more compute and you need to increase basically your budget. It’s like that’s it. It’s all the things that you care about. People spend years and years of their lives at right now. It’s the bitter lesson is that it doesn’t matter. You just throw more compute at it. And this is kind of us trying to be clever and working around it to sneak a few cycles ahead. And that’s really what mixture of experts is working at walkup. So you can see here, we’ll talk about this a bunch along the way, but we care about compute, which is also correlated with like flops, things of that nature, dataset size, which is a lot going on with that right now and parameters, which this is whenever we talk about parameters, something we say like this is Lama 7B, dbt is a 250 billion parameter model.

That sort of thing is what we talk about as far as model size, which is a big deal in mixture of experts.

So we can go to the next slide here.

The next major concept that’s tied to mixture of experts is the concept of a feed for neural network.

This is basically the bread and butter of what makes a lot of these neural network architectures work, which is it is a basically a layer of artificial neurons in which things come into it and things go out of it in a different transform state.

And then there’s billions and billions and trillions of dollars changing what is going in and what is going out essentially. That’s the AI boom that’s currently happening right now. And so something like a transformer, we talk about the transformer architecture.

That’s basically a special series of neural network layers.

So this is the feed for neural network of which the mixture of experts is really just a very specific kind of neural network feed for layer. We go to the next one. And so how do you feed for networks work?

They start with and if you guys haven’t seen this before and you’re interested in transformers, large length model, this would be a very, very familiar graphic right here.

This is from the transformers attention is all you need paper and it’s defining like the core transformers network. So you have inputs going in.

It goes into what is called attention, which is just a kind of layer. And then they do some normalization, which is basically that it evens things out among all of your weights. It goes into some sort of feed for layer. It evens it out and does it again.

It does it again. Does it again. There’s many times as you have layers. And so this is what it is.

So it receives that data in each hidden layer performs computation that passes outputs the next layer.

Hidden layer is essentially just some layer beyond two.

In a neural network is a very easy way of saying it is something where it’s a little bit deep.

And it’s not directly on the output.

The output layer produces final result.

Within every layer, each neuron is connected to every neuron in the next layer. So they’re all it’s very synchronous for the layers and they have numeric weights that are tuned as the network is trained. So things inside of the layers are what gets trained. We’re talking about training models and every neuron receives the inputs and basically multiplies them forward shifts the values around.

And that’s basically how you get your final outputs and tensors. So that’s what a feed for network is.

Any questions at this point?

Essentially stuff goes in stuff comes out robots. Because it’s a yeah, this Alice yeah, this out as magic. All right. So when you say a mathematical function and you’ve got words, what does it do to the word to make a mathematical function apply to it?


So basically, there is an encoder and a decoder on either side.

And it depends on what sort of model you’re using.

But basically there is a text embedding where it takes whatever your input words are.

If you’re doing an input query, it turns those into weights.

That weight will depend on what sort of encode strategy you’re using.

Then it has a decoder on the other ends that will pull it out.

And basically depending on your model, there’s a whole bunch of different ways of how it comes out with that final way.

But the same the word layer going in is always represented by the same set of inputs to the system.

If you have if you have the same layer, if you have an atmospheric layer, a skin layer, a clothing layer, that word always has the same representation going in.

So the layers are not concepts within? Well, I’m using layer as a word. Okay. Okay.

Just as I should.

Oh, right.

So it depends.

So that’s why I say it really does matter on what’s called an embedding, which is a very special set of vectors.

Sometimes you can have stock embeddings.

Sometimes you can train your own embeddings, which that’s a lot. Basically, I’m going to be in a PDF and it’s, you know, whatever my strategy is for doing that, it’s going to detect the possibility of relationships between these things, the semantic meaning based on my batches. So there’s not a hard answer. It really depends on what you’re doing. Okay. All right. Thanks. I’m going to give you a chance to actually use like dagger or Kubernetes to have a neural thing on one layer, deep learning thing on another layer.

Okay. So like a distributed sort of, yeah. Yeah. Is this from that scene in architecture or is this like another layer up from that?

It’s all custom software layer.

Are you talking about mixture of experts or the neural? So there are some people doing distributed computing with mixture of experts including somebody who’s doing something really interesting, which is basically leveraging the new GPU, web GPU sort of whatever that thing is. And they’re basically posting each expert on one Chrome laptop sort of thing and distributing it out that way. I don’t, you know, I think to me it’s a novel novel thing. I don’t really, because you’re really adding a lot of network traffic because you’re everything that’s going on right now is basically how can you cut down the distance between your routing function?

And so you can do that if you’re constrained and I’m sure there is some sort of scale in which it makes sense. And if you have enough idle compute, but you know, I think plan nine is working around somewhere with this.

I guess for some, yeah, yeah.

There are use cases, but they’re really, you gotta really, that’s a complex. Yeah, you gotta really need it. But yeah, there are there are some people doing that sort of thing. So it’s worth looking for.

All right, why don’t we move forward?

So next we’ll go into what mixture of experts is specifically.

It is a neural architecture that consists of an ensemble of specialized sub models called experts that work together to optimize the overall performance.

And basically what it is, is that this is a recursive, it’s also a recursive layer.

And it is a layer that contains a mixture of a whole bunch of different layers, which are basically smaller models that have their own training set that gets switched between depending on whatever your input vectors are. And so you see here, this is just a normal linear layer.

So this would be like, where are you doing like a RELU, any sort of those normal sort of things.

A feed forward.

And then you have basically each one of these tiny models and some, you know, so we’re seven billion model will see these sometimes be like, you know, there may be like 225 megabytes, instead of another seven billion.

And so what you’re really doing is sharding your model size among a large number of different experts, and then activating them on demand. So it gives you a larger overall model weight while at runtime and during training, you’re only activating certain subsets of it, allowing you to kind of do things like put it. And so that’s that’s the idea behind this sort of thing.

If we go to the next slide, I think we’ll talk about the gating network and the very critical thing about this, this is where all of the differences for the most part in mixture of experts experts exist from an architecture level is this gating network.

It’s also called a routing function.

And it’s the major difference between like switch transformers, sparse models, mixture of Laura’s all those sort of things.

And this is the way it is another layer, essentially, it is a trainable thing.

So it learns along with the rest of your model, how to choose which experts from your batch of experts, which one is going to be the most appropriate to activate based off the input query.

And so we’ll see in a few of your further slides that you might have, for instance, say expert one is trained on accounting, expert two is trained on HR, expert three is trained on, you know, operations, logistics, that sort of thing. Another very interesting thing is that you’ll have like expert for the trained on safety, trained on alignment trained on, you know, these different security things that we might care about. And so you have those sorts of things that exist. We had a pop up from somebody that says, zoom chat open, I can’t open long show full screen. We had a comment from Matthew just a second ago asking what language mixture of experts are being driven by. So it’s, I mean, Python.

So Python is where a lot of the implementation of this is but it’s, there’s always going to be implementations and see that related as well.

So, yeah, I’m assuming that’s what the question is.

Yeah. Any other questions on this slide, we’ll hit all of these sub components for gone.

So is it truly burrowing there’s no image.

Oh yeah, oh this image. Yeah, it was completely multimodal. There’s actually one of the models of the architectures will go through does not work on decoder only models like the GT only works on the other thing that was a little counter to me.

I was thinking mixture of experts you have the experts that would be trained separately on all of their tasks and then we combine them together and graph which way but there is one that’s like that. Okay, yes, that’s actually the one that I work on. Okay, so mixture of Laura’s mixture of adapters, but this one in particular kind of what you’re going to general approach is to train this model as is this way.

Yeah, right.

And that’s how you get the getting that we’re trained in saying kind of ways that it’s also, I can imagine if you have your router trained differently than mixed in your experts and you’re out of the wrong thing but the wrong expert. You’re going to be asking your HR person how do we software design. Yeah, it’s the ones that are out there in production use right now or was called the sparse model, which is all trained together.

All right. All right, so the motivations and goal of moe this is a bunch of complex stuff where’s the rub. And there’s a very big row and that’s what we care about it despite all the complexity, which is that we can increase our model capacity without a proportional increase in costs.

Going back to that very slide is scaling laws the only thing all the other things that matter.

The only thing that actually matters is increasing those three parameters.

This is finding a way to decouple the one we like the least cost, which are flops, you know, whatever that that computational budget is, and increasing our model capacity along with it, as well as being able to increase, you know, even the distribution of our data set by being able to split across the models as well.

So that’s really the big road.

So we decouple our parameters.

And the also other big thing is that we reduce inference speeds while scaling our model size, if we fix compute, because we’re not at this point now going to all of the different parts of our model.

So if we take 10 experts and take a seven 70 be model down to seven be.

We’re going to be inferencing that seven be.

So it’s going to be a faster inference speed.

And there are other ways of getting this I think this one is becoming less interesting over time as for a nice benefit because there’s other things like compensation, BLM some of that page attention sort of stuff that’s going on. But it’s a nice benefit that comes along with this. The model class is though it’s just that’s the rub.

So this is the benefit of decoupling parameters, decoupling money.


That’s the thing parameter you you ever try and run Falcon 180 be.

You can’t.

But you could potentially run something equivalent to that on a 4090.

If it was implemented in the extra records.

So here’s kind of the big fun thing around a mixture of experts.

Everybody knows it’s using this architecture.

They won’t say it.

It won’t confirm it. It’s what they’re using. Everybody knows it. Nobody knows any of the actual details. It’s one of those things you can get Luigi and Mario’s before one of the things it’s just, you know, So the model is very likely to boast about 1.8 trillion parameters.

We thought GPT 3.5 was somewhere near 250.

And we think this thing is essentially just 8 GPT 3.5 stack in a box together of which one is the safety function and there’s seven other things in it.

So if you look at the 7 or 15 or whatever it is, it’s very likely that GPT B is just one of these with a vision encoder expert sitting in the top with a high weight.

It’s basically if you look at what all the people at a pin I have been doing in the up the coming up into this, it’s been stuff around this.

And I’ll talk about this. I think a little bit later on.

Have you guys heard about this arachis build of GPT. I’ll talk a little bit once we’ve seen one of the architectures where it’s, they basically they’re putting up all the stuff this month about this new model that releasing that’s going to be cheaper and everything else. And then they couldn’t make it work.

And so they very quietly pulled it all back. And there’s no longer arachis being released. And then the reason seemed to be something that we lay out later in here of the soft mo architecture, or everybody thought this thing was going to work. It didn’t work. And then also they pulled their launch back so we’ll go into that a little bit further on. Just know that’s the big thing there. You can make big stuff with it. So the most structure layer contains the expert networks.

Each expert is like a fully connected neural network. So you basically have a neural network of neural networks. There may be up 2000s of experts. I’ve seen one paper where they have over 1000 networks. That’s not real.

That’s good. Do you have to have all these experts loaded up at once? There are lots of people trying to get it dynamic right now. It’s actually something I was working on today, doing that through a fast API application.

So yes, it would be very nice if you didn’t have to do it at startup. Most things do it at startup right now.

So they load all of your experts together.

And it’s slow.

It is very slow.

But you don’t have to do it on every inference. You just have to do it at load. So the gating network selects a, so yeah, and here, 1000 is one of the numbers.

We generally see 832.

I’ve seen four in some examples.

That’s a really round one.

We start getting a benefit is round four.

There are some projects I’ve seen successfully do up to like 256.

But there are diminishing returns as you go up in size.

So the gating network selects a sparse combination of experts per input.

We’ll go into what that means later.

It also learns a specialized routing function during training. All right, let’s go forward.

All right, so whenever we’re talking about the gating network, this is what determines which experts to use for each input.

And the input here we’re talking about is taken by token by token in general.

It will vary depending on what gating network you’re using.

But mostly we use the token choice sort of thing where we’re going by token.

The outputs are a set of routing probabilities one for expert.

You see that here when we’re talking about the inputs and outputs, these are the scores that come out.

And some of them will get normalized and how they get normalized differs on the architecture.

So if everybody uses softmax to determine the probabilities for each expert, the additional logic is usually needed to reduce the dense complexity and how they do that is different between each model to and I actually have all of the basically the big architecture diagrams we can see how they’re using that final softmax to get your weights.

Basically, it’s a function to pull big, big, big numbers down to small numbers so that we can reason about them effectively.

It’s almost like a close sign.

It’s pretty simple.

You just think about it turns and say just a string of numbers between negative and positive infinity squeezing them all to some open one.

This is the first one.

This is the og, at least of the current batch.

This is the sparse activation sparse mixture of experts is very likely to be the least close to what he is using there obviously doing some crazy special stuff themselves.

But this paper that this might have been the first paper I came across this is the keystone paper is the one that started this one there’s a whole bunch of other papers. It’s the most popular. I think it’s 2017 2017, which is a little bit because this is like brand new stuff but it’s based on 2017 paper that somebody pulled out and went, oh, yeah, useful now. Well, if you look 2017 to 2021, there’s a whole bunch of little small tiny specific open a I papers, all up into the run up of like here’s this little sub component. Here’s this little sub component.

There’s a good geography.

Huh. Yeah, I wish I could get send in with you on the email.

Yes. That’ll, if you go through that and then go through all of their abilities, you’ll have most of everything.

I do include a few of them here, but it’s a big list. Wake up a lot later. Yeah, that’s right. There was a healthy list on Facebook. Yeah, that’s your that’s your primer. That’s the intro basically is what it is right now. So, sparks activation.

The big thing with this one is, it’s a top K. So it takes the experts that have the highest probability.

Usually it’s two to four is kind of what we generally see.

The big thing here is that it is you’re basically taking a, a set of them, and then summarizing those values together and using that to get your final output.

So you’re using a ensemble of the experts to come up with your, your final output. That’s really the big thing that’s in this one you’ll see here. That’s the thing that we’ve been looking at already.

We can go to the next one here.

I think each one of these yeah. So the issues with this one this one that we have the most information on. The main problems of why we work just like cool sparks activations good let’s all end it and go home is that their problems.

Number one is the problem of token dropping which is basically, since we’re going token first, you can have, say for instance, this one expert is super awesome everybody likes this expert the router and discovers that if I send things to this expert, the users like it.

So it gets all the tokens and it might get more than it’s allowed to have, since we have so many where we kind of constrained the amount of tokens and go to something.

And how does this thing deal with it.

That’s all just drop some of the floor. That’s all it’s first mixture of experts does not great. So that’s kind of a problem.

It also happens during training, which is more often that you’re going to have those full limits of experts and things where you might have something on every piece of data, because for those that do not train whenever you’re training you basically batch things up to your token limit. So say I have a 4096 model size, and I have a whole bunch of data examples that are 200 size and like, I might back to a whole bunch of those together. And train it to essentially be able to respond that link reduces the training time for those things that we do for a lot of different reasons.

And it’s a problem that’s bar Smith trip experts, if you don’t do some other things to counteract it, but those add their own complications so it starts that I think it’s more context for you using token token is basically is not like a character but it’s like a subset a meaningful subset that the thing has been trained on that means a concept there’s something there’s a neuron that’s going to be tied to it. That’s what history of training. So a token might so like, hope might be a token and in might be a token see it’s subsets of words.

I don’t know if you have a better.

I think it’s sort of serving. You might break up a certain thing. Yeah, that’s true. So sort of the base code and the work you have like a past tense word and I break it up to like the base word and pass this part of it. But could be something like that.

It’s just like a unit of input.

So it’s not inconceivable that it would send two parts of a word to different expert very unlikely just because of how we do something.

But, you know, also if you’re using like the open AI service, you’re probably paying per token for some things.

So sometimes you might send it a word thinking that’s a token that might not be a token.

That might be to do with you know, so that’s there’s been kind of what we’re looking at before on some lava pieces and some of these other networks and how do we how we get some of these scope down part of a reason for doing some of that. I mean, even when we’re looking at what we did last time where we set up the system prop that we said you are this person with this persona answering this way all of that counts for your token length.

So he’s shipped that across every time you’re paying for the same thing that you’ve already said it like every time there’s some some cost there but it is the atom of large language models.

That is a better way to put it. Well, it’s worth mentioning when I was playing with these different models trying to put it together, you know, see which channels would cross talk.

When I lost tokens when age differently, like it was different levels of like corruption and model just with the token expires.

We had to go read, read the token at the cloud, we’re going to pull everything through. Interesting.

Yeah, yeah, I’m not sure what was that. This is definitely not that. Yeah, so this is definitely it’s the so when you think your people talk about the context link, the link to talk about tokens and talk about you know memory or feeding stuff it’s always it really comes back to this. It’s a little bit of a limit on token status groups. Compute parameter size. So the more tokens that you have, the more memory you have to have to hold them, the longer it takes to do so it’s all that sort of stuff.

So the big changes that are happening right now is like, originally, you had a context link the token link of 512 was like the big thing for a long time.

And everybody was really happy because it’s up to 5,000 and then 2048.

Now it’s mostly around 4096. And like, as you get it larger, you can do more tasks and more things that can be done with it.

But it’s all tied to scaling.

Alright, so moving on here, routing algorithms are not GPU optimized for a lot of the sparse activation patterns.

So this is I put it in third here but this is really the big one is deterministic based on batch and not on sequence, which is really weird so basically a sequence that I said before had a whole bunch of things that were 200 train 200 length as batching those things all together.

What this thing will do is that it’s deterministic based on that collection of batch if I take one out and put another one in because I’m trying to morph around the training data to keep it fresh to keep it from overfitting any sort of things like that. So it’s completely different for every single one of those models all 20 of them, not just the one that was added in. And so deterministic consequence would mean that it would do those 19 the same.

It would do this one different because it’s new sparse activation does not do that.

So it is unstable, not fully differentiable, which means that we can’t successfully back propagate which is basically how we train the neural network.

There’s discontinuity in outputs often diverges with the literal interval reason suddenly your loss will just go or you don’t know what’s going on.

It’s a black box.

It’s black magic.

Nobody knows how it works. Some wizard came and gave it to us essentially what this feels like whenever you get a good sparse model.

Oh, I can remind me to push the button about five minutes with the lights turn off again.

I’m training mechanism work through it. Oh yeah. All right, so gating strategy.

So this is the next one is a switch transformers.

This is kind of the new one.

Or I guess it’s a little bit past hot.

There’s a lot of successful implementations of this one right now, but it’s not a merge and it’s emerged. And people are kind of using this a bit now. So switch transformers changes it for before it was top K. It just checks one guy.

Who’s the best guy.

I’m going to use that simplifies a whole bunch of stuff.

It has selective precision. There’s a kind of a slower learning rate warm up, which is basically it decreases the amount of time the cost it takes to to train these things.

It has a higher expert regularization and this is because this basically means that your your training and your inference is more even across all of your experts. And we’ll go over what happens whenever you don’t do that a little bit later on. Something is a really big problem in sparse models or basically one expert gets really really well trained and everybody else is on the bench. And it also has the ability to do parallel experts.

I haven’t played a lot with that so but it’s one of the things they claim can speak a ton about that.

And so the big thing here is that it outperforms sparse MOE at lower capacity factors capacity factor be basically meaning what is the average amount of experts each token is going to see during training or an inference. And so as you decrease capacity down it outperforms even below one, which is kind of interesting.

I need to get below one expert.

Yes. It’s a half of expert.

Normal person. Okay. About to 90.

All right, so you can see here, basically the big thing here is that it’s the router, it chooses one, only one goes in, it normalizes conceptually it’s very easy.

This is a constrained version of sparse where it only picks one.

Top one.

So I have a serious question.

Okay, if you select the expert. And it goes through and you get a result.

Is there any effort to update all the experts on a regular basis as to how who does what is better during training.

Only during training, not during inference. So if you do it to chat tbt, it’s not. Unless they’re doing a training right.

So the big thing is that basically all of that learning stuff. It shuts off as soon as you’re like using it in production, because it is expensive. Slow. All right. Right.

I think that’s about here any other questions on switch transformers I think we have a little bit additional information on the next slide.

Okay, switch before it really is quite easy seems to work. People like it. There are a bunch of people that have successful implementations that one.

This one’s kind of the new hotness.

So this is soft activation.

You might have heard this called soft MOE.

So in.

Yeah, I’m going to go ahead and get out of the way. So each expert gets basically s slots where the slots are a linear combination of as many tokens as it wants. And so what that means is a linear combination is basically it takes every token and it adds it together. And so basically with an I’m son.

So basically I’m some is a weird notation that exists because Einstein got tired of writing vector summations in the 40s, writing them out by hand.

And so you basically said, I’ll just get together and squash them as roughly what it is.

So now we use that for this particular model a lot of different things.

It’s merging all of the different things together and saying, okay, that’s what it is. Everything that went to this expert. That’s what your weight is. And then it basically takes your original waiting. So we say we have a router and it you know we have a top five routing and it picks the top five activations and basically whatever it got.

So say expert one got point seven expert to dot point two expert three dot point one.

And so you then multiply those weights on the other end by whatever that activation was.

And so it’s kind of doing a weird, you’re getting a funky vectorized blend inside of your latent space.

It’s the bottom that’s unmarked that you just have to know this is a button.

Yes. And so that’s soft MOE.

Yeah, it works.

If you want to figure out why it works. It’s kind of funky, but it does for the models of which it’s pertinent which is the encoder models and not the decoder models.

So the soft activation reduces the sparse.

I don’t have that here.

I don’t.

I think I have that on the next slide.

So I’ll cover that that second one a little bit later.

And so the big thing though here is that there are a lot of benefits to doing this from a performance standpoint. This is almost as much of a jump in performance as we got from a dense model so dense model is where you have just basic all your weights are there.

That’s your your your mama 70 be all those sorts of things that the dense model there’s nothing weird been happening to it. A sparse model is basically just a generalization of a dense model is one case of a dense model where you’re using the very narrow subset of it in order to get performance benefits.

This is that same level of jump up where you’re basically taking your temperature for your experts down to zero.

And so it’s another level of generalization above the sparse MOE.

And the big thing here is that it is deterministic based on sequences, which is the one of the almost one of the as big a thing is that they have a way of you put one set of tokens in it’s going to return the same thing no matter what you back.

All right, we can go to the next one.

Is there any risk with dropping slots like in sparse model and dropping tokens.

No, not really dropping tokens because they’re just kind of stacking all the time together.

Not as far as I know I’ve not heard that.

However, there is a reason why that’s less of a problem that we’ll get to in the next slide aspect has to do with the know it doesn’t really drop tokens.

And it’s not for the reason that you want to be.

And so here we see the benefits of the game strategy soft activation.

It’s doing this on image net, which is a image data set.

So this is basically a vision transformer which is something using clip and blip and all those sorts of things saying I can look at an image segmented say that’s a dog.

So, I’m going to go to the side walk at the dog on the sidewalk that sort of thing. So it’s that sort of item soft MOE is in the blue denses in the red.

It’s just it outperforms.

We do see a trail off up here towards the edge. The intuition here is this is likely because of labeling issues and image net.

Obviously it also could be trailing off but we’re expecting pretty much everything is going to travel to that point because their things are just mislabeled in there. The data set itself is bad.

But it’s the test everybody uses image net one K I think as you can see here.

Over here you can see the same thing soft MOE just kind of outperforming everything outperforming everything.

We have experts choice which we haven’t talked about yet, but it’s kind of, it’s kind of more niche. So the choice is basically everything we’ve been talking about this point is sparse MOE. Other one switch and then dense is like lava.

I’m curious about your test.


It was just something that I was playing with already.

I couldn’t tell you anything about that. I invented this.

The set of specs I was looking for for an image generation.

It gave me those specs back like the first time it gave me tokens to use to just play with use all of them on this particular set of words and labels.

And a little while later, it gave me like another hundred hundred and fifty that I could use for the tokens use the same set of words I put it in there. I got back for the image for much more realistic.

Okay. Now getting more specific getting better results after more training seems to fly in the face of what you got.

But test out a lot. Getting better results. So how you’re saying that the results trail off diminishing returns not reduce reduction returns the the amount.

So the more you throw it at the less you get.

But you don’t get less. Anything else on this one? Svhl.


Where is that?


So that is actually, I believe the size of clip, which is contrastive learning.

So it’s the the text.

There’s a data set out there.

That is basically a mixture of text and images where it knows it treats those things as similar vectors.

So so that they’re, it’s basically a different sizes of that so they go up in size to go.

So the S16 B16 L16.

I think it’s a small.

So the S16 B16 is large and large and huge.

And then the biggest one is a small G. That’s a giant antique or whatever it is, but it’s a small G. So you’ll see you’ll see small G out there.

That’s actually the largest set.


It’s, it might be. I don’t know. As somebody thought they were clever. That’s basically what it was. All right. Small G. So it’s a soft activation.

Who knows what a decoder only model is an encoder only model.

Is that terms that you guys are familiar with?

We’ve covered it once or twice.

Okay. We really jump all the way in.


So the biggest thing to know is that a decoder only model basically what it is, is that it doesn’t know anything about the next.

The next training set, it always max everything ahead of the next token.

It only knows about the current token being generated and everything before things that use decoder only.

GPT, all of your language models, pretty much anything like that is decoder only things that have an encoder or things like a thing diffusion has an encoder.

And then what it does is it transforms to the audio transformers.

Things of that nature, where you’re both, you’re basically it knows about everything that exists in the space.

Things like co pilot has an encoder decoder T five if anybody remembers T five.

So the problem right now is that soft MOE does not work with decoder models.

It doesn’t work with GPT.

So the question that happens in every single paper that anybody released on any MOE soft MOE architectures, anything with decoder decoder only model decoder only decoder only.

They say no, everybody turns around and walks away.

That’s pretty much what’s happening. The thing is, is that for the vision transformers sets like GPT be there’s this new lava model.

What’s out when there’s a whole bunch of stuff happening in vision transformers right now.

It’s great for that. It’s being used for that. And it’s probably part of the reason why there’s a big jump in performance there.

So rather than considering a single token, is there any effort that is considered to be a multi dimensional token?

Instead of a single.

So you’re saying like, have the model know about the dog and the concept of it is both a linguist linguistic thing, but also a visual thing.

That would be an example.

Yeah, yeah, that is what Clip is.

Sorry. It’s really hard to say Clip is vision transformers are weird, because they essentially use the same. We’re essentially teaching us that there’s not that much different between language and visual information, which is a something for all the philosophers to chew on, but it’s kind of decoding a bunch of weird stuff that you want to get into it. But basically saying that there’s not, it doesn’t matter if you train it in a certain way, where you can really stretch what that stuff means to a good extent, especially if you have a good train. So, so yes. And soft MOE is a great thing for that.

So here’s my hot take.

I have one hot take for this talk. I think the soft MOE, they were trying to get to work with decoding.

And they couldn’t do it. That’s that’s basically money, looking at all the people who are on that project. All the hype that they’re doing what they’re saying they’re saying, we’re going to just huge cost saving for everybody inference is going to be so much cheaper. We’re going to be able to roll out all of this stuff you look at papers that they’re writing, and you look at the big failures that happened all the merger quest that got closed around September time period. It’s just a lot of things at point to they tried this, they shop for the moon and they could do it. That’s my hot take, completely unfounded other than intuition and being on discord way too much. Well, they’re also fighting against the quantization type stuff we showed last time, which we’re running a 13b model on this laptop, just with no GPU. You know, so people have been moving over and over and over trying to get models on small hardware.

And this is a different approach to that.


So, yeah, they had to go this way. And the big thing too is that when they announced that it failed was about if you take the delta between soft when soft MOA was released, which is about May.

And when they announced it fails about the time period six months of a big training run. They kind of went to implement lists, all the timetables kind of match up to of like, Oh, they saw this thing that the open source community said they said, my, I’m going to take it and reap all the benefits. You can do it. Whoops.

But they were really sorry to be so they’re still doing pretty good. All right, so that’s soft. What’s the next I think we’re almost the end here.

The next one, I don’t know a damn thing about this other than a lot of people seem to be talking about it. I always hear experts choice. But the concept is simple.

Instead of basically choosing it based off the token like everything else, it has each expert choose whatever tokens it wants.

And so you activate all of your experts. And then you have them basically choose the tokens that they want. And then you have some sort of top K. Yeah, you can set it top K is a printable parameter.

And yeah, I’m assuming this is very expensive, but it probably gives a really good results.

Since you’re only taking a certain amount of the tokens you probably get less but this seems to be like a really good shower but I don’t know how this works. And this is a good example of a very good development of production of use from a trade off standpoint, but maybe in a year or so. It’s like instead of, you know, going down from 10 models to only using to invite 10 experts and I’m only activating to this seems like more like you’re activating like all of the kind of.

And hoping that I guess you’d have if you really trusted your set of experts or a specific problem, maybe. This is a fun one that’s out there. It’s emerging a lot of stuff happening in the open source community right now, because it’s how many of you guys are familiar with Laura, cute Laura pack all that sort of stuff. So let’s tie that. And basically the idea here is this called mixture of adapters.

Laura is a low rank adapter.

So it’s a mixture of those, essentially.

The big thing with Laura’s and key Laura’s that you can train them on consumer GPUs you can use the pep library perimeter efficient fine tuning and get that model size down so it fits on a 4090, especially, you know, it’s not fits on a lot of the things that you can get from, you know, like run, like the land all those sorts of things that accessible things that normal businesses and even individuals can train in the models.

And the idea here is that instead of training all your things at the front time is that you take those adapters and you swap them up. And so this the really the big rub here, because it adds a massive amount of complexity, because this is kind of where you go into the whole thing about you train your router, how do you deal with the fact that if your router isn’t trained, it’s just going to do random stuff. So that’s really the big problem here. The rub is that you can update your experts after your big training run by training it on your own personal data.

You could train it on characters in a movie you could train it on, you know, to do anything and I think about a Q or a Laura, instead of these giant massive data sets you can do it with like 1000 examples instead of a million examples, because it’s doing a smaller space that you can really, I think that this if it works is the future of the consumer grade sort of space. Now there might be a whole bunch of the fractions on top of it.

But what this is is insane.

There’s been a lot of people doing stuff on this right now. And since it is accessible and that consumer hardware, a lot of the open source people are right now. And this is kind of this like three or four major projects that are out there. I will plug a high memory.

That’s one of them that’s out there right now. So that’s a bunch of folks. Let’s get some traction. So then go next here. That’s the basic thought here is that you’re basically swapping these things in, you do the feed function has a lot of the same things here.

But the difference is, is that it just adds the adapter at the Instead of it being this giant thing in the middle.

There is an adapter that basically gets merged into your weights and passes through that at the very end, and it goes out.

That’s kind of how you are works very, very simple. And yeah, that’s the general idea around that.

We can probably go over the next one.

One thing that I thought was very interesting.

So there’s a whole bunch of papers about the people using this sort of thing.

And so this paper laid out, basically they’re training it on different tasks. They have a basically a Laura for each one of their different processes, different things that they want to do. Okay, so for this task is workflow.

I’m going to load up these experts together, and it’s going to go off you have a super high training model, but you can have the same core base.

And this is concept of transfer learning, which is basically that either certain sorts of training data that has broad ability across all tasks.

That’s things like, you’ll hear people talking about programming to train on programming training on code, and suddenly it has the ability to reason most effectively. Why because turns out programming, you know, systems are some of the best structured logic style data, the sort of pathway data that we have available to us, despite what happens during code route. So it’s, it’s, you know, stuff like that where it can help you with like philosophy stuff like that.

So this is kind of a concept around that benefiting from a big large bulky transfer model, you’re still You see here, you see the little freezy, freezy pop here is basically say we’re not training these things.

And the fire is saying that we’re training those things.

And so here we’re training the gate, the task adapters, and the domain adapter which is domain data. There’s a lot of strategies now for basically taking giant corpus of data and turning it into a domain data set.

That’s one of the most pipelines actually saying is how do I take a corpus and go into your text and that is sort of training. So lots of pipelines here.

And yeah. All right, go next. So I can stop talking. All right, so another big thing that’s kind of in all of these that pop out as expert load balancing. So is a big, big problem. A lot of these these things where you have one expert is really nice.

Get super over trained. So, I had to double generate me a little mean for the, for the talk here.

We have a, it’s crazy, kind of crazy, right?

I think somebody had a weird eye over here.

That’s not over fitting under fitting. That is just actually you want to put one model that’s trained and the others are just kind of happy. They’re initialized weights.

They, and it’s just luck in the draw.

Not necessarily anything wrong with those models.

It just happens that it one, two, three expert one. All right. The burges. That’s just like when you say like the sparse model at the burges for some random reason stuff like that.

Okay. Yeah.

Similar to like it work. If you’re really good at something, you just want to be more exact thing that you’re really good at, because people keep giving it to you. Very much knows how to do it, but you get overworked. And so you can’t count right. And so we’re talking about here is the tunable, Gaussian noise.

There’s an additional noise parameter that gets added in.

It’s an additional tunable parameter has to be trained on a recent size complexity, all that stuff.

All right, I got to ask her. Okay. If you’re going to reject an expert’s training.

Okay. And you know what that.

Probability is.

Why even do the work on it.

If you’re going to reject it when you. When do you decide when do you do this added noise.

How do you do it?

Yeah, it’s just another random number generator process.

Kind of.

Yeah. So it’s basically, you come up with a zero and why do all the work to get there.


So it’s not, it’s quite like that.

It’s not somebody deciding on the thing. It’s kind of a, a, it does a. There’s a essence of randomness to the start of a sequence.

You see the values and so it’s kind of based on this.

You have to believe the human brain when it starts searching for something. There’s no human in this process. I know, but I have to believe that human brain stops searching through memory for things. When it knows it’s a dead end.

Cause otherwise I would bring her over here or something.

So this reminds me of something in like recurrence. Where you can’t go for the goal and you wind up something always go straight on, straight on, but it may not actually get to all the space that you want to come. So you have so many curiosity to it. That’s exactly. Yeah, that’s that.

So this is almost like, yeah, you want to do this. You want to go for the goal. So you have to be able to do this. You want to go for the goal. So this is almost like, yeah, you want to do this, but sometimes you want to force it to take it off the path. For instance, the gentleman with the bad, why is he there?

No, he’s overtrained. Right.


But yeah, you can’t take them out during training. You know, the way that it’s all that is to say that and it’s done, you know, talking about those forward passes, the forward, the neural network layer.

It’s just basically done as part of that forward pass.

I think I will not pretend to know the actual intricacies of the Gaussian noise.

I just know it’s there.

Part of what you just said answers my question. Okay. It’s a feed forward system.

There’s no feedback.


It is always before and it does feedback for back propagation.

That’s it.

Yeah, because if we knew what the weight was going to be at the end, we would feed it back and just truncate that effort. But that’s not the way this works. Exactly. I’m going to hit my miniature. Okay. All right. So that’s the bulk of it. I will plug a few open source communities that are out there.

So Hydra MOE skunkworks.

They also do some stuff with vision transformers. They have a version of Bakoba, which is the lava trained on Mistral, which greatly improves that sort of thing. They also have a bunch of things related to ablation studies. So ablated is kind of one of their studies they have out there. Another one is this zoo. I can’t spell his name or say his name. This guy is huge in the MOE community. He has these giant, giant lists on GitHub of like, all here’s all the papers on MOE to go look at like awesome MOE or something like that.

So this is definitely someone I suggest checking out. He also has open moe that he’s kind of moved off to something else right now.

So this has died a little bit, but he’s still posting really good MOE content. And the one I really would suggest is a Luther AI.

These guys do a massive amount of stuff in the open AI community, the actual open AI community. They have a bunch of evaluation frameworks and things like that. They also have the MOE reading group. They meet every Saturday and usually have some sort of paper, whatever’s coming out. They’ll read through it. Someone will give the talk. There’s lots of good stuff on YouTube with that goes over a lot of the things that I talk about today and get they have one on almost one of each of these. So I would definitely check out Luther and there’s a whole bunch of things kind of in that area. If you kind of branch out from a Luther everybody around it to get the news research and all those sorts of people where there’s lots of fun things happening right now. And it’s all in this sort. So if you don’t have this for get this work.

You don’t have to do that’s where everything’s happening. So. I think that’s most of it.

Is a key if I had to take six papers on a desert island and learn about MOE.

I don’t know why I would do that, but I would do these ones.

So sparse expert models and deep learning switch transformers.

This is the key switch transform paper scaling large models of mixture of experts scaling vision. With sparse mixture of experts stable mode, which is a different. This is basically where they figure out how to fix them issues with the sparse MOE is kind of laid out in this paper so that might go into some of your Gaussian noise sort of questions expert choice routing brainformers which is a fun one and then sparse soft, which is the soft MOE.

Yeah, any final questions. Thank you. You guys. Nice to meet you.

The next thing that is likely to come.

I think is the doctor.

I’m pretty confident that that’s gonna be out there. I also think that soft MOE kind of beating up. Vision transformers so like lava. When one of those students probably next. I don’t say the mission transformers one just because it’s a lot happening right there right now.

One of the things that you can see that kind of going through doctor to my head doctor was the other thing where doctor kind of builds the operating system.

Put stuff in and it’s like, oh, I like the communications layer your.

I can see something like one of those models potentially crowning one of those lawyers.

So there’s a lot of very, very interesting.

This is probably come from some of the big guys. So Microsoft, open AI something like that that an LLM as an OS, probably not. The actual stuff still have an OS but something in that space.

It has been posting coily on Twitter, all of his like, oh, I’m looking at operating systems memory need sort of stuff. So you’re like, okay, what do you what do you guys do. So they’re going to release something probably with this copilot. I don’t know. That’s kind of probably come out soon. Another interesting that’s out there. I think more from intuition and conceptual levels as this concept of men GPT, which is something that got released sometime last week or before. The actual concept itself is just somebody being clever and doing some prompt engineering but it introduces the concept of treating LLM memory as the same would as memory registers inside of a computer, which the intuition is kind of interesting. How they’re doing it, they’re just trying to put some names around prompt engineering and call make a paper because they’re like some kid out of college.

This is just one thing around that right now. I’m going to go ahead and shut down recording part of this and we’ll end the zoom part of this call. Thanks everybody for joining in on zoom. Again, we’ll be back at it in two weeks.

AI Hackathon Starter Kit

Transcript generated by https://transcribe.hsv.ai

So one of the other things we do as part of Huntsville AI is we make sure that when we talk about stuff, especially in our weekly meetups and stuff, we try to make sure everything is as public as it can be. We really don’t like people coming and giving us proprietary talks and then hiding the information behind something that we can’t share because that doesn’t really help build a community. One of the things we talked about a while back, I guess the first, it wasn’t the first meetup, it was the first meetup we did here recently. We tried to go do a kind of a deep dive of how AI is used in genomics. And the AI papers I ran into are publicly available in archive and other places. The genomic side is in the journal of nature and things that I can’t get to.

So it’s real easy to see where there’s a barrier there.

So one of the thoughts that we had, and we did this also last year, I was looking for starter kits for hackathons.

I came across one that we actually did last year, Hatch 2022, where we actually went through and created a streamlet demo for here. If you wanted to use streamlet, this is kind of how you would do it. This time around, I decided to basically not, I tried to make this a little more generic than just Hudson Alpha or genomics or things like that.

So my thought was to start off, and I didn’t get too, too far into this because of life and other stuff. I did try really hard. So what we’ll talk about tonight is the kinds of things that you would want or expect in some kind of a starter kit from a kind of a vanilla AI perspective. Because what I started off thinking about was more on the technical side of if you were going to do a hackathon, most things we run into, I need a way to use an API to get to a data source. I mean, that’s generally, if you go do the NASA space apps challenge, they always want you to use NASA data. And that’s one of their driving forces or driving needs.

So you get an API key, and then you make calls to get the data.

The other side that you wind up in most hackathons is we need a way to demonstrate the project.

And generally, as being a mentor on several hackathons, you wind up with teams of people and they spend half of the hackathon trying to figure out how to create a web server.

I remember going, one of the first space apps challenges that I helped with was one, I think Ella was on the team. It was a group of Girl Scouts. They built their entire solution using WordPress and did a better job. Everybody else was sitting here freaking out, how do I run Apache? How do I do this? How do I do that?

And they’re like, hey, here’s a thing.

I can make stuff on it.

And they’re up and going and actually built something that somebody could go click through and actually see and use and stuff and want. And it was fun and kind of cool. So if you look at the hackathon as in I need to solve a problem. I need to show this in a way that somebody can consume it.

It doesn’t need to be this one-off program language thing that only runs on your laptop. So the key that we’re looking at there would, that’s interesting.


So for those on the recording, the lights just dimmed and went back up. So apparently closing time is soon. Drink with you. Yeah, almost over. Are you doing a streamlit talk coming up as part of this? Okay.

When I get this stuff together, I’m most likely going to link to any information that you’ve got or that you’re going to do. Or if I get this thing in place over the weekend, you might actually use this as a jump and off point if you wanted to.

Yeah, you want to get it together? Yeah, we could probably at least I could hand off. Because then that would just ship together even better. So streamlit is an easy to use Python-based web server kind of a thing.

I don’t even know if you don’t call it a server.

It’s a package that contains a bunch of stuff that basically does React, or Reacty stuff from iBot.

Yeah, it has a front end built in. So you can make a streamlit app super duper quick.

You don’t have to touch the HTML, you don’t have to touch JavaScript. You just, in your, you work right in line with your Python, just tell it to st. Right. You know, header is this, st slider, gives you a slider.

Do you want a text field or whatever?

Yeah, it’s super sweet.

And you can do some pretty interesting stuff with it if you want. I mean, one of the things. I’ve tried to plan out the hosted, the one that, the date denying packages. Right. There’s certain packages like TensorFlow.

Yeah, they’re hosted ones, yes.

But even stuff like, I mean, this is all built straight out of streamlit.

Transcription service that we built for video to transcribe into text, things like that. No. Yeah.


So, right.

So that might actually be something we need to look at from an AI perspective hackathon is what packages.

The littlest Jupyter hub was kind of my base for everything and then bolting on just a handful of extensions for the littlest Jupyter hub.


Once you can run on the VM, you can run a container, you can run locally, you can run on your own server, you can deploy to the cloud, all the good things. And then you can run in it to basically have basically running streamlit on the Jupyter hub inside the secure SSL of, and the authentication of the Jupyter hub.

Did that explain that?

Yes. I’m going to have to talk on it first. Yeah. So anyway, this, I mean, if you were wanting to see what an actual code for a streamlit app would look like, you got streamlit that you typically import as ST.

You can pull pandas.

We’re using Altar. I don’t know how to pronounce that. Altar. Altair. I’m Southern, so it’s hard.

How do you say Altair? Altair. You can pull some data and then, you know, do a streamlit multi-select, things like that.

The interesting thing that you can do with this, and this is typically I was leaning this way before, before you start running into issues that you can’t get the right packages.

If this is a straight, something I can run with either scikit-learn or whatever, let me sign in and see if I can actually sign in with GitHub, maybe.

I don’t remember which one I did.

We’ll find out.

Yeah, let’s try that one, maybe.

I’m going to see if I can actually do this live.

Okay, new app. PlenApp repo.

I want this one.

Yeah. See if I can copy and paste into this thing.

Oh, it lets me pick.

Or paste GitHub URL.

What branch are we on?


Your TV still works, though. That’s the weirdest part. Maybe some years sitting on the… Yeah, I’m okay. Why is there someone out there? I wonder if it actually is just a road. I bet people stopped moving. And it turns off by itself.

Honestly, I’ve never had a meeting here about 630, so… Actually, I was about to say I’ve been here like on the weekend and I’ve passed that time before in this room.

It doesn’t have lights.

And I’ve never been able to figure it out. Maybe it just automatically shut down. That’s pretty interesting. It doesn’t want executives to executive around you. Right. That’s pretty funny.

I’m just wondering if you can put it in the professor’s room. All right. Yep. And you URL. Yep. I won’t say it. Did it have an error or something?

I couldn’t see because it was behind this thing. Please switch works. The top right. Nice.


Even new.

That’s interesting.

Now let’s see if this is actually.

I don’t know if it will expand or not.


That guy.

This will be interesting.

I haven’t touched that repo in a year now.

Since we did this talk for last time. I think it’s going to be a great time to talk about it. Yeah. Streamlet’s change. So who knows whether this is going to work or not.

But it was a fairly vanilla. It did pull a data set from AWS and then do some things with it.

So as long as that data set is still there and the, I wasn’t using any weird. Libraries that I know of.

So the whole point is.

If you can get your code, get it in a repo and get a, you can get streamlet to host it for you.

So you don’t have to spend up a web server.

You don’t have to do all this kind of stuff. It’s even smart enough to know if eight of us are working together in a hackathon.

Jonathan makes a change.

He pushes it up to the same branch.

The web servers knows that it got changed and will refresh with your changes.

So from a zero to website. In five minutes or however long this thing takes to come up.

There it is. So let me go back to another.

And one of the other things we’ve been playing around with from a hunt will AI and some of their perspectives. We’re trying to make sure that AI and stewing someone’s development is available for anybody that wants to come to play. You wind up especially working with high schoolers and various families and stuff.

Everybody doesn’t have a laptop with. You know, everybody doesn’t have the power books and all of the what’s the what’s the current thing now.

I mean, anyway, so let’s play around and I can open up GitHub.

This is the thing that got applause at Bob Jones.

I’m looking at the GitHub thing.

I’m logged.

I think I’m logged in the GitHub. Yep.

And I press my dot key.

I’m going to go back to the chat. I’m going to go back to the chat. I’m going to go back to the chat. I’m going to press my dot key. Period key.

The dot.

Yeah, the key, the period key.

Which spins up a development environment in my own browser. So if I go grab the state of frame demo.

Let me find something really easy to see multi select choose from the list region.

And some of this I don’t want to get too far into the.

Maybe I changed this to two countries. Save that.

And then I’m going to go over here and say.

Oh, look out. Nice.


Right. So I pushed that change to move this over to two countries. That better. Two countries. And then if I go back over to where streamlet is hosted.

Somewhere up here.

I’ve got a. What rerun settings.

I don’t know what manage app does.

I have to go make sure I’ve got this set up right before. I did actually commit that change right. That would be yes I did. We’re on main. There’s really nothing to pull and push because you’re live. And it usually picks it up as soon as you push it. I know.

So I probably have something wrong.

I mean go to dark theme. Yeah. And if you’re used to doing data analysis in.

In Python, you know, mostly pandas kind of stuff.

Streamlet understands your native.

Panda stuff. You can create a pandas data frame and say here show this and it gives you an actual HTML table.

That’s nice to look at.

You know, I don’t know if it provides.

Hey sorting.

You know what I mean it’s pretty interesting.

So interactive, you know, as it updates.

So super easy to use easy to get into as.

Yeah, don’t.

The thing is a most there’s a lot available from their own demo kit and things like that. So a lot of it is go find something that looks like what you want copy it paste it and then put your data in. You know, So you can put a plot. You can throw. What is a dash.

And it will know how to render it.


Most of the text that you put on there and use markdown. Which if you’re familiar with much in GitHub stuff or, you know, we use markdown a lot. Anyway, there’s your plug for streamlet that Jonathan’s going to talk a lot about.

Is it next week or week after?

Okay, week after next.

Yeah, for me as well.

Oh, I changed the.

Yeah, I changed the wrong thing.

So instead of choose from list. What I changed was the error code.

So if I kill everything out of here, I should get something that tells me to choose two countries.

Yep, at least two countries.

Yeah, I changed the wrong thing. Yes, user error, which is in general nobody. Come on, you got to correct my code before I push it.

Zoom. This one.

Oh, okay.


Yeah, we never pushed directly to me and either I’ve had a person tell me that that was the way you should work and I made sure I don’t work on her team again. Anyway, if you had really, really, really good developers that are talking to each other all the time every day. Maybe. So I work with not a chance on the what.


Yeah. So that was the thought initially as far as, you know, putting something in that shows I was going to grab the piece that we did for our submission to NASA space apps last time that showed how to grab stuff from their API, and how to store it in a data file things like that. And I started thinking that in here, but then I then I started thinking about, okay, well that’s good from the just a general. How do you get data? How do you show data?

How do you, how do you do some things?

Doesn’t have anything to do with AI.

You know, so I started thinking about the AI stuff and basically if you take, take things I started started off pulling from some things that we already have done in the past with Huntsville AI. One of these was just a general. This is basically stolen from I can’t remember which hackathon we supported here for intro to Python intro to Jupiter, you know, notebooks and stuff. This might be a good thing if you’ve got a beginner level person trying to start from nothing. We see that a good bit, especially on the high school side.

Unless you’re at like Bob Jones hackathon where I say okay who here knows how to write programs in Python.

And all the hands go up.

Like, I didn’t know computer science was a thing when I was in high school.

Okay, who can do C plus plus.

And most of the hands stay up.

Yes. It was while I’m talking to the computer science club. I just was not expecting people which is, yeah JavaScript was like everybody see plus plus was most and it was anyway I was way out of my league there. They. This was not useful for them they could teach this to me. However, they were very impressed by the period key. I still remember that which I had only learned earlier that week, because I followed somebody on Twitter that do it. Yes. Anyway, this goes through some of your general, you know, paying his data frames how to do some stuff.

Still not really getting into much from an AI standpoint. So then the other thing I started looking at.

I think the first time I started doing things were like Hudson Alpha Tech Challenge this was like two years ago we did this as part of the intro.

This might be something useful, mostly from a classification standpoint, you have a particular and I’m really light on the actual science of some of this.

A molecule with a hundred and 66 different features that determine whether this molecule is must or not must and some of these features can interact with each other some of these features are some of them don’t mean anything at all.

You know things like that.

So we actually built a classifier based on this, you know, data set.

And again, lots and lots of features, you know, different values that are measured.

So this point we actually this might be useful for a hackathon starter kit for somebody that hasn’t done classification, especially for low, low volume of data.

You know throw it in the psychic learn build a classifier.

You know that in this particular notebook we actually walk through. Let’s try support vector machine.

That’s great.

Here’s my confusion matrix.

It’s pretty good.

You know, 95% accuracy things like that.

All right, now let’s drop in and do a decision tree classifier.

Yeah, I’m up in 96.9.

Now let’s try random forest.

You know, these are all different things available in psychic learn with the intro into how to do this.

And the thought I had was, depending on what hackathon you’re probably either trying to do things with tabular data, or I may have a sequence of data, or I may have, you know, images, or I may have audio or I mean so I started trying to think about it, approaching it from a, what kind of data am I trying to work with, and then going from which kind of data, which route you would want to take. So I’m thinking of this kind of starter kit almost as a choose your own adventure. Kind of a, okay, now I’m going to NASA, and I’ve got one of the other parts I pulled. Initially, they had something about all of the meteorites that had impacted the earth over the last 100 years, and a data set for that. We’re looking for different kinds of things.

Okay, it’s tabular data.

So we can go figure out maybe a time series analysis of, are these things getting bigger or smaller?

Is there anything you can make out of that?

You know, if I’m doing audio, we could probably pull some of the whisper pieces or, you know, but that was a thought I had.

I’m not quite sure if that’s the best way to do, I mean, just discussion, how would you all approach it from a, if you were trying to build a kind of a starter kit.

You know, we’ve already covered the data, we’ve covered the display of it, you know, just the general processing part from an artificial intelligence standpoint, where would you go with that?

What kind of libraries are useful for that to ask?


If it’s an image to have classification, what’s the hottest?

Is it easy to use?

Can we figure out what’s been done?

Right. Yeah. I’m just like kind of like, here’s some libraries for different types of in-house tasks.

What about in-house?

Have you seen these three inputs that are just labeled awesome?

Yeah. X, Y, Z?


It could just be a collection of them.

Yeah. I’m just like, hey, are you wanting to pick the weather? Here’s some pretty good reports on that. If you want to do image classification, here’s some pretty good reports to that. Time series data.


I see in these hackathons people curiously do the like, when all you need to do, if someone just queued up, here’s the three things that you need to read.


Yeah. About like, hugging faces, image class adventures. Right.

The opposite approach that I was looking at was actually, here’s the top five things that you’re going to see on the news from AI.

You know, a lot of people start with the technology and then try to figure out what problems they can solve with it, which is a little opposite of a typical hackathon.

But then again, reading through the challenges that Hatch currently has, one of them screams chat GPT or some other kind of kind of mode.

It screams it loud enough that we’re actually going to do a chat GPT intro. I think that’s next week. I think I’m going to make Matt do a lot of that work. He may or may not know it yet. So I might queue him up for that. But usually my mindset is more on the problem solving rather than the technology. Otherwise you wind up trying to take something that’s not meant to do. And sometimes it comes up with really cool stuff and you come up with new things that they might have done before. Other times you just wind up getting frustrated because the person that built the tool didn’t build it do the thing you’re trying to do. And I can guarantee you post something will stack overflow and your first reply will be well, you shouldn’t be doing it that way. Which has the typical reply I get.

Maybe that’s the difference between here’s the hotness and here’s the hotness. So if you can break it out and it’s like, well, what problems?


What’s the challenge?

Which of these did it to?

Maybe it’s none of them. Right.

And you could build it in like a streamlet app even where pick your domain, you know, image is a tabular is it sound is it whatever.

Okay, now how much data do you have?

If I’ve got 10,000 lines in a column, I’m going basically straight machine learning.

I’m not throwing a big model at that really unless it’s images.

We can do some pretty interesting image classification with very low data sets, you know, you know, something along those lines.

That might I’m just trying to think of like the ontology or the classification breakdown from problem.

I’m just trying to think of a way to sort of understand how to use it. Anyway, it’s an opinion. What I’m looking for is like an opinionated starter kid that says, okay, you’re jumping into a hackathon. What kind of data I’d probably start with what kind of data or problem are you trying to solve and then go okay, what kind of size you’re looking at. That may also go what kind of compute you have available because we are now in the last moving from GP to GPT to level, which I can train ish on a collab instance.

I’ve actually been able to do that you jump up higher than that and you you have to have like some actual horsepower to be able to train some of the stuff.

If it’s lower than that running on your laptop, it’s fine. You know, so that might be some kind of a, you know, do you do you have AWS tokens available.

That might be a thing sometimes these hackathons actually come with tokens that you can use.

I don’t know if that might actually be something.

Don’t know if Hudson Alpha would know that that might be a thing that’s available. You could go to a here’s a local AWS rep somewhere that you could go to and say hey we’re doing a hackathon it’s open to the community and we would like to provide compute resources.

Is there a way we could get a block you may still be able to do it we may be able to get a block in can I give these out.

To reach out to Alabama super computer.

If you do you’d have like a token that you would it’s a string kind of a key thing you give to each each team would have either their own or maybe a shared one or something.

And that provides compute resources for different things that you can’t but yeah if you’re using it gets tricky trying to find GPU available resources because those are extremely scarce. And I will ask there’s a local Nvidia rep as well that may also have some. Yes, not longer but they they actually have their own clouds and stuff like that that you can.

They don’t do singles much anymore. But now I also have a test.

Yeah, I’ve seen a server.

Oh yeah I know you’ve got servers and storage for days. Right. It might be an interesting comment and it also might get another part of Hudson alpha involved. Typically I know is Chris still here. Chris King.

Okay, shoot.

Find out he was a couple of years ago helping this far mentorship thing but he was the one that initially set up your HPC instance, and your crazy amount of storage. Yeah, I was still working in kind of big data at the time and I was complaining about the 200 gigabytes we get per run of a simulation for some military stuff. It was like we got three petabytes of work in storage shut up. Okay, I mean the corner. Okay, I need to.

Okay, good to know that I may see if I can track it now.

For other stuff. Brilliant guy. Anyway, I’ve got like four minutes to kind of close close out. Oh, I’ve got a few a to boost classifier.

I’ve got naive bays which sucks but whatever.

Oh, okay.


The other one. This one was actually doing. I can never pronounce the name right late and their clay, their, their clay analysis. Doing, if you’ve got a lot of text data.

This one, I actually pulled.

This year.

I think it’s.

Right. And I think the challenge is actually to take the, and this would be great for me because I’m seeing stuff in some of these papers that the domain is so different from what I currently work in.

It’s really hard to even talk about some of these words.

Oh, you know, about the same way that I wouldn’t take somebody that works in genomics and pulling into a lab with me and talk about how debris is affected by can, you know, there’s anyway. I mean, the main is just so different. So I think one of the challenges is actually to be able to automatically take a scientific article or something and reform that into something that is more approachable by the general public.

So anyway, this one we actually pulled Supreme Court hearing or something for I can’t remember which justice was.

There were so many nominations during the last administration we just went and if I like, because they had posted some and I grabbed one. And so actually going through.

And I got a copy of the county clinic, a Coney Barrett Senate confirmation. So I pulled the transcript for the whole day, and then did topic analysis on it to figure out, in general, one of the top 10 things they covered in a full day is worth of hearing. I have printed this out to see what’s way down here. It did something I’m not even sure how far I got into this. But yeah, looking at things like sparsity of the topics. And these were automatically generated just based on probabilistic model of words and how often the words appeared in certain contexts.

These days, I would actually go drag a Transformers model into this and do it entirely different because we don’t do things the same way.

I’d be pulling a model called Bert’s topic, which is much better at this than this is.

Mostly because this is all based on, and I am about trying to time, but this is all based on how often certain words appear.

It doesn’t take into context whether these words were near each other or not near each other or especially for like scientific paper usually it says it once and it’s abbreviated the rest of the time. Right. So it might not even pick up.

Right. But current models would actually be able to know that that word was used in that context and be able to know to switch between them and even find the places where you used to word out of context or hey I’ve got a new term that I just introduced this isn’t defined anywhere say but you might want to add that to your acronym list at the end you know I mean it’s so much you can do now.

Anyway, yeah main topics. I have no idea where Heller came from.

Yes that is sorry at the bottom.

The whole topic for that.

Anyway, so there’s all kinds of different things we can do so the thought initially is, I think we’re on the right track if I can think of it from the what problem. What data do you have.

I can kind of go from there as far as that goes.

I plan on actually finishing out a lot of this before hatch actually launches.

And if I get it far enough, I might publish that out.

You can tell folks hey if you want to if you’re looking at an AI I know the first one is mostly the AI challenge.

But if you need at least a starting point where you got an API key or something you can pull some data you’ve already got a streamlet piece you can publish stuff. You know that takes some of the initial sting out of the you know I mean, you can actually focus on the problem and not how do I spend up a public web service that the judges can get to do you know, you got to spend at least half of your time on the video. That’s always the thing that kills me. Anyway, with that, that’s really what we’ve got. So, looking at it from a context of, you know, data, the other thing that I may put out a call for some help on the AI group is the actual documentation.

Before I built this and then at a time I keep talking I don’t know who else is coming in.

I actually did a Google search for hackathon starter kits, and I found a couple of starter kits on how to run a hackathon.

In case you need one. It was very interesting it was, I mean even when it like the governance and how you select judges and how you I mean it was, it was really cool. I think it was either. Yes, this was how to run a hackathon.

I will.

Yeah, it was not what I was looking for. And it was I think it was either from Google or one of these larger organizations that actually facilitate hackathons all over the place, either somewhere code or other kind of things but Yeah, I forgot where I was going with that anyway. That’s that anyway, that’s that.

Fun with AWS Fargate

I was finally able to get the Streamlit app for the transcription approach to work with AWS Fargate. It was a LOT more complicated than I initially expected. It all makes sense now that I know how it works, but putting all the pieces together was tricky. Hopefully I can get the rest of the app connected before the meetup and show a fully functional transcription application.


Ooh, recording. The ladies should tell me. There we go. All right, so this is basically a quick ish rundown of what it took to get my Streamlit application hosted on Amazon Fargate. So let me share my screen and I can walk through some of this fun stuff, entire screen. Yes, this one. All right, so what I have is a Streamlit application, which is fairly simple. Wow, that’s too big.

It goes through, puts up a S3 bucket and a folder to push stuff to some logo stuff, setting a page config to get like a favicon and the title stuff like that. Simple image, upload file, grabs details. If it’s audio or video, it takes that. Right now I’ve got this. I’ll have to change this afterwards because it’s playing text right now. Anyway, you get the message. So a VINIT grabs file extension file name, does some hashing into a timestamp, and then grabs that, shoves it into an S3 bucket, which we’re actually first, before we push it to an S3 bucket, we’re getting the, actually using a simple queue service, also from Amazon that I’m not really talking about tonight, but it might play into it.

And then, grips another button up there. It says, hey, do you want to transcribe this? If they say yes, go, next thing we do is we push it to the S3 bucket, which triggers a lambda that we’ve talked about before, which goes through and grabs the file, does a transcription, and while it’s transcribing, it’s pushing messages back through this message queue. And so we’re actually just printing out what’s going on to the screen, and then when it either completes or fails, we finish up. So just a quick thing, shouldn’t be that hard. I’ve got this in a pretty simple Docker file, Python 3a, 8501 exposed, copying in streamlit, and I’ve got some config stuff to automatically go dark theme and some other types of stuff. So just a pip install, upgrade install, streamlit and then requirements.

I don’t think these two things have to be separate, but now I was running into issues without it. Entry point, streamlit, run, app.py. Fairly simple. So you build all of that, I’ve got some notes in here for how to actually, it’s not this particular one, but you have to do a, that’s not it, there you go. You have to, the containers that you run on Fargate have to be hosted in your Amazon container registry. So you have to initially run this command to log in, piping that to a Docker log in, all that stuff. Then you tag your repo or your container, then you can push to the container registry.

Whoops, didn’t mean to do that on AWS. Fun stuff. So let me go back over to this window. Let me go find my, whoops, fun stuff. Let me log in again, which is all too factorized. Let me go bring that up. Which at first I thought the whole two factor thing was gonna be a pretty big hit as far as being able to do stuff is in hey, how long is it gonna take to do this every time? And it’s really not that bad.

So let me jump over to elastic container registry. So I’ve got a private repo for Huntsle AI that I’ve got what 20 different images in over time. I need to go remove some of these. So the one I’ve got latest is actually the one that runs on S on Lambda for doing the actual transcription, the streamlit container much smaller is the Docker image running my streamlit app that I built. So then the next thing is, okay, let’s get this into Fargate and run it and throw a public load balancer in front of it, that kind of thing. Cause it sounded easy.

As it turns out, I started tracking what all you need to do for that. So let’s start with kind of the things you have to know. Well, you got no dockers and containers. You got to know AWS, ECR, which is what we just showed. You got to know how to sub a VPC, availability regions, you got to know about subnets, CIDR, which is kind of goes with the subnets, but also another couple of places you got to have an internet gateway setup, route tables, security groups, load balancers, access, control list, endpoints, clusters, task definition, services, tasks, target groups, aim roles, and aim permissions. And then after all of that, you too can have a container running with Fargate. Easy enough, right?

So I started actually walking through, and one of the other interesting things is some of these tutorials you follow through, they enter at various points in this kind of a setup. I will try to jump over a little bit to, I started trying to graph kind of what I wind up with. So I’ve got to use a load balancer, you got to have two subnets or two availability regions, which forces you into two subnets, and all that kind of stuff. So anyway, there’s a couple of different ways to think about it. One is to start from the networking side, which would be to set up your VPC, your virtual private, what’s the, the C mean, out of whatever the C is in VPC, would be to start from the network side and kind of work your way back. The other, which is what I did, and then try to figure out what I broke was actually to start from Fargate itself and go forward.

So starting off, let’s go to, and of course, all of these things, these are all the different services I’ve had to jump through recently, trying to get all this stuff set up. So I’m looking for Fargate might be under elastic container service, I think it is. Yep, so the first thing you do with Fargate is create this thing called a cluster, and it’ll ask you which virtual private, which VPC you want, or it’ll create one from default. So I wound up creating one using whatever the default one was that it had and that didn’t work, so I wound up having to do some other stuff. But this is basically the set of a collection of services or tasks that you want this thing to run. What you wind up with is you can create either services or tasks, which is kind of interesting because of, and confusing at the same time.

To start off with, you go to a task definition, and you have to create a task definition. So we can just create one of these for the heck of it and see, so this is a new, we’ll call this new definition. So next thing, you gotta, you can name it or whatever, you gotta create your image uri, which for some reason this doesn’t let you do a lookup where a lot of the other places that you wind up with, like the Lambda, I can actually go look, it’ll actually give me a piece where I can look up what that is by clicking through. In this way, I actually have to go find this, find my image tag, go copy my image uri, I think that’s right. Yes. Then I go back over here and I paste that in. Report mappings, I want 85.1 in this case, because it’s a streamlit container. You can add more or less, I don’t really care about all the environment stuff. Whoops, I have to give a name for my task definition. The other thing I ran into, trying to create some of these and they wouldn’t, like I’ve had a streamlit task definition and it broke, so I removed it and I created a new one and then it complained that, well, this name is already in use or something. Currently it takes, some of these items, it takes like an hour for it to clear the cache, so you can reuse a name, which was a little tough. In the app environment, we’re going to go with Fargate for server lists. Yeah, that’s fine. These, I basically rolled them back, as low as I could possibly go. So here, I actually want it to use a quarter of a CPU and then I can drop that down to 0.5 gig of memory. I’m going to go with the task rolls. I don’t want this, we’ll get into some of this later because I think this is part that’s still broken for me. I don’t really want a storage volume because I get 20 gig for free or something like that. I’m turning logging. This is something I need to go figure out if this is actually costing me money where that’s coming from. Anyway, you hit next. So, container 1, blah, blah, blah. I think that’s right. What we had done. Oh, now it’s being created. All right. Fun times. So while that’s often running, apparently it says it’s active. I’m not quite sure if I don’t know if I trust that or not. If I go here, well, it looks like it’s active. So I’ve got a new definition for task. So then the next thing you wind up doing, you go to the clusters, the cluster you’ve got. So I have a task. Now I just need something to actually spin up and run or I have a task definition. Next up is to run something using that task definition. Hold on one second. Let me see what this message was. All right. So the next thing you can do is you have tasks and you have services. The thing that’s weird is that if you run a service, it creates a task for you, which is, I’m just makes you wonder why I weren’t we just running a task. The main difference is the service when it spins up, like I’ve got this one for streamlet service. Let’s see if I can go there. For the, when you run the service, you can actually tell it how many tasks you want to run at any time. And you can do a minimum level max level. And this is kind of a thing to where, depending on resource usage, it will go ahead and create new tasks for you. And so apparently we just, yeah, I just expose that thing. So anyway, you can go to configuration tasks. And this actually right now it only shows me one task running yesterday. It showed me about the last 10 that had failed. And it would try about every eight minutes to spin up a new task. Because I didn’t have one actively running and I told it I need at least one. So then under services, we can do something like, if I wanted to deploy a new service, transcribed cluster, I want a service. Family, I wish they would have used the word task definition here, because that’s actually what, what this is. A service name. Desired task, deployment options, load balancing. We’re going to leave that off for now. I think you’re supposed to be able to actually use an existing load balancer. I don’t want this one. Target group. What’s the tricky thing is it? I don’t know if I wasn’t able to get this to work. So I’m not going to try this now. But I don’t the load, this load balancer already has a target group associated with it. So I don’t know why it’s offering to create a new target group. So. This case, I’m just going to say none. Networking, this is very important. Because when I went through it automatically to create my, oh, virtual cloud, that’s it. VPC stands for. So you pick your VPC initially it had created four different subnets. And I added them all in. But then when I actually set up my, my load balancer and some other security groups, I had only picked two out of the four. And the tricky thing is the service will spin up this task and associated with one of these subnets. And at one point I was all messed up because it was. Sometimes it would pick the subnet and it would work because that particular subnet was routed correctly. Then it something would happen. It would fail and it would come up again and couldn’t create the container because it was putting it in a subnet that didn’t have a route to something. So that’s where you got to know about subnets security security groups are very important. So we’re going to use security group will get into that in a minute. And of course, I’m going to use a load balancer to actually access this. So I do not want a public IP for this. Where a lot of the tutorials you walk through will just, yeah, throw a public IP on that. Then you’re good to go. Which is super simple if you just want to, you know, you know, I would consider doing that for some kind of a quick and dirty demo app that’s going to live for maybe a day or two just to show show somebody that something’s working. But long term, you definitely don’t want that. So I can hit deploy and it says, hey, it’s a deployment might be a few minutes. Some of these screens automatically update. Some of them don’t. Now I’ve got my new service name. I can go in here and see. I really don’t have anything yet. So configuration and tasks. It says, okay, I’ve got new. I’ve got this task and it’s provisioning. If I click on this, I can actually see it’s already got a private IP address. Definition a subnet ID and an E and I ID, which this is basically the network interface that connects this container. At that particular IP address on the VPC. The one thing that got really hard to troubleshoot. Was when the task fails and it doesn’t create the container, it goes ahead and removes this, you know, network interface. But still has a link to it in this definition as if it were still there. So that’s tricky. So we’ll wait on that to come up and run and see if we can add that as a, as a link. So let’s see so far. Let me jump back over. See any questions so far on the funness of this stuff. Yeah, the load balancing stuff just looks much more complicated than I expected. Yeah. So let’s drop through. I still don’t think it’s. Is this new service running and running? So it looks like we’re up and going. Good to you. You can’t get to it yet because that I period isn’t available. We’ll hit that in a second. So let me drop over to, I don’t remember what Windows I have open, but. The VPC is basically your second home. So you wind up with two basic windows, always open one looking at your ECS stuff, which has clusters, task definitions and all. Your other is your virtual private cloud. So this is where your subnets live. All of your VPCs. The other good thing is you can actually take this whole thing. Because I have two VPCs. One of them gets, gets created by default for every, every time you, you, a new AWS account automatically gets a basically a blank VPC. So that’s kind of fun and distracting and confusing. So we’re going to pick this guy. Which I was hope this part over here refresh resources. There we go. Well, it still shows me two VPCs, even though I’ve filtered by this. So anyway. If I look through here. Okay, so I filtered I’ve got my VPC, which if you click on this guy, it shows you all of the different things associated with it. An other interesting thing is one of the default tutorials I had walked through. Because AWS on some of these, it will automate, it’ll offer kind of like what we were seeing with the load balancer. It says, hey, do you want to create one of these for you? And the first time I went through as a chair, but then some of the options they give you don’t like work or they work for one thing and they change it and now it doesn’t work. So that was important to not do that and just start and figure out what I was doing and then kind of walk backwards. I think this is the third or fourth VPC I’ve created trying to troubleshoot all of this stuff. So the ones that were created automatically would wind up with a routing table per subnet and multiple subnets and a bunch of stuff that didn’t matter. In one case, it created a subnet, four subnets, two for each availability zone, one with the name private and another with the name public, even though they were actually both private, it got really, really weird. So anyway, this routing table is important. I guess that’s there we go. So we have a couple of routes. We had to make sure you have to have this IGW, which is internet gateway, which tags in basically everything from, I mean, this is basically this kind of CIDR, which I used to know, but forgot a long time ago. This is basically all, all IP addresses coming in from our internet gateway, because if you don’t have an internet gateway, you don’t have a connection to the internet. So your load balancers never going to see anything. That was fun. Local is basically, this is grabbing 10,000, all the way smaller number here actually means a wider thing, wider section coming in. We used to use subnets a lot, would like class A, class B, class C, that apparently is not how you do it anymore. And I just know that the 16 means basically anything, any number in that highlighted section is now available on this VPC. So good thing to know. Subnet associations, we don’t have anything explicit. These just pick up automatically. And then the two subnets that are in my VPC, I’ve got 10,000 and the dash, the slash 20 means that this one basically goes up to 10,015.255. And then the second one picks up from there and goes up to a bigger number. So I’m not going to do that math in my head, whatever, I probably, what 29 something like that, that 255. That’s fun. I don’t have anything set up for edge association or route propagation. And then so that subnets, internet gateways, I think of already get that set. The other really important thing was security groups. ACLs, we go ahead and look at the ACL. It’s fairly boring inbound rules is basically yell out everything outbound rules. Yeah, allow everything. There are all kinds of information on how to make this more secure. I haven’t gotten there yet. And then the other thing is you also have to have that associated with your subnets. So that’s kind of interesting security groups. Everything started off if you go through the streamlet two totals you can find on how to run with fargate they’ll teach you how to. I don’t know what’s going on that open a security group and then open up port 80 51 on your security group, because that rule has to be added or else, you know, everything else is all port 80, you know, that kind of thing. So in this case, I created two secure and of course that works until you add a load balancer and then you’re all messed up. So I’ve got two security groups. One for the a ob as automatic load balancer. I guess it could have been a better name and the service one is for all of the containers we’ve got running. So if I look at this security group for the containers, the way it’s set up is to allow, let’s see inbound rules. There we go. I want all traffic as long as it’s internal. In other words, if it’s coming from within the subnet, that’s fine. Otherwise, allow all traffic from this other security group. So that was something else that I’d never run into before where I can have one security group basically routing traffic to another security group that was and this was a little tricky to get set up right. This was also the one that initially I had everything public and my containers worked fine. And then as soon as after I removed, I went into a load balancer. So I added this security group. I actually had removed this line. Unfortunately, that also meant that my service that spins up the container did not have access back out to like S3 or the container registry or anything else. And that was really, really tricky to try to troubleshoot. So then if I go find, okay, so I’m pulling in from this security group. If I go back to my list. And go here. This one shows I’ve got only port 80 available, you know, basically from the internet. So nothing except for port 80. And this is one where as soon as I get a certificate and loaded on there. At that point, I set up a rule somewhere to reroute 80 over to 443 and then only open the secure port there at some point. So that security groups, the other thing. I think load balancers on this list. Somewhere, let me see. Nope. How do you get to your load balancer? Let me see. This was routing tables. Oh, that’s is that back over here. No. I was thinking it was over here somewhere. Let’s look. All right. EC2. Yes. Okay. So this is fun because, and again, it gets really interesting. This is why I started working through this piece to actually draw this out because as you’ll see as we go through a lot of these links. A lot of these things connect to a lot of other things. And there are multiple windows they show up on. And it’s not like there’s it’s all. I don’t know. It’s like it’s like they were trying to figure out how do we organize this. And then they went and like drink a fit the jack Daniels and then decided stuff. So a lot of it makes no sense like I’ve got security groups here. That we just walk through. I’m already lost on VPC. Well, here I’ve got some nets, routing tables security groups. But I’ve also got, you know, security groups. So that’s fun. So anyway, walked into load balancer. So I decided I wanted to create a load balancer because I read a cool article on how it works and all. And so you create one. You get it set up to where it’s listening on port 80. You can also decide which availability zones and subnets that you want to listen on. Which the other interesting thing that happened was initially I mentioned that I had it created like a public and a private subnet for each one of these availability zones when you create a load balancer you pick availability zone and then you it was giving me both sub nets and I can only pick one. So this is where I’d wind up in a weird thing to where my load balancer is listening on subnet one and two. But then I had all four in my service description. So if it happened to create the container in subnet one or two, I was fine. If it created the subnet and some other sorry if it contained in the container and subnets three or four. And now I’m lost trying to figure out what they have. So anyway, that said I wound up going and throwing away that VPC and create one from scratch and only adding the base the bare minimum things I had to have which I should have done to start with. But you know, this is why it’s called learning load balancer has to be associated with a security group, which is what we just walked through there. A lot of fun stuff enabled. And then you get to the point where OK, so now your load balancer has what’s called listeners. So you can create a listener on on one port at a time. And then basically route that to and this is kind of interesting. Let me see if it’ll show up routing to a target group. And view and edit rules. See if what this looks like. Yeah, this is pretty pretty interesting. Not really. So when you set up a listener, you can basically do things like forward or you can reroute. You know, things like that. So I believe after I get done with the HTTPS side of this, I can actually create a listener on 80 and use this piece to actually reroute back over to the listener for 443. So that was fun. The other thing I’m trying to think of what I’m missing here. Oh, target groups. I was expecting to hit the load balancer and have the settings on the load balancer for how it balances considering the name load balancer. But that work is actually down in a target group, which are kind of interesting. When I created the first, let’s go back a little bit. Let’s create a let’s start three creating a new and from scratch. This is kind of showing you some things. I’m going to do because this got tricky as well for fargate. This selection has to be IP addresses. Just because it doesn’t work. Target group. Well, that’s all messed up. And the other thing I messed up initially was, well, yeah, I’m going to be accepting it’s connected to the list neural port 80s 80s. So I’m going to leave that alone. Actually, this needs to be the port that you’re routing to. So that was that through me for a while. You have to pick your VPCs. We’re going to pick the one I picked up for the transcribe. I haven’t messed with any of the protocol versions. The health check is something it does automatically. Then you can create next. And the other fun thing. And at first, I was trying to do a bunch of stuff here, because this is where it put me. And it took a while to figure out it puts me halfway down the page. And you actually have to scroll up to actually do anything, which is a little weird. So you get to hear you actually can add the IP addresses of the containers that you have running, which I’m still not quite sure how this connects. If you have containers automatically getting spun up. How did that how do they know how to attach to the existing load balancer. That’s something I’ve still got to work out. So anyway, those ports, let’s you add targets and things like that. We’re going to cancel this and just go look at the next one. Look at the other one. Wait, did I have to do something different cancel? Are you sure? So this lets you cancel your cancel. This is how bad some of this UI is. No, to cancel, I have to leave. So at this point, what I want to do is go back over to where I had my cluster up here transcribed cluster. I got these services running. I want to grab my new service, which has this container running, which is at this IP address. So I want to copy that. I’m going to go over here to my target group. And I’ve got this one already set up with some set of registered targets. These are the ones that I had killed that failed. So I can add a new target here. So that network, that IP address and automatically fizz out the port. Next thing you have to do is include is pending below. Then you have to scroll down and hit register pending targets because just because it’s on this list doesn’t mean it’s on anybody else’s. So that was weird. So I get that set up. And now I’m actually attempting at some point this will actually come back and say healthy because it’s actually doing a status check on it. I’m trying to figure out somewhere on here you can actually actions. I don’t want to delete. I may have to wait until. Okay, so I’ve got too healthy. I’m monitoring health checks. There’s a place on here somewhere where you can actually load balancing algorithm right now is round robin. This is one where you can actually set some, you know, some types of changes. I’m looking for the setting somewhere that I was actually able to to say I want 10% of the traffic to go to target one and I want the rest to go to the other targets. So I think there’s somewhere I’ve seen that you can actually do that. But that’s a little beyond this one. So at this point, if you actually go back to the load balancer, wherever that was should be on the screen because that’s where I’m at. I have a server setup at that DNS name that is routing traffic to two different containers. So that got that got me a little excited because if I actually it was a large, large pain to get all of this setup when all I want to do is host a container. You know, a couple hours in, I’m wondering why didn’t use Heroku, you know, a few more hours I’m looking at, well, should I just pay streamlit for their service. You know, another day in and I’m wondering, shoot, should I have just looked at, you know, how hugging face does their spaces. Or how they host stuff. So anyway, the major pain, but it does give you absolute control over what’s going on. And the ability to, at this point, what I’m looking at next is to get a certificate for a HTTPS. And then to do that, I actually have to, there’s a couple of different ways the way I’m probably going to do is to create an A record on the HSV AI, you know, DNS. And that way it can validate pretty quick. And then basically set up that streamlit instance to be basically I can take a, I’m thinking transcribed at AI dot huntsville.ai or something, then reroute to that streamlit container. So that is pretty much it. Let me, and it’s six, what time is it? 650, that’s about right. Let me actually any questions so far than I’ll kill the recording and cover anything else. Any thoughts? I mean, is it? I mean, it was a heck of a lot more than what I thought I was getting into. Yeah, I thought traffic was complicated to get set up to do a lot of those things, but it is significantly easier than using AWS’s stack of tools. I could see that being useful if you need it to scale across like all their clouds and scale up thousands of units really simply and easily. But man, that’s a lot of hoops to jump through like everything you’re looking to do, you can do a traffic and you could do it in about 20 minutes. If you just have some room machine. The other thing I thought about was Amazon has a cloud formation thing, which is basically a giant YAML file or whatever, where you just spec what you want. We didn’t even get into permissions shoot. Anyway, more for more for next time. So doing everything I’ve got what I wish I could get was, OK, I have this current setup in a VPC with all of this stuff. Can you tell me what cloud formation file would have generated this? You know, go backwards. Because that’s something Matt Brooks had mentioned was the problem with this whole thing is if something gets screwed up or whatnot or. There’s a hiccup and it gets lost somewhere now got to actually go through all of these button clicks and all of this crap and I might not even have a you know there’s not it’s not version. I can’t just oh I totally goes that up get reset hard please you know yeah you’re going to need a YAML configs and something like terraform to drive recreating it. Yeah, and I’ve seen that I mean terraform actually has a way to go from there you can build this out with terraform. That maybe the problem is it doesn’t have really anything to do with email so I’m just kind of like well yeah that’s great but. That sounds like somebody else’s meet up. Yeah, yeah, I want to try to like rope myself in and that’s one reason why I don’t deal with a lot of like hosting Amazon services at that point because. Yeah, I just want to get a demo site up and I want to get it hosted it’s easier for me to just create a like a VM on AWS and then just spin up the containers on that machine through the traffic reverse proxy in front of it. Just get a static IP address and then just point the domain name at it and that’s a graphical handle getting certificates that will handle routing to the services. Yeah, I think I’ve been able to quickly spin up more services with Docker compose files. I don’t have to dive through AWS menus to just create something. Right, we may want to well, I don’t want to walk through that maybe we will. But yeah, I probably would have gone that route of if I don’t know how hard all of this was going to be to the hard part is all troubleshooting because it’s all in weird log files where I can’t. I can access this IP and you’re like well crap, why not and then you have to go figure out between two sub nets, two security groups and access control list, a routing table and what not why can’t you get there from here. But anyway, that that being said, let me show this on the video while we’ve got it. So actually set up here’s my whole little thing on Amazon. I go browse files go find them, you know, some kind of a test way file. It’s going to complain at me in a minute because I don’t have the permissions to actually load this into s3 from here. And then here’s my little password that I shared earlier. So can upload to s3, but what I should be able to do. Actually, let me stop the video and I’ll continue a minute because we’ll cover some of this next time. All right, if I can figure out how to stop not stop video, stop recording. Yeah, there it is. Yeah.

MLOps with AWS Lambda

At this meetup we walked through an exercise to try and put the speech to text application into a docker container and attach it to an AWS Lambda function that runs automatically when we add a video or audio to an S3 bucket.

 In addition to the speech to text model, I’ve been working with an additional model that reduces background noise / music.

Speech to Text with Hugging Face Wav2Vec

This meetup covered an approach for using Hugging Face transformers to convert audio files to text transcriptions. We also went through an approach using DeepSpeech, but did not see any reasonable improvement in the transcript.

Fine-tuning DistilGPT2 and Generating Text

We have some really exciting stuff to cover this week! Even folks that aren’t into the in-depth side of AI should be interested to see how well this can be used to create text from a simple prompt. We will cover the process used to train this model using a collection of SBIR topics, using freely available resources – along with challenges encountered when using this approach.

If you need a refresher on Hugging Face, here’s a video of the session we did back in March – Intro to Hugging Face

We will finish off the night with a demonstration of generating the full text of SBIR topics.


HuggingFace DistilGPT2
DistilBERT Paper

GitHub Runners (Part 1)

This week we will attempt to configure and connect a remote runner using Amazon AWS and connect it to GitHub to perform actions on the Product Recommendation repository. Nothing like a live demo to get the heart pumping!

We will go through the latest updates for reporting on datasets for the repo and discuss the need and options for runners.

Related links:
Product Recommendation repo
GitHub Runners
Security for Runners?

Community Support with Little Orange Fish

This week we will have Daniel Adamek, Executive Director of Little Orange Fish present their “Here for You” initiative. We will brainstorm AI related ideas that may help them accomplish their goals. This will be an IN-PERSON meetup that we will also attempt to stream and record.

Little Orange Fish Concept:

A strong community is built of healthy individuals, and good health starts with a healthy mind.

Here for You:

To provide awareness and understanding of the public and private resources that support the health and safety of our community.

Premise:  We have a system of services, public and private, that are in place to support the health and safety of our community.

Problem:  Lack of understanding about the utility, value, availability and access to these services is resulting in their inefficient, uncoordinated and ineffective use in support of personal and community health.

Solution:  A resource that provides users easy access to information related to the availability of health care, public health and public safety services, what these services provide, and how to access them.

Benefits:  Reducing the barriers to access these services will improve the health and safety outcomes for the people in our community. Systemic issues can be more easily identified and characterized,  thereby improving coordination between providers. This will also provide greater clarity for advocates and policy makers to inform the implementation of more effective laws, policies and budgets that affect these service providers. 

Related links:
Little Orange Fish – https://littleorangefish.org/

Intro to Hugging Face

This week we will begin a series that covers Hugging Face and transformers provided through their community-driven approach. This will be one of the first times that we’ve covered Hugging Face, so please take some time to read ahead and help drive the discussion.

Also, we’re trying to plan ahead a bit better, so that you know what’s coming up. Check out the events link below to see the meetups on the schedule.

Related links:
Hugging Face – https://huggingface.co/
Huntsville AI March Schedule – https://hsv.ai/events