AI Hackathon Starter Kit

Transcript generated by https://transcribe.hsv.ai

So one of the other things we do as part of Huntsville AI is we make sure that when we talk about stuff, especially in our weekly meetups and stuff, we try to make sure everything is as public as it can be. We really don’t like people coming and giving us proprietary talks and then hiding the information behind something that we can’t share because that doesn’t really help build a community. One of the things we talked about a while back, I guess the first, it wasn’t the first meetup, it was the first meetup we did here recently. We tried to go do a kind of a deep dive of how AI is used in genomics. And the AI papers I ran into are publicly available in archive and other places. The genomic side is in the journal of nature and things that I can’t get to.

So it’s real easy to see where there’s a barrier there.

So one of the thoughts that we had, and we did this also last year, I was looking for starter kits for hackathons.

I came across one that we actually did last year, Hatch 2022, where we actually went through and created a streamlet demo for here. If you wanted to use streamlet, this is kind of how you would do it. This time around, I decided to basically not, I tried to make this a little more generic than just Hudson Alpha or genomics or things like that.

So my thought was to start off, and I didn’t get too, too far into this because of life and other stuff. I did try really hard. So what we’ll talk about tonight is the kinds of things that you would want or expect in some kind of a starter kit from a kind of a vanilla AI perspective. Because what I started off thinking about was more on the technical side of if you were going to do a hackathon, most things we run into, I need a way to use an API to get to a data source. I mean, that’s generally, if you go do the NASA space apps challenge, they always want you to use NASA data. And that’s one of their driving forces or driving needs.

So you get an API key, and then you make calls to get the data.

The other side that you wind up in most hackathons is we need a way to demonstrate the project.

And generally, as being a mentor on several hackathons, you wind up with teams of people and they spend half of the hackathon trying to figure out how to create a web server.

I remember going, one of the first space apps challenges that I helped with was one, I think Ella was on the team. It was a group of Girl Scouts. They built their entire solution using WordPress and did a better job. Everybody else was sitting here freaking out, how do I run Apache? How do I do this? How do I do that?

And they’re like, hey, here’s a thing.

I can make stuff on it.

And they’re up and going and actually built something that somebody could go click through and actually see and use and stuff and want. And it was fun and kind of cool. So if you look at the hackathon as in I need to solve a problem. I need to show this in a way that somebody can consume it.

It doesn’t need to be this one-off program language thing that only runs on your laptop. So the key that we’re looking at there would, that’s interesting.

Okay.

So for those on the recording, the lights just dimmed and went back up. So apparently closing time is soon. Drink with you. Yeah, almost over. Are you doing a streamlit talk coming up as part of this? Okay.

When I get this stuff together, I’m most likely going to link to any information that you’ve got or that you’re going to do. Or if I get this thing in place over the weekend, you might actually use this as a jump and off point if you wanted to.

Yeah, you want to get it together? Yeah, we could probably at least I could hand off. Because then that would just ship together even better. So streamlit is an easy to use Python-based web server kind of a thing.

I don’t even know if you don’t call it a server.

It’s a package that contains a bunch of stuff that basically does React, or Reacty stuff from iBot.

Yeah, it has a front end built in. So you can make a streamlit app super duper quick.

You don’t have to touch the HTML, you don’t have to touch JavaScript. You just, in your, you work right in line with your Python, just tell it to st. Right. You know, header is this, st slider, gives you a slider.

Do you want a text field or whatever?

Yeah, it’s super sweet.

And you can do some pretty interesting stuff with it if you want. I mean, one of the things. I’ve tried to plan out the hosted, the one that, the date denying packages. Right. There’s certain packages like TensorFlow.

Yeah, they’re hosted ones, yes.

But even stuff like, I mean, this is all built straight out of streamlit.

Transcription service that we built for video to transcribe into text, things like that. No. Yeah.

No.

So, right.

So that might actually be something we need to look at from an AI perspective hackathon is what packages.

The littlest Jupyter hub was kind of my base for everything and then bolting on just a handful of extensions for the littlest Jupyter hub.

Right.

Once you can run on the VM, you can run a container, you can run locally, you can run on your own server, you can deploy to the cloud, all the good things. And then you can run in it to basically have basically running streamlit on the Jupyter hub inside the secure SSL of, and the authentication of the Jupyter hub.

Did that explain that?

Yes. I’m going to have to talk on it first. Yeah. So anyway, this, I mean, if you were wanting to see what an actual code for a streamlit app would look like, you got streamlit that you typically import as ST.

You can pull pandas.

We’re using Altar. I don’t know how to pronounce that. Altar. Altair. I’m Southern, so it’s hard.

How do you say Altair? Altair. You can pull some data and then, you know, do a streamlit multi-select, things like that.

The interesting thing that you can do with this, and this is typically I was leaning this way before, before you start running into issues that you can’t get the right packages.

If this is a straight, something I can run with either scikit-learn or whatever, let me sign in and see if I can actually sign in with GitHub, maybe.

I don’t remember which one I did.

We’ll find out.

Yeah, let’s try that one, maybe.

I’m going to see if I can actually do this live.

Okay, new app. PlenApp repo.

I want this one.

Yeah. See if I can copy and paste into this thing.

Oh, it lets me pick.

Or paste GitHub URL.

What branch are we on?

Interesting.

Your TV still works, though. That’s the weirdest part. Maybe some years sitting on the… Yeah, I’m okay. Why is there someone out there? I wonder if it actually is just a road. I bet people stopped moving. And it turns off by itself.

Honestly, I’ve never had a meeting here about 630, so… Actually, I was about to say I’ve been here like on the weekend and I’ve passed that time before in this room.

It doesn’t have lights.

And I’ve never been able to figure it out. Maybe it just automatically shut down. That’s pretty interesting. It doesn’t want executives to executive around you. Right. That’s pretty funny.

I’m just wondering if you can put it in the professor’s room. All right. Yep. And you URL. Yep. I won’t say it. Did it have an error or something?

I couldn’t see because it was behind this thing. Please switch works. The top right. Nice.

Hey.

Even new.

That’s interesting.

Now let’s see if this is actually.

I don’t know if it will expand or not.

Main.

That guy.

This will be interesting.

I haven’t touched that repo in a year now.

Since we did this talk for last time. I think it’s going to be a great time to talk about it. Yeah. Streamlet’s change. So who knows whether this is going to work or not.

But it was a fairly vanilla. It did pull a data set from AWS and then do some things with it.

So as long as that data set is still there and the, I wasn’t using any weird. Libraries that I know of.

So the whole point is.

If you can get your code, get it in a repo and get a, you can get streamlet to host it for you.

So you don’t have to spend up a web server.

You don’t have to do all this kind of stuff. It’s even smart enough to know if eight of us are working together in a hackathon.

Jonathan makes a change.

He pushes it up to the same branch.

The web servers knows that it got changed and will refresh with your changes.

So from a zero to website. In five minutes or however long this thing takes to come up.

There it is. So let me go back to another.

And one of the other things we’ve been playing around with from a hunt will AI and some of their perspectives. We’re trying to make sure that AI and stewing someone’s development is available for anybody that wants to come to play. You wind up especially working with high schoolers and various families and stuff.

Everybody doesn’t have a laptop with. You know, everybody doesn’t have the power books and all of the what’s the what’s the current thing now.

I mean, anyway, so let’s play around and I can open up GitHub.

This is the thing that got applause at Bob Jones.

I’m looking at the GitHub thing.

I’m logged.

I think I’m logged in the GitHub. Yep.

And I press my dot key.

I’m going to go back to the chat. I’m going to go back to the chat. I’m going to go back to the chat. I’m going to press my dot key. Period key.

The dot.

Yeah, the key, the period key.

Which spins up a development environment in my own browser. So if I go grab the state of frame demo.

Let me find something really easy to see multi select choose from the list region.

And some of this I don’t want to get too far into the.

Maybe I changed this to two countries. Save that.

And then I’m going to go over here and say.

Oh, look out. Nice.

Yeah.

Right. So I pushed that change to move this over to two countries. That better. Two countries. And then if I go back over to where streamlet is hosted.

Somewhere up here.

I’ve got a. What rerun settings.

I don’t know what manage app does.

I have to go make sure I’ve got this set up right before. I did actually commit that change right. That would be yes I did. We’re on main. There’s really nothing to pull and push because you’re live. And it usually picks it up as soon as you push it. I know.

So I probably have something wrong.

I mean go to dark theme. Yeah. And if you’re used to doing data analysis in.

In Python, you know, mostly pandas kind of stuff.

Streamlet understands your native.

Panda stuff. You can create a pandas data frame and say here show this and it gives you an actual HTML table.

That’s nice to look at.

You know, I don’t know if it provides.

Hey sorting.

You know what I mean it’s pretty interesting.

So interactive, you know, as it updates.

So super easy to use easy to get into as.

Yeah, don’t.

The thing is a most there’s a lot available from their own demo kit and things like that. So a lot of it is go find something that looks like what you want copy it paste it and then put your data in. You know, So you can put a plot. You can throw. What is a dash.

And it will know how to render it.

Right.

Most of the text that you put on there and use markdown. Which if you’re familiar with much in GitHub stuff or, you know, we use markdown a lot. Anyway, there’s your plug for streamlet that Jonathan’s going to talk a lot about.

Is it next week or week after?

Okay, week after next.

Yeah, for me as well.

Oh, I changed the.

Yeah, I changed the wrong thing.

So instead of choose from list. What I changed was the error code.

So if I kill everything out of here, I should get something that tells me to choose two countries.

Yep, at least two countries.

Yeah, I changed the wrong thing. Yes, user error, which is in general nobody. Come on, you got to correct my code before I push it.

Zoom. This one.

Oh, okay.

Yeah.

Yeah, we never pushed directly to me and either I’ve had a person tell me that that was the way you should work and I made sure I don’t work on her team again. Anyway, if you had really, really, really good developers that are talking to each other all the time every day. Maybe. So I work with not a chance on the what.

Right.

Yeah. So that was the thought initially as far as, you know, putting something in that shows I was going to grab the piece that we did for our submission to NASA space apps last time that showed how to grab stuff from their API, and how to store it in a data file things like that. And I started thinking that in here, but then I then I started thinking about, okay, well that’s good from the just a general. How do you get data? How do you show data?

How do you, how do you do some things?

Doesn’t have anything to do with AI.

You know, so I started thinking about the AI stuff and basically if you take, take things I started started off pulling from some things that we already have done in the past with Huntsville AI. One of these was just a general. This is basically stolen from I can’t remember which hackathon we supported here for intro to Python intro to Jupiter, you know, notebooks and stuff. This might be a good thing if you’ve got a beginner level person trying to start from nothing. We see that a good bit, especially on the high school side.

Unless you’re at like Bob Jones hackathon where I say okay who here knows how to write programs in Python.

And all the hands go up.

Like, I didn’t know computer science was a thing when I was in high school.

Okay, who can do C plus plus.

And most of the hands stay up.

Yes. It was while I’m talking to the computer science club. I just was not expecting people which is, yeah JavaScript was like everybody see plus plus was most and it was anyway I was way out of my league there. They. This was not useful for them they could teach this to me. However, they were very impressed by the period key. I still remember that which I had only learned earlier that week, because I followed somebody on Twitter that do it. Yes. Anyway, this goes through some of your general, you know, paying his data frames how to do some stuff.

Still not really getting into much from an AI standpoint. So then the other thing I started looking at.

I think the first time I started doing things were like Hudson Alpha Tech Challenge this was like two years ago we did this as part of the intro.

This might be something useful, mostly from a classification standpoint, you have a particular and I’m really light on the actual science of some of this.

A molecule with a hundred and 66 different features that determine whether this molecule is must or not must and some of these features can interact with each other some of these features are some of them don’t mean anything at all.

You know things like that.

So we actually built a classifier based on this, you know, data set.

And again, lots and lots of features, you know, different values that are measured.

So this point we actually this might be useful for a hackathon starter kit for somebody that hasn’t done classification, especially for low, low volume of data.

You know throw it in the psychic learn build a classifier.

You know that in this particular notebook we actually walk through. Let’s try support vector machine.

That’s great.

Here’s my confusion matrix.

It’s pretty good.

You know, 95% accuracy things like that.

All right, now let’s drop in and do a decision tree classifier.

Yeah, I’m up in 96.9.

Now let’s try random forest.

You know, these are all different things available in psychic learn with the intro into how to do this.

And the thought I had was, depending on what hackathon you’re probably either trying to do things with tabular data, or I may have a sequence of data, or I may have, you know, images, or I may have audio or I mean so I started trying to think about it, approaching it from a, what kind of data am I trying to work with, and then going from which kind of data, which route you would want to take. So I’m thinking of this kind of starter kit almost as a choose your own adventure. Kind of a, okay, now I’m going to NASA, and I’ve got one of the other parts I pulled. Initially, they had something about all of the meteorites that had impacted the earth over the last 100 years, and a data set for that. We’re looking for different kinds of things.

Okay, it’s tabular data.

So we can go figure out maybe a time series analysis of, are these things getting bigger or smaller?

Is there anything you can make out of that?

You know, if I’m doing audio, we could probably pull some of the whisper pieces or, you know, but that was a thought I had.

I’m not quite sure if that’s the best way to do, I mean, just discussion, how would you all approach it from a, if you were trying to build a kind of a starter kit.

You know, we’ve already covered the data, we’ve covered the display of it, you know, just the general processing part from an artificial intelligence standpoint, where would you go with that?

What kind of libraries are useful for that to ask?

Okay.

If it’s an image to have classification, what’s the hottest?

Is it easy to use?

Can we figure out what’s been done?

Right. Yeah. I’m just like kind of like, here’s some libraries for different types of in-house tasks.

What about in-house?

Have you seen these three inputs that are just labeled awesome?

Yeah. X, Y, Z?

Right.

It could just be a collection of them.

Yeah. I’m just like, hey, are you wanting to pick the weather? Here’s some pretty good reports on that. If you want to do image classification, here’s some pretty good reports to that. Time series data.

Yeah.

I see in these hackathons people curiously do the like, when all you need to do, if someone just queued up, here’s the three things that you need to read.

Right.

Yeah. About like, hugging faces, image class adventures. Right.

The opposite approach that I was looking at was actually, here’s the top five things that you’re going to see on the news from AI.

You know, a lot of people start with the technology and then try to figure out what problems they can solve with it, which is a little opposite of a typical hackathon.

But then again, reading through the challenges that Hatch currently has, one of them screams chat GPT or some other kind of kind of mode.

It screams it loud enough that we’re actually going to do a chat GPT intro. I think that’s next week. I think I’m going to make Matt do a lot of that work. He may or may not know it yet. So I might queue him up for that. But usually my mindset is more on the problem solving rather than the technology. Otherwise you wind up trying to take something that’s not meant to do. And sometimes it comes up with really cool stuff and you come up with new things that they might have done before. Other times you just wind up getting frustrated because the person that built the tool didn’t build it do the thing you’re trying to do. And I can guarantee you post something will stack overflow and your first reply will be well, you shouldn’t be doing it that way. Which has the typical reply I get.

Maybe that’s the difference between here’s the hotness and here’s the hotness. So if you can break it out and it’s like, well, what problems?

Right.

What’s the challenge?

Which of these did it to?

Maybe it’s none of them. Right.

And you could build it in like a streamlet app even where pick your domain, you know, image is a tabular is it sound is it whatever.

Okay, now how much data do you have?

If I’ve got 10,000 lines in a column, I’m going basically straight machine learning.

I’m not throwing a big model at that really unless it’s images.

We can do some pretty interesting image classification with very low data sets, you know, you know, something along those lines.

That might I’m just trying to think of like the ontology or the classification breakdown from problem.

I’m just trying to think of a way to sort of understand how to use it. Anyway, it’s an opinion. What I’m looking for is like an opinionated starter kid that says, okay, you’re jumping into a hackathon. What kind of data I’d probably start with what kind of data or problem are you trying to solve and then go okay, what kind of size you’re looking at. That may also go what kind of compute you have available because we are now in the last moving from GP to GPT to level, which I can train ish on a collab instance.

I’ve actually been able to do that you jump up higher than that and you you have to have like some actual horsepower to be able to train some of the stuff.

If it’s lower than that running on your laptop, it’s fine. You know, so that might be some kind of a, you know, do you do you have AWS tokens available.

That might be a thing sometimes these hackathons actually come with tokens that you can use.

I don’t know if that might actually be something.

Don’t know if Hudson Alpha would know that that might be a thing that’s available. You could go to a here’s a local AWS rep somewhere that you could go to and say hey we’re doing a hackathon it’s open to the community and we would like to provide compute resources.

Is there a way we could get a block you may still be able to do it we may be able to get a block in can I give these out.

To reach out to Alabama super computer.

If you do you’d have like a token that you would it’s a string kind of a key thing you give to each each team would have either their own or maybe a shared one or something.

And that provides compute resources for different things that you can’t but yeah if you’re using it gets tricky trying to find GPU available resources because those are extremely scarce. And I will ask there’s a local Nvidia rep as well that may also have some. Yes, not longer but they they actually have their own clouds and stuff like that that you can.

They don’t do singles much anymore. But now I also have a test.

Yeah, I’ve seen a server.

Oh yeah I know you’ve got servers and storage for days. Right. It might be an interesting comment and it also might get another part of Hudson alpha involved. Typically I know is Chris still here. Chris King.

Okay, shoot.

Find out he was a couple of years ago helping this far mentorship thing but he was the one that initially set up your HPC instance, and your crazy amount of storage. Yeah, I was still working in kind of big data at the time and I was complaining about the 200 gigabytes we get per run of a simulation for some military stuff. It was like we got three petabytes of work in storage shut up. Okay, I mean the corner. Okay, I need to.

Okay, good to know that I may see if I can track it now.

For other stuff. Brilliant guy. Anyway, I’ve got like four minutes to kind of close close out. Oh, I’ve got a few a to boost classifier.

I’ve got naive bays which sucks but whatever.

Oh, okay.

Yes.

The other one. This one was actually doing. I can never pronounce the name right late and their clay, their, their clay analysis. Doing, if you’ve got a lot of text data.

This one, I actually pulled.

This year.

I think it’s.

Right. And I think the challenge is actually to take the, and this would be great for me because I’m seeing stuff in some of these papers that the domain is so different from what I currently work in.

It’s really hard to even talk about some of these words.

Oh, you know, about the same way that I wouldn’t take somebody that works in genomics and pulling into a lab with me and talk about how debris is affected by can, you know, there’s anyway. I mean, the main is just so different. So I think one of the challenges is actually to be able to automatically take a scientific article or something and reform that into something that is more approachable by the general public.

So anyway, this one we actually pulled Supreme Court hearing or something for I can’t remember which justice was.

There were so many nominations during the last administration we just went and if I like, because they had posted some and I grabbed one. And so actually going through.

And I got a copy of the county clinic, a Coney Barrett Senate confirmation. So I pulled the transcript for the whole day, and then did topic analysis on it to figure out, in general, one of the top 10 things they covered in a full day is worth of hearing. I have printed this out to see what’s way down here. It did something I’m not even sure how far I got into this. But yeah, looking at things like sparsity of the topics. And these were automatically generated just based on probabilistic model of words and how often the words appeared in certain contexts.

These days, I would actually go drag a Transformers model into this and do it entirely different because we don’t do things the same way.

I’d be pulling a model called Bert’s topic, which is much better at this than this is.

Mostly because this is all based on, and I am about trying to time, but this is all based on how often certain words appear.

It doesn’t take into context whether these words were near each other or not near each other or especially for like scientific paper usually it says it once and it’s abbreviated the rest of the time. Right. So it might not even pick up.

Right. But current models would actually be able to know that that word was used in that context and be able to know to switch between them and even find the places where you used to word out of context or hey I’ve got a new term that I just introduced this isn’t defined anywhere say but you might want to add that to your acronym list at the end you know I mean it’s so much you can do now.

Anyway, yeah main topics. I have no idea where Heller came from.

Yes that is sorry at the bottom.

The whole topic for that.

Anyway, so there’s all kinds of different things we can do so the thought initially is, I think we’re on the right track if I can think of it from the what problem. What data do you have.

I can kind of go from there as far as that goes.

I plan on actually finishing out a lot of this before hatch actually launches.

And if I get it far enough, I might publish that out.

You can tell folks hey if you want to if you’re looking at an AI I know the first one is mostly the AI challenge.

But if you need at least a starting point where you got an API key or something you can pull some data you’ve already got a streamlet piece you can publish stuff. You know that takes some of the initial sting out of the you know I mean, you can actually focus on the problem and not how do I spend up a public web service that the judges can get to do you know, you got to spend at least half of your time on the video. That’s always the thing that kills me. Anyway, with that, that’s really what we’ve got. So, looking at it from a context of, you know, data, the other thing that I may put out a call for some help on the AI group is the actual documentation.

Before I built this and then at a time I keep talking I don’t know who else is coming in.

I actually did a Google search for hackathon starter kits, and I found a couple of starter kits on how to run a hackathon.

In case you need one. It was very interesting it was, I mean even when it like the governance and how you select judges and how you I mean it was, it was really cool. I think it was either. Yes, this was how to run a hackathon.

I will.

Yeah, it was not what I was looking for. And it was I think it was either from Google or one of these larger organizations that actually facilitate hackathons all over the place, either somewhere code or other kind of things but Yeah, I forgot where I was going with that anyway. That’s that anyway, that’s that.