AI Challenges and Competitions

AI Challenges and Competitions

Transcription provided by Huntsville AI Transcribe

Alright, so what we’re going to talk about the theme for tonight, we’ll cover some of the other pieces, probably stop recording before we jump into some of that. So that when we first started, the main challenge kind of site out there for competition through machine learning, mostly and other things was cattle.

We still use that for some things. They have, it’s a really good place to find open datasets.

What would happen would be a company needed to somehow model a bill or something like that and get this data and they don’t know what to do.

So they go talk to cattle and they basically say, okay, we got $40,000 and I need this bill. The cattle would actually go do that through a competition. They put the data out and say, we need this done. Whoever scores the highest gets this amount.

Whoever scores second gets this amount.

At the end of the day, whoever actually did the P, actually built it, got some payment for it and got some exposure and all that, but the person that actually had the problem had their problem solved. So everybody’s having a little bit of that. That kind of spawned off some people that now who moved into like Fast 8 I was Jeremy Howard and things like that.

They found that the approach to solving these problems was generally to having a very big toolbox with a bunch of different models that you could apply, either a decision tree or a linear regression and or a boosted decision tree or, you know, just all of these different kind of things.

What they would do is basically almost like throwing some gays at the wall and seeing what sticks.

They take the data set, they would iterate as fast as they could through these different approaches.

They’d figure out which one actually maxed the data the best.

And then they would throw the other stuff away and select that one and just start working through that.

That whole approach worked really well for a lot of that. Since then, especially now that we’ve gone to larger and larger and larger models, that’s not possible on just your very cheap kind of GPU kind of things. In other words, I can’t run all of these things and collapse session just to iterate through Brooklyn.

So some of what we’re seeing now is actually the teams that are winning a lot of these competitions are actually teams that have a significant amount of hardware at their disposal to use for training new models.

So last year we actually looked at one from AI crowd that was the sound separation challenge for here’s a bunch of music, you know, files.

And we want you to separate these into the drone tracks, the vocal track, the, you know, keyboard track and then other. And so Ben and I actually started working that I was using his server that’s got a couple of GPUs on it. We were able to turn some things pretty good, but we weren’t keeping up in the slightest with some of the teams that actually won that they dropped paper who won and the models they use and all of that. They probably have a stack of, you know, basically your normal server that’s got like a GPUs in it there.

I mean, they are way ahead of anything we’re able to try.

So I’m seeing a lot of that type of thing going on as well.

Did you see the senators trying?

They’re a little bit under the bus. No, they were in their meeting, you know, trying to justify their existence of calling them. They’re a little bit dangerous technology. Oh, okay.

Yeah, I can see that.

You may touch on the cash barbie world.

Right.

But it says there in the very beginning, you know, because they’re trying to sign, you know, is it better days at very nice.

Right.

And he says in the very, he modified the lyrics in the very beginning. It says hi. I’m not Johnny.

Put a knot in there. Along the same lines, there was some news that came out earlier this week about Facebook and mostly it was Facebook and Instagram. There’s probably some others that will pick up on it, but actually adding watermarks to anything that’s AI generated as far as imagery. That it knows if you’re going to post it on their on their systems, if it’s AI generated and has to have a watermark, and apparently they have some way to know. I think some of the companies that produce the models like she was the mid journey and some of the others have have something in their images that you can look, you know, that these other systems can look at your oh, this is a mid journey. You know, of course that’s, that’s not only good for the knowing. Do I be looking at metadata that’s ex-sophisticated.

It may be it is probably an ex-sophisticated, but there may actually change that.

I know it may actually be something embedded in the image itself that’s difficult to notice.

I have a, what they don’t currently do only do the images with the embedded text. Yeah. Yeah, something like that. But there’s so that’s probably coming soon. You’ve got, and of course they, there was actually a hearing earlier this week with Meta and all these folks and you know, Congress asking questions stuff. You may actually see something. I’m not quite sure. The other thing is I’m pretty sure that they’re very interested because there’s some elections coming up. And these tools for making fake videos and fake all that are just so easy. I mean, I can do it.

You know, it’s not even hard at this point.

You just download the tool and find enough source in it for your source video and you put it in your script that you want this person to say or whatnot. And it’s easily to the point where it would, it passes the, I call it the scroll test.

You’re scrolling through something and it’s like, yeah, okay, okay. And it doesn’t jump out as being not right or not real. I used to use the test I used to have was if my grandmother would share it or would believe it enough to just reshare without actually even looking at, you know, I mean, it’s at face value and it looks correct. All the references are right. It sounds okay, sure.

That’s probably good. But we’re past that.

So that might come out.

I had a some good results from the image generator for the war in America’s cars.

Right. And I made an Instagram profile for it. And like the second or third image I uploaded went viral in that little group. Right. It’s been out for about three months and it’s got probably 50,000 views. Yeah. A couple different places just from higher traffic. Yeah. And I started watermarking my stuff. They would strip the watermark out and then try to pass it off as real. Like I’m not the one with the problem. Every post I make, I put it up there as AI generated.

Right. It’s some kind of tag that I purposely put in there.

Right.

So people know it’s fake.

Yes.

But I’m the bad guy for not trying to pass it off as real.

Yeah. Now, I mean, it’s not that they don’t want people to share stuff on this is that they just want to be a true attribute attributed attributed.

That’s right. And then again, you’ve got the sites like mid journey and others that they would love to have a big journey watermark and all other stuff going out and have some reason to have it because that drives other people that like the imagery back to mid journey because that’s how you create stuff like that. But back to the challenges I found some of these we’ve already known about.

One of them I actually didn’t. I thought I had up in this but I must. I must.

Oh, I already there’s another site I would jump to in a minute.

Actually, I’m going to go there now because this was a little bit different.

Let me go jump to you. I’m actually getting better as well at.

Putting things back out there that we’re going to, you know, that way. We can actually go back and look. Yeah, they think on was the other. So, like challenge.gov was a fun one. Because this is actually.

I didn’t even realize this, but when we come across some things occasionally where it’s the part of energy is running a challenge or, you know, homeland is running a challenge or there’s a cyber challenge or blah, blah. This site basically grabs all of the various challenges across different domains in government itself. And stacks them up so that you can actually look through most of the ones that you’re going to find are on.

That we’d be interested in our analytics and algorithms. And so you’ll see there’s several that.

You know, pop up.

It’s difficult because if you don’t track what’s coming up. You see things that you look at it and it’s like, oh, that looks cool. And you’re like, oh, they’re already in phase three.

You had to pass phase one and phase two to even get invited to phase three. So that’s some of that’s a little rough. Put the stuff out there and say, can you solve this faster than a room full of engineers? Who ends up doing these things but a room full of engineers? Yes. Without having to have the business construct in front of it to tell them what they can work on.

And again, Kaggle is kind of your old go to but it’s still active.

There’s still challenges that they’re running.

Kaggle is another one where a lot of their stuff is kind of difficult to solve without without some hardware. That’s another place that’s been interesting.

It kind of fell by the wayside because I don’t think they were able to have the hardware.

They’re not on the list anymore.

There was one site we’d go to at one point you join a challenge.

But their piece was that you actually had to do the work. They basically had a server set up. You had to do the work on their hardware and stuff, which I guess they wanted a level playing field, which was that part of work out. The part that didn’t work out was hardware was terrible. And so it was not even experience driving. They used this stuff to work with. This one was extremely odd and interesting for a few reasons. So numer.ai or numeria, I don’t know numeria. I don’t know how to pronounce the thing.

Basically, they got that number is a 48 million that has been paid to data science scientists, which is somewhat. It’s actually true. The interesting thing is a lot of that money came from data scientists. When they say state models, this one threw me at first. So what they do, and this might be interesting just to play with the data.

What they have done is taking stock market data and then cleaned up some parts of it, making it a lot easier to use. And basically what you have to do is build a model to predict future stock prices. So if you win your best model, they like it because they can use that model to go as can you go predict stock market movements or whatever. The trick here is that when you enter the competition and you have your model that you want to put into the competition, you also have to put your model and you also have to put money on your model. And then basically whoever wins the competition takes all the same. So in other words, 48 billion is correct that they paid out.

They don’t tell you how much of that has been paid in. So they have to get better neural resources. Right. So and of course from their perspective, they probably take a cut just for running the competition of course.

And then they get the use of the model they want.

So great for these guys. I mean, it’s genius.

But yeah, but could be interesting to try to train something and see because you can get the they have some pretty good material around how to download the data, how to predict, you know, just using some basic, you could actually see how well anything you do with compare against, you know, other models.

So, you know, I just kept you in the list for that perspective.

I would not advise anyone to actually do this.

Unless you have what do you call money that you can burn without caring disposable disposable income, which I don’t know any of those people if I did I would be asking them to fun. You know, this particular zoom call. I knew one guy with that amount of money and he was a doctor.

Might have been hard yet that much. Tripping the data was a was a good one. The reason I like this one is again, because of some of the social impact.

They actually reach out to other organizations. So other organizations that want to do something and it feels like a, you know, more a social good kind of a thing. Instead of running it themselves.

They’ll use data, data driven to actually are driven data to actually host the competition and things. Because some of them, I mean, I agree, NASA is probably trying to get to social good. Some of the others, you’d have a hard time with believing that like the World Bank is interested in some, you know, which they might be. But just at face value is like why, you know, they want to be some conflicted interest type things. I got this route that I walk my neighborhood and I pulled it up on Google Maps to figure out how far I was going. It says, you’re doing 2.4 miles. Okay, cool.

Good exercise. I went back and I double checked and I just went through again just just for giggles, maybe six months ago. And it added a mile to it.

I thought Google Maps was a lot more accurate. It should be. It should be. But this one in particular, I mean, I’ve got a link of like four of them that I thought were pretty interesting.

This one seems like a combination of grabbing and stuff through clinical notes from like doctors notes and stuff and actually linking back to, you know, particular items for terminology.

If you figured out how to do that, you might actually get some prize money. If you also figured out that that same kind of thing could be used to apply the right codes that they have to enter when they do billing. That they right now have to pay people to go figure out what codes to enter this thing that might be an interesting, you know, kind of aspect of that as well. So the driven day one is more for counting your social impact kind of stuff.

AI crowd. I didn’t mean to reuse the same window, but I just did. The next one that’s some of these are actually in the middle of being, you know, they’re already through their first phase or whatever. This one was really interesting. I think I’ve got this one also paid somewhere. This is basically instead of just doing image generation. They actually want to generate basically take a picture of an empty room. Yep.

First one.

Yeah, I think we got like 15 minutes for again, I can push the button.

So what they’re looking for isn’t just generating whatever, but basically taking a picture of empty room and then a text prop telling me what I tell you what I wanted this room and then generate an image of the room with that particular kind of furniture that particular kind of layout in it.

So that might be how hard it is. It might be kind of fun. And I don’t remember.

Okay, so that’s got a $15,000 prize on it.

That medical coding challenge look kind of interesting.

The other. Yeah, I’ve done a little code. And we’ll see what they’re asking for isn’t necessarily medical coding.

But whatever you built to do that.

Would likely be able to be, you know, this is actually training. I got these notes. And I’ve got the snow med CT is actually a list of clinical I, you know, I mean the, like the technical doctor works.

If you would. Okay. Well, I see the nine codes.

I see the nine codes that would be something that you can train models and actually get good outcomes.

Right.

But when they say somebody signs into the problem, you know, their final diagnosis and go soul often didn’t. It was, it was like a trail to follow. Get to that. Yeah, you know what they came in to be treated for wasn’t, you know, how it ended. Right. Yeah. And this kind of thing linking from one thing to another.

Like I said, you’re tracing through this.

You may have multiple visits you have may have, you know, you got a test that was run. Well, that test was that linked to the reason it came in or was that linked to the thing the doctor found from the, you know, maybe it’s So the AI crowds thing was, was interesting.

The ML Contests is basically a clearinghouse of other.

This is kind of a link.

And this is actually pretty good jumping off point.

Because this one, this is where I didn’t even know think onward was a thing. But it links back into like the drip and data one was one that showed up.

You got a couple from AI crowd.

Cindy is mostly based on African type type problems.

That would be something there’s there’s a good bit of work there. The entry and it’s, it’s extremely interesting because some of the things that they’re doing. You would think that had, you think some of these problems that are even solved, but they’re different.

It’s, it’s almost like the, it’s so different.

The solutions that we might have aren’t necessarily applicable to the, to the things they run into.

It could be because of government policy is so different there could be government overall is so different there could be, you know, but some of that has been fairly interesting. I’ll look through a couple of those, but none of them were anything close to something that we would be able to try to get into. Hope you missed the scum retrosions. Security. Yeah. And I’m not familiar with what independent is.

There was one called an X prize that was put out by, I don’t, I think it was a combination of IBM Watson and some others basically trying to figure out how to automatically patch security vulnerabilities in source code. It’s like a cyber challenge at the bottom too. Yeah. It was 18 million dollars. Yep. Wow. That might be, that might be the one that.

We should all get on this one and we’re going to together. Yeah. And so the other thing you get into some of these. Yeah. So that will be highly sawback. Yeah. And not just for the monetary reason, but you play some of the top 10 of the death con addition kind of the thing that that has a lot of meaning to it. You’ll never get off right on it again. Yes.

But along the same lines, if you’re looking at running after that, you’re going to be competing with people that have a pretty heavy hardware available to them.

Yeah.

Probably stuff like these, these like this problem. And again, that’s, but again, figuring out where you fit in some of this and what’s interesting.

Some of it, sure you should know after just because you’re interested in it.

And it’s a great way to learn because it comes with data. It comes with generally some a starter kit, if you will allow these do a here’s a starting point.

That’s at least 60% accurate.

That’s where you have push a button.

I don’t know if there’s a light switch that’s not actually a light switch.

That’s a button that doesn’t even look like a button. So you have to press the button. It doesn’t look like a button or light switch. I like that.

Yeah. Those listening to me chatter on on this recording layer, the lights have just turned off on a sustained app to know the secret. Secret. Yeah.

The finding mining sites looks kind of interesting. I figure they’re probably using satellite photography. I’m not sure. Let me sponsor. Some of these are.

And again, this isn’t the best site because you got to go figure out how do I even look at the description.

Yes.

So you got it from here.

You actually have to go figure out how do I find so little food and then you know, so you got a diagonal deeper.

This think onward. Was a pretty interesting one. This one seemed to have kind of higher bounties on some of their challenges, but then again, they’re only running a few, you know, at a time. Of course, they and a lot of them have different terminology for what they call like this one. They’ll do about about to other people say, well, here’s the prize. So this one, there’s some stuff I’ve seen on the DOS side might be applicable to this. This is, let’s say you’ve done some light off scans of something, but you’re missing a place in there. And this is, well, can you gen something that’s good enough to use for training?

Data or something to fill the void of this area.

In other words, if you had a map with a big hole in the middle of it, could you draw something in there that looks good that would pass again the stroll test or whatever for, you know, okay, this looks right.

And it would probably be a little bit more difficult to get into the area.

So the ones that I had found that I thought were pretty interesting.

This one.

Which it looked fairly easy to get started with.

And it’s something that we’ve done a good bit already, which is they’ve got some noisy data.

And they’re trying to figure out how to like take this noisy data, get it configured into your data.

And they’re trying to figure out how to like take this noisy data, get it configured into some get it cleaned or whatever into something I can use and use basically out of the box to train an AI model. Then they want you to actually train the AI model and then document what your approach was or it’s almost like part of it’s building the model. So it’s data cleaning is building a model and it’s writing paper. Kind of. So, so that one was pretty interesting.

It’s, I’m not exact.

Plus, it seemed like they know what they want pretty well, which is always something we ran into in a different challenge we’ve done before, where the judging was a little too subjected.

And even though you got through this thing you were scoring well.

Oh, we like to score. We just don’t like the approach he took or something.

It’s like, well, I don’t remember that being in the judging study. You know, so some of those, if you see something that’s wide open as far as how they’re going to score this thing, I would avoid it because that means they could score however they want. And you really don’t have to be forced because it’s not actually written in any of the rules.

So that one was interesting.

The other one. This one I actually like. Plus it’s when part of the million dollars that that was kind of cool.

And this is actually first forecast attempts. This is really don’t have much time left in this. This kind of thing has been going on for a while. There’s been a, I think there was a previous, it wasn’t the SBIR, but there was something similar to that before. They’re trying to globally through satellite or whatever, determine where illegal fishing might be happening. Because you can imagine boats on the ocean and looking for patterns and all this kind of stuff because they’re trying to find, you got issues where illegal fishing is actually depleting the food source for entire countries. You know, especially if you’re anywhere on an island, in the Pacific or somewhere, it’s just what you do. So the Japanese were going after whales on your Hawaii. Well, there’s an area where they’re protected, but you can hunt for scientific reasons. I like going in there.

Hunting, getting them, but for scientific reasons. There was one other place that it’s somewhere in between like India, you know, down in this particular region, you got X number of miles offshore is still her view of that country. But dead center in this place was actually international waters. There was like completely surrounded by water that would whatever. So if you cross over here, now you’re wide open as far as the rules go. And it was just some interesting, just because that happens, but I can see that being part of because like your planes have transponders, like have transponders in boats, but you can turn things off. Yes. You also got patterns of expected travel for ships of the if it’s actually shifting from point A to point B, you would expect it to at least be in route.

You know, or maybe moving around some storm that’s coming up, but that should be predictable.

So anyway, that’s kind of what that challenge was about.

See if I can cover the next and then jump to it.

That was the this one was the only generic design, but it’s actually those we are already click the other ones just to walk through this one was the, you know, doctors notes, linking back to other types terminology.

And the last one was the challenge for the interior design. But just thinking through things like that we made. I will probably drop something in my internal calendar just to, you know, what’s a quarter go through the, here’s what might be coming up if we care to put together a team or play around with it.

And I’ll try to get this thrown over to the CM chapter, maybe I’m going to end as well because if you’re a group of college students, and you can get in put together a team gets a lot of the stuff applies right along with some of the class work that you’re doing already.

And looks great on a resume.

Even if you don’t win, because you actually have something that’s public that has your name attached to it. So that’s cool. But that I’m going to close out at least the video part, or the recording part.