AI for Code Development – Huntsville AI

Transcription provided by Huntsville AI Transcribe

So, uh, as usual, welcome, uh, Huntsville AI six o’clock ish. We’ll get cracking.

Uh, we are recording this.

We’ll post it later in case you need to reference it or share it with somebody that could be here.

Oh, and it’s the computer logged into zoom doing that.

So I don’t, you know what I mean?

It’s wired directly.

So I don’t know what could be doing it. There was an Nvidia meetup over at torch today.

Several of us were able to make it over there, uh, was pretty interesting, uh, meeting some of the folks that actually worked directly for Nvidia, um, and see what kind of stuff they’ve got. They’ve also got some ties back to their, uh, kind of federal support is kind of what they’re pushing here. Um, so a lot of the stuff was DOD related, you know, that kind of thing. Uh, some of the notes I took, uh, trying to get a schedule set up. So they made like twice a year, uh, which is a little odd for me to configure more often, but Hey, that’s, if that’s what they can do, that’s, that’s great. Uh, so the next one I’m going to look at is going to be in August. Uh, so we’re keeping the line on when that is really bugging me. Um, cable, cable. I don’t know.

So those on the call, uh, is the screen flickering for you guys or not?

No, I mean, this is tied directly into the zoom call from here. Okay.

Not at all.

It might be projector.

I’ll let them, I bet it’s the arrangement of the tables.

Doesn’t match or something like that.

Um, so, uh, anyway, uh, they covered some chip architecture stuff. Uh, they’ve got some concept of a world foundation model. Apparently you have to put foundation in the name of your model now, uh, to be important or something.

I’m not quite sure where that started happening.

Uh, I think it was Stanford came out with foundation models. Now everybody wants to say their model is a foundation model.

Um, if anybody knows of an actual standard of what makes something a foundation model versus not, uh, let me know.

Cause for me, it’s just, it seems like it, it used to mean it’s a big model.

Then I come across some that say they’re foundation models that might be on the 3 billion size, you know, I mean, there’s some fairly small ones.

So I don’t know.

Apparently the one they’re looking at is a world foundation model that understands physics.

Uh, cause what they’re doing, they got this thing called Omniverse and this was pretty interesting.

They’re using that as kind of a, uh, to generate a simulated world, uh, with full up graphics, ray tracing, you know, light the way that you would expect it in the way that, uh, you could do a video fly through of the simulated world versus a video fly through from a drone. And it’s hard to tell the difference. And the reason they’re doing that is to actually generate synthetic data to then turn back around and train some kind of a model that goes into a physical robot.

Oh, it’s no words.

You can’t, you can’t just make the weather do weird stuff because you feel like it, you know, if you’re trying to train a model in the physical world, but if you do it in a simulated world and then transfer that over, that’s kind of the approach they’re taking. Uh, some of it, some of the material from their, uh, Omniverse has been around for a minute.

They talked about it first, a couple of years ago at the 2023 AI symposium here, uh, and interesting. They had this concept of NEM. I don’t know what NEM is.

NEEMO I know is their kind of framework for models and stuff that you plug in.

Um, NEM was a different thing.

It seems like a microservice kind of approach for AI models.

That might be something interesting for us at some point. I just wrote it down and there’s some other company called Vast that did basically, I don’t know if that was a product pitch or what that was. I was confused and wondering why I was, why I was spending 15 more minutes in say, you know, definitely not Nvidia, um, but they did sponsor lunch. So I got a sandwich and okay, that’s fine. Uh, so there was that. I will say though, to your point about that might be useful to somebody on the vast side, it looked very useful for DOD applications because of like their security, like their security posture around Vast. And they mentioned that like Meta, Mistral, um, and Microsoft for the Pi models are using Vast as like their kind of their data foundation for training all of those.

So, um, it definitely would, I think it will be interesting to look to see how that evolves over the next few years from like a security, like a DOD perspective, but with it being a kind of fully functional data pipeline in that perspective, uh, it, it, there definitely seems to be some value there in my opinion.

I have to look at it a little more.

It might be similar to the Red Hat situation where if I can attach it to Red Hat, slide it in, oh, it’s already approved.

It’s already whatever, you know, we trust it because, and even, even when they might not should trust it.

Um, I could see that being the case where somebody is running that Vast kind of a framework or, or set up. You know, hey, I’ve got a model.

You can plug it into this thing.

It’s already approved. You know, I could, that might be a way in for some things. They did mention they had ATO.

They’re very proud of something where they’ve got an ATO on, you know, already operated something that work in like three days. I’m like, okay, great. The only person I know that’s ever done that in less than, well, assuming it’s, yeah.

I don’t, I don’t know the constraints or what they ran through for that.

Anything else from the Torch thing that you can think that we covered, talked about?

Cause I think Jeffrey was there as well. So, um, we got connections to their email and stuff. I’m trying to get somebody from Nvidia to actually come here and talk to us.

Especially if I can find somebody that can talk to us about some of these stuff. They weren’t talking about a CES, like the digit box or whatever.

What is actually in it, you know, versus the propaganda and stuff and see whether it’s, it’s a thing or not a thing. It’s probably a thing, but, uh, we’ve got some questions that we were trying to answer, like going through the Discord channel, uh, looking at a lot of the stuff from Nvidia that we’ve used before has their software stack in it. And you learn how to train using their software.

You learn how to set up your data using their pipelines. And you let, you know, it’s, it’s a whole thing by itself. So we’re trying to see, is this thing going to be able to run, you know, PyTorch out of the box?

Is it going to be able to, are you going to be able to use it with your own pipelines that you set up or whatnot? And I don’t know if we got good answers on that yet, but anyway, looking for that.

Uh, we did a thing, uh, Bill’s not here.

I think Bill is, I don’t know if Bill is sick or he had something going on this week.

Uh, if you haven’t met Bill Carter, he’s one of the guys that comes a lot. Uh, he’s, he’s, uh, involved in a different group in town called Learning Quest. And that is a group of, uh, mostly retired professional folks, uh, that have this organization. I think they’re connected with the, uh, with the Huntsville, uh, library. Uh, as far as that goes, they, I’m not sure what kind of grants they get for, think of it as continuous learning even after you retire. So kind of cool. I mean, they’re a bunch of interested folks. Um, and so we did a session for them for an intro to AI, uh, fund, history and fundamentals, which was really fun. I’ve done the, the history part a lot. Um, so I had 150 people showed up last week and I was, normally the groups I talk to are about this size.

Uh, so it was kind of interesting.

Uh, we did have, there is one thing that I set up specifically because Bill was there and Bill is in the front row and we’re going through the history and we get to Rosenblatt and the Perceptron and I actually have it up on the screen. It’s a big machine, refrigerator sized or bigger, made out of vacuum tubes and all kinds of stuff, you know, because it was 1958 and I said for context, uh, when was the integrated circuit invented? Because Bill, who worked for Texas Instruments, when the integrated circuit was invented, it was right here and he got the answer off. And I’m like, oh, Bill, stop. Um, but that was kind of neat to have folks, uh, in there. A lot of times when I’m covering history, I’m talking to people, especially when I’m talking to college kids, their history starts in the 2000s, you know, and we’re backing all the way up to, you know, touring all the way through and it was neat to see people that were around and actually working when stuff happened. It was kind of neat. I got the slides linked here. I’m not going to go through all of those.

Uh, I did update them though because we, the last time I had done this was like 2019 uh, and the whole chat GPT thing hadn’t even happened yet. Actually GPT hadn’t even happened yet. You know, there’s a lot that that was missing, so I had to go fill that in. Uh, but for tonight, uh, what we’re gonna talk about, uh, is AI for code development and this is basically getting it into a, uh, some kind of a code editor or some kind of a notebook or whatever.

Uh, I’m gonna play around with Visual Studio Code uh, with GitHub Copilot and then we’re going to try to make sure, we’ll see if this works.

Charlie, uh, who is online, has cursor set up as his primary uh, editor that gives you different kinds of, and just look at, look and see what kind of things you can do with it, uh, how does it help, what kind of limits are you going to run into, uh, you know, things like that. So to start with, we went, we’ve been kind of chasing this for a while. There was, I’m trying to remember how long ago we looked at StarCoder.

Uh, that was one from, uh, yeah, she was that uh, Carnegie Mellon was the one that was, uh, kind of putting that together. Uh, that was a few years ago, I think, and that was the latest to greatest and then several other models started to come out after they figured out the instruct kind of a mode and then, you know, um, what was it, reinforcement learning with human feedback, that kind of thing, and trying to drive that more and more. Uh, so it’s, it’s come a long way.

Uh, I think when StarCoder, we were looking at that, that was like a seven billion parameter model, uh, or even smaller because I know I was running it on this laptop, which isn’t a, you know, a high-powered thing.

So, uh, the main basic info is on the co-pilot page.

If I can click my link, of course, they want to be the editor for everybody.

Uh, you know, you can do, and we’ll, we’ll step through, uh, some of this stuff. Uh, contextual, uh, that’s true. Uh, I haven’t really played around with too, too many of the different models to see what kind of answers I might get from CLAW versus SOP, you know, or, you know, the, the OpenAI variants.

Uh, we can play with that.

Multifile edits, real time.

Yeah. Uh, extensions. Uh, this one was a little bit, uh, I was kind of excited about it for a little bit because I spend a lot of time in the terminal trying to, you know, in the middle of things. Uh, lots of glue code, lots of, uh, I mean, it usually, you know, I mean, usually it’s me trying to figure out, okay, how do I recursively copy a directory of files while maintaining the, you know, it’s the little things I can’t keep in my head. They always have to go look up.

Uh, in this case, the terminal piece, you actually have to have a pro account.

It’s not available on the free thing.

So, uh, you don’t really get that until you go try to install it.

It’s like, oh, sorry. Um, so there’s that. Yeah.

Sorry, I’m muted, but I was going to type it in the chat. I did some testing with, um, co-pilot and quad and chat GPT for a hackathon and, um, we tested it with, um, crap. What’s the thing where you can go and like run your code?

I feel like it’s, um, uh, for, it’s something to prep for interviews.

It’s going to come to me later, but we actually found, um, some interesting stuff there where the code was slightly different, but it ran at about the same optimization. Is it like CoLab or, uh, some, something else?

I’m going to go look for it and I’ll put it in the chat because it’s bothering me because I know it. It’s just on, it’s on the tip of my tongue, but I can’t remember because I don’t use it that often. Yeah, um, it’s kind of like me trying to remember whether how to recursively copy a directory with files and keep the full up time. But anyway, um, yeah, let me know. Um, and I’m thinking it might show me if somebody pops, uh, you know, drop something in the chat. But, um, if I don’t come back around to it, let me know. Uh, I haven’t tried, uh, the other piece, which is more like an app on your phone that’s connected in so that you can actually ask, you can actually use it that way.

Um, because where I normally am sitting, I don’t have a phone with me while I’m doing development. Um, it’s outside the door, um, in a cubby. Uh, so there’s that. Uh, so here’s one of the interesting parts, the free tier.

Uh, and this is the one I’m on because until I run out of, uh, you know, requests or something, I might as well see if it works well enough for me.

Uh, so it’s got 2,000 completions and 50 chat requests per month.

I don’t know if there’s a way to track how many I’ve made or whatnot since I’ve started.

We’ll see if I can get to that.

Probably ask it. That would cost you. Yeah, that’s great. Does it know about its current response? Plus one or?

It’s painful.

Right. Well, it would be great if it says, I don’t know check back in a minute.

You know, the whole genie, you got three wishes. Oh, got me.

Uh, the pro is $10.

That’s not bad.

Uh, for, from what I can tell so far, it’s good enough that for the free, for the free side, if I did run out the completion to whatever, I would probably pay the $10 a month for it.

Um, just because the, uh, one of the things I’ve asked, and it actually was helpful already.

Uh, if, if there’s stuff, and this is just my kind of opinion perspective. If you’re working in a code base that you know, or that you wrote, or that you’re familiar with and you do this every day and the language that you’re comfortable with, you know the syntax all over the place. You know, I mean, if you’re good, I don’t know that you’d really need this as much other than some nifty things like here’s a class, please build me a unit test suite. That’s, I really don’t like to do that anyway. I’d much rather something else do it. Uh, you know, and if you don’t have interns, you know, maybe that would do it. Oh, where it really shines is when you’re trying to pick up a new technology or a new package, or a new, uh, you know, a new language or something like that. And you don’t have that, you know, five years worth of, you know, intuition. Oh, hey, this is probably how this works. It is really good at going and finding the stuff that you got close enough to where it almost compiles.

Um, I’d love to see this for Rust.

I wish we had Alex here.

Um, oh yeah.

You think it would work with Rust?

Yeah, there’s a lot of Rust code, like a lot of really well-written Rust code there’s a code that’s right trained off of, so or not. Well, it has to be well-written.

I think the borrow checker.

Well, I guess for anything about the borrow checker, right, I think it would work. That’d be fun. I bet it would work for Rust but it works for the X86 assembly. If you know of a GitHub repo out there with Rust, um, I don’t know if we can do it tonight, but one of the things I asked Charlie to do, what I’m going to do, I’m going to go, what I’m going to look at is something we did years and years, a 2019 time frame, um, in a Jupyter notebook and we’ll go, we’ll go and aim at that because I haven’t touched that in a long time. I’ve actually had Charlie go look at a code base that he’s never seen before and load that into Cursor and see what it’s, you know, I mean, just, just, I think that’s one of the key, uh, advantages is to help you get up to speed quickly on some things. I trust it with Rust more too because the compiler is, if I like, you’re going to run, build C with this thing and really trust what the output is, like the compiler is at least going to throw a fit with, with statically typed things with real type system, right? Right. Yeah, um, it is, if it’s, I wish I could show some of the stuff that I, that we were starting to get pulled in, uh, at work in my day job.

It’s similar to what they’ve got in here in the co-pilot.

I’d be, it’d be interesting to see, it’s got some things like, Hey, let me select a large section of code and say, explain this to me and actually have it walk through step-by-step reason.

I mean, it’s got a reasoning model behind it going through what each thing does and how it works, things like that.

You can highlight a section of code and say, fix this for me.

Oh, it’ll actually go try to fix it in your editor and then give you a accept or, you know, I mean, you want to accept the change or not accept the change. And it’s, it’s pretty interesting. Oh, a lot of the way that they kind of couch some of this is, and I, I could see some of the, some of the use of this, some of the, some of the stuff I play around with, but some of the way I, I’m basically the only one hacking in some of this.

So imagine if you had somebody else on a code review or another bot or an agent or something, just kind of watching as you’re, as you’re going, you know, to kind of catch some of the small things that you’re, you’re going to waste time on anyway, you know, and it may not catch giant logic errors in the design of your overall application, but some of the smaller pieces will save you some time.

So hey, this is probably not what you meant to do because 99% of the other times it sees this, it’s not used that way.

So either you’re special or in my case, well, I’m special in a different way.

Oh, you know, thinking this will compile.

Oh, it really bothers me when you get to business and enterprise and there’s a contact sales, you know, normally that means I can’t afford it. Kind of like the menus without prices on them. Yeah. So what I wanted to do, let’s see, code it, they’ve got Visual Studio Code plugins, JetBrains plugins, NeoVim, which I swear I’m going to have to pull out of work just to mess with people if I can. I didn’t know there was a thing, but apparently there’s a thing called NeoVim that is… Serious. Is it? I use it?

I’ve got three people on my team that swear by it. Oh yeah.

But I use it. Really? I mean, it’s just them. I’m good at them. I didn’t know there was, I didn’t know we needed a new film, but… Oh yeah.

It works in your API. Used to maybe I mean, it just works better with plugins and stuff.

So like, if you use them, like you know, the exact same thing, just remove one and just all the other. Okay. You’ll just do it together. Okay. I might try that.

I forget how long ago they forked it. I don’t think NeoVim is technically the same code base now.

Okay.

I don’t think pretty much since you have like Python back end or something like that. Lua.

I think.

Yeah, Lua.

Okay. Yeah.

Okay.

Huh.

I don’t remember the history of it.

I switched a long time ago.

It was faster to start up. It ran language servers better at the time. Okay. Yeah.

As long as the interview question, you open up film and get somebody in and see if they can hate it. You know, for me, usually I’m working code.

I’ve got at least, I’ve got a matter most thing up. And I’ve got a team’s thing up. I got at least one other chat then or whatever. And half the time, I forget which window I’m in. I start changing directory somewhere, but I’m in a team’s chat or whatnot.

And then of course, my always, I get flamed for that a lot.

Of course, it’s, you know, control Q. Let’s see.

Get a settings page.

This ought to take me to my settings, which this is where I was thinking, maybe it has something that tells me how many.

One thing I did was I disabled the use of using my data for AI model training.

Assuming you trust anything that you see on the internet, but it’s a box and said I could, you know, turn off so I did. And then also, I think they probably have enough data for product improvements. And the way I use it may not be the way that they, you know, I’m just hacking around at it at the moment. I turned off access to Bing. I don’t really know why, other than I barely ever use Bing. So I don’t, I don’t know.

Oh, and then the cloud side, it is basically a preview mode.

I don’t have any organizations.

That’s about it. I was hoping it would show me something about, you know, having requests I had used or whatnot. Also, they have a lot of examples.

So some of the things, this might be interesting for Jacqueline for hackathons.

We had also done something a few years back, basically trying to create a hackathon template. So if we went into Hatch or Space Apps or something, here’s at least a theme that spins up the web server using Streamlit and hosted here, you know, just the basic shell of a theme.

Along with one model that does something that you switch the model off, whatever you’re trying to build, and you don’t have to worry about all of the other junk about how do I host it, how do I, you know, all that stuff. Some of the things that I actually wind up answering questions on from a mentor perspective are things like, how do I handle a rate limit because they want me to use this NASA data for my NASA Space Apps challenge, but the API I’m hitting limit speed to a thousand requests, you know, an hour, and I’m on a 48-hour time limit to work a hackathon and nobody that set up the hackathon thought about this beforehand, you know, that maybe we ought to relax the rate limit for folks that are in the hackathon or just provide the data without making them go through all this. But being able to highlight a section of code and say, hey, can you rewrite this to actually work around a rate limit and have it automatically go refactor that, that’s kind of neat. Lint errors is a pretty easy one, but again, things like that is pretty useful.

I’m still not quite sure, and Jack, this might have been one of the ones you were looking at, where you can actually refactor code for performance and say, hey, find a better performing way to implement this.

I did that at work the other day.

We had something that was moving a lot of data into a map structure, you know, a key value pair kind of a thing, and the one thing he came up with was if you’re working with a lot of data, go ahead and size the map first so that you’re not like, yeah, we probably all do that, you know, because we were dealing with like, you know, 20,000 things coming in at a time, and yeah, there’s not that much in, you know what I mean, we’re probably paying a price for it. So that was kind of interesting. Going back to my main list.

All right, and then let me crank up Ubuntu, and I do think I’m running into something a little odd that I haven’t chased down yet.

Uh, in that I’m fairly certain the Visual Studio, the plugin knows I’m actually on a Windows machine, but it doesn’t know that I always use VS Code from a Linux, you know, WSL perspective, because the other day I was trying to do something, and it put wrong slashes in instead of, you know, and I’m like, whoa, what is that?

That’s, that’s weird. Um, let me go find the actual questions answers.

I think that was the one, so let me actually revert all the changes I made to this one, but that was 2020, and that was January 20. Okay, so I can find it. Let me get this open and then I will, uh, drop in one, two, I’m looking for, there we go.

And I think I need to, not sure why it thinks it’s already changed. Okay, so is that readable, or do I need to, is that good enough? Okay, um, let’s see, the Jupyter, I think that’s right. So this is a, uh, this is actually a notebook we did, uh, for a topic modeling talk we did years ago, and as a, what we did, and I don’t know why we picked what we picked, other than it just happened, and we grabbed a document that was out there. This was a confirmation hearing for a Supreme Court judge. So it was, uh, Amy Cohen, I can’t think of the name, but it was one of those.

Um, and so what we did, we just pulled in the word document as is and pointed a model at it and said, none of us in the room, actually, I think the reason we did it is because nobody that was at the session had actually watched any of the proceedings, you know what I mean, so looking at it cold, what were the topics? You know, what was talked about, what was discussed, you know, just NLP kind of stuff. Um, this point, I should be able to execute a cell and at some point I’m going to run into something that isn’t actually going to work and we’ll use our nifty co-pilot thing to see, ah, come on, there we go, let me get rid of this. Oh, I’m all the way at the bottom of this, okay, go back up.

All right, so first thing, we’re pulling some dependencies, which will be fairly simple. We actually have the document that we tried to load and it’s given me a problem, it’s giving me something for this has no attribute, get iterator.

So you can either do that, you can code, what I was doing the other day, I hovered over and said, fix using co-pilot.

I clicked the button and it was nice enough to tell me that, hey, the get iterator method was deprecated in 2.7, removed in 3.9, I should use the iter method, you know what I mean, it’s, it’s true, it’s right, you know, that’s probably what I would have gotten if I went to stack overflow, after I scroll down past the, you know, why are you doing it that way, you shouldn’t even do it that way, why are you, you know, all of that mess.

So it actually makes the change for me, changes this over to an iter, I wish it had done the next one as well.

So I can actually accept the change, let me try to run it again, it’ll probably fail on the next one down.

Yeah, so I don’t know this, change that over. All right, so then we’ll just check and see, actually, this is just printing out some, okay, Amy Cohen and Barrett sending confirmation during day two.

We pulled in some Regex stuff just to get things into groups, uh, pulled out keys and values, this one I think also has some kind of an issue with it.

All right, so now I’ve got a count vectorizer has no attribute name, and this was using scikit-learn, which fairly, fairly common across a lot of different, you know, it’s more machine learning, data science kind of stuff.

But I wasn’t quite sure, and it doesn’t, my, what do you call the thing that actually puts squiggly lines under things that already knows are bad?

I don’t know if it’s linting or whatever, but it didn’t even catch it. So let me go down here, and I think at this point, what I wound up doing, I’m trying to make sure, at some point, I did something like, copy this over to my, I don’t want to install that. Oh, so apparently the newer versions of scikit-learn uses other, let’s update your code. Here’s an example.

It is kind of, I wish there was a little bit better interaction between, it feels weird to me that I have a chat window over here, I’ve got code over here, and I keep having to kind of jump between the two.

I do kind of like it when it’s actually in line in my code, it pops up and I’m like already looking at it, I have that in my context. One thing that is nice with CodePilot though, if you’re not in a Jupyter notebook, if you’re just in a regular Python script, if you get the error in your terminal, you can do, I think it’s control ID, and it will automatically pull that debug error along with your current file that you’re working on, and do this instead of having to copy paste over, it will automatically do that for you.

All right.

Oh, it has no stop words.

The last time I was working through this, I asked it, hey, what do I do about this? It was far enough to remind me that I don’t actually use the output of that anywhere, so it’s dead coding, I probably ought to get rid of it anyway.

I was like, that sounds more like a Stack Overflow answer.

You complain about something like, why are you doing it that way?

So that gets us through that, pulling in. At the time, I could never exactly figure out, I know LDA, but the pronunciation of whether that’s Derek Lay or what the actual name of that is, is kind of fun. Yeah. So anyway, it actually ran through, gave me a perplexity and likelihood, and then it came up with 14 different kinds of topics that were in the text itself. Some of these, if you remember back to what was going on at the time, yeah, that actually tracks to what was in politics and all the other stuff going on at the time. You’ve got Woman-Woman Pro-Choice, you’ve got Renewable Energy, EPA. I have no idea what Notre Dame in the wrong order and Dollar Tree Court in Vivian. Apparently that was somewhere in this whole document. It’d be curious to go back using this and go, what the heck? But you really don’t know what some folks are talking about. Just looking at the total number of the primary words in the topic.

Yeah, I can see a lot of those coming out.

Yes, I’m guessing daughter popped somewhere on some other kind of conversation. Here’s another one from Module Not Found. This one, let’s see if I can fix you some co-pilot.

This one actually, in this case, I actually had to leave co-pilot and go somewhere else and look this up because SK Learn actually is this P, and it doesn’t surprise me too much because the Pi-LEA visualization tool isn’t super widely used, so you’re not going to find a ton of examples for anybody to train, for it to train code on how this thing is working.

It doesn’t really matter that much.

I don’t think it’s, I’m not sure it shows up well in this anyway. When you say it doesn’t know, does it make things up until you figure out it doesn’t know or does it tell you it doesn’t know?

No, this was actually telling me I needed to install the package. Which you can see was wrong. I have the package installed already. The actual problem was that SK Learn got renamed to a different package name within Pi-LEA this.

So the actual answer was instead of SK Learn, you replace this with whatever.

Let me see if I can’t. I don’t know if you guys have the chat say this is wrong. What do you think? I feel like an element would cop that.

That’s the type of thing you didn’t notice. Should we try calling?

Let’s give it a shot.

Oh, that’s just my call.

It’s trying. Let me switch back over to go see if I can find the… Gemini will tell you. Somewhere, I think this was it. SK Learn, LDA model. So instead of SK Learn, replace with LDA model. And that was it. So the whole LDA this thing, it actually gives you an interesting way to hover over.

Let me actually see if I can think this bigger.

Not really related to this talk or anything, but you can actually hover over a topic and it shows you within this subtopic what are the primary terms and what rate they showed up, what weighting they would have.

So some of the smaller ones, challenged, renewable oil, not much there, abortion, pain, child. Apparently somebody named Samantha. I have no idea. So the top one may basically get a lot of them. So anyway, that was that point. We are 15 minutes left.

Let me throw it over to Charlie. If you’re there, let me check first, I guess. 40 minutes to see.

Okay, I will stall for a second.

Yeah, that’s pretty interesting. All right. So Charlie, let me know when you’re ready. And I can make you the host.

I’ve got another thing I can do if you need me to stall for Lockerd. I’ll let you work on your mic. I will jump right back to you. Just interrupt me when your mic works. Another interesting one that I was playing around with was actually Gemini connected to CoLab, which again is something that we use a lot during especially hackathons. When you got like high schoolers and some of them, you never know what kind of hardware they’re going to walk in with.

So being able to shove something out on a CoLab and everybody can kind of be in the same place is a really good way to do it.

See if I can actually get, there’s air in here somewhere where it actually, you can actually prompt Gemini, do all kinds of stuff for that.

There was a different one that actually came up that I was going to play with. I didn’t know they had Gemini CoLab now.

There’s something similar that, it wound up being more of a question than something that I know. It was similar to what you were talking about where you just click the air in the console and it pops up and it tells you, this one was something similar to that, except it popped in next to where the Gemini box is, but it’s definitely feels like it’s using some kind of a model behind it to actually go find answers for you. All right, at some point I’ll get there. I don’t remember if it was, I think it was on this line somewhere.

And this is the one early on before you had good big models.

We were actually using the spaCy library to do like semantic similarity between sentences and phrases and stuff like that.

So yeah, pasta, I mean, apple and banana are at least foods, pasta and info are not. Maybe this one, I’d be a little upset if I don’t even get to the actual, somehow this works. Charlie, interrupt me when you get your mic working. That’ll be a good indication that your mic is working. He’s busy eating a hippo burger. He’s busy eating a hippo burger with pasta. We’re also playing around with word movers distance.

Oh, this is what it’s gonna die on, I remember.

Cause spaCy changed the way that they do pipelines.

So I’ve got this value error thing, blah, blah, blah.

So it’s got explain error and it actually drops over here to the side.

And it goes through its reasoning of the hypothesis about why this might be in here.

It expects a string.

It’s because in spaCy version three, which was after this was written, add mic has changed, blah, blah, blah. And it’s actually got things. Of course, every time it provides code that you could copy, there’s a little line that says use code with caution along the link to why you should do that.

In this case, it’s got an actual setting that’s different.

So what it wants is this guy, which looks to be the same.

Oh, okay.

And I said the name, so somewhere up here. Ah, it’s way different. So it’s wanting that. And I’m gonna collab, why not? Well, it’ll be based on, okay, yeah, yeah, yeah. But it actually gave me some, what we were doing, and this comes from one of the original word movers distance papers. The politician speaks to the media in Illinois versus the president greets the press in Chicago. You know, Illinois and Chicago, not the same word, but close. You know, politician, president, those are close, speaks, addresses, you know what I mean? You can say the same thing with different words. And so the first two similarities came back at a 0.75 versus up to one, whereas the next one, I do not like green eggs in half, has nothing to do with any of them. So here’s that. So I don’t really know what model is behind, I think I plopped the same thing into, let me go grab what we put into here.

Oh yeah, they don’t do a real good job distinguishing theirs either. Did you mean like, is it one five or two of experimental? Well, we also have now they have flash thinking and flash thinking experimental, so there’s a bunch.

And they just label it Gemini all over the place.

You’re like, I’m using Gemini, which one right now? That was one of the things we’re also running into with like co-pilot.

You’ve got people brand, I mean, it’s like Microsoft branding co-pilot is now like Office 365 on, I don’t know.

It’s, I even see commercials about, do this with co-pilot, like that’s Excel.

Or, you know, I don’t think this is the same thing.

So it seems like Gemini gave me the same answer. How to fix it.

Yeah, adding, using the name along with sources. So that’s, yeah.

I’m sorry. Come on, let me check in on Charlie again. Let me check in on Charlie again.

Oh, admit.

Sorry, Charlie, you should be back in.

I think. Can you hear me now?

Just let me know. I thought I saw you try to talk. Yeah, can you hear me? Yes, hi. Hi.

I couldn’t stop sharing and make you the host.

Yeah, for some odd reason, I couldn’t get back in. Yeah, I had to start gatekeeping the Zoom call. Oh, right. For reasons.

I will make you the host.

Okay.

And we’ve got, I don’t know how long, five to 10 minutes, something like that.

Okay.

If you want to go screaming through, go screaming through Cursor. Sure, let’s do that. Okay, so first off, Cursor is, it’s completely free to download. It has some payment plans, but it is a fork of, let’s see, is that?

Yeah, you’re sharing.

Okay, is it sharing the right screen? Uh, it looks like it.

Maybe?

I’m seeing a product recommendation, that thing I sent over to you on WSL.

I don’t know what Cursor looks like, so.

Okay, all right, good.

Does it look like VS Code?

Yes, it is.

It is a fork of VS Code.

So all of your VS Code plugins will asterisk work in Cursor as well.

So what we decided to do was jump to a repo that I’ve never seen before, and I don’t have the first clue to do, so what to do.

So I’m going to bring up my AI pane. It does look similar to what VS Code offers you, except that it has a lot more control over what you can do with certain models. So let’s see here.

I’m just going to ask, by the way, I’m going to use Claude here for this first one.

I’m just going to tell it to ask, tell me about this repo, and I have the option to either select, add in its context, whichever files that are in the repo, or I can search the entire code base.

Hopefully everyone can see what’s coming up here.

Let’s see, all right, a product recommendation system project that focuses on building and evaluating recommendation algorithms. Jay, does this look… That is correct.

Fantastic.

They mentioned a library called Implicit.

And then, yeah, we tried Surprise.

It wasn’t. Surprise was a Facebook algorithm that worked worse, I believe, than Implicit. It was. I’m wondering what evil engineer actually said, hey, we’re going to dump this on the public. Let’s call it Surprise. Oh, it doesn’t work well enough to keep it proprietary.

We used Kedro as our data pipeline, and you remember that, Ben?

I know you.

Yeah, it’s just like a data pipelining reproducibility tool.

I haven’t used it in a while, but it worked quite well for that.

Yeah, it was good at having data at certain stages and knowing what changed, and think of it as a declarative way to do pipelines, where you can declare, I’ve got this class that has this input, and the outputs of this thing go to this thing.

Now wire them together.

Okay, now I’ve got a different experiment. I want the outputs of this to go back to here, which is going to circle all the way back around, provide other training data to some other… It was pretty nifty. I don’t know if anybody still does it that way or not, but from a data science pipeline, it was pretty solid.

So I’m honestly really impressed with how much this is breaking down the project here. But let me see if it can find something in here. So I’m just picking a file at random.

I’ll just add it to the context and say… I can type.

Oh, you’re asking for errors on my code. Ben’s job is usually to fix my errors when we’re doing live demos. Shouldn’t be any. So we need a Ben agent.

What’s that?

I wasn’t asking for that.

That’s a lot of code. Yeah. Well, at least with this, it’s going to take in as much of the individual file that you add to its context. And it looks like it’s completely rewritten this.

And if I want to, I’m not going to.

You can either apply it directly or you can copy it and paste over. The apply function only works if you have a paid subscription to Cursor. But honestly, if you have at least one API key that you’re paying, if you’ve got one with Anthropic or OpenAI or whatnot, I tell you right now, it just don’t… I would recommend not to get the subscription from Cursor.

But it does pretty well. I actually wrote a program in using the Qt library completely using Clod, because I never used it and I wanted to see what it would do.

So it does have some of the same features.

Say if I had some output here, I can highlight and just send it to chat.

There’s another thing called Composer, which also that and Bug Finder, they’re also part of the paid subscription.

But you can have it do more direct writing of your code that way. But what I want to highlight here is some of the stuff that’s under the hood.

I want to show you just how much more you can get out of the Cursor. So let me close that here, close that. So you’ve got your general account. You can import all of your stuff from VS Code if you want to make a switch.

You can give some basic system instructions to whichever AI, sort of like put that as part of the prompt to add right here.

Privacy mode, if you don’t want to send Cursor your information or any of your searches or anything like that, I always leave that unenabled. Here’s the fun part. You can have access to as many models as are available from OpenAI, Anthropic or Google.

If you have an environment in Azure, you can add that as well.

So I’m able to use several models from Claude, from GPT, even Gemini.

I can show you what that looks like here.

So I just create a new chat and it also saves your history if you want to go back and, hey, I thought of something that needs to go on that particular problem I was working on before.

So let me scroll down to Gemini. And at some point when you ask it, what changes should be made to move this repo from a Python 3.8 to a Python, what are we at? 3.13 just dropped not too long ago.

Okay, I’m always late so make it 3.12.

Okay.

Because this is probably, this is another project we may want to play around with because at the time we were using Implicit because we were just trying to do kind of not collaborative filtering, but we don’t actually get feedback from any products.

Well, the intent was if somebody walks up to a counter with four products in a basket, what is the fifth product you should recommend?

Without knowing who they are, what history they may have, all you know is other baskets that came in with different collections.

So I’d be curious to just load the whole freaking thing into a context somewhere and just ask GPT, hey, what should I recommend?

Well, see, I just did this with Gemini 2 with their brand new free model, the flash thinking experimental.

Yeah.

So it’s saying, okay, let’s update the Kedro version.

Let’s see, update Python version and Docker files in the dev container.

Update your dependencies. I’ll just tell you to test my code.

Okay. Now see, if I want to ask it, say, if I want it to do, let’s use a different model.

Does someone have a favorite model they like?

Oh, nobody here.

All right. Opus.

Opus?

Sure.

Why not?

I have no idea what it’s going to say.

I don’t actually use Opus. So that seems a little more straightforward on what you need to update.

Yeah, I’m interested to know. So they say update Kedro. What I’m wondering is if they actually talk about what actually changed in Kedro to go from a previous version to the next version, kind of like what we saw before where packages got renamed or something.

It’d be nice if it would do a heads up. Oh, well, okay. What was it?

What was it?

Oh, 18.

And they want us to move to 18.4.

Just for fun.

I’m going to switch to GPT. So, yeah, you can switch back and forth between the different models at will in the same window.

Let’s see.

Oh, there is one other thing that you brought up, Jay, that made me realize there’s something else that we haven’t touched. What was that one library that you were talking about before that you had to go and find information about? Hi, LDA Vis. If you know where that is, you can embed it inside Cursor. All you have to do is find whatever the web address is of some of the documentation Oh, okay.

Just add it in. It may, depending upon how large it is, it may take a little while. And if it turns out that I have, it looks like there are a couple that I need to update.

I can just go through.

It’s already indexing and updating these. Mine would only do that when I’m in a hurry.

Right. Right.

I haven’t yet found a limit to what it can index here.

Okay.

But it does take all of that into its context whenever you’re asking questions.

So let’s take a look back here.

How about this?

Let’s go to, I tend to use Sonnet whenever I’m being conversational.

And I’m asking this because I actually have the standard for docstrings in my embeddings.

Okay.

Yeah, that’s nifty. One thing, one thing I will suggest if you do decide to use this, if you’re going, if you decide to use Cursor, there is a problem. Well, not a problem, but the folder structure is very hard to read natively.

I always recommend this material icon theme.

It just, with the way that plug-in is, it really makes all of your folders stand out in Cursor. I feel like I need it more for Cursor than I would for VS Code.

Let me flip back over and we will kind of wrap this up.

All right. I need to stop sharing. And I have seen, there’s been a couple of code editors that will now allow you to search with, use Ollama in your AI chat windows. So if you really want to take everything offline. If you want truly horrible recommendations.

So that’s, I would mention, I played around with a few others.

There’s a CodeGPT plug-in.

There’s some other plug-ins to VS Code that for me, it seems like more of a, I added a chat window to your VS Code. And I’m like, well, I already have a chat window. I don’t know that having it in this box versus that box really buys me much.

Cause I already have to copy from one to the other, you know, that whole, you know, user interaction kind of things, you know, a little weird.

So any, any comments, questions?

I’ll flip it over and hit the chat in case folks online have anything on code editing.

I wish we had gotten, had some conversations with Michael who works at Intrinsic, which is a now an alphabet slash Google company. And they’ve got their own code editors in their own stack and all.

And what I hear from him, from what they can do with some of the code gen is pretty, pretty impressive coming from somebody that doesn’t trust any of it. So it’s kind of, you know, it’s kind of like, oh, that’s pretty interesting.

And I know I’m starting to see this more and more at various companies around town.

Getting it at least. I don’t know anybody yet that has a model like this in a classified environment, but I do know several that have it in unclassified type environments on, you know, different levels of controlled unclassified information type stuff.

Let me stop.

One thing I will say for cursor, like cursor versus co-pilot, cursor is like composer feature that Charlie was like highlighting over.

That is incredibly good.

Like if you’re starting a brand new project and want like multi-file creation, composer is way better than the co-pilot version, which is called edits.

But that’s to me, I’ve played around with both of them quite a bit, but to me, that’s been, that’s like the only feature that I like love a lot about cursor that would make me go back.

But I ended up using co-pilot because it is only $10 a month.

Instead of $20, but the composer feature is like very, very good in cursor. Let me stop recording after I remember where that button is.