Transcription provided by Huntsville AI Transcribe
How we got here was that we’ve been using this library, Lama CPP-Python, mostly to introduce a RAG system which is retrieval on the generation. We can go back a bit if we need to and cover what that is and what should be with it. We might if we get through this quick enough.
So the Lama CPP-Python library lets you run the same types of models that you would need for a chat GPP, except you can run them like all laptop.
So if you are somewhere, well I’ve seen this being used a lot by people that are trying to just iterate through something quickly and they don’t want to sign up and spend a lot of money on an open AI connection and all the tokens and all the other things to be able to quickly turn something and build up an application.
The other use for it right now is that well if you are one of those that work in a place that does not have an internet connection, which is a thing here, you might need to be able to run something locally without going through an API.
Also, I didn’t expect it, but there’s their actual private companies as well that due to the sensitivity of the stuff they work with, in this case it was an attorney, it could be a medical thing or something where you are not allowed to send this information across this network depending on what type of server this is going to be on.
So it’s not just Department of Defense types of, occasionally it’s all of these other restrictions as well. And we’re actually probably behind the like Europeans on this.
They got way more rules than we do now.
We have to have a data, how you have to restrict access to things that aren’t necessarily defensively. So with that, this along the CPP Python module provides an API that happens to be the same API that OpenAI provides.
So if I build something using this model and run locally, I can then take that local model off and point it back to OpenAI if I want it to go like a full production system.
In other words, if I want access to, and with OpenAI I can pick one of them.
So you can T3.5, well 3.5 Turbo do it.
I don’t think I can hit four yet.
But you can do all that kind of stuff.
So the thinking of it as a way to build something in a modular fashion where I can remove a piece and put in another piece later. Because there always come up new models. And it’s neat, but new models sometimes are expensive.
So what I wanted to work through today was actually much, I wish I could have just created an account from scratch to walk you through what that’s like.
Basically you create a login, you tell it who you are, that kind of thing. You can go into, I believe it’s settings and you can create it right. For this I created an organization that’s just me for now.
If I wanted to add additional people to the organization, I could have them create an account and add them to this organization. You can set things like billing limits for an organization, which I would advise always set limits every time, always for things like that. You can say which particular things are accessible to members or which things you can do organization only. And of course billing.
Right now I’ve got a mindset up as a pay as you go kind of thing. And the way this works is I got it set up to go with like $10 at a time. So if it runs out of that $10, it’s a charging note. I’ve also, and this is a guide set for this rate limits.
I have it set up so that my maximum for each month right now, the maximum usage is $120.
And apparently if I’ve got, I’ve got it set up for $20 where it will actually stop, stop working, it’ll actually turn me off. And I also got it set up send me emailing across 15. This extremely low, this is kind of a toy project playing around with.
The other thing that you can do is create projects, which right now created one. It comes with a default project, which is you can pick this one. This is basically just a think of it as a sandbox or the playground, try all the things. If you wanted to build up a specific project, just to do one particular application. The one that, the one that I did for right now, if I can figure out how to get back to it, there it is.
Okay. I call it a space apps because this is the main thing we’re working on with this retrieval augmented generation system.
So for the actual project, you’ve got a project ID that will show you where you use that.
And then it, you can also set different things as far as call it project by project.
So if I was a business and let’s say I was a business model, that I am going to create an application that allows markers to create marketing material for data projects.
I might actually create one of these projects for each company I work with.
And that way, if, if something goes crazy on one of these, it doesn’t affect all of my other projects.
You might have running at the same time.
That also keeps things separate as far as charging because sometimes you may actually want to pass that cost back to your, your client, something like that, but actually gives you a good bit of granularity. And then the other thing you can do is open AI has a lot of models that are available. I was thinking this was straight limits.
And there we go.
So Dolly would be the one that does the image generation from a prompt.
The whisper model is one that does, it’s actually the same model we use in the transcription service from Monster.
Of course, you’ve got all your GPT’s.
Some of these, I’m not sure what they even are.
Some of the embedded models, those are used kind of in the front end of a large language model.
So if you think about it, in order for these models to work, you have to turn words or sentences into a list of numbers that they know it will calculate.
So think of the embedding is the thing that takes a phrase or a document or a paragraph and turns that into, let’s say an array of $368.
And then so, you know, the last paragraph I said, now it’s one read.
And so that’s, that’s how some of these embedding models work.
But as you can see, there’s a lot of available models.
Each one of these models has a slightly different price model. So some of these like whisper, I think an hour’s worth of audio, they charge like $0.36 transfer, which is interesting.
But these others, they kind of shift over time. They come up with updates and say, hey, now we’re actually, some of the stuff they’ve done, they actually lowered the cost of some of their newer models.
Then the previous ones, it’s like you’re using the price as a way to get people to move off of the older models to the new models question. The other thing, I don’t know, this is just an assumption. I’m guessing Josh would be, you might know, some of the newer models, even though they are larger, are any of these more efficient than the older models? You know, isn’t how much worse power it actually takes to run them?
Some of these, you can tune them or bring them. They’re doing a lot of interesting things to make them more cheaper to run. But I don’t know if y’all know that or not. I just thought that I had.
Yeah. Actually, the one that just got released yesterday is called a GPTO.
And it is a good bit cheaper than GPT-4 while being smarter.
And it seems like they’re doing something crazy with the sparsity and multimodal.
That seems to make it more performant.
That’s what they’re calling it GPT-2.
But that’s like GPT-2, not the second GPT yet.
So it’s like a new architecture almost stuff like that.
Is that this 4.0?
Yeah, it’s pretty nuts.
I wasn’t aware of Tyler was talking to Tyler. Tyler’s been using it. He was here earlier or hanging out in the lobby. He’s been using it and is quite impressed with how fast that thing is running versus the previous version of GPT. It’s like he’s almost, it’s almost like he’s giving them info before he’s finished typing this thing into the product.
The real crazy thing with GPT-4.0 is that it’s text in, text out, but it’s also image in, image out. So like where GPT-4 right now, it calls out to Dolly or like Stable Defeat or something like that.
This is like, it is generating the image itself, not doing a function call, which is too insane.
Yeah, it’s quite insane.
Yeah, I don’t know where we’ll be next year. So you can on a project by project and this is generally a good idea just for, think of it as project hygiene to only allow the models that you expect to be used for this particular project.
And we’ll talk about API teams in a minute.
So API teams like your secret name, unless people use, it’s like it lets applications act like they’re new on your thing. Like I know we probably cited into something, you’re giving a Google account or whatever, just because you have the Google account, let that, you know, let’s see based on that.
It’s a similar type of a thing where you can create a key.
I can see over to another developer making use that key to access my project and open AI, maybe developer, maybe application we put out there somewhere.
And so there’s, there is a risk always of if somebody else cracks a hole in your key and then they start using it.
Now all of a sudden you’re paying for times that is going to them, not for your project.
So we’ll cover a little bit about that in a minute. But so it’s part of that if you have a project and you’re like I’ve got this one set up to use GPT three and a half turbo.
So it’s, it’s less likely that somebody gets a key that was going with this project.
If they’re trying to use anything else, it won’t work.
And I actually learned that the hard way because I didn’t know I had to add model. So the first time I was trying to use it this tell me I couldn’t like know this key was project and I finally figured out know the model you’re trying to use has to be on bliss. And of course I was trying to figure out where is the list because I believe if you deselect it.
I don’t know if it’ll let me save black cancer anyway.
Initially the list was blank and it didn’t really, it was kind of hard to see that this is something I need to add to. But the other thing I didn’t realize is you can also create a list of blocked models. So if you had one model that you wanted to block, you know, maybe you want to give somebody access to all of them except for a few. So that’s about it as far as how to set up the project of the things you’re going to need is this.
Actually, I guess I could go see if there are settings project and then I’m trying to get back to see where I had set my actual key for this.
So the next one goes with the either of my bad.
Your API key.
Like if I create a new.
Secret key I don’t know yet.
So when you create a key, it doesn’t show up underneath your project, but you can pick.
I’ve only got one project here.
The next thing that you can do is I haven’t played around with the service account versus a personal key.
A lot of times things that run for a lot of things and you can think of them as a processing pipeline where you may have one stage that maybe pulls in the latest data set from a from a run of something.
Let’s say I had a test documentation that I got from a project.
So the test finished project documentation gets done and directly another piece of pipeline that picks that up.
And let’s we call it chunky where you actually look through the documents and put these paragraph by itself and then you’ve been though you have to do a lot of things that way. A lot of times those things will run the service and they still need to send you the same kind of a key to go back and access. I’ve played around with that much with this one. This is where you could name the key you tell what project you’re with or you’re using.
So I’m going to go ahead and say throw away. Actually, before I do that, a vacuum air mint remove a move that and then start up. Okay, go quick while we’re waiting. Oops, wrong one. So I’m just actually let me call this delete me. As we’re going to create it for this space apps project that we have. The reason I’m calling delete me is because we will post this video live. I’m going to click my key right there. So we’ll cover a couple of things with that. So here’s real quick, let me.
So now if I go.
This is the project we’ve been working in.
Again, this is also the posted on our video.
Everything it’s called space apps 2022. One of the things that you will see typically I’m trying to remember the actual I think it’s. It’s works open. Open AI API. Key.
So one thing to remember with keys just as a as a reminder, you want to make sure that if you do have a key.
You can go back and most of us are first control.
Either get up or some other, you know, get lab or something like that. Never, ever, ever check your key and to a repository. If you can go back and read our stories. There are things out there actively scrubbing up projects and accounts looking for keys.
And then they’ll figure out what they have to do.
Or stories selling years old where somebody had checked in their key and it got stolen. And somebody was spending up Amazon ECS containers doing Bitcoin mining faster than this guy could shut them down because it was on. You know, same thing after this. Wow. And then put a stop to it. You know, they figured out what’s going on, which is also why you create a budget every 10. First thing you do is you create a nice great wedding or great limit.
So if something does happen, you’re only out a little bit. Somebody made the same mistake to put it on. Yeah, yeah, but all the other thing is if you delete it from, you know, it’s called source control for reasons that the history of the whole thing. So that history is also archival if you know enough to get to the back of that.
So back to the open AI piece.
So it’s a safety because they will only show this to you once otherwise you have to, you know, generate new one. So if you’re, it is also nice from the management side.
If I am a developer and I’m running a project and I might have a couple of applications that are out there.
So if you have a key and it hasn’t been used in a month, you might want to ask why am I still letting this sit here.
You know, there’s things like that. The, so with that, we can actually drop into a new one.
So if you’re a developer and you’re using a key, you can actually use it.
So with that, we can actually drop into.
This is just the playground.
I was trying to find the actual docs for this.
Next generation.
The one that we’re looking for is called chat completion.
I’m not sure if I’m on the right. Right place. Not chat. One of the things that I found the project that we’re working with.
I think I actually wound up forking it live.
So this is the primary project and one thing get a bletch you do from an open source kind of thing.
The llama CPP Python project going through all of the, all of their examples.
I was trying to figure out, okay, you say I can connect with open AI, using API, really example, I try it.
It doesn’t work. Of course, but not out of the ordinary for open source projects. Sometimes the documentation is the last thing that is updated. If it gets updated.
So what I finally found out is that parts of the documentation were written against the previous version of the API.
There was a ODOT 2, 2.8.1 or some kind of a version.
And then it was late last year when they came out of the version. And they had stuff that worked in the old version that they took out of the new version. And so I’m looking through, like all these examples here.
You know, for the high level API.
And I’ve got, I’m not sure which one in here I was, I was looking at.
That’s not it.
That’s not directly. But it was a mixed bag of which parts would work. These stuff and which parts still work for the old stuff. So I actually found one of the issues that the owner of this project had put out there about moving to an inversion and dropped comments saying, hey, if I wanted to update your documentation using the version, would you, would you like, would you accept the forward questions? Like, yeah, go for it.
You know, so the way to do that is you fork their project into your own space.
You make the change and you submit a request back to them to pull your piece back in there. So actually working through that now. And you’ll find a different, different methods.
Some projects we find that are open source are mostly open source just in name.
Because the groups that built it wants to say they open source it, but that doesn’t mean they accept contributions or request for changes or, you know, things like that.
So for that, typically my line of work in, you know, a cohesion force, one of the things I use to work with is like an open source foundation, like either patching or eclipse or container or, you know, several Linux foundation.
But actually have some kind of a governance and so that you know what you’re going to, you know how to interact. Some of these projects are kind of a one person had an idea that was built out the whole thing. And it’s really interesting how some of that happens, but Mama’s CTV Princess was one guy over a weekend said, Hey, I wonder if I can make this llama to model run on my MacBook.
And did.
And now I will, I will have the other subusing that thing. You know, so pretty groundbreaking, but so that’s that was pretty interesting trying to figure some of that stuff out. It took a while to figure out because the will jump over to the actual code.
And that’s the one that I’ve got.
This is similar to we had this.
We had a script.
I’m just going to script called ask question. And so just a general, a general description of what a rank is retrieval on many generations.
And I say like a year ago, maybe make a little further back to the year ago, you had this big language models and they were trained on basically every piece of that.
I mean, we could be a plus Reddit plus New York Times plus, you know, ever basically everything in the world. And the way they train these models, especially for question answering is you ask the question and it is trying to give you an answer that you would accept is correct.
And so it will lie to me.
It’ll turn the fun term solutionation you can actually find really, really interesting things if you if you just Google, you know, GPT goes this.
So there are actually some interesting challenges out there to try to get models to tell me something is not true.
Because the model developers are trying to figure out how to guard rails for companies. Brave search engines don’t like that. It’s got the AI, you know, they’re fancy AI, which it ups off. Yeah, and you search stuff and it, it tries to give you an educated answer. Right. Yeah. Even there’s also some things that these model developers have put in for safety reasons. In other words, you should be able to ask chat GPT how I make the model. You can try it. So anyway, we’re running the interesting things with that. Oh, the image generation stuff you can enter a person’s name and have it make that person’s image. Bruce Lee, Donald Trump, by the whatever. You see, you could until somebody did it and caused a lot of interesting news coverage. And then they, so it’s, it’s like, it’s like throwing things out there and then people use it in ways you never thought they’d use it. And so at that point, you’re constantly trying to turn off certain things and you wind up with the best. And then you’re right to GPT to and make them. One of the other things that they’ll put in there. This is kind of the thing that we’re looking at next. Some of the guardrails they put in, in other words, I don’t want to let anybody create a bomb or give anybody instructions to do something like that.
We’re actually the project we’re working with now is to take all of NASA’s technical data. And providing a really neat way to search through that kind of form of exploring the search. You might have no exact words, you may have a concept of trying to figure out where is this been looked at before and whatnot. If you think of dangerous things, NASA doesn’t.
You know, so if I have a model that’s trying to not tell me how to make a control explosion that provide a propulsion in a certain direction, you know, it’s not going to even work for what we’re trying to do here. So anyway, it’s interesting things there.
But the whole point of a the rag mechanism, instead of asking the model directly for an answer, you give it a set of context as in, hey, from these 28 research papers.
Use this as your source of information to answer this question.
So instead of making the things up, you can actually tell it. And again, I think we have been in here. So if you don’t know, so you don’t know.
So we’re actually these are the instructions we’re giving this model.
You are a helpful, respectful and honest assistant.
You can do you can change this up to do all kinds of things. Our first our first look at this was, I think it was GPT to I think it was you are a dog, Trump. Tell me about your wall in the formal William Shakespeare and wrote a science. That right.
I mean, it was it was insane. And of course, you have to do doctors, which play flip top gaps, which was the cows on the moon. That was us.
Trade war. So anyway, you can tell it all kinds of things about your assistant, answer, helpfully, while being safe.
Don’t include any harmful, harmful, unethical. And the interesting this, if you were ever run into an ethicist, the question is, how does the model know that that’s not it’s trained on data from all over the place. So, anyway, some of this is kind of interesting.
In legal content. We didn’t tell what country or what’s safe and what is legal, where we are. There’s a lot of things here that is just kind of abstract the moment. So if a question doesn’t make sense, explain why, instead of trying to answer. Which we can play with that some answering the success. There’s, there’s other places where.
If you’re using one of these services like open AI, there’s a concept of a token and taste.
You’re basically all of the stuff, all of the words that you put in there, which are these that you see, along with your question. That is turned into an embedding. And then the number of those that have to go across, it’s charging you per token on that.
So there are some tricks to figure out how many tokens is this, but some of these models are are pretty gross.
And you will want to testing it will be charging you a lot of money just to keep you raw answers and a verbose format.
So that’s where some of these, some of these tips tricks to come along over the years.
So answer is succinctly as possible.
And we’re telling you we’re giving you some documents for Tileless. If the answer comes from a from different documents, mention all of them.
If you don’t know, tell me. And then we give it a context in this case, we’re just given it one block post segment that I just grabbed off internet somewhere talking about a transform model.
And then in this case, we’re asking you a question that we know the answers in here.
And we can play around later and actually ask it something that’s totally different.
I asked it something about the different, what are the differences between apples and oranges or something like that. It gave me somewhat interesting because it does mention apple, apples and iPhones. So that’s kind of fun. So in this case, we were using this local model model.
Let me see if it’s actually run.
Nothing like black in those to see what happens.
So this is actually running this.
See what model running up top.
So I’m running a llama to model with 13 billion parameters, which is about as much as I can do on this laptop context length is 422.
So if I were going to price that out at like open AI, you get to look at your pricing model, it’s probably less than the P. Just I mean, it’s fairly fairly cheap.
Still virtual machine or you’re running a mobile?
I am on a virtual machine on this laptop. So it’s actually done a dive into data on the laptop using Windows subsystem for Linux. So I’ll take a little bit of time while this is running. So the kind of the concept of the rag of the retrieve log minute generation is the first part of it is a semantic search where if I’m looking for certain things, I don’t necessarily have to get the word right.
It’s not looking for the words I tell you in the documents.
It’s looking for the same process in the documents.
So some of the interesting things we did during the initial space x challenge thing.
I think we looked for a neighbor said something like tell me about Zeus.
And there’s nothing massive about Zeus. But NASA loves to name things after planets and planets that can also be named after like Roman and Greek gods. So it knows that Zeus is a Greek guy and it knows other Greek guys and it will actually make the job and figure out Oh, I can find Zeus for here’s another Greek guy and areas or you know, or Mars or you know, I’m pretty much destroying a Greek pathology. I can’t remember which one I’m on the computer. So.
See what’s that.
See how long this is going to take to run. How much time I need to fill.
So anyway, what you wind up doing is you put in the search and it goes across all the documents I have in my, in my database, which in our case there’s like over 10,000. NASA technical documents. We do the semantic search to figure out which ones of these are the closest to the question.
And then we actually show all of those and one thing.
And we hand it to this model.
This whole context area.
We say, okay, you’re talking one years time. And some of this maybe here is page 38 of this document.
Or here is second paragraph or page 38 of this document.
And basically we’re telling the model, use this to get your answer.
And so when it comes up, use your answer, it’ll actually reference provide references back to which documents that were in the context.
And so if you’re, I mean, check should be D doesn’t get most of these chat applications at this point.
They didn’t used to do this, but now we’ll actually cycle references on things.
I’ve got one of these quite good called perplexity.ai that actually provides some citations and stuff and you can click on it and it’ll take you to where the document is.
I can hear the fan.
So I know it’s doing stuff. At the same time, we can actually switch.
Yes, I brought my charging board here. I was running something in a meetup a while back and basically drained my battery mid presentation area. And but we’re running locally. So you see how long it took to actually just get an answer for a fairly easy question.
But this is also not the kind of horsepower you would typically use. But the main point is this is like a four to five year old laptop and I just ran a model that I would not even able to run two years ago.
And so if you’re the cost, the barrier to entry for a lot of this kind of work is extremely low. But anyway, based on the information provided in log post page one, we passed the rules to a transformer model specifically we use the load and takes by client to wrap the transformer model and pass the rules.
So it was able to take to this product.
For this context, and actually give me an action.
That’s basically some of the stuff we covered last week. What I wanted to do this time is actually there’s a library provided by open AI. That it’s slightly different in.
Okay, so we’re moving.
Okay, here we go.
So you give it a system prompt, which is basically close to what we had in this case.
If you look up here, we actually had to put some special characters and tokens and themes in this prompt.
And this particular model is smart enough to know, okay, this is an instruction. And then this piece is my system prompt.
And then that’s done.
And then this is context.
And so you have to form it exactly right for this thing to work with this open AI API, which I have a hard time saying because it gets in the API things in it.
And build my system prompt here.
And then I can actually basically set up this structure called messages.
And where I can say, roll number one is a system.
And here’s my context.
Roll number two is a user.
And here is what that context is.
And then as if you want to build more of a chat applications, what happens is if I if you generate a question once I get the answer back, and you’re asking that another question.
I will actually take the answer for question one and tack it in here as another system call.
And then I’ll ask you a second question as the neck.
I’ll add another piece on here for your next question. And that gives it a full history of it’s like remembering what we talked about in the conversation. So unlike my own chat about 15 second buffered, where I can repeat anything that’s been in my head. So if you know that whole, you know, this is the thing I say I can compete in like 15 seconds.
You go past that and I’m lost. But that’s kind of how this piece works. One of the differences and question I’ll show you this part of this open a I T. Right here.
This is actually going to point back to what we had dropped in this file. So I drop in here and then the reason I’m putting it here is this get ignore file. If you’re using get for source control, you can make a big list of these files never change.
You know, we do this a lot for, let’s say data files that I’m going to produce.
I don’t want to check them in.
I’m just going to throw it away. I’ll produce again. Let it might be a large file.
Or in this case, I have sensitive information.
Do not say this.
So we create those in a dot in the file.
And now I run it the first time and it will fail because I haven’t.
I haven’t sourced that particular file, but you want to pass in your. What? Wait a minute.
I’m in the wrong. That’s my local.
Okay, there we go.
So I passed my key in here and then I also tell it you remember the projects we were going through earlier.
I can tell it which project this key goes with.
And then I’m actually using their API, your chat completions and saying use this model, which again, that was one we had to go select that model and be able to use.
And I’m saying pass it to the set of messages.
And then I’m printing the content. They also have a streaming option, which is me because I’ve had to build this by hand before and by phone is not necessarily easy.
Because what this does is as the model is building the text, it’s passing it back to me. So there’s a lot of network connections of doing it’s anyway. They make it a method.
You just call it. He says, oh, so there’s that.
Let me let me run this and it failed at first because it doesn’t find the key that I can actually tell it where my key is and try to run it again. And you see how fast it was. It’s already done. Anyway, rules are passed to the transfer model using head and face and then my problem is the streaming options too fast to the regular parts too fast to really tell the difference between the streaming and the not string. So this one, let’s do something like.
We’ll just change this to what the difference is. It shouldn’t answer this question. You’re going to push the magic. There you go. See, don’t have an answer that based on information provided in the doctors, which is pretty much what we wanted. The streaming option gave us a little bit different answer. And then one of the things you can do with these with these models, there is some randomness built in.
You can actually have it.
You can get the same answer if you pass it. What’s called a random C or use this if you’re seeing everything based on that.
A lot of times you can ask for the image at GBD.
So far as asking the same thing a couple of times to shit.
So the point of this presentation actually is that using the exact same code.
I can actually I’m importing the same open AI API.
Give it the same system prompt.
Same questions. The difference is the API key no longer matters.
And I’m setting my base URL to be this actual machine.
So let me flip back over here and I’m going to actually launch this line of model locally.
So this is the same model we were working with before, but now it’s just running as a service by itself.
And I can actually, this will work.
Let me show you the docs.
So it will actually show me that which which hooks I can connect to and you can actually play around and test it out without any kind of code at all.
You know, you can actually look at this and do some stuff.
So this is the service I’m going to hit.
So at this point, instead of reaching out across the internet, I’m reaching to this machine on this at this particular port.
I’m going to tell you to use this model, which is the one I just tried.
There we go.
I might be taxing what this laptop can do and passing the same. So this is exactly the same code that we were talking to open AI with the exception of now importing at my base URL to this machine.
So we can run it.
It’s going to take a hot minute.
Let me actually bring up task manager and you can see this thing. Let’s see it go.
So I don’t currently have it using. I’ve got it using a small bit of one of the GPU that’s on here.
One of the things you can do with a lot of CPV if you don’t have a GPU, but all on a CPU.
If you do happen to have a GPU, you can tell it how many layers of that model you want to run on your team in university for CPV.
And it will run as much as you can get it.
Do you have that running in a browser?
It’s also a local host. Local host. No, local host is only, where do they go?
The service that we spun up is hooked at local host.
So it actually provides this particular view.
And this is normal for some of the apps that you, or some of the services you make connect with that address interface.
A lot of them will provide some kind of a documentation in this manner where you can actually go test some things out, you know, do some fun stuff.
I got into the back end so much on my new job. It was a long before I ever actually saw a front-end address.
That was actually quicker than the last one. Okay, this was a 7-0 program in the model and said that a 13 will be tried.
So it went a little quicker.
So here we try again, but so this basically showing that using the same API, I can be local or I can switch and run against OpenAI itself.
And I have seen there are, you will get into some issues occasionally.
And this called me, this first completion, this piece where it says completions, the previous version of OpenAI is API had a completion piece that you could use.
That is how it deprecated.
So it gets confusing sometimes trying to connect this stuff up.
But so far the embedding works, I’ve been able to use that.
The chat completion part works. And then the models of the piece, which is pretty easy at this point to use your list of models that are available. In my case, it returns one model because it’s still in one split up with.
There are some extras in here that I haven’t played around with on the API.
That would be not part of OpenAI’s API, but they have the rest of their faces to do.
Similar to what we had done back on the ask question part.
There is a way to actually figure out what my input and link is, because mostly models have a specific context.
They have a limit to how much you can give them.
So the general approach is to get as much context as you can. So I’m going to go to a query and say give me the top 100 documents that match.
And I’m going to start having one by one to get close to the context link. And then I’m going to stop and ship it.
But that gives you that piece. So that’s what we’ve got for tonight.
Any questions so far?
Then we shut it down and have any other discussions.
That was great. Really interesting.
Ben, Josh or Josh?
Nothing here.
I think I’ve got a session plan coming up. The next two weeks, it will be the Wednesday after the order. We’re going to go back and do a social but so vile.
The first time I did that was when we were just opened up. This is going to go over well. A lot of folks, introverts, I think we had 25 people. So I think it was great. I’m definitely not a party planner.
But we’ll do that again.
And just kind of hang out a little bit. Kind of be degree, get no general better. After exact June the 12th, we’ve got Andrew, come to talk about law and AI.
This one I was talking to Tyler about, and maybe I’ll probably provide that to some of the other. We do a lot of social focus as well. They also deal in data sets and things like that, and how do you license a data set, what happens if you use what information can you use or can’t you use or what do you get into that kind of thing. I’ve scheduled most of them to cover prompts and what are the different kinds of prompts you would use and RAG. I’ve seen things get as fancy as use one model to generate the prompts that I give to another model or back to itself.
You can actually ask chatGPT what’s the best way to use chatGPT, you know, it gets it’ll make your judgment.
So that’s coming up and so we’ll be able to look out for that.
And I guess that’s it.
Let me go stop recording thing first. That always takes a little bit of time. Figure out how to do it. It’s usually under the other button.