RAG Prompt Engineering

RAG Prompt Engineering

Transcription provided by Huntsville AI Transcribe

Hey, we’re recording.

Here.

All right. So if we do drop, I will try to, I’ll try to notice. And then stop and get back on it. So what we’re talking about tonight is prompt engineering for RAG, which is retrieval augment generation. We’ve been a foster minute.

We’ve been going through a series over the last several months based on a big trove of NASA documentation that we actually done a, we did a, what’s it called hackathon a couple of years ago for SpaceX. And we put this in and we did some cool stuff. We came in second, which I’m not mad about. I think the function one were pretty cool.

They did a neat little actual physical game that you play to teach kids about how things happen. Yeah. Things like it was, it was really, anyway, in the peak time, we also built our own backup database. We didn’t know that would be a thing.

It wasn’t, it’s really just a big memory thing doing a cosine similarity search. And so we were doing a semantic search, which we’re sharing with you at the time. But that was before really chat GPT was a thing that was late 2022. And chat GPT was right around that. You know, it was out there, but people had really figured out how big of a thing it was going to be. So now we’re kind of looking back at what that, what we submitted there.

And that’s in the link on the invite, if you get the email. We since this new rag approach has come out, we figured that would be in much better way, just doing semantic search and gets the whole like $10,000 of technical documents going back to the 50s. So the concept that we actually put in the challenge was more of NASA’s great at exploring space. Let’s take that same thing and explore documentation or explore the knowledge that they have. So thinking about it in ways that how do you, if I’m looking for something, but I’m not sure exactly what, you know, how do I get enough information to take the next step and the next step, the next step, of course, the whole chat approach, you’re really, really helpful there. And then with rag, the ability to have it grounded in the context that you get it instead of hallucinating, whatever answer, things you want to hear. You can imagine how much space travel, documentation, you know, all of these LLMs were trained off of, you could probably ask it things that Aston Martin wrote, you know, oh, no, it may not be factual. But you can imagine how interesting that would be.

But so we’re looking at using that as a way we’ve been going through, I think this is session like number eight in the series, just kind of using that as a basis for stepping through, okay, if you wanted to build your own rag and be able, in my case, I have a need to host something like this on an internal network, not connected to the internet ever.

And that is, you know, so if you wanted, of course, GPT, usually a lot of ways to do this, open AI provides, API’s and GVUs, and I would actually use those like crazy if I’m working in a thing that can be commercial, you know, and I’m not worried about data privacy and things.

But some of the things that I have to worry about from a classification perspective, you wind up with other people having to worry about from a data protection perspective, you’ve got some companies that they don’t want their data to be able to be with anybody else. And if you’re if you’re small business working, this should probably run into that. Working through talking with Andrew, who came and talked to us a while back about law and AI and all that stuff in racks, they have similar things where if they can’t, you know, if they’re working with a client, the client doesn’t want their protected data, I mean, especially if you’re involved in a lawsuit, and you know that people are actively trying to get to your data. So in the same kind of manner, you may need to find a way to host something like this on the internal system. So anyway, so that’s kind of what drove us to learn how to build one of these ourselves. Yeah. So what are the, you know, talk about hosting something like this on your own internal system, kind of requirements you talking about in order to be able to run like a llama 70d? A llama 70d? That’s a sad overkill.

That’s probably overkill for a lot of what you what you’re doing. What I’m going to show towards the end of this assuming I’ve still got it working is actually a llama 70d. Got to bring on this like a llama.

We’re going to weigh it out on this showing that you can do a lot of things. It’s similar to a talk I gave last year, it’s S&D supported. We were talking about how to take big models and fine tune them and then use the fine tuned version or smaller models and fine tune those and getting the same amount of goodness out of that because a lot of times the big models have to know all the things, right? All the time. And the, I think the expression that you just was, I have to know everything from particle physics to the name of Shakespeare’s Shad, you know, and your client or your questions probably don’t range quite that far.

So you could find some new one and run it that way, but then the whole rag approach came out where now I can actually take a dataset and then do a query on the dataset to give you a smaller context and I can actually base my answers on the context and get it.

So at that point, as long as your model is good enough to be able to do some amount of like instruction following or some amount of it’s got to know enough language to know what it is you’re asking me because you’re actually, and we’ll get into a little bit here, you’re actually asking it to do things that if you’re a programmer might feel a little familiar, which is a little interesting as well.

But Josh, do you know what it would take to run 70B?

Yeah, I would do 96 gigabytes because that would give you 70B. You quantize it to 5kS, which would be fast enough to not be painful. You can get that with a Mac. It’s a really good way of getting that.

You can do like a Mac Pro, not that.

You can do it with someone like this, but they run. Where you don’t have to have like all the GPUs running and stuff like that. So footprint is much smaller than if you were to try to run like an open AI.

One of the other things we went through a couple of ways ago is that some of the libraries that we’re using, one of them in particular, which is called Mama CPT Python, it provides the same interface that Open AI provides to use their chat completion model.

So if I had something and I can deploy it publicly, I just use my key and go to chat to CPT and or open AI, I’ll just do that. If I have to deploy it internally and I have to spend a final version of the server, I can pick whether I run a Lava to 70B or 13B, or whatever it is, I want to load it up and point my client at this point over to that server instead.

And my code on the query side is the same.

I’m using the same API, getting the same results, the same structures, all that kind of stuff.

So it makes it a whole lot easier to just kind of try a bunch of different things. As long as the context window is problematic for RAG at all, is it fine instead of small?

There’s almost no case where you actually need more of the big K or for RAG unless, because you should be solving on chunking side.

Because the thing about it, RAG is taking these little tiny chunks and trying to find something semantically similar through the bedding.

Your thing is this long, there’s no meaningful event except kind of that. It’s always better to get it down. Now summarization tasks, you know, be able to paste that stuff, yeah, and then some of them are even on my command arm or something like that. That’s the big big problem. So, all right.

So, part of this is from the intro or from the email I set out about, you know, sometimes you get the tracking these models to do what you want to do, and it can get a little interesting sometimes.

If you are trying to get a model to do the same thing, everybody else’s working, it’s usually not that bad because you can just, you can go to chat tpt and ask you how do I prompt you to move this?

That’s actually where some of the prompts I have been here today from.

There’s really not a lot of, I would say solid guidance in some cases.

I mean, I actually had to look pretty hard to find that there’s open AI cookbooks that I’ve got linked in.

There’s actually a prompt engineer got an AI website, but also has a GitHub repo that has examples for a bunch of different things, either from, you know, changing to RAG, to a bunch of other stuff.

That one’s pretty, we’ll show that one of those in a minute. I’ve been having to go through all over the place trying to find where, you know, if I’m just trying to do a RAG prompt, what is the base prompt?

And a lot of those we’ve gotten into some of, we’ve got several of them shown below that we’ll cover.

The other thing I wanted to mention, this is where the midi AI years ago, do you remember when we put in an instruct model and use that to do instruction?

And we had a chat model and all this kind of stuff.

It seems like we dropped that a pretty good bit and most folks are just using a prompt to go into the big model because it does all the things they need to do.

I think it’s the models that gotten complex enough to where they can actually follow just basic English instruction well enough.

Because in the instructor model, you had to put yourself in certain tags, you had to do this kind of thing, and then that’s how it knew.

They were basically fine to use either human feedback or something else to actually follow what you’re telling it a little more closely. We don’t necessarily need as much that anymore. But we’ll get there in a second.

One other thing, I think I’ve already got this up somewhere. There it is. This is one I wanted to walk through real quick.

This one actually uses one of the instruct models.

So you’ll see their context that I’ve got an INST for instruct.

I’ve got a close tag for closing it out. You know, some things like this.

We don’t normally have to be this anymore.

But unless I’ve even done this with taking this out and still giving it to the instruct models, and it’s been able to do it okay.

This is actually from the prompt engineering guide that I linked in there.

There’s several interesting things in here that are pretty good starting points.

For if you’re trying to figure something out and never done it before, it might be a good place to look and see.

I haven’t tried to, that’s the thing I need to do next, is load this repo up and see how many of these notebooks are actually still left. That’s the other thing where AI year gets you. It’s like, hey, this is the notebook that I worked on, worked a week or two ago, but somebody’s gone in last five, the model, where their interface, whatever, my stuff doesn’t work.

How much does that divide your RAM workflow?

Like you have it working and then a month or a quarter later, it’s like, hey, the results are kind of garbage you get.

Does that happen? Some of that, yeah, you, it can happen. That’s why a lot of times we try to either lock in with a couple of different things like opening AI isn’t going to change that often.

The model underneath it.

That’s what I was concerned about.

Like does the model underneath it changing greatly impact your results?

Like if you spend time working on the model and then they go, here’s our new greatest latest and like, I’m not getting results I wanted anymore.

Is that a concern?

It depends on, for the rack, it depends on what your chain is, what your prompt is.

So if you’re doing multi-chain rack, where you’re generating some like intermediate steps, that would affect it a lot.

But I would highly doubt there would be a case where the rack would get worse from a new model because so would meet my instruction ticket.

It’s not calling.

One of the other things I’ve done is, and it’s a little easier to use in the LAML approach with the library instead of you’ll chat GPT or open AI.

I get to use GPT-40 or GPT-4 or GPT-345.

That’s it.

You know, one of the things I’ve been playing around with is taking the same props and using it against NISTRA or using it against the LAML model, using it when I see the results I get are fairly consistent between the model.

If I run into where I’m using a specific prompt or something, it just goes nuts.

Good one model.

I may have something weird that I need to look at.

Is it, I might be on the edge of performance on some model somewhere, which is not where I want to look.

In this case, you know, some places it’s great to be cutting edge.

In the terms of AI, years cutting edge is about three days, you know.

So it’s not an easy place to stay.

This one is basically loading up. Again, I’m going to blow through a lot of this without explaining much of the thing because the main piece I want to get is at the end where they actually talk about the prompt. This is going through just a basic completion with like, hello, my name is, and then it’s basically bringing in, spinning some words out for I am your, you know, blank blank. I’m writing to, you know, I believe that, you know, something like that. The same title thing.

Can you tell me two jokes?

Here, two jokes for you.

You know, this is just some of the basic stuff to play around with, you know, just for fun.

And then here’s some other actual where it gets into, here’s a, here’s some data I needed in this format, which Robert, I think you sent me something that was a little similar to this from an automata.

I didn’t want to share that directly because I didn’t know if any of that was a proprietary or super secret sauce or anything. But some of these instruction laws, if you have a, back in the day, we’d have some kind of a form template with just some stushy tokens and stuff like, if you got an email from MailChimp, that’s what it’s doing. Hello, name, first name, last name. Welcome to, you know, the interesting, so models from that perspective actually taking a little like a step ahead where it actually makes it, it’s actually a lot better flow from a, you know, from a language perspective.

So that’s one way, it’s not necessarily a rack, but it is basically instructing and LLM to do what you want it to do.

So in this case, what they’re trying to do is generate short paper titles based on some types of data that they’ve loaded in.

So this is not necessarily using the semantic search part that we can normally do for a rack.

This is basically, I have a data set and based on the data set, I want to generate something else. So in this case, it’s pulling in, you know, some kind of documents with title, a description, you know, and an abstract, things like that.

And it’s asking the model to actually, actually, this is just spinning out a bunch of stuff.

Well, actually, maybe doing embeddings. No, it is.

Okay. And then doing some kind of a search to get things back. A little confused, but let’s see.

So it’s making a collection.

So it’s putting all that into a vector store somewhere.

It looks like in memory.

It looks like in memory.

Oh, I know what it’s doing. Okay. So it’s taking a bunch of existing papers with their titles and abstracts and put it into the vector store.

And then as far as the user goes, I had a time for my document, but I want to know some alternatives.

So based on all of the documents that you’ve seen before, suggest some alternatives to these based on the one I gave you.

So you were talking about embeddings up a little further.

All you’re talking about there is every one of those documents with its abstract is essentially becoming an embedding.

Yes, though.

I’d say probably in this case, yes.

Sometimes people put things all together in one embedding.

Sometimes people drop everything. Or are they doing that? Oh, it’s replacing with no title. And then meta.

It looks like it’s just becoming the title. And then it’s similar to this is using chromaDB. It’s actually meta data.

Right.

To the to the effect.

So that after you get the query back, you can ask you get a little more data.

I’ve seen some things where occasionally they will just everything into the embedding as well as the easy approach through the days of their somewhere.

So this case, it’s trying.

Here’s the title it’s trying to get and uses a prompt and say, okay, I want to get some other titles that you may recommend to me.

And so this is one of the interesting parts that you’ll see in a minute.

The instruction is your main task is to generate five suggested titles based for the paper title.

And then you should mimic a similar style and length.

But do not include them for short titles.

In the suggested titles, only generate versions of the paper.

I’m guessing that was supposed to say title. That’s going to be fun. The other thing is that a lot of times the larger the LLN you’re working with the more forgiving it is when you miss spell crack. It’s far enough to know that hey, that’s probably what they missed. It’ll actually do the same. And then they take the user query and they shove it in here for paper title and then they take short titles and they give it a list of the titles that came back from the query. And then they spit out the output. And so that gives you your actual, you know, the suggestions came out as far as what five things you should name it. And this is basically what the actual template looked like when it was done.

And some of the interesting things I wanted to point out before we get back in this particular prompt is kind of the concept of kind of the word or substitution thing.

So in my actual system prompt, which is what I would pronounce as part of it, about where you’re telling the model of what you want it to do, you can actually put specific keywords or things like that. Then in the question, if they show up in particular place, some of these things you may not, a lot of times we think of it as just a chat-solidary person is adding information.

There’s a lot of places where RAG is used as an intermediate stuff where the pieces feed in are coming from some other system.

So if you do need, if you did need to insert certain characters in to specify the part of the question or the theme, the problem that we’re going to have the user, you could actually reference those in your system prompt to replace or do options on that.

So this is actually saying, hey, you’re going to see a list of things in the short titles.

You’re going to see a thing that’s called a paper title.

We’ll use it this way. So the substitution part is pretty interesting.

The other fun thing that you can see, and some of, you know, you can kind of deduce from some of this, you can actually, you have it do some kind of control logic.

You can say for each suggestion and your answer, apply this thing to the end.

So not only am I telling this thing to do something and then replace words, I’m telling it for each thing in the list, do it this way.

You can say, if this has that, then I mean, I’m basically taking control structure that used to be in a program that we would have to write.

I’m using just basic English to explain what LLM to do the thing that I, which is just was my life, to some extent. We’re almost to the point where anybody had to program. It was.

Until it just doesn’t do one of the steps. Yeah. I said, oh, I think the pilot was a champion.

Did that to me, Claude seemed to work better, but I was trying to do something. Jeff GPT just give me the same. I’m like, I screwed you. I’m going over here. And Claude figured it out. I think Claude’s a lot better instruction by the new version.

It was I think it’s the same thing.

I think it gets weird sometimes you can, you can back paint or something that you find works really, really well with this model. And as long as you’re on that one, great.

And then you play with it.

Like you said, it’s finally a little, my tools, I guess you could go. I think for me, it was more that I had been going back and forth in the same chat.

And I think it just started to get lost. And three, four, oh, yeah.

Yeah.

Four, oh, it’s designed.

They like made it very, very fast because they’re doing it with voice or conversation.

So it has no ability to like attend, you know, and like change it’s fine.

Hey, you said, hey, the wrong is like, yeah, you’re right. It’s wrong. There’s the wrong answer.

That’s right, man.

And that’s exactly what happened. So how I started the unique chat probably would have been fine. Yeah, yeah, we’re good for four doesn’t have that problem. But it’s almost like, how long is this one?

I’m drinking. Which one started early? So they find my other piece that I had.

Okay.

So we talked about some of those, Trollogic.

So I’ve been playing around with different pieces or different models mostly with jet GPT, you know, open AI, you know, trying, trying to take some basic rag kind of crops and dropping it in and do a one shot. Okay, what are you gonna do? Okay, that’ll be changed the instruction around a little bit, run it again. What are you gonna do? In general, I mean, say, I think saying mostly the same results from that, I really haven’t seen a lot of impact.

However, where I did try to run one of these on a lot of seven feet, I ran into something where the prompt I’ve got, actually, yeah, let’s I’ll skip that and come back to it.

No, wait, here we go.

Yeah, this was actually, we’ll get there. And I really didn’t realize this was going to print on this way. So in coming, trying to come up with my basic rag and prompt, I come across a lot of different descriptions, I do this and what we did, and all. And we talked about it in a couple of times, couple of things here, because basically, it is a, you’re helpful, respectful, and honest to see, you know, always answer as helpfully as possible, which that probably just doubles up the helpful, I probably don’t need that statement. Answer should not include any, any of these things, which again, I think we talked about that at one point. Ethics depends on culture a lot of times. And things interesting. So anyway, so some of that comes back to some alignment stuff as far as where does this model right, ask them all to be helpful, what does it think I mean?

Anyway, this but that was copied from somewhere else.

And also illegal kind of what country? You know, that is one socially unbiased and positive image. Well, yeah, what anyway, I probably want to drop it, but this was one of the ones that I did get from, I don’t know if it was checked, you can see or another bottle in the studio as a good prop to use. This is something normally you see if a question doesn’t make sense, or is it factually coherent, explain why instead of answering something that’s not correct. If you don’t know the answer, don’t share false information, of course, that’s, unless you probably say more along the lines of if you don’t know the answer, reply with I don’t know the answer. Some of these things being more precise, it’s better. I see Josh gringing some.

Yeah, this is coming from something it was obviously trying to generate another proper thing following. I have seen a lot of times you actually ask it to be succinct, because if you’re going across the open AI, you’re paying for token, and I don’t need more at least answer something that could be a paragraph when I pay, not per word, but you know, so this is telling you you’re provided several documents with titles, and if the answer comes from different documents mentioned that hey, I found this at least two places rather than just picking the one. Let’s see if you can’t answer your state that you do not have an answer. One of the interesting things that came across when we were playing around with NASA data and using this data, NASA does things that are basically controlled explosions in a particular direction to get us off the earth. That might be harmful. That could be considered something some of these models have had boundaries to not answer those kinds of questions.

So in that case, I might want to change my route.

There’s all kinds of things there. Here’s another one I found.

This one, I think, came from Perplexed AI when I was asking me how to do these things.

Which again, you can see if you haven’t played with Perplexed AI, it is now one of my top things on toolbar. You know, pro tip, it’s free, ask it stuff. It keeps track of what they call a library and all the things I’ve asked for. There’s also a phone, I’ve moved an Android app a lot.

This pretty much replaced Google for me in most cases. Me too.

So, so for locations.

Anyway, I was trying to say, how great is free?

What are they trying to make in their money? Zero subscription? Oh, good.

And there’s that, you know, you start typing something, you can turn pro on. And it’s good. It is.

Oh, apparently I can try it for free.

Yeah, you can try free ones a day. Okay. Especially if you’re doing research, if you go into focus mode, you can tell it, okay, I just want to see academic, just search academic papers. Because for this particular topic. Yeah, like get rid of sources. And it does all the expense. Yeah, it’s a very solid problem.

So, here’s one from the library that I, oh, can’t view this.

So, why?

It’s one of the ones I’m so proud to see here.

No, hold on.

Is it lost?

Let’s see.

Is there a limit to what kind of language you can use?

I know some of your words you’re using there. I wouldn’t expect a computer to be succinctly. Yeah. Oh, yeah.

If it’s a word, he knows about it.

Pretty much. Yeah. And he can probably use it in a sentence. And he could probably spell it wrong. I don’t know. I don’t know. I’ve heard it that many times.

I don’t bother to crack into those.

I’m talking 10 GBT. Yeah, close enough.

So, one of the things I like about it, I ask it to hate it in a specific way. I mean, this is where I get along.

I don’t, as a AII, I don’t just try to get a bunch of stuff by myself like, so we need some of these tools to help me. I don’t write the newsletter thing from an AI because it doesn’t sound like it.

You can get it to sound like it. You can all use it. Say, make it sound like this guy. Oh, yeah.

If it was feet, you’re all newsletters and these are rag technique. We’re ahead there. If you’ve ever asked to write in somebody’s style yet, I have a friend that’s just like, well, I’ll be published and I asked him to write his style and sound like it. I was like, I didn’t even know you could be able to find this person to make it. No, one of the things, I mean, we talked about a lot. Yeah, I’m going there. I can’t remember.

I think it was Lama CPP we’re playing with initially. I basically gave it props of, okay, you are President Trump. Tell me about your wall. You’re in the form of William Shakespeare. I’m writing the question of, so how long is the preparation of the wall going? And it wrote a silent.

I can’t think of the name of the form of the silent, but it did.

And then we said, okay, how do we do the form of Dr. Seuss? And that pointed into which the Ocaxon went through, you know, the wall, it’s great it is.

But it was nearly, yeah, we all looked awesome. It was pretty fun. Yeah, as far as words, imagine these models, especially the big ones that have been trained on just about every piece of great material that they can get to it. Well, kind of what I would either feel, I just got to remember the coding and appreciate it, but it’s almost like you meet a stranger.

So you keep asking the stranger that he looks at, so that she can understand who he is, what he does, what his person is.

Those are the kind of quarters you’re kind of asking this thing all the time, right? In a way.

I’m asking you things in this case, based on imagine you had a friend that made me a stranger that had read the Encyclopedia Britannica and could only remember it and had read every publication from the New York Times since the beginning of the New York Times and had the memory.

And read Wikipedia, all of it, and held that in memory. And read the transcript of every movie produced and had that in memory. Every song, every memory, that’s the amount of material that I think it’s mind-blowing. It’s so good. So it will tell you things. The other fun thing we got into with NASA, going back from the NASA documentation, it goes back to the 50s. There were things we did in 1950, 1960 that we do not do. The model doesn’t know that. In order to get some of that piece in, you actually have, why would it say it may, if I add some of the years or some of the time steps to some of these things, it may be able to figure those out.

I could probably in my prompts, tell it, put, place more emphasis on more recent material in your answers.

So if I have, if I ask you the question, it finds the answer in the document from 1998 and the document from 1967, it could probably have it, prefer to pull the material from the more recent document. I’ve tried it, but that’s the kind of thing you can do. I heard the Explorer 1 team, an Explorer 1 quick talking team, went back to the old manuals. They had language and codes they no longer use. They had to re-educate themselves how to talk. They had to explore one, they figured it out and they went through some public analysis and found that what was broken and then they told us to do something else and then move over to another server within the system. If they had your system, they could have done it on the floor. I was going to say, if they had all the source code, all this kind of stuff, you can say, here’s the source code for this other language.

Here’s the source code for language number one.

I have this command language number one, can you transpose those to language number two for me?

I don’t know if it would compile a way off the path, but it’ll get you close.

The other thing I like about this, you can see where I’ve been kind of going back and forth. Can you give me an introductory paragraph?

I’ll actually use this in an evolving landscape, lay high in actual language processing. You can pick an generated text out of the blue sometimes.

If you’re really having this stuff, it almost sounds like a movie interest or a movie thing. I hope this is a little English. I hope it’s English. I hope so. I hope so. I hope it’s custom to put the writing into, to strip all that stuff out. Right. Yeah. And you just hit the point. We’ll get to a minute. So yeah, hang on a little bit. But at the end of the thing, what I get at the end is this is similar to what I was aiming for with the NASA thing. I asked the question and then based on the question I asked and the answer I gave, it gave me, here are some other places I may want to look into. I’m not sure how much of this, if this were like the Google approach, you would just be doing a huge bunch of friends also asked.

In this case, it’s probably actually looking at some kind of a knowledge graph behind the scenes and saying, if I can follow this in particular directions, these are the kinds of things you might want to ask.

I knew the other way.

So the Perplexity CEO has actually done a really long lecture treatment interview.

So three hours long, but he goes into all the components of Perplexity, all the stuff like that.

He’s a lot of stuff out there. He just kind of said, this is how this works. And so it is absolutely a knowledge graph.

That’s what they started out with.

But they actually do this thing on the front end too.

So that’s how they get their search results.

So they basically split it out.

They say, ask the whole bunch of questions about this query.

That’s intermediate.

Or they say, they use that to go out and search all the different sources. That’s why they get so many. They pull it back and they just pop that on the end. So they want you to keep asking questions. Oh yeah, I’ve spent hours in Perplexity AI just thinking, oh, I didn’t know that. And then I didn’t know that. And it’s, where it was good.

Well, my normal approach these days is put in my question or whatever I’m looking for, scroll past all the slots or things and then scroll past all the YouTube videos, links that I go watch, and then basically go directly to page two, which starts the real invention at that point. And it has the time to what I’m looking for on page five or six. Because a lot of times it’s something that’s a little further down the weeds.

And is it something that is normally searched? If I’m looking for something that’s just pop culture or common knowledge, or something, yeah, it’s on page one.

A lot of times the beauty in there are generated things. Which is somebody that also has a website based on AI and stuff that bugs the cracker because it’s no longer sending people to my site and sending stuff from my site and flipping it into its web page. So you have no idea it came from me. So forget about it. You have the SEO or the pages that bank money from getting views and things like that. A lot of that is themes of the fix.

Five AI years ago, which we could actually get into another discussion at some point about SEO in the age of new AI approaches.

Is there even a way to do that?

Do an AI write it as a question and answer?

CSI.

Okay.

To search on the section.

You’re trying to get your answer, your truth into the answers that it generates basically.

When you start to figure out what the answer or what the question would be in your site, any answer. Right.

So in college we’re trying to put that out into programs. So try to explain what this is, what someone’s going to be doing. So what is aerospace engineering? Aerospace engineering. This is what we’re going to do. We’re trying to do that with Google because we’re all having a solution where it can be for the job. Right. Then they can click through.

Okay.

One of the things I found in that the prompt they gave me is an answer question using only information provided under the context of site resources using bracketed numbers at the end of relevant sentences like this.

If it’s not in the context state that the information is not provided and provide a concise factual without additional commentary. Much more succinct prompt than what I had above. It’s really good.

Again, do not phrase or use phrases like according to the context or in the evolution of data. If you see things that typically are popping out, you could drop it in. I’ve also seen people that say that negative prompts or negative things don’t quite have the same effects.

It’s hostile things when you’re talking about elements.

I don’t really like. Because a lot of the times all it’s doing is predicting what’s like really likely to be. Do not. You know what I mean?

Things like that.

You could actually take that thing and say you could say use, I don’t know, you could tell it maybe what grade level to answer with depending on what you try to know. If you could take the negatives about don’t do this, try to find a positive version of how it’s just like that. Perhaps I’m not sure making a lot of sense there, but so I think I’ve run into too much. I mean, the prior models, yeah, but it’s for a lot of internet. All right, we say, hey, don’t do this or never do this.

And then you get it specific or you highlight something that’s important to you. And that seems to follow up with the whole model.

Okay, so I might have been last, but it doesn’t always.

I’m not saying nine times out of 10, it follows the instructions.

But the other thing that just killed me on that took me a minute. I was writing this local using a lot of stuff to be I put this propped in, I put my contacts in, I put my question in. And a lot of times I would get the answer I was looking for and sometimes they would just forget to put the references out of the sentence. If you notice, it says in the context above, I was putting my contacts below a system rocket.

And somewhere it was after I switched those two things, it didn’t happen to you.

And it was just kind of interesting how much to tell you the truth. The above isn’t even necessary. You can just tell the context, it’ll know to find context somewhere. So it is just one of those little things that you may wind up. So the it’s interesting to make your practice precise as you can, but then also realize that sometimes that preciseness may get somewhere that you didn’t really attend. The other thing I’ve seen is that here’s where one of the needle needle and haystack. I don’t know if that’s paper, about context and where it finds things or if that’s… It’s a metric.

Okay. Yeah.

Okay.

So one of the other pieces I linked in here, I actually, I’ll show you where I got it from.

I think if I can find it.

I’m sure I’m in search. Oh, yeah, this is where it was. There is a crazy good post just on the community for open AI asking about, hey, I’ve been trying to figure out how to write a prompt for right. And does these things matter?

And going through, you know, hey, I normally put my instructions at the bottom because I found that to work best.

And then some people will say, well, no, I thought it was at the top because there’s a person on the home and read the instructions and then tell me what to do with the thing I’m going to read next.

And then this guy posted about, you know, well, the other thing is he said, let you say to the bottom of the bag.

So actually, got into this whole part about recency. And then the attention problem was always in the top 50% of the prompt where the bottom 50% is the most reliable.

And he was actually also looking across how much of the tokens got in green here is good, basically.

So small context links that you’re using as far as where it is.

And this doesn’t really matter as much, according to this graph thing, which I’m assuming is coming from a paper somewhere.

It’s when you have a long context.

And the thing you’re wanting it to really pay attention to is at the top of that long context. It appears that it works.

It gets things right more often or when it’s towards the bottom of the context.

The other tip he has is it’s a big context. Don’t use the whole big context, because you see where all of the problems are fairly high up as far as the context length in these. And this is with a GPT-4 with 128k context.

And it’s different every single moment.

Every single moment has a different ability to do that. So he’s like, clause way better. It can go almost the same way as you’re expected.

Yeah, expected. I don’t know that I will ever be passed like way over here to the left as far as context-free thing I’m writing.

But if you were trying to do something where you have masses of amounts of information in the context, maybe that you’re getting, maybe I want to take the entire paper as massive.

And instead of a paragraph or whatever, this is where I had to push the light switch.

Here we go.

So that gets, anyway, that’s one of the interesting things I found.

One place where I ran into that was I was trying to do some take, for instance, a PWS, an old PWS and an EPWS. And I wanted to do a comparison to see what were the differences. And so that’s where they were having a lot of problems. And I had to keep going back and say, okay, now, and you know, they rednumbered it. So now don’t compare numbers to numbers. Look for semantic and it just, it was hard. So I had to break it into smaller. And the problem was, you know, the paragraph that was in the top of this might have been in the bottom of this, the second one. So it was just hard to get it to do that with too big PDS.

Yeah. Sometimes you’ll see where one model is used to do the first part of that work, to break something down into smaller pieces or whatever.

And then the output of that is then fed into another stage where you may have another model that one thing we haven’t gotten into yet that I’m really interested in is actually using the model to build the props for another model. Automatically, I don’t even want to write the props. I want to get it some basic instructions and have it know. But some of these things are so good about this point where the basic instructions is about all the model needs.

I’m not quite sure where some of that’s going.

Hey, Jim, if I could don’t have a whole lot more on that, we can run some of these and see what they do and play around with it.

So I did have some, and again, if you haven’t been before, all of this stuff is already up on GitHub. If you look at hsv.ai, that’s usually, you can go back to, this is 2024, this goes back to 2018, all the stuff we’ve done in the past. We usually try to do it open in some form.

So I don’t always have as much info because sometimes it’s just a place on it.

But here was one of the props that we were looking at earlier about the helpful assistant, blah, blah, blah, blah, warming and the question and then a bunch of references. I’m actually going to copy this over. I guess I could just copy that, but let me find where I had that been going. I actually think I’ve already done a temporary check. So this is giving it, and again, this is something I got out of the hsv.ai as far as I need an example of a rag prompt with at least 10 examples in the context and a question in the story game.

So now I’m taking stuff, I got out of one out of them as an example to go put into another one and see what happens anyway.

Talks three stuff, how is climate change expected, impact global agriculture, and what measures can be taken to mitigate these effects. This then gives you a pretty good synopsis along with the impacts and measures, along with the actual pieces after that, you know, things like that. Reaching, is there what I haven’t figured out, there it is, edit.

The other thing I want to do is at the end of this, whoops, yeah, there it is, okay. Have I fixed all my mistakes in Let’s see, I’m going to tell you where and what those references are, you know, which sentences that’s a real baby. Yeah, it’s a pointer. All right, so let’s see, we go to edit. You know, I didn’t try thinking about what you said earlier, maybe you said 4.4 instead of 4.0, because I was trying to do some stuff with having it do inline citations, and I just could not get it to do what I wanted it to do. You forget how slow 4 is. Holy cow.

It’s very slow.

Yeah, come back to 4 occasionally now, and you’ll be like watching it for a minute.

You’re like, so the question you asked is, you’re like, please, I’m going to go to another tab and come back in a minute. So the long as that fast answer might just be the same way you just got. So Jake, was this a serious thing about climate change? I am not it. Well, what I mean is we have man’s impact on it, climate, but there’s also a broad cyclical thing that goes on at the earth, right?

Times, right?

That’s right.

Bigger than they are. Right, but can you get it to, I mean, does it look at the natural?

It’s looking at.

I’ve also been going on. That’s where this right part comes in. It’s only getting the answers based on the context that I get.

If I give it a limited information, it’s only going to use that information. Which is very, I could, you could come up with a, you could come up with a couple of different ground approaches that lead extremely one way or another, just based on the information that you’re allowing to use as its context. So the last question is, I have a lot of people on it, I was like, oh, no, man’s not causing this.

This is just a part of the natural cycle. How would you begin to investigate?

Right.

So the other prompt I had that I was playing around with, I tried to get it to give me let’s see, it did put, well, it got closer to putting these in order by date.

Did quite.

And yeah, so that’s pretty much the kind of thing I was trying to cover at least tonight. I know we are at time. Oh, the other one I had was just a basic Eiffel Tower.

I can’t remember where I got this one from. Let’s see.

You can check.

I can’t remember. Oh, there we go.

I said, look at the side of your presentation.

So this one is a little more concise as far as what I’m giving it.

And I’m not asking you to do too much. And again, this is just some general thing I found.

So any other comments or discussion on prompts, it was kind of interesting trying to learn more about it specifically, Greg, and not coming across a whole lot of good material on at least a good opinionated material.

It’s where everything would go to the fence.

But given the tendency to gain help, I didn’t get a lot. So I don’t know if I would hope there would be some kind of a basic, here’s just a basic prompt that is pure vanilla.

It just does this thing that doesn’t, you can then, I would expect to find a bunch of like extensions of that, but I haven’t really come across that yet.

So a lot of that stuff, it’s really, unfortunately, we don’t expect to get information like that.

It’s like discord, you know, talking to other people that are developing these things online, like, hey, wouldn’t work for you. And so for Greg, the reason why it depends so much is because so much depends on how you’re chunking your data. Oh, that’s like what most of it is.

And the prompt side is really quite simple because you really just need to have something that’s like based on this context, context, throwing all your documents in there, answer this question, question. And then like the things that I found that’s really helped what you’ve done right here already is citations, which is really that’s like 90% of the perplexity secret prompting sauce is used to citations because those little citation markers, the one, two, they’re so unique and they always appear in texts that’s academic and citing things and factual and properly attending things that you’re more likely to get correct answers. So it puts it like in the right space, basically.

So the citation things works really well, especially if you like say only site sources that are relevant and then tell me what sources you cited and why you cited them. It’s forcing it to like reason, spending some cycles, reasoning on why this thing is important. So this is stuff like there that you can do. Like your prompt really is like four sentences. You know something like that with some brackets to throw some stuff into.

So you don’t have to go, it’s big walk prompts.

You just kind of kind of like do some little tiny things.

I think what has happened is the LLNs got smarter and we don’t have to do stuff as much as you want to do. You see, and I’m probably trying too hard to make the right prompt to do the right thing when I could. There’s probably this aspect of LLNs.

So, would you say getting smarter or talking about the thing I released is the code from but you know chat 3.5 or 4.4.

Yes.

As in two years ago, we weren’t really even talking about chat anything. I mean GPT 2, we could get I think GPT 3 was coming and the chat GPT dropped I think September 2022. November 3. November 30, 2022. So before that we were dealing with models that were, I mean a big model, 7B within a big model. I’ll tell you, GPT 2 was small, maybe 1, 2 billion maybe?

Something like that.

Yeah. So two years ago, with chat, with start, I keep using it’s in my vernacular. We were using GPT 2 to take a sentence and generate an SBIR topic.

And we were happy that it came out with you know phase one, phase two, commercialization and all of it did.

That was a 2 billion model. I mean that was GPT 2 and that was only two years ago.

You know, so that’s all concept I’m trying to throw out there for an AI year.

I mean it feels like we’re a decade ahead of where we were two years ago. Easy. If you go from there back in 2018 around that time frame where we first kind of started this group, one of the things I was trying to do was actually build a bug analysis system that could tie in with a, some of the open source pieces I work with, magnetic clips. They got like 500,000 bugs that they’ve dealt with, bugzilla and you know things like that.

They got active time. They had like 70 million lines of source code. They had 1,500 people committed to their recovery. So that’s well over a million lines of code per person is responsible for.

So I was trying to come up with ways that if you were going to work on this bug, I wanted to read the context of that and find other bugs that might be similar, maybe fix both of them at the same time, get a 2 for 1.

And at the time we were just doing a topic model. That was the best we had at the time. I could probably put the entirety of that into a freaking context. Say find the things that are similar. So that’s what we said eight years ago.

Actually six years ago.

So six years at a time we were going to be using more of a statistical Bayesian approach. Now I’ve got a giant LLM that could probably do exactly what I needed to do without adding the right code.

It’s just, it’s crazy.

Let me kill the recording and then we’ll check with, I think Ben you’re still there. Let me hit stop first.