AI Breakthroughs in Video

Transcription provided by Huntsville AI Transcribe

All right. So what we’re talking about today, I’m actually going to walk through this backward.

Because initially, I think it was mad if you’re actually, if you’re not on our discord channel, the bottom of this email that you probably got has a link where you cannot play on discord. Matt had actually dropped a link to Sora probably the same day it popped. I don’t know if that was, and anyway, it was like whatever. And I clicked the link. He’s like, oh my gosh, this is crazy. For one, isn’t it real?

It was my first question. And then initially, I thought it was something else because I had seen Google drop something and I was looking through that before and it turns out there’s been like four or five different things all coming out about the same time. And you never really know whether one thing by one company forced the hand at another thing and other companies.

In other words, if we don’t publish this now, we’re going to get scooped or everybody’s going to go to the other thing. You never really know what’s true behind saying some of this. Also, I found out that some of the dates I have on here, I think are incorrect. Like initially I had linear or let me I don’t know how to pronounce the Google thing. Let me linear.

Let me hear.

We’re going out and we’ll come up with a mirror.

Or lamar.

We can do that. So, I’ve got that as January 23rd, but as we’ll get to the paper, I think it was actually like February the eighth, which I might have had these two things separate.

So I think basically within a week, we had a February 8th Google a beer at a UC Berkeley dropped on the 13th open AI dropped on the 15th. So it was just kind of back to back to back. It was kind of interesting. But I actually did a quick survey of what have been the latest groundbreaking things published related to video. And I kind of tagged back into this one paper called CNN or whatever you want to match your out of town system. This was from a, and again, I wasn’t quite sure how to attribute this considering there’s like six different universities on the list that help write this paper. This one was pretty interesting mostly from the the resolution of some of the images. This is pretty much taking an image and generating a video based on the image and a text prompt slide from the input image in the cars on the snowy doomsday highway. Let me see if this one’s better. And I guess I need to let me click this button on this string to get this out of the way. There we go. So it’s like the foster criminality qualities like multi model time diffused.

It depends on which one you’re talking about.

And the one we’ll get to it.

Well, yeah, hold on a little.

Did you me.

I did.

So for like this.

And again, these are all fairly short.

Yeah, so basically taking an image and doing some kind of manipulation image, but one of the things that was definitely different with this group was the resolution of it. And I didn’t see a whole lot of artifacts jumping around and a lot of weirdness on the edges where you normally find fun things. But that was back in October. We’re not going to spend too much time on that one. After that.

I’m still trying to figure out if keyframer was next or Google the mirror was next keyframer after actually jumped into it a bit.

That’s not it.

There we go.

Keyframer initially looked like it was something more than what it was based on the article that I found.

What it is doing is actually taking a while I’m going to finish scrolling a little bit. What it is doing is taking an image like an SVG, if you’re doing web design and a text prompt to animate that image in some way in some kind of a, you know, to provide some kind of an animation to put it to a design. Considering Apple does a lot of design tools, a lot of things like that. I would expect it actually more out of Adobe than I would after what. And then I started digging into the paper and really what they did, they’re using chat GPD with GPD for putting in an SVG and a prompt to have it generate automatically generate the CSS needed to, you know, to animate. So that pretty much discounts this as being a video really, you know, it’s more of a commercial application GPD or pretty much. And then outcomes LaMere, which this is what I thought we were talking about.

I’m actually going to play this or back this up to the beginning and play this for a while.

I don’t really, I don’t even know if it has sound to it, but because I was watching it work and I didn’t want to annoy my coworkers. But it does really good. I mean, some of these videos, other than the ones that are stylized and stuff, you can put it up next to just about anything else you can find this video and it’d be difficult to tell whether this was generated video or actual video footage of something. And they did do an interesting thing with some of the stylized stuff. This is this is a diffusion model under the covers. So some of this was pretty interesting, taking an image of one thing and then saying I need a video of something similar to that.

But in this time.

So I think it’s muted. I don’t know if I’ve got this actually on the corner. So in traditional diffusion, Laura is still provide.

We can use them in this application just or is it completely different. I can I can pitch in there.

So, Laura’s are used a lot for like motion. So for like directing different types of actions, you can basically chain Laura’s together to control the camera motion, which you won’t see in these models where they kind of take that away from you. So that’s kind of how those utilized. More like stylized right giving a dancing bear in the style of the van go, you know, there’s an ankle or a bug in the north requires additional fine tuning. So I could probably do that with stuff like animate this, but not. Sora.

Yeah, so I was even trying to figure out how to get your hands on it at this point is a little difficult even the LaMire model I was just showing here.

Other than the paper. I can’t find it’s not on having faced.

It’s not like you can get access to this easily to actually kick the tires on it. Jumping into the paper real quick. According to them, it’s a space time diffusion model.

And keep the space time in mind because that’ll come up again in the in the Berkeley one that we’ll look at for the, I think it’s the dark one.

But it’s basically it’s definitely a diffusion model. One of the things to mention on this is that the LaMire will.

It is a high resolution video and it will do up to five seconds worth.

So that was the that was the thing that Sora basically just blew out of the water to go from five seconds to a full minute worth of video. That was fairly interesting and pretty groundbreaking. So even even the LaMire, there’s where it talks about 80 frames at 16 frames a second or probably, you know, five seconds worth.

So they do build the whole video at once.

There’s one place in here to talks about where they take how they generate what they call a token for this.

So you got X number of frames and then they basically take that set of frames and they take tokens from like a language model and it’s like they slam them together and that’s what they use.

That’s been something else I’m not quite sure I understand completely. I haven’t had time to dig in too much yet. So the thing is when you talk about some of these multi-level models that deal with text and deal with images and deal with video, what are they doing to actually tokenize, you know, or get embedded for video text or, you know, and then they all wind up working together.

Well, the rumors is on Earth. Yeah. And they’re so they’re maybe they’re putting several images together. On the Swarovan.

The Swarovan actually gets in a little more detail.

Okay.

But yes, that’s, I think that’s right in the space time diffusion the the other one that will hit next.

Now this is actually it.

The stew net.

Great term.

I was thinking this was a large world model from Berkeley, but now this is actually a different one.

This reminds me of the one that we were using for the audio piece where for the audio they had the it was the I can’t remember the name of the model. It did the frequency. It basically take the sound and convert it to frequency and use that on one side as input and then it would do the actual sound on another side with the amplitude of the way. And it would take the frequency version and the amplitude version wound up putting them into the same kind of a model. What this is doing is actually taking.

If you think of looking at a video or an image and you got spatial representation things in other words, the door is to the left of the window in space.

And then you’ve got the time aspect of it if you’re talking about video.

It’s either the frame before or the frame after. So it’s similar to from the spatial version looking at a frequency range of is this frequency left or right of let’s say middle C on the piano, you know, things like that versus time.

Did this happen in the first beat or the second beat or you know, it seemed to be fairly similar to something like that.

But they did some pretty interesting things. But still, they got kind of overcome by the Sora drop. It’s interesting to see where this Google Amir goes or if anybody picks it up.

The other thing that was after that was Berkeley dropped this piece called a they called it a world model.

Looking at remit tension with something else. I’m not sure I’ve heard great attention. I don’t know if that’s a common thing, or if it’s something they made up just for this. But I have seen Peter, I’m not sure how pronounces last name a bill. He’s been around for a bit on some pretty interesting stuff. The coolest thing about this particular model is that it’s published on having face, you can go download it, you can go best with it.

Your code is on GitHub.

I wasn’t able to get it to run, but I tried. 7 billion parameters is pretty small too or something like this. Yeah.

They’re actually, yeah, it’s a 7 billion parameter model.

You’re looking at over a billion tokens of context on your foot.

The key thing here, let me see if I can find the, no, that’s opening.

I think I’ll go back to my presentation because I think I skipped a link somewhere.

Oh, I know what it was. This actually didn’t work. So, yeah, you know, this worked yesterday. What if this was actually showing up what you would see.

Let me actually see if I can just Google the thing. I bet the same thing shows up on there. Maybe on the main site. There you go. So it talks about what they’ve trained, what they used for inputs and basically what they did is kind of stepped up the input context over time and dropped a bunch of models out of it.

One of the cool things out of this as well is they made the data set available. So not only is the model available on the place, all of the data sets they used are available as well. So you want to play with it if you want to do something yourself.

That’s out there. None of the other things we’re talking about tonight will even tell you what data they used to train on. There’s already, I don’t know if there’s a lawsuit yet against OpenAI, SORA.

There’s been some interesting things coming out as far as, you know, I made a video on it for, was any of my video content used to train your model. And I actually see something I filmed show up in something generated one day. You know, similar to the way the writers were looking at, you know, doing it for, you know, some of the chat DPD issues they had.

Real fast, bring attention.

I was just looking at it. It’s a new, it’s a different way than just traditional transformer.

It allows for the more memory optimization basically to the player context with us.

Right. Which is, I think the goal here was to open up the context of it.

One of the, if the other page would have worked. Sorry. Right.

We’ll hit that second. Coach Josh, anything else you find, how dropped in the chat. The other thing about this one that was really interesting that, that I think might be something that actually sticks. They would play like a bunch of get hub, not get, I’m sorry, YouTube video clips back to back basically slam them all together. They could feed this thing in the model and just start asking me questions about what had happened in the video. Like what color shirt was the guy riding on the motorcycle through the city. And it would answer what is a red shirt. You know, you go back to that video clip.

And the dudes wearing red shirt on a motorcycle right through and it’s just having that level of understanding of what is in a video. It makes it seem like you could use things in this model similar to what we did initially with some of the text models back when we were just trying to do things like, hey, is this segment of text. Does this have the same meaning as this other piece of text?

Not at the same water, it doesn’t have the same structure, but does that have the same meaning? You imagine being able to do that with the. What’s scary is now layer and orchestration facial recognition. Right. Yes.

So that is also a key if you’re looking at being able to generate video, which one of the things they’ll actually cover in the solar paper or the solar piece is actually filling in a segment of something.

I think it was actually the we didn’t get it to the last part of the mirror video, but they actually did some things like that where actually let me let me jump back over to that and see if I can jump towards the end of it.

Which several of these have similar pieces and let me see. There was one piece towards the end.

It’s past this where it was actually looking at. Come on. Nope.

Flowers. Again, there’s been so much of this stuff drop, but I keep losing track of which one of these I was talking about at the time. One of them actually had like a it was a lady in a like a blue dress walking through the field of sunflowers or something. Okay, I changed the change the dress to be a 60 style with straps with polka dots or something and all of a sudden it’s. Was that okay. Yeah, yeah, the same thing probably. Yeah. But some of these things are just, but in order to give that a text prompt, the model you’re talking about also has to understand what is actually in the video of start. It’s not just be blindly, you know, like the old diffusion stuff turn this, you know, noise thing into this image, you know, now it’s turned this into this but also have some understanding of what’s what’s in there. But with that let’s again this was the thing I like to put up to Berkeley again was that they made the data set available on the models are available the code to try to model.

I might have run into a problem because if you use jacks, you know that is okay.

Apparently, jacks is something jx is something that is like an umpire but written directly in machine local code.

So it’s not a. Wow, works.

It’s a compile language rather than a. So interpret language so much faster on hardware, but.

Apparently I have all of the assorted libraries on my laptop to be able to do it. Also, I think that I read into the past, especially these large contracts window models is. The library that they use to build that memory efficiency is only available for.

The 3040 series and the new great great hoppers stuff coming up anything that’s like a Tesla architecture right the library is not mature enough to include those read a graphics cards for the all the memory optimization stone right. Apparently this stuff was kind of built using tpu’s. So I tried to do some of that I didn’t just could not get there in the time I had to play. They want to play on this a little more just because it’s available. And the others might be some time before they actually come out for you know the other thing you don’t know is we’ll talk about the well hang on let’s just jump right into Sora. Which I think I had over here. So there’s the Sora link that they actually dropped but the best thing I found is their actual write up of it on their didn’t publish an actual paper that I found might be on there and somebody actually found a paper let me know.

But instead they wound up publishing something on their research blog. Yep just block post. Let me actually let me jump over and check the chat. We’re over here. I thought it was here somewhere. Okay there we go.

Oh shutter stock.

That’s interesting.

Okay.

Yeah just block post.

Yeah open AI has been fairly. Well if you can imagine if you’re going to drop something like Sora or like you know let’s say even chat GPT or something like that.

Some of the legal issues you may run into you can guess what they’re going to be for you drop it.

It’s like hey I bet somebody’s going to think their stuff is in my training data and I need to protect myself or something. Let me jump back over to this piece. So I mean there’s several things that were groundbreaking not just in Hey we got a really cool look at video. The different resolutions and aspect ratio. This wasn’t just 256 by 256 you know they got basically I think I can’t remember you said you mentioned the nerve piece. Yeah I mean the rumors that’s so Sora there’s a couple of rumors for Sora one is they’re doing it very very low resolution and they’re using upscaling. Yes. And then there’s also like you do text production and starting with the image and then doing next image. Yes. Like that’s the other thing.

And someone was also saying that the background is loaded in the nerve and the other thing is transposed on top of it. Okay. So like the background would be like the building.

Like the building generated a nerve from a set of images that generated some kind of background and then that’s how they camera. Right. It’s kind of like I haven’t seen that yet.

They do talk about how they downscale and upscale and then predict forward the weird thing we’ll get into a minute.

They also predict backward. They actually predict what frames would have led up to the current frame. They can do that as well.

So one of the things they were showing further down will keep scrolling. I was being able to take an existing video and add to the end of it and add to the front of it in a way that actually provides a continuous loop. Which is not easy.

Apparently I’ve never done it but it sounds really hard. Yeah. We’ll get down in a minute.

Josh let me I think they actually talk a little bit about how they do it. It’s a rumor. Yeah. I think the Google one is probably based on nerve. I don’t know if this one is. This one they talk about how they take the things and they go to a smaller instead of token.

They’re actually taking a larger image and going down to something else that they they actually use almost as tokens. You know they talked about how the different models actually work or things that have been tried.

This one they actually mentioned it. I think they call this a diffusion transformer which I’m not even sure what that I know those two things are words.

But as far as the architecture I’m not quite sure.

What it actually winds up with.

This is what they were calling as far as patches is what they used instead of a nerve for a you know for a splat. Again if you’re open AI you can make up your own words and make them mean what you say they mean I guess. So what they wound up doing is similar in the thing that they call a patch. Similar in the same kind of way the large language model basically you need tons and tons and tons of data and you wind up just hoping this thing winds up turning into something that’s generally usable.

So I’m not it I’d love to hear about how this was made maybe in a year or two when it whoever it is that built it can actually come out and talk about it.

Some of it I’m wondering I’ve got a theory that if you’ve been around long enough you’ve heard me say this that a lot of things that happen in AI or somebody to try something and then it worked and then they spend the next year trying to figure out why it worked and put it into a paper.

You know kind of like hey let’s throw away anything less than zero and make it zero. Oh it worked. So it’s like the first image now I need a proof someone did take the first image from the video the girl and then put it in a nerve and it kind of worked. Yeah yeah yeah that’s like what it might be a nerve. Okay. So what you’re what you’re showing here is actually in a lot of papers so this concept of turning the visual data into patches.

They basically tile the image and they treated all as code or as text so they take that that string for the image and and use that to tokenize it.

And then basically they interpolate a next frame prediction so they just add a little bit of noise into the image and they try to predict what it’s going to do next.

And so that’s so what they’re saying here showing here is actually how like stable diffusion video works it’s how animate diff works all of those other sort of LCM things. It just probably has open AI juice behind it. So there’s probably some some ways to get some of this information already.

Okay, so it’s probably similar to how some of the you know early GTP GTP models happened is why we took the same thing but we made it a whole lot bigger because we’ve got the hardware to run or the hardware to trade the data put into it maybe. The only thing I do is confirm is this running very low resolution. So it’s not like the quality thing is not how it’s being generated.

Yeah, that’s actually how if you like think look at lava and some of the other ones it’s actually running at like a resolution of 312 by 312.

But it can do resolution of images up to a giant amount because they’re basically they’ll ensemble all the patches together.

That’s how they get those large resolutions.

So compress it into a lower dimensional latent space.

And I love the the phrase space time patches.

That’s great.

So they’re saying there’s an element of 40.

Yes.

So they take raw video output late at least the compression piece.

Compressed both temporally based in time and spatially. So that’s I don’t know if something they’re doing with the way they compress it might be different or not.

Decoder goes back to pixel space.

Extract sequence like yeah.

Okay.

Transformer tokens that sounds similar.

And they talk about how does the division diffusion model by taking the noisy patches and then predict clean packages.

But here’s what I say important.

It’s apparently important that it’s a diffusion transformer.

So that’s okay.

Got it. And OK. Yeah. That looks like that’s how that works. Oh.

Of course.

Here’s where you have to choose.

These are three different pieces.

One is some basic amount of compute thrown at training it. Second one is four times as much and then 32 times as much. You can definitely see the amount of horsepower that you have able to do this would definitely affect it. And this is one thing that made me it didn’t come from the Lumiere paper. The Google paper was specific to five second videos at a specific, you know, whatever resolution. So this is basically, you know, a little different than that. But talking about sampling different kinds of aspect ratios and then generating new things but with the same same model.

Which is fairly interesting to be able to do. So this is basically a little different than that. But talking about sampling different kinds of aspect ratios and then generating new things but with the same same model.

Which is fairly interesting to be able to do that. Yeah. They’re talking about framing where some. Trying to see what I’m looking at here.

I think it’s the one on the left.

That’s some weird.

Maybe some weirdness but not. I don’t know. Train wall square crops left. Generates videos such. Oh, okay.

Got it.

This is apparently when it’s generating a video if you’re forced into a square.

Sometimes it doesn’t quite work so well as far as getting the thing you’re trying to generate in the middle.

Which I’m guessing is what they call framing.

And so this one this is kind of close to the question I had before about how exactly are they tying video and image kind of stuff with language itself. There’s all the yeah. Okay.

So galley.

Yeah.

Basically kind of taking your prompt and they’ve got anything in the background that kind of converts to like a contour or something like that.

I said that to myself. Yeah. So one thing to maybe be helpful there is clip. It has a shared backbone between text and images. My guess is that there’s probably a clip to that is under Dolly that fixes how crappy clip is because it is crappy. And that’s probably what’s underneath this thing too. Who built clip. Who built clip. That was I’m trying to remember where that is open.

It’s open. Yeah. Okay.

Because that was the first part that actually let me make sure I remember this right. That’s the piece that took a bunch of captioned images and wound up finding a way to basically do the correlation. I’m thinking I’m just a correlation. I’m sure what how it actually works between like text and what you would see an image in a way that could be used for training. Is that something like that or way to embed that so that it’s useful.

Right.

Yeah. And clip is actually a humans did all the captioning. And the guess is that for Dolly.

GPT did and the level of the captions and semantic understanding is it’s not even close. Right. This is kind of where you take something with a human at first and then you find a way to bootstrap it and generate additional stuff out of it.

And then that’s kind of cool. Now you got to go push the magic button. Sorry for the online. Yeah. Once you’re on my return off at six thirty nine apparently.

Kind of weird time.

It’s not a timer. Whatever you push it. When it plays you’re supposed to push the game.

Oh yeah.

Okay.

So you mean maybe another day.

But it can do some interesting things.

I wonder how long it takes to generate.

I don’t know if any of that is lower in this or not. I mean there’s a lot.

I haven’t found yet.

How long does it take to generate how much horsepower.

There’s a lot of things that you can do to generate. I don’t know if that is going to be a good idea. I’m going to try to figure out how to do that. I think that’s a good idea. I think that’s a good idea. I think that’s a good idea. How long does it take to generate how much horsepower do they have to use. You know kind of like all the studies that have been done on chat TPD as far as I work. One hour. One hour. Got it. So one hour. One minute. Is that for a minute.

I almost feel like we’re back in the.

If you remember back like when Pixar first came out and they were doing all of the generation of all the stuff. I mean it was. I mean just to get frames of the other there. That was kind of interesting. This thing keeps jumping all over the place as far as the website. Don’t get a break. That’s right. Text video. You can prompt it with.

Wow.

This thing.

This page keeps jumping around.

You can prompt it with pretty much anything.

You can prompt it with preexisting videos and prompt and images. That’s kind of cool. So it wouldn’t be. I’m wondering if they start off with like dialing to get an image and then go from there to the video.

That might be interesting.

Here’s some monsters and a flattened design.

It’s really good with text.

I wonder if you could ask it to generate a hand with only five fingers.

Well, I mean, yeah, the hands look fine too.

So it’s like. Like they fixed it.

This is the one that turned out better than the others. Right.

This is what I was talking about where you can actually walk backwards in time. Forward or backward. Four videos extended backwards starting from a segment of a generated video. That was kind of nerfy. They wanted to say. See if it’s a live play. I thought that was a trolley. So they should wind up. This would be interesting. Oh.

So it’s like you took the frame somewhere around here.

Oh, come on.

Yes, I can show it.

Well, maybe I’ll go back far enough.

It’s like you took this last segment of video and then took three different runs of it and said now back this up.

To three different positions and then play them all at the same time. It’s kind of interesting you always you run back into the same thing.

This was their seamless loop piece.

Just gather bike riding around I guess. Video to video editing. This talks about SD edit. But being able to take one video and then prompt it to actually modify. You know changing the setting or doing other kinds of things. I’m not sure if they have anything below.

Transitions.

Transitions.

This is something that actually had come out in that the first thing we looked at from October last year that same or whatever.

Some of the things they were working on was video or transitions from one image into another image. I don’t think they made it to transition from one video into another video overall. But I think this turns a drone into a butterfly. So.

Something like that.

I’m not sure what we’re doing. Okay.

Oh, yeah.

We got it.

So the one in the middle is basically interpolation of the one left and the one right.

Okay.

Not sure what to say about this.

I don’t really get it.

So they’re taking the concepts from the two videos and blending them together. So if you think about the fact of like you could have a TikTok video and then a video of like a stormtrooper. Right now you have to do like a whole bunch of fancy control net stuff to like get that to play together and basically have a stormtrooper doing a TikTok dance. But they’re saying here is that you don’t need to do that. You just feed it the two videos and it can interpolate those concepts semantically together. Okay. What I missed on the middle was just showing me the same thing that was on the left.

That’s where I was.

Okay.

Yeah, that’s kind of neat. You can see more of the peacock and I think the lizard. Right. I think you can tell more there because it’s two different animals.

Here’s a peacock lizard. Oh, that’s so weird.

That needs to be half. That needs to happen. Oh, there’s a nightmare waiting to happen somewhere. Peacock lizard.

Yeah.

Oh wait, here’s a cheetah or something like that. An electric. Yeah, that’s running down the road instead of the Jeep driving down the road. Or Land Rover if you’re a Land Rover fan. We can do our own shark data. I’m guessing eventually the people in the thing are going to turn into sharks.

There we go.

And of course they talk about, hey, images, I mean videos are made up of images. So of course we can do images too. That’s pretty interesting.

So some of the things they talk about as well, and this is stuff I think they’re still, according to what they’re writing here, they’re still studying to see how this actually works.

Not only 3D, well 3D consistency was one of them. I lost my mouse. There we go.

As far as kind of like being able to in a video regenerate be able to move the camera around and keep the consistency of the things that are far off.

Or still close or still close and the sizing is appropriate.

The other thing they talked about was like object permanence. You see how the dog is in the background and people walking in front of the dog, but it’s still a dog in the background. So it’s like, you know, it’s like, you know, you’re walking in front of the dog and you’re walking in front of the dog. You see how the dog is in the background and people walking in front of the dog, but it’s still a dog in the background there. Apparently that’s a hard thing to do. So what makes the dog one really nuts is like what we’re saying is that it’s all based off of next frame prediction.

In the past, so if something goes out of the frame, then it’s not going to have the attention on it.

And so the fact that they’re able to to maintain that here is pretty crazy. And there’s some other videos where they’re like, they have an inverse where like it’s a subway car and it’s blurry, blurry, blurry, and they cross a building. They come into full focus and then they go back blurry. So what they’re doing with their attention mechanism is that’s magic. That’s sorcery. Yeah, I’m wondering if it’s the combination of space and time at this.

It must sound like an attention mechanism that like doesn’t just look at the frame but also looks at the patch according to how it’s been generated in the past. So instead of so looking at the whole picture itself plus how that patch has changed over the sequence of images, which was being why the backers in time.

You know, there’s a bunch of stuff that’s been going on with like having registers like for vision transformers where they’re storing basically, basically that cash on the side, it has like class information context information that’s not specific to that patch. Maybe they’re doing basically a giant space time patch register somewhere off to the side. That would make sense here. Yeah, I mean, what they’re talking about here is like things that to know that a guy in the burger, there should be a piece missing from the hamburger after a text bite, you know, digital worlds. I’m just, I forgot who it was. Let’s talk to my man. Not what it was Tyler, I think we’re talking about this because it won’t be long before you wind up with what is the thing that Apple just came out with their VR piece. The vision, the vision pro plug into something like this and it just generates a game for you on the fly. And you know, it’s almost like a, what was it, choose your own adventure.

I can imagine something like that.

You know, instead of okay now turn to page 38. Yeah, Indiana Jones enters the temple.

Yet discussion. I didn’t get quite down below.

Before when I was looking through this.

This is kind of interesting.

I’m just getting the time out of order. Where the spill happens before the glass turns over. It’s some interesting things.

Yeah, and the glass doesn’t chat. So as long as you’re living in the matrix.

I guess it’s okay.

How spontaneous in their landing page that be interesting for objects that just, you know, spontaneously appear.

That’s sure this is the one that It could be the Yeah, it makes me wonder if he’s supposed to be wearing a cap that was knit by his.

It’s probably a problem. Okay. The other promises.

Cap generated or Nitty states on it or whatever. I’m wondering if it’s something that just mentioned that things just appear. And I think that’s all on this particular paper. Paper slash clock post.

Let me check the comments at the end.

One hour. Generate. That’s it.

All right, let me get back over to the actual. I’m not sure there was much more that was on the story page itself. Because a lot of the stuff is the same types of things they had shown. But some of the things that make me, you know, make you wonder how how they’re able to do some of this. How many videos do you think there were of California during the rush? Or was that before or after video? You know, just interesting kind of stuff. That’s cool. I think that’s interesting. I think that’s interesting.

I think that’s interesting.

I think that’s interesting.

I think that’s interesting.

I think that’s interesting.

You can basically build your own Super Bowl commercial. Oh, about as good as some of the ones I saw. In the year 2056. Okay, it’s kind of for the future for us.

Yeah, that’s interesting.

What I’m wondering is if you can describe a video of something that has never existed with images. If you can have a video of the hanging gardens of Babylon.

You know, there’s been a lot written about it, but there are no pictures of it.

There may be a drawing of it, some are an assumed drawing of it.

That’d be kind of interesting.

I wasn’t that cold rush basically yet. That’s what I was wondering. I think that’s the first film campus for roughly available in the United States. Okay. So another thing to think about too is that the interface to this is probably going to be something like chat GPT, where they generate your prompt for you.

So you ask for it for something and it’ll generate it. And they’re really good for copyright reasons about describing something, all the attributes about it.

And then not actually using the reference itself.

So it’s probably going to be something like that. Actually see that. We’ve got like three minutes left. Turner comps, characters.

Let me jump all the way down to where they get into safety.

This is where they tell you that when you can’t use it yet. But some of it I could understand why. I think they’re looking at doing things like generating pieces into the video that make it easier to find later if somebody’s trying to ask is the same real thing or not a real thing.

At least they’re, I think they’re thinking about it upfront.

I’m not quite sure how long it’s going to take somebody to jailbreak the thing or whatnot and actually get it to do some things. Yeah, all this goes like, you know, I’m not forced to make this happen. But we needed to PR. We need to do various tools to your heart.

Like what happened when they released recently that hasn’t been there like pretty soon? Yeah, I’m not sure. I mean, if it pulled it, they pulled off it pulled off, but I don’t know. I don’t think they go this far unless they have something that’s at least usable.

Maybe not to a wide audience, but this kind of thing, it sounds like something they could they could narrow your audience just by charging and the short amount of money to use it. Because it’s, I mean, it’s, it’s probably got a lot of value. You know, just based on what it does. So what about like these videos to YouTube and stuff like that too? I mean, there are, I mean, there are any generating videos, the students still images, but yeah, like and people are making a lot of money with it. It reminds me of one of the things that’s going to be difficult. I don’t remember which shooting it was, but there was something that it hit it actually get YouTube wound up just having to crank up the audit like whatever not they use as far as is this likely the same video or not the same view because they were trying to take it down as soon as people would publish it. But people were doing things like changing the aspect ratio of applying a color filter to it. Do I want to, I mean, just the easy one button click things you can do with your phone to change what a video looks like. And they’re, they’re automated detection models weren’t good enough.

So they wound up turning it up to the point where ABC news and CNN and other stuff wound up getting taken off of their feeds because it looked close enough to what they, you know, they just went. But again, they had to know that we have an ability to use the same technology that artists are using to poison, steal, diffusion to run video and get it past all those filters. Or it destroys the digital fingerprint but people can’t proceed to do anything wrong with it or different for that.