Meeting notes provided by Gemini
Mar 25, 2026
AlphaFold 3 – Transcript
auditorium@hudsonalpha.org: This is a little bit more on each one primary. You could, this is my metaphor, but I’ve always thought of this as kind of the noodle or the bundle of cables. I know that it’s going to eventually form a shape. It’s just when I’m looking at it in that native state. There aren’t really good clues as to where that’s going to be unless I’ve been taught and brought in on that.
There are certain bonds that are double which are fixed because of how like and we’ll get into that part but double bonds actually ensure that there’s a very specific orientation and then if it’s a single bond there’s a little bit of rotation that can occur but that’s that’s really in there.
The secondary this is kind of another look at it. This is where we start to see those structures form or what will be structures. Two of the classic motifs or common ones are the alpha helyses and then the beta pleated sheets.
auditorium@hudsonalpha.org: I can’t remember why they were called alpha or beta. They’re just that now. Um, but the thing about this is a fun way to always remember it. And if you’re looking at a DNA model or even a protein model, this is a good way to catch yourself. All alpha helyses are going to be right-handed. If you stick your thumb up straight in front of you, the curve will be around that that axis. So if that if you actually see a model of DNA, there’s a game called splice and it’s this really cool thing kind of like Jingo Towers where you actually stack all these little pieces and you’re trying to build stuff, whatever. Um, it’s a left-handed uh helix, so it’s not correct. So they tried, but it didn’t work. I still kept it and glued it all together because it was funny. But yeah, and then beta pleated sheets. It quite literally almost looks like a pleated sheet when you’re staring at it.
auditorium@hudsonalpha.org: And these bonds aren’t very strong. This is ideal conditions, a model of it. This is how it would ideally come together in biological conditions, whether it’s in tissue, whether it’s in fluid suspension. This is kind of what that natural chemistry wants it to tend towards. God, I’m flying through these uh tertiary. I promise it’s not censored. There was just a really cool quadinary part and I didn’t want to distract them. But this is when it gets to that that next bit of there’s even more chemical interactions going on. Let’s say we had this big long strand. We know there’s a helix here. There’s a beta pleated sheet here. But if everything’s sitting there, how do they come together and start interacting in 3D space? And that’s dulfide bonds. Just fun side note, dulfide bonds are actually part of the reason why there’s a difference between Irish curls or regular curls. Whenever women, anybody who straightens their hair, whenever you’re getting your hair straightened, they’re actually breaking the dulfide bonds temporarily so that the hair straightens, but it produces a curl in the protein when there’s more dulfide bonds.
auditorium@hudsonalpha.org: And then quad, this goes back to that big behemoth that was at the front of this is how are all of these pieces individual proteins? And in this case, like they’ve got them pretty well annotated in here. There’s two different types of serine proteins that are involved or at least based by color. We’ve got the CHA Y proteins, the stator. There’s a separate set from membrane penetration and anchoring. Like all of these are different types of proteins that are coming together into this space. But that’s that’s so when you’re looking at I am trying to model protein structure caring about or caring about am I trying to model what the secondary is am I looking at the tertiary or is this something that can push all the way to the quadinary and start getting these pieces together no matter how large they are. Can I predict that based on what we know? Um this is where we kind of get into the methods of it. I only chose two. There are way more than this.
auditorium@hudsonalpha.org: Um, but they’re all expensive and they’re all cumbersome, but they all have proven their worth. One of the first ones in humans was actually hem, which is part of our red blood cells letting us capture oxygen. That was one of the first structures that we ever figured out with X-ray crystalallography. And basically trying to figure out how to exactly explain this. X-ray crystalallography is at its core. If I know what a quartz structure looks like on the outside, if I bounce an X-ray off of that, it’s going to have a certain refraction, it’s going to have a certain bounce off of it. You can do the same thing with proteins. If I can purify a protein down and form a crystal with it, and it is as difficult as it sounds, but there is an art to it and people have gotten whole PhDs in it, but you can actually crystallize these proteins, do X-ray crystalallography, and be able to infer what that structure looks like based on how the X-rays are coming back and defracting off of that crystal.
auditorium@hudsonalpha.org: Um, this is a good example from um, Rosland Franklin, but hers like Franklin and Crook or Roslin. I’m getting there. I’m tired. This is the classic example of the double helix. So, when she did this, that was the actual X-ray defraction pattern for the double helix. Now how at the time they were able to back this out there’s a lot of math I never bothered to learn but it is there from the central point and from there you can start doing electron density mapping so I know what the different electron fields within that defraction pattern were and then I can start backing out to atomic models. This was kind of a reference for how many of these are available from protein datab bank. I just went to their basic search. I didn’t put anything other than human samples. So, it’s just looking for those. But the angstroms or the the refinement resolution is how close in terms of angstroms do we believe we are with that model. And that’s you’re getting down to oh gosh, I should know these numbers.
auditorium@hudsonalpha.org: It’s on the angstrom scale when you’re dealing with DNA with these molecules together. So, it’s it’s down to can I really say that? Is my confidence that high? If you get something closer to 4.5 or four, four, four, even higher. I’ve seen like one that was five. You might as well have not uploaded it. You’re kind of guessing. That’s sometimes you need to because that’s the requirement of the publication is they’re like, “Hey, you have to post this.” Um, still kind of embarrassing though because you it’s not really a great model. You did your best, but protein purification is still really hard. Um, likewise there’s another one that is quite a mouthful. Cryogenic electronic electron, it’s not electronic. I guess PowerPoint decided to correct me. Electron microscopy crym blissfully is the shortand for that one. But this is actually kind of cool because this becomes not just the defraction pattern. I’m starting to do averages and gradients across what I’m actually looking at.
auditorium@hudsonalpha.org: So, at a very high level, this actually came from a Wikipedia article for this. It’s really good, is you’re still going to purify the protein down to what you need. You get that loaded into a grid. You’ll freeze it. And this is beyond. So, this is like liquid helium, liquid nitrogen levels of freezing. um to be able to get it down that low. Then you’re going to actually use electron microscopy and look at the 2D projections. And then you’ll go through several different phases all the way to based on all the different ways that these proteins landed in here. I think this is what this is in a 3D orientation. So you can kind of see that going from step five to step six to step seven is they’re going, “Okay, based on all of these different pieces that I’m seeing, I believe that’s what I’m actually looking at.” And this is where you get some of the same thing. This was more human samples. Yet again, what the distribution generally looks like and you can say like 2.5, three angstroms is about where you want to sit with this one.
auditorium@hudsonalpha.org: But that’s also a matter of how close that technique can actually get as well. But both of these regardless are they’re expensive. They’re time consuming. The the especially the protein isolation, protein crystallization. My former boss, she actually still works here at one of the other companies. She did protein crystallization and she said it was equally as an art as it is a science. She didn’t like doing it, but she did. Um, but regardless, that’s that’s what we’re trying to do with Alpha Fold 3 is you see how incredibly expensive these are, and this does more or less confirm what you’re looking at. But if we can limit the amount of confirmation needed and predict it and then have ways to go back and validate what we do, then that’s much easier and potentially I don’t know about computing cost, potentially cheaper than trying to go this way. So for the 2D, this is getting a two-dimensional view of it and then doing that a bunch of times. Is it is it doing it with different uh proteins each time or you shake it a little bit and try to get it to roll over?
auditorium@hudsonalpha.org: It’s all the same protein, but I imagine it’s probably the closest metaphor. This is kind of silly. I just thought about this is if I took a bowl of Lucky Charms and I just grabbed the rainbow out of it and I was trying to say like what is the structure inside that rainbow at that scale I couldn’t see whatever and I just dumped a whole slew of them into the grid and however they landed. Oh, so there’s more than one of them. Yeah. And Okay. And so you’re getting the same 2D image, but it is of the same thing over and over and over that grid at potentially different angles or different Okay. Yeah. And it and it’s frozen in place, so it it becomes something that I don’t have to worry about changes in temperature. I don’t have to worry about ambient heat or potentially additional energy actually changing what’s going on. This is subzero close to subzero temperatures. Do all the proteins need a very narrow polyivity for this to work?
auditorium@hudsonalpha.org: Very narrow what polyivity for the the chain length? Sorry, I’m thinking about this in terms of plastics. Uh not necessarily. I don’t know enough about the technical upper limits of this, but some of these that I have seen structures for very large. But in saying that very large at this scale is something that’s thousands of of amino acids long. It’s still possible, but it might be you choose one method versus the other. Like this one that you may be limited by grid size, but when you get down to X-ray crystalallography, it’s down to how pure can I actually get that given its size, but maybe I’m not sure. So, so Alexa, how much of this is governed by like physics versus chemistry? Like in this case, you know, I’m rolling a bunch of dice or I’m rolling things and seeing like which side they’re on. Is it are they, you know, this particle selection? Is it like a chemical bond? Is it more because it it landed a certain angle and and it’s giving off a wavelength, right?
auditorium@hudsonalpha.org: So, to me, this this almost like a radar to me, right? So, I’m coming, you know, he may come at it from like a mechanical side. And I’m coming at it from a physics thought process. So how how much is one versus the other? At this point when you’re freezing like that, you’re hoping to actually mitigate all of the changes due to chemistry. So it would be pure physics at that point because you’re assuming there hasn’t been enough damage. And I mean this this I say assuming it’s been pretty tested that like if I freeze it at this state fast enough, it is going to hold the shape that it was in at whatever temperature I pulled it from. So you’re you’re not it is down to a physics scale problem when you’re getting down to the EM part of this. And that wave that angstrom is that wavelength radiation off the angstrom is trying to figure out how to how to put this If I were to measure something, like let’s say I’m trying to measure that sign on the far wall, I know that an inch for me right now looks like this.
auditorium@hudsonalpha.org: It’s a matter of if I look at that, where does an inch sit? So, if I’m looking at that model and my inch ends up being three or four angstroms or my tolerance is three or four angstroms, I may get the measurements on that sign far pretty far off. Like I may think that I mean it looks probably like three feet tall. I don’t know. I’m kind of short so it’s hard to tell. But I could mismeasure that and I could say oh it could be 3 feet tall or it could be 18 feet tall. I just don’t know. You you get into this I’ve accepted a certain amount of error for my measurements and that’s where the angstrom is coming in. Um, there is another one that isn’t on here for these different methods and it’s NMR, nuclear magnetic resonance. I didn’t want to touch on that one, but it was you see that a lot in mouse samples and in other species, but for humans, it was just kind of there was data there, but it seems to be like one of the methods that people use less for humans.
auditorium@hudsonalpha.org: And the Ramachand plots, I think these came around 60s. I want to say it was early 60s, if not early 70s when these came about. But the idea is if I can generate these plots, I can sort of deduce or factcheck myself about the structure I think I have. And this this is pulled from a blog that I’ve got included in the notes, but the beta pleated sheets, if there’s a ton of them in my structure, they’re going to show up in the upper left and I’m going to see them represented up there. And that tends to be where those those rotation angles are going to account for. And it’s positive or negative around that axis. And then what I’m looking for, you don’t really see them here. are kind of represented. But in the bottom left is where you’re actually going to see those alpha helyses. Um very very early on and I found it just before it was actually pulled from the same paper back in 97. They tried to do a prediction model based on structures and it was using purely Ramachandre plots to say this is how much is here.
auditorium@hudsonalpha.org: This is how much is in this quadrant for alpha helyses. But then kind of like you’re seeing here is they’re denoting denoting hybrid structures that live in other clusters throughout and that that was a very very early rudimentary thing in 97. But that that was something that I didn’t want to fall down a rabbit hole on and get lost on. But this is that it’s a way to fact check yourself. So if you see something that you’ve modeled it, you’ve done the crystalallography and it should have a ton of beta pleated sheets. If this structure shows up not largely flat but ends up having a spike or a whirl or a helical pattern to it and your Ramachandran plot shows that it has that you’re like oh the structure I predicted or the structure I thought I had is incorrect. I’ve just validated that something is wrong in what I’ve done. And we used to have to like study these and figure out like what where are things wrong. And it took forever for me to figure out oh it’s the angles around the center axis.
auditorium@hudsonalpha.org: Is there a mutant protein in it? Then I can start sort of unfolding all of this and go okay that’s something I need to focus on. I now know how that works mechanistically. So it applies to I mean effectively anything that you could do. There’s there I say drug targeting. That’s probably the most lucrative version of of anything that people do. But there are there is the whole thing of I just want to know. It’s like when you’re looking at the flag motor or the the motor for the tail of a of a salmonillaa bacteria, you kind of ask why. But then it also translates to I could use that for nanotechnology as a tool or I could if I can generate that structure I could tack it onto something else and make it more effective in delivery. So you can start playing these kinds of games with how you want to do it. Um genetic diseases are another one that actually kind of touches on on that research and maybe we’ll get there but it’s mutations can make the difference and if you have one mutation it may affect how that protein structure ultimately works.
auditorium@hudsonalpha.org: So you may have one mutation. It’s pretty rare that it’s just one, but you may have one mutation and it’ll exacerbate out and actually cause protein protein structural level problems or functional problems. Oh yeah, this is the why I can’t I’ve got the quote or the article tied to it. uh such efforts using experimental methods have identified the structures of about 170,000 proteins over the last 60 years while there are over 200 million known proteins across all life forms. Um I had to do the math on that because I’m fun. It’s 0.0085% of everything that we think we know. So yeah, you can tell I kind of fell into a rabbit hole late at night without it. Um, so I won’t touch on alpha fold one and two. I keep wanting to call it alpha fold one because it annoys me that it’s just alpha fold, but I get it. Um, basically all you need to know is that there is the critical assessment of structure prediction contest. It happens tenatively every two years.
auditorium@hudsonalpha.org: um for TW or 2018 they were the highest scoring ones in terms of distance. So their model was the closest in terms of overall distance calculations to what they were asked to model. Every it’s a known structure. Everybody ran their prediction models and then they took your prediction model and they ran it against that model to say how close were you in terms of distance in terms of 3D with your prediction and they scored the highest. And I say the highest out of a hundred I think it was like 60 something. It’s not great but it was the best one that was there. Uh likewise in 2020 two years later they went for CASP 14 they also placed first again same kind of metrics. It’s the known protein structure and then you’re trying to run your stuff pure prediction against it to be able to get there. Um, and one caveat that they mentioned with that is alpha fall 2 only predicts proteins that have one polyeptide chain and that actually came from their own stuff. So they’re only focusing on that secondary or tertiary structure.
auditorium@hudsonalpha.org: They’re not getting into how these pieces fit together in quadinary. They’re just there. Um, excuse me. Yeah, there’s the global distance test was what number 90 or greater. So, everybody’s doing pretty lukewarm at best in in cast 13. By cast 14, we’re getting close to like 60s and 70s, which is where alpha fold 2 sits. And all you need to know is they got cocky enough that for CASP uh 15, alpha fold 3 did not present. But more than half of the people that who were in the top uh used alpha fold too. So they they kind of just took over and were like you guys got this now. Let’s see. And this is oh gosh this next one. So now alpha fold three. Uh when I said the architecture was gross and it could be a whole series I wasn’t really kidding. Um, so at least in terms of what they are trying to solve, let me see if I can get towards the front so I can actually see this is when you’re inputting what you’re looking for.
auditorium@hudsonalpha.org: They’re going to do a genetic search as the first bit and that’s down to sequence. So that can be if I put in a DNA strand, how does that DNA strand align? If I put in a protein, where does that protein fall into alignment? Is it close to something that’s already known? Yeah. Oh, yeah. Sorry, you looked like you had something for them. So, there’s the genetic search, and they kind of use that as like a protein search at the same time. They say genetic, but it’s it’s proteomic and genomic at the same time. There’s the conformer generation and that’s going to generate a slew of shapes to actually ask what could this possibly form. And then there’s the template search of we know parts of our training data. We know there are certain templates that this can already fit. Is it already going to be a helix? Is it going to be closer to a flat sheet? Is there going to be something that’s a globular structure kind of in between?
auditorium@hudsonalpha.org: And all of these pieces feed back into embeddings into the template module into the multiple sequence alignment module. And then this was the fun part is uh the parapformer or the evoiformer towards the center that is effectively alpha fold too. like they might have made some minor changes, but they effectively just made their whole previous model one little chunk of what the whole thing actually is. And then you get into diffusion models and then because they were having, and we’ll talk about that in a second, they were having multiple issues with it would predict the structure, but then it would have two atoms or two molecules crossing each other in the same 3D space, which is not possible. So in their their version of that, that’s a hallucination. So they’d automatically throw those out. They were looking for collision. They were looking for things that there’s there’s an effort down at this level called stereochemistry, and it’s whether or not a molecule is left-handed or right-handed. And good example of this is L-dopa, left-handed dopamine is in our systems.
auditorium@hudsonalpha.org: If you give us right-handed dopa, uh, nothing or horrible things happen because our body can’t use it. So, there’s there’s very specific chemistry, but it would randomly hallucinate right or dextroyality with some of these molecules. It didn’t really make sense. So, they have this confidence module where they’re starting to knock down and filter all of those pieces out, but some still gets through before they generate their confidence. So it’s a whole ordeal to be able to go through this. But that’s basically what it’s trying to do is it’s not just looking at the protein sequence itself. It’s asking the question of have I seen this before in another genome level or proteome level? Have I seen this in other templates or portions of this in other templates that are already available? Out of all of the generated shapes or attempted structures that I can, have I already seen this before? And then all of that passes through all of these blocks and modules as it goes. The fact that I can have a job complete in a few minutes using their free version is kind of terrifying because I don’t know what’s on the back end of this um or what’s running the back end of this.
auditorium@hudsonalpha.org: So yeah, so it says there the confidence module has four blocks and those are DIT blocks. That’s not a deterministic sort of checker. It’s a part of the attention heads or something like that. It’s in the model so yeah. Okay, that’s interesting. So for the the left side here, this genetic search thing. So it seems like are you familiar with the concept of rag retrieval augmented generation? It sounds like that’s doing basically on a database. Very much. Interesting. Okay. Because they’re embedding that and then doing some sort of pre simulation and then feeding that into Alphold 2 essentially more or less. Yeah. Okay. There there are other components that came in. They It’s almost like they took a hero’s rest after uh cast 14 because they had a lot of other projects that had public papers. Um it was like Alpha Fold, Emboss, AlphaFold, something else and they had like a bunch of little products and then they just sort of funneled everything back into three and were like this is the thing.
auditorium@hudsonalpha.org: Um, you can there’s a I’ve got a timeline saved somewhere of all of their little products that they put out in between is like they had different research groups and different teams trying to tackle other aspects and then when they finally got everything resolved it all got wrapped up under three. This is really fun. So we just did a Jeba and they did something very similar. They did robotics, but they basically have Jeba one or VJA one, which was just one part of the model, and they had like five other experiments. They just jammed them all together, and that was Jeba 2. So, it’s interesting. I think that’s what Josh I think that’s what made me ping Alexa. We were talking about like these world models and I was like, but what about Alfold? And that came up and then I s this might be this might be similar. might have covered it and my language might be off, but for the conformer generation, is there enough confidence in what Alphafold is generating that when it proves the new protein that gets fed back into the conformer for future use?
auditorium@hudsonalpha.org: Yes and no. And this kind of gets into some of my own tinkering with it. I I think it is, and I’m and I’m saying this like I have the money and the time and the computing ability. I think this is okay where it’s at because there’s some signs in just me poking around in it ignorantly that there’s still a lot of work to be done. Um, but that’s that’s not knocking it. This is incredible and they’ve done a lot that quite literally I could spend the rest of my life reading the research that most of these people have done. But same as anything else, it’s at the it’ll get you 85% of the way there. you need to I I almost fell into the weeds of some of these papers and I was like nope I have a bad day job. I think I think what you’re saying I mean so so what’s been really interesting here is that there’s lots of verifiable stuff here especially the physics stuff and so I’m not seeing anything to do with like RL reinforcement learning sort of things here and so if they’re able to get that loop going I’m sure alpha 4 exists somewhere I mean it’s very easy to see what that would look like yeah that’s when the robots are stirring and freezing Let’s
auditorium@hudsonalpha.org: see. This was some I mentioned these sort of briefly. These are some of the caveats and they mention this uh there’s a paper that was in the GitHub event saying it’s their alpha fold three paper. Um you can get through the whole thing and then right towards the end they mention these these beautiful little caveats like you do in the paper right towards the end you go hey um to the end. a really really great but um no one will read that far for kyality they assigned a penalty or they added a penalty to it so it it would actually that may be part of the only reinforcement learning portion of this is like hey you kind of sucked at that one don’t do it again but there was a penalty for kirality issues there was a penalty for collision for them existing in a 3D space but then they mentioned they can only do static structures And that’s the other part of studying protein structures and trying to figure out how they work and how all of these pieces move.
auditorium@hudsonalpha.org: This kind of goes back to that flag motor in that first bit. This is still another great example is there’s a confirmational or a shape change that occurs for that rotation for that tail to twirl or for that that flagagular protein to actually move or cause movement. Um, likewise for he molecules, can’t remember exactly how it goes, but there’s open and closed. And when believe it’s open, there’s a chance for it to bind oxygen. And when oxygen actually lands, and it’s pure chemistry and physics, the the molecule will ever so slightly close. And that’s actually when it’s holding or it’s oxygenated hemoglobin. And that that becomes the difference in you know they you hear that classic thing of it’s not quite blue blood to red blood but there is a shift in the color spectrum of blood in relation to if it’s heavily oxygenated or not. It’s just because there’s so many proteins that are just going and they’re their refraction with light actually changes the future of light changes. It’s cool. But they they admit they can only do static.
auditorium@hudsonalpha.org: So if it is something that moves at least for alpha fold three right now they cannot do that great and they do have and this goes the open versus closed they’re going to model it however they can as close they can so in their training data which they admit to using uh PDB or protein data bank if there are only structures that are studied in the closed position then everything that it’s going to predict is going to be in the closed closed position. Or likewise, if there’s a ton of closed position and then there’s one open position, it’s going to favor or have bias to predicting models that are going to be in that one close or in that open position rather than closed or vice versa. It just depends. So they they admit that like these things happen and they’re not quite there yet. And these are dynamic and moving. Proteins are really, really cool and also kind of terrifying when you think about the scale. Um, so you do have to do a wonderful little This is the only time I’ve ever had to do this.
auditorium@hudsonalpha.org: It was kind of creepy. Um, I signed on to alphafoldserver.com and you can actually pull in your Google credentials, but then it’ll walk you through you can only do these things. you can only have access to these things, how to edit it. You have to like skim a whole bunch of stuff and disclaimers. And then you click okay and you’re signed it. You’re fine. And mine is a free account. They allow up to 5,000 tokens and then they have a little link so you can figure out what a token is for them. And then you can do up to 30 jobs a day and it resets in a 24h hour period. So like it’s pretty pretty darn good for something that’s that’s free. Um, this is one of their I’ll show you guys this in a second for the live demo version. This is one of the freebies that they have and this is a known structure that’s on protein datab bank. There’s a couple of it’s a double helix of DNA and then it’s the interaction of the protein that’s with it and then there’s a couple of ions that are actually in there.
auditorium@hudsonalpha.org: And that’s where this gets interesting is it’s unlike alphafold one which just focused on the structure itself. Alpha fold two, I don’t believe they actually accounted for having multiple interactions. And now by the time we’re getting into three, we’re starting to get into this territory of if there’s an ion that needs to bind there or if there’s another protein that’s involved with this, can I actually model that in there, too? And to some degree, yes. So, they’re they’re adding more and more. This was just kind of showing you what that input kind of looks like. And they they’ve got it gated so you can select a protein, RNA, DNA, or then you select the ion and then they only have a short list of things that you can preview the job. You click go and then you sign away one of your 30 jobs for the day. Um, this is a good example just tentatively. I’ll do the the demo after this to just kind of show what this looks like. I just wanted the screenshots.
auditorium@hudsonalpha.org: The one on the left is the protein datab bank version of the same thing that they were able to generate in alpha fold. So this is their prediction for what that is. Um I’ll show you actually let me grab the slide and I can emphasize that because you can’t really see it right here. There we go. Make this guy super big. So part of their confidence is included in the rankings across the top. Right? So the color of the model that they render it in is the relative confidence that they have for those different sections of the protein that are in there. So everything that’s in here is very very high confidence leaning towards that blue. But then at the very top and they’ve got an explanation for this and we can walk through it. They’ve got a couple of different confidence metrics. the IPM and the PTM. Basically, anything that’s above 0.8, they’re almost completely guaranteed to be correct. So, the fact that this is sitting at 0.94 and n3 means I am more than absolutely certain I am correct.
auditorium@hudsonalpha.org: So, that’s that’s a fun little metric there. Uh, let’s see. Well, all right. Instead of moving, I’m just going to delete it. Um, but this also goes back to the research paper that I did. This is actually talking about I say I did. It was a whole group of us because we figured it out and we bribed our professor to let us do it pretty big. There was no money. Um, so this patient had an immuno deficiency and they had no answers. They were like, I’m just going to throw my data out there. We’ll see what happens. There’s a specific gene rag 2 and it’s part of what generates different receptors or different immune functionalities. One of the things about enzymes and not all proteins are enzymes but all enzymes are proteins but most enzymes need a co-actor or something to kind of help them along to make that energy of the reaction easier. In this case, it was zinc. And we went into their their sequence and we’re like, “Okay, let’s look at this one gene we where we know it has to have zinc fit into a very specific little molecular pocket in order for this whole reaction to work.” And in the original,
auditorium@hudsonalpha.org: in what would be a a healthy I hate using that term because you don’t know that, but healthy version of that gene, the connection is tight. the zinc in the middle which is that wonderful little gray orb is nicely held by all of these different bonds. So it’s a very secure hold. But then when we took the same sequence and it’s the cyine 478 that’s just a position to tyroine versus zinc. I just recreated what we did and it took us months uh like back in 2020 to do this. But you can tell that the p pocket or the little cage where the sink is supposed to hit is inherently less stable. So the theory would be maybe possibly and there’s no way to prove it that that would be part of why that they may be immuno deficient. That may may not be the whole reason and you can’t diagnose using something like this. It’s just hey maybe this might be what is actually the problem. But you’ll notice that previous one, the scores on the top were 0 93.94.
auditorium@hudsonalpha.org: When I run these 6, 33,42,43, I’m guessing. So we when I’m trying to throw something random at alpha fold that it probably hasn’t seen before, they’re get shaky really quick. And that’s that’s kind of where I’m at with they’re good, but they’re really good with their training data, which is it’s kind of showing pretty well. Do you think they’re overfitting? I don’t want to accuse people that work for Google. Keep in mind, but maybe there there might be a no way to back that out, but I even just for this one test, and that may be like confirmation bias on my part, it’s like I was really surprised that the scores were that low. That point 6 means you’re pretty darn confident that it’s there. So at least with the the original and that does have that goes back to it though the original does have a model in protein datab banking. So the there may be a bit of of overfitting but it’s it’s very very hard to tell if that’s that’s the exact case.
auditorium@hudsonalpha.org: But I I don’t know what this would look like if you were to produce, let’s say, we’ve never crystallized this particular protein, but we know it’s sequenced. I don’t know how well this would do at all if no one’s ever touched it. It seems like because I mean diffusion models have this problem anyways uh if if they don’t have some sort of RL objective to kind of push it past what it data is. But it seems like especially too if they have that confidence head that they’re almost double bounding themselves to structures that it is able to be confident about even if if it’s not because it’s not that deterministic outside checker. It’s it’s it’s bound to what it was able to learn to be confident about within its data distribution. This double thing almost. Yeah. And and I’m I’m not like there’s a lot of potential here, but that’s even just playing with it. And that that’s the good thing about like coming into something that you’re only lukewarm about is by just going but what if um I can already start to see that there might be some shakiness and some overfitting and or like you said the double bond double bindings for that.
auditorium@hudsonalpha.org: is just if you know this structure, if you’ve seen this structure before and you’ve seen it in your training data, then you’re kind of adding that layer of of if it’s known, we’ve already done this. Well, I think it’s almost like their confidence level isn’t necessarily that they’re confident that it’s correct. They’re confident that it has some match to a thing they’ve seen before. Yeah. So, it’s almost like yet they know it’s a valid thing because it’s part of their train. They they’ll tell you they’re confident it’s valid. because I got enough of a curve over too. This almost makes me think of I mean going into like G and like VO and all that sort of stuff because you’re talking about the ability for the thing to move especially too you know if you kind of tie in some of the world model robotics stuff is that lets you do some more interesting postchecking things that might let it get out of distribution. I think that’s if it’s not able to generate new struct I new structures it’s a part of the trick right for the most part unless I’m you know yeah no I mean I I agree but there’s also this giant asterisk of I don’t want to be means of the people that have lots of of money and coals repeating
auditorium@hudsonalpha.org: infrastructure but yeah there’s there’s problems that was part of my before before she reached out and pinged me for this I did want to like I am a natural naturally skeptical person. I can be a bit of a jerk for new technologies because I’m I’m constantly sitting there going but what if but why? But who? You and I’m I’m very negative initially for most things. Uh it took me forever even as someone who is in AI to start using a chatbot regularly. But that’s mostly just like hey I’ve designed this code. How would you as a faux software engineer design this code? And then if it backs me up I’m like okay cool I got it. So, like there there are parts in here that I’m very negative about. But with this, I’m like I was lukewarm about Alpha Fold 3 where it was at because I’m like, “Okay, they’ve got only so much protein structure data that’s out there.” And this is kind of backed up the the whole thing of cool, you’ve you’ve gotten really good at making a tool that can render what we already have.
auditorium@hudsonalpha.org: part of that um since it’s training on pretty much all of the protein structure data in existence kind of like you mentioned it’s going to be biased towards things that are physically able to be resolved with the other tools that we have. Yeah. Things that crystallize well and things that are biologically interesting for some reason for someone to go through all the trouble of actually studying. And then and then that goes back to that insane little factoid I’m having issues with of it’s we’ve done 085% of all known structures. So, we’re we’ve got this little slice hammer out and then there’s the rest of the room to figure out. That could be one of the reasons why, you know, you got the block of Alpha Fold 2 and then they’ve kind of expanded things for Alpha Fold 3 and stuff. They may have gotten to the point where doing more on the modeling side before you go back and start beefing out the rest of that data set. Well, they they’ve got to find some they’ve got it’s an environment problem.
auditorium@hudsonalpha.org: They’ve got to find some way to take these random structures that they aren’t confident about and go put it in an environment to see if it’s interesting, right? I don’t know, a lab or something like that. I mean, but that’s what what makes the video stuff work because they can validate that stuff in some way. Uh maybe part buried in the terms of service of me agreeing to like throw my stuff. Oh, sure. Exactly. Right. Congratulations. Yeah. No, a little bit. If it’s a free tool, you’re the product. But yeah, so there’s scary though like a wet lab of something where they’re just trying these things. So when you you go in and you can see this reset for the day, but you can set up all these different jobs. These were actually some of the ones that I rendered for this. U if you wanted to add things on the fly, you can add a protein, you can do DNA, you can do a ligan.
auditorium@hudsonalpha.org: I don’t really want to get into the weeds of that right now, but we’ll just use let’s do one of uh basically whatever would bind to a receptor. So like it could be another protein, it could be uh not a protein at all. Like nicotine as a molecule is a lian for nicotenic receptors or could be a lian for a tennis a nicotenic receptor. There’s sort of it gets into that chemistry of how things fit together in 3D 3D space, but there are some ligans that bind really really well. There’s some that bind really, really poorly. Um, carbon monoxide poisoning is one of those cases because carbon monoxide binds really, really, really well and preferentially to hemoglobin over oxygen. So, if there’s carbon monoxide, it will actually bind to that he first before the oxygen does. Um, but this is just one of theirs. RNA, couple of protein strands. I can continue and preview. I didn’t set the seed because they don’t let me do that. This job’s already been done because they’ve already run it somewhere.
auditorium@hudsonalpha.org: But then the cool thing about this is you can see it’s kind of shaky in terms of confidence. I mean, it’s still over 90. But then when we get out to very, very low, it’s these tail ends over here where it’s not sure. Go ahead and bring this guy up. But then if I wanted to do all kinds of superficial adjustments like the adult I am, please spin it because I can start playing all these games and kind of see it. Uh the spin doesn’t really help when you’re trying to take screenshots. Um but it’s all mouse oriented. This is based on previous molecular tools that are already out. Yeah, you might have mentioned it, but are any any labs or schools saying please send us whatever you run through this and making a collection of all that people have done? Are there any crowdsourced like here’s the 1 million proteins we’d really like people to run through here for us? I do not know if there is, but I also wonder I do wonder and this kind of goes back to that not being mean to Google, but I do wonder if they have buried in this thing and you can’t do that.
auditorium@hudsonalpha.org: Yeah. Well, you have to approach Google to be the lab to get the stuff. And I’d assume like we mentioned anything that’s free is, you know, here’s here’s helping decide what we’re going to run and giving us more data, but just it just seemed like an obvious like, okay, if there’s 200 million of these we need to find, somebody probably has a list of, hey, these 50,000 would be really nice to have. And didn’t know if people were working together to run through it and a wanted poster. Yeah. Yeah. I thought you were looking for like a midjourney community version the latest and greatest and who came up with what and I’m feeling like this protein today. So many of the earlier solutions before alpha fold were gamified and you know people could play games and get points and contribute computing power towards helping protein fold. So there’s plenty of that in astronomy. Thanks specifically she was talking. Yeah, I believe they finally have enough infrastructure, but I know very early on it was one of those where they did ask people to contribute their computers when they weren’t using them for the very early version.
auditorium@hudsonalpha.org: You can kind of see where the spike portion comes from or the name because it does have a spike like appearance when you’re looking at it at this point. Um, but that’s actually the point that would fit into in some cases the ACE, I believe it’s ACE2 receptor on certain red blood cells or certain cells, which is why you end up seeing a lot of cardiovascular and pulmonary problems in some of those, especially early on, is because we found that this was oh, it’s hitting ACE2 receptors. We see all of these cardiovascular problems. And it’s just because for corona viruses not just the covid that we all know it’s a whole family of viruses or whole janeera I don’t know whole group of viruses they all have this spike protein it’s just so happens that through enough iterations through enough exposure you get a really good version of it and it’s really good at what it does unfortunately but that’s we were able to get that with pretty high confidence that that’s what the overall structure looks like. What I’m noticing on both those proteins you’ve shown is that the furthest point with the longest torque has the lowest confidence.
auditorium@hudsonalpha.org: It’s almost Yeah. And that gets into and that’s something that can be covered if we do this again. There’s a lot here. Um when you’re looking at actually I’ll back up here. Let’s just zoom way the heck in on this guy. Those tails sometimes will go into a membrane or sometimes be an anchor point somewhere or it may have a very loose sort of shape that allows it to act like a hinge when it’s sitting against something else. So it may not necessarily have a defined structure if there’s not the molecule or the other protein that it’s supposed to be attached to. But and this this gets into the like the nitty-gritty of some of this. We do know that helical structures are very very stable when they’re trying to sit into a cell membrane. And that’s something just because of the nature of the chemistry outside, the nature of the chemistry inside that that can actually puncture a cell wall and sit in that membrane very very well. Um, actually just for fun nerdy bits there, actually got a cell wall anchor sort of tattooed on the on the wrist there.
auditorium@hudsonalpha.org: I didn’t want a nautical anchor. It seemed too tacky, but this one’s just nervous enough. But that also goes back to one of the most represented receptors is a GPCR. Um, there are these big there’s several helical pieces. Um, and you can actually find a model on protein database. Let me see if they still have their they have a really cool paper version of it actually. Let me see. It should be under there. Learn actually curricula educational resources. Yeah, there we go. Um, you can actually print out this beautiful little paper model, but when the I tried doing this, you really need good scotch tape and good things to like flatten edges. But these tails that are here, you may get low confidence when you’re trying to predict what that tail is exactly doing in 3D space. But if this structure is known and you know that these guys will sit against one another, then you’re good. But if there’s another molecule that needs to dovetail into where that tail sits or if there’s another piece that’s sitting there, that’s just a giant question for the model because our fold can do some of the lians and some of that stuff together.
auditorium@hudsonalpha.org: But when I’m trying to do big complexes like that flag motor, no dice, not yet anyways. So you can it’s kind of wobbly on the edges. It’s like losing a context window on an LLM. It’s like I guess it’s a different kind of question. Um I know like on um so like LLMs will do um you got a concept of alignment where we can teach this thing do this but don’t do that you know don’t answer these kinds of questions. These are bad, you know, because people have used them to do things. You know, you’re not supposed to be able to get chat GPT to tell you how to make a bomb, blah blah blah. If you were to role play like you’re getting out of jail. Yes. Are there potential misuses of Alpha Folds 3? um especially for designing or you know I mean you’d have to be way down in the weeds of you know yeah you can design but being effective and being able to build the thing you know unfortunately yes um but likewise there were good hopeful stories and intentionality for people that actually know there was a gentleman and he did not use alpha fold three but it’s kind of the similar parallel the big story was guy uses chat GBT to fix his dog’s cancer.
auditorium@hudsonalpha.org: That really isn’t what happened is he asked Chat GBT to walk him through the process of designing a custom drug for the cancer and it walked him through but he paid the bill for the genetic sequencing of the tumor and for the healthy tissue and he effectively designed a drug to target the receptor. But that if like everything that’s tied to that but that designing to fit a receptor you could potentially fact check part of it not necessarily receptor and ligan not very well you could do that but there may also be another version like this is the free happy fun version for people that are cheap there may be a version I haven’t checked that alpha fold 3 will work with like a pharmaceutical company to do this process I don’t even know what that price tag is that was my concern. That’s why whenever I started saying, you know, you’d have to unbound it and connect it to the environment. If I have a bad RL run and I fail a math test and it creates something bad, it’s like, oh well.
auditorium@hudsonalpha.org: But I mean, there’s a lot of big oopses that can happen if you just have a wet lab somewhere with this thing making stuff and one of those things maybe we don’t want that made, you know. Yes. And that’s that’s a very quick for those that are familiar that’s a very quick run from a BSL2 lab to a BSL4 lab. Yeah. Some of those BSL4 labs are classified and no one knows where those are. Um that’s where big scary stuff. But yeah, you you could whether or not you would be all that great or whether or not it’s something that you could take and run to production or have the lab facilities, the ability to do it. Right now, they wouldn’t have confidence level high enough to know whether it was bad or good or, you know, Yeah. unless they unless they modeled the things that they didn’t want people to be able to replicate. Yeah. And I I don’t know what kind of upper bounds they have for that. I I get very nervous for trying it.
auditorium@hudsonalpha.org: for trying it. Get very nervous. Um Yeah. Where’s the biological red team from a cyber perspective on, you know, just going and trying to I didn’t know I was trying. But I don’t think they’re fun right now. Glad I wasn’t the only one thinking it. there’s a there’s there’s an element that it could happen, but then at the same time, it’s this really gets into the weeds of I have to know what I’m targeting. I have to have the right infrastructure. I have to have all of these things. And even then, if I get something that’s pretty close and could sort of fit into that receptor, sort of hit that, that might just be good enough. Yeah. Unfortunately, right? But it’s the kind of thing that not your normal not nation states wouldn’t have the capacity to okay I found this let me go build this out or whatever versus you know stuff you would find on you know like an LLM or whatever that it’s very it’s very hard to really uh align diffusion models you really the way you align diffusion models is that you don’t include stuff in it training data like for real that’s the thing because it’s a very hard thing to align.
auditorium@hudsonalpha.org: Okay. Yeah. And it it also comes down to the ethics of the person and you get into like a different sort of philosophical oral debate of if I’m designing something and I’m researching and this kind of goes it’s the same logic with an LLM. If I’m looking at viral structures because I at the other end of the machine am trying to understand why a virus or a bacterium is so effective or why this is a problem. I’m doing it for this reason. If you block me because I’m looking at a viral protein, where is the boundary? And that that becomes a whole other thing. And that’s kind of like I mean you know asking an LLM explicitly to roleplay about making a bomb is a very different thing than like hey I’m trying to solve world hunger because I mean that gets very specific because you’re defining the language that you’re using right but yeah like with this it’s very much an open-ended question of I could do that but do I have the infrastructure the ability to test?
auditorium@hudsonalpha.org: Do I have animal models that I could inflict this on? Would the animal models do the same thing as a human model? Right. And that’s that’s heck of a lot of money, right? So the gap is wide enough that the risk is low probably for an AI related timeline. So a lot of these technologies and even with certain things that I’m seeing uh in terms of autonomous whatevers, you can like add a noun on the end. Um, I always have this feeling of the metaphor of Icarus, like at some point we’re going to fly too high and figure out the hard way. Whereas younger people would tend to say, “Fo.” Yeah. Yeah, that’s that’s the general high level we’ve already run over, but I’m really glad you all had the questions. No, this was great. Uh, seriously appreciate the, you know, the talk. um we don’t cover a lot of genomic stuff even though we’re sitting in Hudson Alpha every other week. Um so this was pretty cool to to tie the two together.
auditorium@hudsonalpha.org: And this is I will give them this out of all this. There’s a lot of negative that was coming out that when we did our sort of like grad school level work, it was I used this random tool that had a free license. So, I used this web app that sort of worked. This platform had something. This one had something. And I had to find all of these different pieces and put them together. With this, if I just put it put in a protein sequence, I’ve got the model. I’ve got that. I know it’s already tried to do genomic matching to a degree. It’s kind of nebulous, but it’s already done a lot of that in one platform, in one space that I would have had to even five years ago. Oh god, it’s been six years now. Six years now. Have to run to like all of these different platforms, hope, pray, and then realize that people were still writing things in code that makes it look out of 1994. So there’s there’s really good benefits there.
auditorium@hudsonalpha.org: Yeah, biology does a lot of cool stuff, but a lot of our tools are need. Any other questions before we close out? This was great. Thank you. Appreciate it. All right. Thanks, Jay. We’re gonna close out the meeting. Jay, you did give me an idea. If you want a more hallucinated version, mourn can’t take the molecule of the month into a video for you. Can you mix that up in her? Oh, yeah. Already finished. Cool. Oh, that’s pretty confident. Yeah, I’m pretty confident. Hold on. Let me grab the sequence. Did I already lose it? No, I don’t. There we go. Structure summary. So, okay. Where did you go? There you are. If I could find the orientation that you’re supposed to be in. Oh, that’s a good question. After you bottle this, how can you tell it’s bad or good or what?
auditorium@hudsonalpha.org: It’s Yeah, I mean that’s that’s what you’re kind of hoping is just like to put in it was like spike. How did you know that was a bad spike? Because I know what I was studying. The matrix pop up all the letters and I’m like you’re like well it doesn’t have a U. So this actually So it’s that obvious to you. Yes. There we go. I don’t know why they felt the need to do this, but the Wcom Institute printed out as of I think it was like the 2003 version. That is the entire human genome per chromosome at point or at four size one. Yeah, they’ve got that bookshelf on the fourth floor, too. Nice. Yeah. I I’ve always thought about how cool that would be, but then the reality of size four font is disgusting. I’m sure it’s funny shades that it was correct. So fun thing about that the human genome the first one that we came out with and this this is another bias and another problem and another resource thing.
auditorium@hudsonalpha.org: The first human the first model of the human genome was basically what we agreed upon from the sequencing. And this was a very gnarly gnarly brutal technique of like short little like 175 length sequences just laid over top and sort of assembled together into what we think is the same for right. I think it was like a dozen people. He slapped all their data together and goes that’s the and we were first like a dozen people but it’s mostly one person. It wasn’t supposed to be but that’s how it ended up being like yeah I think 80% from one donor. The fun little factoid and I don’t know if he has this listed anywhere. It’s not in his bios. Hilariously enough it’s an unspoken thing. Mark Davis at at I think he’s at Stanford. He’s at some university out in California. He routinely just is like here’s a sample, draw some blood. They do that. Like the guy is the most sequenced human on the planet. It’s just because he’s old.
auditorium@hudsonalpha.org: He has access to a lab. But it it’s so insane is because you’ll see that stuff out there and his we have no idea what’s his or what his isn’t his. But it there’s a pretty high chance for any research that comes out of his lab that if there’s a genetic sample randomly, it probably came from him. Like they they unless they had a patient to focus on it probably came from actually. I think he’s at Stanford. Yeah, Stanford immunology. It’s just the most random thing ever. And I only learned that because he was he was friends with my former boss and they were talking about that. It was like some type of abstract conversation over dinner. It’s like I just get sequenced all the time. Whatever. Casually brag, but like it’s probably 20 years when we’re generating stuff and it just happens to look like, you know, the lamb the lamb double. That’s interesting. This is actually where we got one. Let’s see if they still got it.
auditorium@hudsonalpha.org: If you ever want a fun one, just poke around with free data that’s out there. These people have never learned how to redact personal information. So, you are going to find names, you’re going to find birthdays, you’re going to find all kinds of things, which is kind of horrifying. Um, but there’s their genetic data. if you ever want something. Some of them have filled out stuff about family history. Some of them are just like, I don’t care. Another fun resource. I was in here, but he he was in here somewhere and we actually pulled out his actually. I don’t know why they have this other than just here we go. There’s a good premise for a murder mystery is using alpha to develop a She personalized drug attack on this list. If you can do that, you can make so much money. You just solved all the diseases. So, she uploaded her ancestry. We’ve got somebody who did 23. Uh, okay. We’re just going to put in your glasses and contacts records for some reason. Apparently, he’s just that might just be what it is. There’s all kinds of fun like I need genome data. Somebody’s probably flung it out sometime. That’s pretty. Yeah, that that’s just kind of fun stuff. It’s a whole new level of impersonation. Oh, that’s crucial. There’s certain taco time. Thanks for letting me in. Also, you’re welcome. in my phone number. But I didn’t one night and I got home and I’m like, “Oh, shoot. My
Transcription ended after 01:13:49
This editable transcript was computer generated and might contain errors. People can also change the text after it was created.

