AI in Healthcare

AI in Healthcare

Transcription provided by Huntsville AI Transcribe

So welcome to the left side of the room because we all decided to go this direction. This week we’ve got Jacqueline talking to us about AI and healthcare.

There’s a lot about that and it’s been helpful over the last about a year and a half. I bounced a lot of things off of her going to this doctoral intensive thing for the College of Nursing earlier this year. It was all, hey, is this a thing? And it’s like, yes, that’s a thing. And then I think you put together actually a document and shipped it my way from, yeah, here’s some things going on right now to talk about. So it was super, super helpful. Always fun when, you know, AI just don’t know the domain.

It’s always a scary place. It’s a very, very broad domain. Yeah, they knew a little bit. They knew some things. Yeah. So Jacqueline, Daydoy.

Hi, everyone.

So I titled this AI and Health Care Challenges and Opportunities, but it’s going to be a small subset, mostly based on either what I have experience with or what I see in my general network, people I know, stuff I’m interested in. There’s a lot more than just this breakfast going in. I’m putting my about me up here even though I know some of you all know this because it’s really instrumental in understanding the rest of the talk and why I chose what I chose to include. My background is in statistics and though we love unbiased estimators, I’m a person and I’m not unbiased. So you’re going to get longitudinal analytics for healthcare marketing data is what I work on every day. Patient education analytics because that’s what I used to do. longitudinal predictive modeling for Alzheimer’s disease. You’re going to hear more about aging probably than any other area of medical research because I spent like eight years of my life doing that. And then before that, my bachelor’s degree is in agricultural engineering. And yeah, long story. I’m allergic to pollen. I need an inside job. I like statistics. I went that route.

But it informs a lot of like how I approach problem solving and in general access and what I care about. So while I’m not going to reference agricultural or environmental health directly in this, know that it plays a massive important part in healthcare research. Okay, so overall, I’m going to start off by talking about unique challenges in healthcare data and a little bit of background. And then I’m going to look at these different application areas. So biomedical research and AI, AI for clinicians, patient education, which is much more about AI for the patient, how it can help. And then AI and healthcare marketing, because this is what I spend all of my time thinking about now at work. Okay. Starting off.

What is a hospital?

An extra credit is who gets to define what a hospital is? Anybody want to take a guess? I guess the government defines what a hospital is. Correct.

It’s one very specific branch of the government. And I looked for a specific definition of a hospital and it’s like really, really long and I didn’t want to put it. So the answer is Center, Support Medicare and Medicaid Services CMS. They control about a quarter of the entire U.S. budget and most of it is they are given money and it goes right back out to pay for our healthcare. The real definition of a hospital in practical use is anybody that takes Medicare. So any help saying that takes Medicare meets the requirements for a hospital. I’ll also practice this by not my area of expertise. A lot of this is based on a talk I attended in 2023 about the federal government. So, quick question. I helped run a transportation company before and we got to where we can take Medicare for medical transports. Can I call the owner of that company before and be like, hey, technically you’re a hospital? Was it direct from Medicare or were you reimbursed through someone else? No, we were direct. Interesting.

My guess is it comes from a secondary budget, but yeah, not an expert. Yeah, so I’ll still call him and say, hey, you’re in a hospital for a while.

I feel like somebody, does anybody have an open mic?

If you could all online, please. I don’t know if there’s a way to tell.

I don’t know.

I’ll say this use of this meme here I stole directly from a talk from another statistician about healthcare data because it cracked me up both when I saw it live and when I reviewed all this information again because It’s calling up the best quality of the picture, but I can’t do it from here. Let me try here. Bill, if you could hear me, if you wouldn’t mind muting, please. It’s probably a browser with two phones going on.

Yeah, I can find the right.

Yes, technical difficulties.

There we go.

Okay, now let’s… We have figured it out. Okay, we’re back. So, trying to entangle the web of healthcare data is a little bit of a nightmare.

It is all connected but incredibly siloed both at the federal government level, in individual health systems, within research institutes, and just about anywhere that’s using this data, has it partitioned in a way that it’s really hard to access for a lot of reasons. Some of it being they don’t have the resources to devote to build a centralized system, some of it for privacy purposes.

The number one most important thing to remember in healthcare is that your data collection cannot come at the expense of the patient experience or clinical burden. So what this meant when I was working on Alzheimer’s disease research is that we were doing a matched control study. So we had basically for every person in your study that it’s a long story, but there’s three different levels.

of developing Alzheimer’s disease generally.

And they would find someone who was in the intermediary stage. They would find a match control on every demographic that they could for someone who was deemed normal cognition. And as part of that study, they were doing MRIs. They were doing a insane amount of questionnaires, cognitive testing in like a doctor’s office. And then they were also spinal tapping people ages 60 to 90. About half the people opted in at the start of the study. This is a study about 330 people. After that first one, most of them said, I don’t ever want to do that again.

But that is where you find the best measure of the biomarkers that they think are associated with decreasing cognition due to Alzheimer’s disease and dementia, which is why this final tap was in there. However, if you want to opt out, they can’t stop you.

They cannot force you to go through that. which makes doing longitudinal data analysis really, really difficult because you don’t know at what point you’re going to stop getting data off people. Privacy is paramount. Healthcare innovation, unless you’re working for a startup, moves incredibly slowly. And when a startup goes to implement their product, it takes a long time.

It takes us 18 months to sell a contract to a new hospital system.

It takes three years start to finish from the start of the initial approach for sale to full implementation on average and that is for patient education being directly implemented into like a health care system so that I can be delivered directly to patient records. It takes a long time. Hospitals take forever to sign contracts and then to implement into their old school systems or rather what a lot of people would call old school systems for them. It’s really new to them usually. It takes a long time. I’m going to talk about this later but 75% of hospital market share is for electronic health records is owned by Epic. Huntsville Hospital uses Starner, so don’t know much about them other than their owned by Oracle. That was a relatively new thing that happened in the first couple of years I started working in industry. So there’s a lot about the market also that informs kind of how we access our data and who gets to build within. Privacy, like I said, important. HIPAA is a thing. There’s something like I think from three basic data metrics, 95% of people are reidentifiable.

We can’t let that happen.

We’re actually not allowed to reidentify claims data for that reason.

And within claims data, private insurance and self-pay is separate from Medicare and Medicaid. There’s also a lot of standardization things with Medicaid that I have on another slide. And context matters, different hospital systems and different government agencies designed for their needs and use cases, and they’re not necessarily interoperable with other agencies. So my picture of challenges. So HIPAA, we have to protect patient privacy, both for the good of the patient, but also for the good of the entity you’re working for because you want to stay employed. HIPAA breaches are incredibly extensive for fines. Lack of data standardization, that’s a picture of standardization to the normal curve. So Medicare is the national level, Medicaid is the state level. They do have to report, Medicaid does have to report their data to the federal government. However, different states can collect different things. That’s one of the untapped opportunities.

No one has been able to solve yet, but according to different consultants and different people working in the field, if you can figure out how to make sense of Medicaid data at the national level, you will be rich for the rest of your life because it is such a rich data source, but no one knows what to do with it. And like I said before, Epic dominates the EHR market, but it’s a product, it’s a software product.

Anyone can implement it. However, they want to, people develop things for Epic, additional apps that go in. They customize their workflows to suit their needs for their processes, which means who knows whether your data will be entered a certain way.

And also patient notes or notes on the record are considered to be where like 95% of the good information on patients is living.

But there is no standard to note taking.

And so Some places physicians, some places it’s scribes, some places it’s nurses, scribes are people who go and they sit in with the doctor and they take notes for the doctor and usually it’s people shadowing because they want to go to medical school. I have friends who’ve done this for a couple years and they go to applied medical school. So a lot of people are taking notes in shorthand and a lot of people are just taking random notes on the things that they think are important or what they were taught was important. and that varies not only within hospital systems, but within like departments. So rheumatology might take notes differently than emergency medicine, than OB-GYN, and even to the person.

So there’s a lot of like, if you’re someone who does any kind of statistical modeling or data science hierarchical modeling, there’s a lot of different levels you can go to. Okay. EHR is electronic health records.

Okay.

Yeah.

So basically, if anyone has a patient portal that they’ve gone to, the other thing about data standardization, and I can’t remember where I put it, you have a PCP primary care physician. A lot of times those are small practices.

They cannot afford Epic or Center. They are very expensive. They’ve got some other small, like they’ve got a product that’s geared for smaller offices, not necessarily going to send information to Epic. There are people trying to solve that with standards. There is no agreed upon standard currently. Epic uses something called FHIR for their like data standardization for exports so that people can take data out and do things with it.

But you don’t have to opt into using it.

Not everybody can build to connect with it. There’s a lot of different things.

And FHIR, the acronym is F-H-I-R. I don’t remember on the top of my head what it stands for. Yeah. That wouldn’t have found that. Yeah, I was like, I saw you through Google. I was like, if you’re Googling it, it’s F-H-I-R. because it took me about a year to figure it out. I started working. And I’m assuming this isn’t the Epic that does the embryo legend, the Epic marketplace, the company Epic. I’m assuming it’s a different company, Epic.

I mean, there probably is another company out there named Epic, but… The one that I’m talking about is headquartered in Madison, Wisconsin and has a Disney 5 campus that I have not yet gotten to go to and I just want to see what it looks like. But they are a very interesting company. The CU basically decided she wanted to do this and build this company and it’s Epic’s way or the highway was always their position and now because of AI they’re kind of opening up. to trying to be a little bit more user-centric. I saw something come in on the chat.

Oh, this is really interesting.

Jack asked a question of if there are financial incentives to share your data. Actually, so I went to grad school at Vanderbilt. I’ve been a patient at Vanderbilt before, so I lived there for a very long time.

And you can opt into their synthetic derivative, which takes your patient data. It basically de-identifies it, and you go into a giant pool.

There is no financial incentive for me to do it.

It’s just for the good of the research. However, people are starting to, and I have a link in my Google Doc about this, and I didn’t put it in here, but there is… A new initiative I saw through a friend that they’re trying to figure out how to give power back to the patients over their own data and they can get money for opting in or sharing their data into certain studies. But it’s kind of a newer idea. And there’s a lot of reasons why in there may not work. But it’s a newer idea as far as something I’ve seen. We talked a little bit about collecting data.

The real thing that I found with product analytics built for clinicians is you cannot interrupt their workflow at all. If you add even a second to their workflow, they will not do it.

They will not use it. So you’ve got to find a way to collect data, but keep the product working at the same pace that it was. And depending on your design and depending on if you’re directly implemented into Epic, Additional data collection can really, really slow down the actual tool that you’re trying to build.

And so that balance is really, that’s what makes collecting this stuff really hard.

And a lot of times you get either anecdotal stuff or you get a research study done within a hospital system versus a company trying to do research on this.

And then scale and replicability.

In general, science and replicability replicability like when you’re doing peer reviewed research can be an issue. Just because study designs are so different, data access is different, you read a paper and you think that’s how they collected the data and then it turns out there’s something that you either didn’t interpret as the same way or there’s a step that was forgotten.

There’s a lot of reasons why a study can’t be replicated. But development of statistical methods at scale free HR data, which is observable, lots of missing data, lack of standardization. There’s a lot of things that go into it, but also it’s really, really big. It’s billions and billions of rows within a single system.

It’s kind of a newer area for us as statisticians.

I’d say there’s not really gold standard methods, but it’s a huge area of research right now.

So we’re kind of all just doing the best we can right now with it and hoping that we can figure out what to do with it. Most causal studies are done on smaller specific populations, and then you have to back your way into generalizability.

So for Alzheimer’s disease, you tend to get older, rich, educated white people, and then you have to back your way into the rest of the population. You hope that there’s someone that’s not in that demographic to help with your weights to back into generalizability. There’s a lot of reasons for that.

I’ve got a slide in a minute about why we have trouble sometimes recruiting. But in general, study designs and replicability are an issue, but there are being actively addressed. So the only data you can get are from the people with access to health care? A lot of time, yes. There are studies that have been done. When you have real community designed studies, and I still haven’t looked at the citation you do. So I did my summer institute of biostatistics. program, which is what convinced me to go to grad school for statistics in St. Louis. And they were running a study when I was there on birth control in a community where it was highly people of color who weren’t graduating from high school. And they ran community drives with nurses who were from the same demographics as the people who lived there. And they recruited a massive amount of people. To come and get birth control and they had they could get any type they wanted they could get an IUD they could get the plan of arm they could get prescriptions for pills and it was all free and It was from girls like age 12. I think up through like men boss so women of all ages and children and They saw high school graduation rates rise. They saw college going individuals they saw that number rise they saw crime go down they saw all these like crazy things and so it’s a great example for when you are able to get into those areas where there isn’t access and you really think about like what does this community need you can see an impact but not everyone has the funds or the ability to do that so a lot of it is you are kind of i’m not going to say stuck but you kind of get what you get sometimes when you recruit And that’s what makes clinical trials really difficult is recruiting participants. Okay. Another question.

Who collects healthcare data on Americans?

Insurance companies would be one.

I didn’t even put them in here. I forgot about them because they’re so problematic. You should probably throw that one. Yeah, thank you. That’s a very interesting route to go down. And one I know less about.

The answer is it depends, again.

Because all of these people collect data on us and also insurance companies, which does a big miss off at them.

Federal agencies, these are just some of the federal agencies that are collecting healthcare data, the CDC, CMS, which is Medicare. ARHQ has healthier quality, FDA, EPA, Department of Agriculture. There’s also some smaller ones that tend to link up with a lot of these, NIH also.

HHS.

Yeah, HHS is like the over umbrella for CMS and some of these others.

You’re hospitaled. Health care systems are collecting data. This was to the point the question we got earlier, some teaching and research hospitals are the ones that I know that allow you to opt in your medical records into their research. Epic Cosmos, so Epic is now trying to build their own AI foundational models on all the stuff that comes in and they’re trying to partner with universities and other institutes to do things with it.

How that works for informed consent on a patient, I do not know the answer to that, it’s very new. pharmaceutical companies running clinical trials, and then yeah, I have the teaching research hospitals, but research and academic centers like UAB, Vanderbilt, Emory, those are really big ones here in the southeast. Private foundations. So this one, I will try not to rant about this, but there are a lot of conditions that are not actively being tracked fully by the federal government. Abortion is a big one. States can report it to the CDC at like a population level, but it’s not a requirement and not all states do. So when you want to get a picture of what happens to policy post Roe v. Wade, who answers that?

A private foundation that has to fundraise does.

A broader maternal mortality. So when you think about maternal mortality, the US has a really high rate of that, especially compared to other developed nations. Especially when you break down across like racial class lines, then there can be really big changes depending on what group you’re in. But there are broader public health implications that we’re not necessarily collecting.

So like domestic violence is a public health issue amongst other things. And a lot of women during pregnancy or after early on having a child die due to domestic violence and it is directly related to them becoming a mother. but that’s not counted in maternal mortality deaths. So there’s a lot of this type of stuff that’s more, it’s not medical necessarily, but it’s kind of what you might call a gray area or just broader impact that when you try to go and quantify it, it’s really, really hard because no one’s, we’re not necessarily collecting this data in a place where researchers can access it. Would this be the equivalent of the social history when it comes to medical? systems, meaning like social determinants of health type stuff.

Yeah, there’s a lot of that on another slide somewhere. Yeah, there’s broader things like there’s a big debate right now amongst whether we should be collecting race as a data point because it often serves as a proxy for things like income, education access, socioeconomic status. There’s a lot of things that are correlated with that variable and there’s not necessarily a genetic reason to collect up. They’re finding that more and more now that you know human genome has been categorized that things that we thought might show up in genetics aren’t in terms of like defining like race, ethnicity. They tend to be more social, socially determined and so there is a lot of that as well. And so in general, things like also rare diseases or conditions where the societal and public health play into it are important concerns, but they’re not necessarily well understood, like some of the ones we just discussed. Rare diseases, I didn’t put it on very explicitly, but there’s a lot of diseases that they call them orphaned diseases where you can’t really get research funding for them because either it’s not well understood, it’s not. It’s hard to quantify or their cases are so rare that they’re not exactly sure what it is. And so there are initiatives out there to try and provide funding to that, but it is probably, it’s going to be harder for that to come directly from something like the NIH, where the broader your impact, the more you can quantify, the more people you can help that tends to help you on grant applications.

Yeah, I think that’s one of the big things about the McMillan Center here.

If you guys are familiar with that.

They do genetic testing for diseases that are unable to be diagnosed elsewhere. Also, I was wondering what you say about the health data. Do you also include, like, carburetory block oxygen level? Let’s say we’ll have some smartwatch. They also collect data.

Yeah, so I didn’t put the device stuff on here. So it’s weird because, like, I think they call it medical device. Statistics calls that lifetime data analysis, which is not.

how I would define lifetime data analysis if I were to hear the term at all.

But there are a lot of things going on with that where people are trying to figure out how to use that data, helpfully.

I’ve definitely seen grants up for it. There’s a couple of conferences on it. I don’t know a ton about it other than it’s really hard because it tends to be cyclical depending on what’s going on in the day and by certain hours of the day. I’ve done 24-hour blood pressure studies before and that’s really… there’s a lot of reasons why that’s difficult and so taking that and then going to like biomarker data being get from a wearable it’s collecting data so often and all of that data is so instantaneously correlated that a lot of traditional methods won’t work. To this point you know that this also includes other wearables such as say the glucose sensors that go back to avid labs you know they get old they have that data too. So, in fact, when you put it all on the app, it’s going to ask you, you might use your data to do that. So, in my opinion, it’s out of control.

No, I’m being serious now. Like, for instance, I looked at the list of referral, right? How about the DOD?

They do.

I didn’t put them on here. No, I’m just saying, that’s not a big piece, you know. NASA does studies on people, too. Lots of different agencies have some kind of stake. and help their data were funded, POV on 100% funds, help their data for healthcare research. Yeah, a question too, because I do some social media work as well. So I’ve heard before that a lot of times they scrape data off of social media as well, then for the healthcare, like, you know, for diagnosing how many people might get sick, like for cancer or kidney ailments or whatever, like kind of projecting outward. And then based on that, you know, raising insurance premiums. I’ve heard that, but… This is why I didn’t put insurance on here, so I don’t know enough, but I’ve heard enough of people who work there that it’s like, oh no, oh no. But I don’t have enough confirmed information to feel comfortable putting it on here. But I actually, it’s gonna, I’m gonna talk about it later, but I’m a research mentor for this program called EMA-Ed. that does the helps train people who want to either who are in clinical and want to pivot to AI or they’re in AI and they want to pivot into the healthcare industry. And one of my mentees was a dentist who worked for an insurance company. And she was like, insurance is the only place where you can have this level of model performed poorly and they don’t care. They can get away with that. Like the onus is on you at the end person as a patient to go and buy the system. And they know that a lot of people are not going to do that. And there’s not really good solution for it, unfortunately. We did have a question come in about the racial ethnic bias. If we don’t collect the data, that’s probably my, when I read all of this stuff, I think it’s really, it’s a really well intended idea to collect all of the things that make up what we would want to collect instead of race, but we’re just not equipped for that most places. And historically, we collect it. We’ve collected it for so long that when you try to compare going backwards, it’s just so ingrained in the medical system that I don’t know how we go about. And the question was specifically, how do we ensure it’s used for its intended purpose? And how do we gain trust? And I don’t know how you can gain trust in those communities if you can’t report on things that directly affect them.

And so that’s why it’s a debate.

And I don’t think there’s been any official policy set down. by any of the like by American Medical Association or by statisticians or anything like that and I think it’s great to discuss especially with people in those communities and I think we’re a while away from figuring out what to do about this issue. So thank you for the question. Okay, final thing on this slide. Data aggregators.

I use these a lot at work to get data on people.

Definitive healthcare, which they collect a lot of information about hospital systems, things like beds, specialties, people serve typical reimbursements. They also get all claims data and they kind of separate it out by hospitals. Epsilon and throttle are all about identifying things about individuals, not necessarily healthcare related, but in general, how do you respond to marketing or things about your buying personas? And we use them to help figure out how to market to people at work at an aggregate level. We’re not going to be able to go in and look up, Jay’s throttle ID is this.

Therefore, we are going to send him this exact ad for the disease we think he might want to be on the lookout for. We can’t do that. What we can do is build predictive models and see people with these characteristics are probably more likely to develop certain diseases or be more interested in this. Therefore, we’re going to start placing content on websites they might visit.

or placing ads for this hospital system so that it’s kind of there, so they start to pick up brand awareness. And then later, we’ll hit it with an email or a direct mail or some other really direct marketing channel directly to you.

I’ve only been in this for about a year and a half, and it’s been a learning experience. So, okay.

And then before we move on, medical photos. We’re still in the data, I’m sorry. This is like, The medical codes, like trying to figure out if you have, if you have observational data, which EHR records are, and you want to do research on it, and that’s where a lot of our research is coming from. How do you figure out who has what?

Codes.

You have to use the codes.

There’s not going to be some nice little thing that’s like diagnosis. There’s not a field that just says like diagnosis, like lupus.

That was a study I worked on.

We had to pay a doctor. an MD doctor practicing MD for at least 20 hours of work to comb all of the ICD-10 and nine codes because we have both to give us what codes are most likely to present in what order for lupus nephritis which is a special way that lupus presents so that we could do our blood pressure variability study and we could request the data from the synthetic derivative for those codes for the time periods for all the other inclusion criteria we have. It’s a lot. It’s like thousands and thousands of medical codes for diagnosis and then there’s thousands and thousands of procedure codes and not every hospital uses the same code systems. That would be too easy. There’s a lot of different ones and just as a sample someone from work told us for me. So one of these is related to One of those is about getting ulcers on your feet due to diabetes. That’s the top one and all the different codes related to that.

So it’s not just diabetes.

It’s very specifically related to foot injuries due to diabetes, which are common if you’ve had diabetes for a long time because blood pressure circulation tends to decrease over time down to extremities. The one underneath that is for burns and you see initial encounter, subsequent encounter. And you will have other data that you can kind of figure out like visit one, visit two, time in between things like that, but there’s still separate codes for all of this.

And then the other one is for like fractures.

Other very specific type, they’re stress fractures. So this is just for those like very, very specific conditions. So if you’re thinking, I’m going to do a study on diabetes, you’re not just looking at like a group of five or six codes, you’re looking at a ton of codes, and then you have to weed out, well, what am I looking for?

What codes do I need to exclude?

What do I need to include?

To pull the records I need to. Okay, so all of this, and then some more not so fun facts, and then we will talk about applications.

However, I felt that in order to build AI applications, you do have to have a strong understanding of the data issues you face, which is why I spent so much time on this program. Not so fun facts.

Women in clinical trials, women in medical research in general. Women couldn’t really participate in it until the early 1990s. There’s a ton of conditions that we assume that men and women present the same and we don’t actually have evidence to back that up. Heart attack symptoms is a good one because we know now that women and men experience them very, very differently. And when you see symptoms posted in hospitals or advertised, it is men’s symptoms you see. Women die of heart attack. it’s like I think they’re number one or number two killer because they don’t recognize symptoms because they are so different from that I can tell you right now I don’t remember the symptoms off the top of my head and I know that this exists but I know like pain in the left arm and a couple other things combined man might be having a heart attack we should probably do something about that okay uh historical ethical issues so when you’re trying to build trust With your patients in the community that you want to work with, there are a lot of historical harms that have been done to a lot of communities, but especially if you’re not a white man. One of them that’s really relevant to Alabama is the Tuskegee syphilis study, which is where they knew men had syphilis. They knew there was a treatment and they continued to allow those men to not be treated because they wanted to see what would happen. in the later stages of the disease and documented and followed it. And that just ended in 1972 when they finally decided to stop it and that those men should be treated. This is not like ancient history. This is really recent. Other things, how people in the HIV and AIDS epidemic were treated, was replicated with the opioid crisis. There’s actually a ton of similarities, not necessarily in the disease itself, but in the policy and the reaction there. and how those communities that were affected by it responded. And so how do you recruit people to come in and share this vulnerable information with you when they know someone that was directly affected by all of these harms? And that’s one of the many difficulties we have. Okay, so this is where I put social determinants of health.

Racism and medicine exist both for doctors. and for patients. I have this book.

I have not read it yet, but I follow her.

I’ve heard her speak. This is Uche Blackstock. She is a black doctor. Her sister is also practicing MD. And they talk a lot about their experience going through medical school and how they’re treated by patients. They also see directly how patients of different backgrounds are treated. Women are more likely to come in and be considered drug seeking and have that put into their patient notes when they’re in pain. There’s a lot of things like that that play into all of this. We know that patient outcomes differ by both the gender and race of the patient. Also, depending on your physician, black women doctors have the best outcomes for most things that we can track it for. Typically, It’s not really that well studied. I would say a lot of it has to do with we think that they recognize issues and they listen to their patients and they are more likely to disregard the stereotypes because they have been stereotyping themselves.

But that’s not proven. And then again, I put the maternal mortality rate here again because it directly plays into this.

And then federal data collection has a ton of limitations as well.

So every agency of the federal government is tasked with specific reporting requirements that determines the type of data they collect on the structure. I learned this in 2023.

I did not know about this, but the biggest example that they gave us when I was at this health policy conference was, why was the CDC so unprepared to do contract tracing during COVID?

Because they are not tasked with collecting individual level data. They are tasked with collecting population level data. Yes, there. They do work with epidemic data and they will be able to contract trace for something like Ebola. But that’s usually because that’s one or two people re-entering the country that have been exposed. Not the entire population trying to contract trace them, both coming into the country and just living their lives. And so they were not tasked with that. They were the ones that ended up at the job. And that was why we had such trouble collecting data during COVID and tracking down kind of infection rates. They’re just not good for that, and that’s not their fault. Okay, so in all of that, because I know that was a lot of not so fun information and a lot of challenges, let’s talk about some of the applications. Again, this is just a subset.

A lot of these are ones that I directly either see, know someone working on, or have experienced within the past.

There’s a ton more, but. Biomedical research. I put Amy Head on here in case anyone is interested in learning more about healthcare or getting access to healthcare data. I will say the computational resources are not great that they give you but that’s because it’s run by like nine institutions and they all have to agree on the system and what they’re building and it’s also built by academics and so you get and they’re constricted to certain budgets. There’s a lot there but they do training, they do webinars, they do different things that are just really interesting. They’re trying to collect interesting data for AI. And so like one of my mentees in the AI clinical care program was looking at AI for imaging. I think it was heart image. I think it was heart. To predict inpatient substance survival when they come into the ER. sepsis is a really really big killer and if you don’t catch it quick a lot of people died from it. So I was of interest to them they were collecting a ton more data than that but they had the sepsis data available in the first year for people to start using. They also have all of us genetic data sets that’s run by Vanderbilt and a couple other institutes and if you work for a research institute and you have interest and you can get an IRB you can get access to this data. and there’s a certain amount of the workbench that’s for free otherwise you have to like set up an agreement but that’s one I think most of their data sets like that there’s like four or five programs have some kind of institutional setup where if you work for research institute you can connect with them and figure out like how to use those resources and they are they have training programs both for AI specialists talking about healthcare or for healthcare and clinicians, healthcare professionals and clinicians to learn more about AI and hopefully build collaborative working relationships. The one that I was a mentor for was in its first year. So there’s a lot of growing pains there, but they did the first year of AI conference that they had at the Space and Rocket Center. They had someone from AIMA come and speak. And so it is something they’re trying to grow. I will come back to some of these questions. that came in the chat.

Aging research, again, I did Alzheimer’s disease research. I know a lot about it. It’s got there’s at least three big national kind of data sets that you can use to look at AI or to look at Alzheimer’s disease.

One of them is really primed for AI because it’s all about imaging.

It’s called ADNI, which I think is the Alzheimer’s disease. something initiative. It’s going to come to me later. Used to from my dissertation. You’ll be driving home, it’ll come to you. National Alzheimer’s Coordinating Center, that is all of the Alzheimer’s disease research centers. There’s like 37 in the country. Vanderbilt just became one actually, right after I graduated. They all contribute data in a standardized way.

to this national center, and then researchers can go and get access. They can pull it down, they can do research. A lot of that is more questionnaire stuff. I believe there’s some imaging stuff. I didn’t do any imaging for my research, so I’m less familiar. But that one’s nice because you shouldn’t be seeing side effects, so that’s one left side with the T outside, like drug side effect. That’s one less kind of thing you have to condition on, because when you have to condition on multiple hierarchies, you lose degrees of freedom, which means you lose predictors that you could put into the model because now which hospital system you came from is a predictor you haven’t account for and so this is supposed to take that away with the standardization.

Some additional stuff that’s happening there MRI imaging is always a big one and there’s a ton of it for Alzheimer’s disease there’s other things too and we’re seeing it also especially in breast cancer but cancer where there’s a lot of imaging work done. voice and speech analysis. This one I found really interesting because it’s both taking transcripts and looking at them and also the sound itself and running it through models.

So people are approaching it from different ways.

And then there’s also the genomics stuff, which I know the least amount about. But I was in graduate school, which was not that long ago. The one thing we knew genetically about Alzheimer’s disease is that if you have this one gene called ApoE4, and you were positive for it, even just one of the two, like basically for the alleles, you can add zero, one or two. Having one increases your risk of developing Alzheimer’s disease by having two. It’s not quite 100% guaranteed, but it is highly, highly likely that if you live to start an age, you’ll develop dementia. And that was kind of what we knew. And so now with more data, all these kind of AI methods that allow for bigger data, we’re starting to kind of dive more into what we should be looking at in the genetics and same for like breast cancer and other cancers and a lot of other diseases where we have a lot of people who get them and a lot of people working on the research for it. Okay clinicians so hospital administration reducing clinician burden is a massive concern of theirs COVID kind of made really evident what people within the system already knew. And it’s that clinicians have over the years been tasked with more and more administrative tasks. So they have less time with patients. And then when we went into COVID and things were happening, it just, the whole, everyone in that system burned out.

And so figuring out how to, with our, so I’ll say, last year, Okay, don’t know how many of you are familiar with physician matching and what happens when you do residency?

Many of you are familiar with sorority rush.

It is sorority rush algorithm for people trying to go from their medical school to their residency. If you’re a doctor, you have to do a residency to be licensed. You still have your MD degree if you don’t do a residency, but you can’t practice this doctor in the US. There are other things you can do with an MD degree, but you can’t directly work with patients. So what happens is, for residency, you kind of set up where the specialty you think you want to go into, the school you think you want to apply to, you go off and interview with them, and then you rank them. And they rank all the people that applied and interviewed. And then somehow through algorithmic magic, and it literally is the same algorithm as sorority match at SEC schools, you somehow land upon everybody hopefully matches a lot of people.

There’s still people that don’t. Then there’s the equivalent of snap bidding where like, If you got an open spot, they’ll find someone that also didn’t match up with residency, and they’re like, actually, we really liked you.

It’s just that this algorithm didn’t work.

They’ll grab them, or you have to wait a year and reapply. It works out better than the previous system, as it turns out, which was everyone applied individually and then had to negotiate everything individually. And a lot of times people would say, well, we need your answer by this date, and they were still waiting to hear back from their programs.

And so this did do a lot for most people in making it easier. probably a better experience, but it still sounds absolutely wild to someone who thankfully didn’t have to go through it, but has watched several friends go through the process. You find out if you’re going to match on like Tuesday, on Friday, you open an envelope that tells you where you’ve matched, and that is that. It’s like the batch and they’re all closed.

It is very much like when I say the sorority rush, it is almost like it is a lot like that.

It is very Yeah, it’s an interesting system.

So with all of that, the past couple of years, we are seeing people in specialties that normally have weightless, not match.

So like OB-GYNs, matric drop this year by a lot.

Lots of unmatched spots.

I think internal medicine also did. Spots that internal medicine is kind of general, many kind of pick up direction. Those tend to sell up. Dermatology is always the most popular. They usually don’t have an issue with matching because they have the best work-life balance and usually reimbursements. Not kidding. I know people that took a gap here in medical school to do extra research, not even related to dermatology, take the application look better. It is a very in-demand one for the actual people working. All that is to say, There’s a lot going on in the healthcare system that we need to keep clinicians if we can. And so we are seeing a lot of interest from vendors specifically. I don’t get to talk directly to doctors as much about the scribing part and the actual use of it, but vendors like Microsoft and others and a lot of startups are banking a lot on AI scribes for doing medical notes. You’ve seen a couple go really, really wrong.

AI generated summaries for patient visits to help the doctor when they come back next time see like at a really quick level what did we decide last time and in general tools to help clinicians be more efficient with their time or be able to spend more time with the patient are of interest. We’re also seeing interest in diagnosis aids so things that help clinicians in their workflows they go through diagnostic processes a lot of diagnoses, you have to kind of rule things out as you go. So that seems anything with a workflow like that seems to be a good opportunity.

I mentioned this before.

Epic is building their own product for research and AI tools. It’s specifically called Cosmos. They are getting a lot of buy-in for major hospital systems. Not sure what they’re going to do. They’ve got a whole laundry list of things they think they want to build on there on their website, which In the PDF at the end I have like some links to some things and Yeah, there’s information on the HR records already talked about on the market share and things like that I think it’s super influential because they own the majority of the market share They’re probably a little bit worried about monopoly issues in the future We’ll see how that goes, but they do have the most pull to be able to kind of convince people that they should opt into this patient record. For lack of a better word, synthetic derivative, which is de-identified. Let’s do research on this, let’s use it.

Then hospital processes, this is the one I know the least about.

I’ve just seen some stuff that’s related to billing and a couple other things, trying to reduce administrative burden on clinicians, but also the staff that work in the hospital. One that I think is really interesting and they’ve pitched it for clinicians, but I think would be really interesting from the patient angle is Stanford introducing chat EHR, which is where you, they’ve pitched it for clinicians to be able to chat directly with a patient’s record and pull all that data in probably in a rag style formation and kind of help like with, pull a lot of things together because doctors tend to be specialists. And so just because you’re going to see someone for this thing, it might impact something else and your doctor may not have access to that. So it does kind of rely on everything being in the singular patient record.

But I listed it under here as potential for this to help patients understand their own medical records as well, because that’s where I think it’s actually the most useful, that is personal opinion. But there’s a lot that we think about in terms of health literacy and literacy being, they’re separate. Health literacy is a separate facet from like just reading level in literacy. And so that’s where I think this has a real chance to be impactful.

And in patient education, so leads us directly to this.

Oh, chatbots.

Chatbots are everywhere.

They’re kind of considered table stakes at this point for big healthcare systems to have one on their website. I’m currently user testing one at work. Thanks to all of the cybersecurity stuff here. I’ve been asking off the wall questions and everyone looks at me like I’m nuts. I’m like, do you know?

Like everyone that shows up here is not just because this is designed for knee injury to help provide knee injury information. Just not necessarily me. That’s what people are going to type in when they get here.

So like we need to see what happens.

It’s holding up pretty well, probably because it’s not just generative AI answering it.

We have to balance providing accurate medical information, routing patients to their next best step, and liability of information. So if incorrect information goes out, it’s liable. OpenAI is shifting that onto the end user, both with legal and with medical.

Don’t really want to wait for that to come down from a higher court in a couple years. So how do we preventively design for that? That’s the question we’re currently facing.

Another one that I’m really interested in seeing what happens is content translation. So health systems in the US have to be able to support education in English and Spanish. But there’s localizations of language. It’s like I grew up in Florida.

There’s certain types of Spanish that are spoken where I grew up in Florida, as opposed to maybe you grew up in Texas. Also, I put on here Haitian Creole in South Florida, where I went to school. Everything came home in English, Spanish, and Creole. because the populations were so large, and so it’s interesting to see how do you support languages that aren’t necessarily at the national level spoken in the largest numbers, but two specific health systems are very, very important. Another one of this is the Among in Minnesota, specifically St. Paul Minneapolis. They were begging us to translate our patient instructions, which is like what we give out. Let’s say you have a surgery. and you need some follow-up care, you have to do some stuff and you go home. Your doctor will usually give you some kind of handout. I worked for the company that wrote those. So they were begging for that in mom because a lot of the parents, especially grandparents, they did not speak English and they desperately needed it. But they are really, really far down on the list of like if we’re only going to top 25 languages, I’m not even sure if they crack it. So then you’re prioritizing one client over another.

How do you balance that? It’s really, really expensive to have a person translate things.

So is there a way to prioritize what goes to a person to translate versus what can we get away with machine translation?

If it’s like 95% of the way there, there’s a lot of questions, but having that in place could help us get better education to people faster. And so that’s something we do have to explore. And then content personalization. So can we adjust our content in real time based upon the demographic information of the patient?

So older women versus older men for heart attack symptom content, instead of putting both on the same page, is there a way that we could grab different pieces and insert them because we know this is a woman getting mass?

Also though, Is she the wife of someone going through a heart attack?

She’s got a caregiver at home.

So there are different roles that we design for as well.

So there’s the patient, there’s what’s called a caregiver.

There used to be patient child.

We’ve brought them that because of things like Alzheimer’s disease dementia and kind of people having children and having to care and take their parents or be there as a support. That term is much more broad. And so it’s one of those like, how do we figure out who they’re caregiving at any given time?

communication preferences, language we talked about and then literacy levels so if you don’t read very well can we give you a picture format of things or can we design content and like we do have this it’s called visual patient instructions but it’s like it’s diagrams and there’s very little text and it’s written at like the fourth grade bubble. to try and help people who cannot necessarily interpret the sixth grade level that we normally write to with no pictures. So how do we get that to come out in real time to people?

That’s the problem has to be solved and AI might be able to help.

Healthcare marketing. So standard medical instructions are sixth grade.

Where I work, where I work, they are not all companies.

Not all companies will write for patients, even though it’s patient education. Some take clinician-focused educational content, and they rework it a little bit, but not necessarily with literacy in mind. And so you’ll see a lot of really high-level medical terminology that you have to Google in some of those, and that’s what they give to patients. So they do technically provide patient education. That is it. understandable to people about a medical background, that’s her miss, depending.

And then this is what I spend all my time now thinking about.

So healthcare marketing, there’s a lot of marketing on its own that AI does. Google has a lot of stuff that they’ve developed. And a lot of marketing for like big retail companies has big enough data to support a lot of really cool AI work. Healthcare is different because even if you have a national brand like Baptist Health Care or HCA or Ascension or there’s there’s several that are national. They will never market nationally They will always market regionally to their direct people because you can’t offer people in national Tennessee The same services necessarily that you would in South Florida where you’re looking at an older demographic and a higher rate of skin cancer than just about anywhere else in the country Maybe Arizona And so you’ve got to personalize, but that immediately shrinks your sample size. So how do you do cool personalization with predictive models when you don’t have billions of people that Nike can market shoes to with their ads? And that is where all the interesting stuff is, and that’s the statistician. Everything works in the infinite.

It’s what happens in small sample sizes that’s interesting.

So for our predictive targeting models, this is trying to figure out who is likely to get, who needs to know about certain diseases or certain services at hospitals.

There’s a lot of rules about what information cannot be used and can be used to target individuals in health care marketing. So if you ever get a diagnosis and like where I work, we do get like patient record information to help figure out like how do we do on the return on investment for marketing. We can never connect that you got a diagnosis for something and then market to you directly about that.

It’s all done through prediction and lookalike audiences.

So if we’re seeing like a bunch of men of a certain age come in and they typically tend to have these three conditions, we can build audiences around that.

But we can’t say these 50 men came in for these three things.

We’re going to send them directly this. There’s a lot of other things too. And there’s a lot of questions about what’s education versus what’s marketing and what’s the line. And that’s where legal. steps in and tells us.

We do have the ability to supplement our models with tangential information.

One that I thought was hilarious when I looked at it was, do you golf?

Yes or no? That’s a predictor for if you need orthopedic surgery.

So there’s things like that about lifestyle that are correlated with disease conditions that I would not have necessarily made that connection immediately.

That we might need to market to them about shoulder strain.

So we do have things like that that we can buy or we can learn from our own data over time.

One that a lot of people are really interested in that I am really hoping is going to fall flat on its face because I don’t think it’s going to work very well personally. We’ll see AI generated marketing creative. So video, display, which is like a banner, just like a static banner out of maybe some of the text, the actual script that might show up in a video, the sound. music, speech, all of that. I haven’t watched it yet.

I heard Coca-Cola just did a fully AI generated ad and they released like how many prompts it took, how many people, how many hours. It was like you might as well just use humans to make the ad because apparently it’s still a little wonky. Have I watched it yet? No, this is all hearsay based on someone else’s work that watched it. But that leads into the fully AI automated marketing pipelines. So it’s I’m not going to say it started with Meta, but Meta said in 2026 they’re doing it. And so not everybody’s trying to do it is being able to create marketing campaigns start to finish using AI. And I put some stuff on there that I’m going to come back to that. But basically automatically creating and updating your campaign tactics, which basically there’s a lot of different ways marketing can come to you. A lot more than I realized. But automating.

We know that someone watches this YouTube channel at this time of day.

So we need to get onto YouTube for these videos or this channel, these ads, because these people are likely to see it. They want to fully automate that decision process. They also want to fully automate the creative that you see. So we know that this person’s going to watch this ad or this video or their live way too. It’s all prediction. Watch this video at this time of day. We know that they respond to this type of marketing, so we’re going to show them a five second video with the QR code. We’re already seeing it, or at least I am.

I watch enough.

I don’t have cable, so I watch mostly streaming.

And a lot of streaming with ads, because I don’t want to pay more. So I get a lot of stuff marketed to me, especially by Amazon. That’s like, add to carton. I’m like, Joe’s on you. This is my sister’s Amazon account. But there’s a lot of them.

They’re going after it, which means everybody else has to start looking into it. It still remains to be seen how customers are going to react to it. But the other things that are on here are specifically media mix modeling, which is figuring out how and where you spend your marketing budget in terms of how much you spend on creative, what channels. It’s just kind of interesting to see it from this perspective. And it’s kind of like, do as I say, not as I do, because I know Meta and YouTube in particular are like, If we find out that any part of your content is AI generated, we’re getting rid of it. So it’s like this. They’re also walled guards with data, so like you can’t access any of the data that they collect about things.

It’s like, it’s a massive like double-edged sword.

Yeah, they’ve all, they’ve all had these, I mean, they’re used to, like a coworker, it used to be a marketing group. Was it on target marketing? Or you would have had some, I think. And it was, so a lot of these businesses like Meta, even Facebook, they They all, it’s like they created a whole side domain of business that would do this for their, you know, their system and stuff. It’s like you’re just going to cut all of it out. Um, seems like.

How much of the, uh, so for everything you’re studying from metrics for marketing and returns on that, uh, how much also our patient outcomes being studied.

for positive effects for the patients.

So that is like the gold question that no one knows how to answer because so many things affect patient outcome.

Length of stay in hospital, hugely predictive of how you’ll do in the next like month or so. Whether or not we have access to that information or can pull it. The patient education outcomes research I tend to see tend to be very, very specific studies. in partnership with a university or some kind of academic research partner. And so trying to do this stuff at scale is really, really hard. I was just thinking if I got served in the head sank.

Hey, you know, you might have this outcome in the future.

And if you do this, three out of five people have a better quality of life.

So like that is the ideal thing that they have is like how much money came into the system.

And then it’s, oh, we can prove that we made a difference in their health is like the next piece.

So company that I work at, by the way, I meant to say this earlier.

These are my views and not necessarily my contents, not necessarily nor expected company I work for. It’s called WebMD.

Monday night specifically which is patient education and marketing. The website is technically a separate company however I do take complaints and then I send them on to other people about them but I will also say I know that I can be frustrating using that site for symptom checker however there are reasons that those symptoms are listed sometimes and I hadn’t thought about this for a while but one of my best friends found out he had cancer because he had a persistent cough. and that’s something you go on WebMD and you’re like oh cancer or death and it’s like oh but really this was an example of where it actually was one of those and so they do have to list them it’s just that it’s so prevalent for so many conditions it’s really hard and for liability reasons they have to include all of the symptoms because what if you went and looked it up and it didn’t list cancer as a potential outcome for cough and then you were like oh you didn’t do your due diligence or you didn’t provide it and so that’s kind of one of the many lines that we have to walk. And it can be frustrating, which is why personalization is of such interest there. And then back to what you were saying, I’m actually working on something called multi-touch attribution, which is, if you’re served a whole bunch of ads, which ones and which order influenced your decision to go to this hospital system? But we’re also looking to incorporate patient education. So when did you get education from your doctor? And how does that interplay with marketing?

That’s a long-term question we want to answer, because that’s the I think that’s one of the best ways we’re going to get it. The question you asked of, what actually made a difference to the patient? But building that’s really hard.

And for us, clients have to be buying both our marketing services and our education services for us to get that data. And there’s about three of our many, many clients that do. So it’s very much it’s early.

We’re still trying to figure out how we would go about doing it or how we would partner with. I’m just happy to hear that’s a consideration. It was a really big consideration at HealthWise, which was the company I worked at before we got bought, which only did patient education, so it makes sense that that would have been of interest, but we got bought. And we went from being a nonprofit to being a photographic company, and there’s a lot of shifts that happen for that reason and the things that companies value. And so people within the company definitely want to know, but it’s going to depend a lot on what we actually impede. or what makes a difference in which contracts we get. I think that was my last one for right now. So thank you for listening. We can go through. I know there’s some questions in the chat of people with questions or discussion points.

Feel free and I’ll just say I did add some sources. It’s not all of them. I dropped my health card Google Doc. In the Discord chat, after our last time I was here, which was a while ago, I can add that again. It is truly not the most organized thing.

It was a lot of stream of consciousness, things, and open tabs that I dumped in there. And I have many more. The rest of it’s in my Notion reading list, then untagged and unhelpful.

It’s just links and maybe titles if I’m having a good day. But there’s a lot of information out there.

There’s a lot of areas for opportunity, which is why I think it’s great time to be in healthcare research. It’s just barrier to entry for access and other things can be difficult in terms of getting access to data or trying to build something. Open enrollment during that period, I got exposed to AI asking to see what premium should I use to buy.

it was pretty good meaning because let’s say I was paying $700 per month last year so it’s been like 11,000 this year because there’s an increase so it went back to my out-of-pocket last year was like 2000 so it’s telling me instead of premium premium, what is our healthcare, I should just get the core that I would save money. How the whole, because of circumstances, right? How many visits you do and the discounts that comes in between. That’s what the AI side was missing so that I could make it better target.

So I still kept my premium healthcare this year.

That’s why it’s interesting because I’ve had to make the decision of do you go with the high deductible cheaper plan? Exactly, yeah. HSA or do you go with the plan that I do? How old are the things that happen to you to lower out of pocket?

HSA did come in play because, you know, it’s like a debit card, but then you don’t know if it’s going to qualify for certain things. You’d be out of pocket first, that kind of thing. So, you know what? It’s too much paperwork, but it’s too much time to think over. So I’m just going to keep status quo. Yeah, but I don’t understand why it’s open enrollments only two weeks or three weeks long. Is it two weeks long usually?

I honestly should pay more attention. I haven’t. Work is opening hours in December, so that’s that’s what I paid attention to is when my own opens.

I mean it was like boom.

Yeah, I know it was usually pretty quick. Maybe 30 minutes and then like two days before that I was like even more like three hours.

Yeah, and then status quo. But the AI, Alex was pretty good. It was talking to me. And this is also like, I do want to say I can come across as like very cautionary, probably because I’ve worked in healthcare, but not to say that there’s not a lot of good that can come from this. It’s just, I’ve seen a lot of bad done with, with data and healthcare, just like historically, but also just like, You see people make up data, you see retractions, you see all kinds of things that happen in peer review. It’s especially free just to me when it’s affecting people and the decisions they make. And so I tend to operate from a place of caution, which is why I started with all of the challenges and not all the cool stuff. I will say at work we have an AI avatar for people to be able to chat with them about different conditions to learn and figure out what their options are. The health systems themselves are interested in it.

I don’t know if patients are. And I really want to ask about the demo here, but they just showed it to us today.

And they just showed us the stills. I don’t know how ready it is to show, but I would be really curious on people outside of, like, you know, when you get into your, like, swirl, people with different perspectives react to that and if they would ever use it because… Yeah, it’s cool, but and the health system wants to pay for it But like if they don’t use it and people don’t use it, they’re not going to renew So then it’s like why built it? Why spend all the money to build it? Well, you think but the decreased time with providers a lot of people get they would well a lot of people are going to chat GTT anyway on their own even if Open AI says like don’t use us for medical stuff people are going and doing that because they don’t have access to anything better and so there’s like a debate. I want to say Sanford or one of the other like really big AI and healthcare like labs is holding one where it’s like is it better to have some kind of access than no access to all to help information but then you think about like liability again I keep telling around like I live here where everyone talks about cybersecurity and how easy it is to like poison things and it’s like who knows what people are out here doing for you know either the fun of it or on purpose, you know, we don’t really know. We don’t know the way to quantify that right now. And so like, when you’re talking to people, I’ve been talking to a lot of older people about this recently.

Like, how do you know what’s coming out as truth? We don’t.

And like, those of us in here know that.

Like, we’re paying attention, we have training, we’re interacting with it enough, we know that, but… Again, I was just talking to some, I was just talking to people about this on my call, but maybe almost late for this.

A lot of people see this as magic and not math and data and science. And so they don’t necessarily understand why things are happening the way that they are and why things are popping out of OpenAI’s chat GPT or other models the way that it is.

And so people are just like, wow, that looks mostly right.

I believe that. And then it turns out we don’t know what they’ve, we don’t know what they’ve scraped. have they scraped a bunch of anti-vax content in there? We have no control over what went into it, we don’t really know. So how do we know that what’s coming out about the vaccine is actually true unless you go and you follow and you continue to do the research and the problem sometimes between your own research is how do you know what’s a real source and what’s not anymore? There’s a lot, there’s just a lot on the internet now. We really don’t, do we? We’ve been talking about trust a lot at work, and that’s why I think machine-generated content is not the way to go in terms of that. But that, again, is my opinion and not necessarily my employer’s opinion. No, no, I think you’re right. In regards to digital health care workers, right, or EHR rampant, what would you, if you could weigh the measure upon? Would it be more that you would want the data to be aggregated?

and actually have more access overall?

Or do you think that there’s like even more challenge with the standardization and like ICD codes? I would rather have data standardized within an individual system because I’d rather have regional information that was helpful than like trying to pull things together at the national level or the international level. ICD nine codes are actually internationally developed. Who runs the World Health Organization?

ICD-11 just came out in 2022.

We do not yet have a timeline for implementation here in the US. That means codes will change, which means doing historical longitudinal data that tends to be about every 20 years or so.

That makes it really hard to track things over time.

So there’s things like the Framingham Heart Study, which enrolled people early and may follow them pretty much the rest of their life as much as they could.

Mayo Clinic also runs a longitudinal data set for their entire county. They have all kinds of data, but it’s very, very specific to their population, but they have it from, in some cases, from birth to death for people. Similar to how you mentioned stuff that’s not ancient history, I was a med tech in the early 2000s. The hospital I was in had just transitioned to computers for records. We’re only 20 years into computers and hospitals being used for records rather than just papers. Yes. Digitization of paper records.

I don’t know how high that is on anyone’s priority if it still exists because a lot of times that stuff gets lost.

Reading what doctors wrote. I don’t know how that works.

I was just gonna say there’s jokes about like buildings burning down with records in them but that does happen. And also, like this isn’t a side, this is not healthcare, but I’m gonna talk about any makes it’s cool. If any of you have heard of human rights data analysis group, they do really, really cool stuff with math. And so the, they’re like lead quantitative person came and talked to Vanderbilt while I was there. So cool. He does a lot of like human rights violations work and he justifies for like more criminal trials at the Hague and other things. And they had discovered this building in Guatemala. filled with records of all the wrongs the police had done just hidden in this building all these years all these cases just thrown in there why they didn’t burn them I don’t know but they were all in there and so they had to devise a method to prepare for the trial because it’s going to take historians lifetimes to go through all of this and catalog it and understand it they had to devise a way with math treating it as like a geographical distribution like spatially what records to sample and poll to prove that this person was the head of the conspiracy. And they did. And it’s just like, they do such cool stuff. So another really cool thing, if you’re ever interested in math and data being applied to really interesting problems that are maybe really hard to solve, that group. What’s the group?

You can memorize data analysis group.

I want to say they’re based in California. But. I met a couple people that have worked there. They do really interesting work. And some of it is on like issues with police data and building models on where to send police on data where you don’t have data. So like basically his thing when you look at building models for policing, all the data that they have in local California for drug buses in Oakland, California, but he was like, there is no way. that Oakland is the only area of that area of California where people are doing drugs. They’re just the ones we collect data on and so if you build your model and that you’re gonna send more people, you’re gonna send more police officers there, therefore more people are gonna get caught and you perpetuate this. Of course you’re gonna find more here. Instead of looking for drug deals in other places and then there’s things like human trafficking where what do you do when you don’t have data or you only have data on the cases that get caught? That doesn’t mean that you have to account for that uncertainty somehow, and how do we do estimates for things where we have a very limited pool of data? All of those problems are just really interesting, and in general, health care has a lot of those issues too.

It’s also like any adverse event reporting or anything like that.

It’s entirely voluntary. It varies by hospital systems and physicians. But I mean, the portal is still there, but whether or not somebody dedicates the time where their state or their different institution requires them to say anything at all.

Like if it’s something that the CDC monitors the require, but if it’s like anything else, it’s not going to be or iffy if it is.

Yes, and then how do you know when to pull a drug when you’re only getting voluntary adverse reaction data coming in?

And so it may only be a small portion.

It might be everything. It might be a small portion. How do you figure that out? There’s like people who work on statistical methods to figure that out.

I think we talked about here before, but it’s sort of like the old, so they would take war planes when they’d come back for missions. Yeah, survivor bias. Yeah, and they would patch up and add extra armor where we were shot. It’s like, I need to study the ones that didn’t make it back. The ones that survived. I’m going to check the shot and see if there’s all cool questions. Just see if there’s any. Yeah, we do need to start wrapping it up. Jack did make an interesting comment about the golf example being tied to higher income and that it’s really hard to untangle data by it and yes.

All this stuff is so correlated.

It gets really hard to differentiate what is the true thing driving the effect.

Okay, what do you think AI has truly made possible in the last few years?

And for AML training, are there any companies leveraging aggregated data?

patient data. That’s second question from Dean about law stuff, the litigation stuff. I don’t know as much about that. I know a lot of people in legal AI tech. I don’t know if they’re looking at like monitoring the medical stuff, but there is a lot of interest from the legal community in using AI to do all kinds of things that were not available to them before. So wouldn’t surprise me if someone’s out there doing it. I just don’t know. And what do you think it has truly made possible in the last two years that wasn’t realistic? Anything with big data. Anything population level wise with big data because processing big data until about 10 years ago, we did not necessarily know how to do it or have the meeting resources for it.

And that was a really big topic and it still is.

And because of that, a lot of genomics or anything where you hear omics at the end, it can be done. Where it would not have been peaceful before. So like predicting protein folding. Chroneomics, doable now. A lot of genetic stuff, anything that requires like spatial, lots of things like that. That one Alzheimer’s study that had the multimodal piece where it had the images with the video, talking the salad, the talking the text, you know, with the three modalities into one model that would be possible years ago. Yeah, I didn’t talk a lot about it, but that’s it. Alzheimer’s disease is really difficult to diagnose because it’s a consensus diagnosis. It takes at least two or three clinicians to agree that you have it and there’s six different areas that you have to meet certain requirements for to be officially diagnosed with either MCI or AD. MCI is that like middle stage of you have some kind of neural problem.

It’s probably for it’s probably going to lead or could lead to dementia, but it might not necessarily be Alzheimer’s disease related dementia. There’s other types. Also Parkinson’s can present as dementia depending on your symptoms. So there’s a lot there in terms of modalities that like could help with the diagnosis issues or as we figure out more interventions, whether they are like physical exercise related or drug, when to give them because of the pieces you’re seeing. That’s an interest to a lot of people too. So it would be interesting to see like an AI model that’s trained on the whatever the ICD standards are through any given time, right?

So if you have like standard that’s, and I don’t know the years, but if you have standard that’s from 2005 to 2010 and then from 2010 to 2015, right? If it’s trained on those and knowing like what the years are, what the codes were, what the general diagnosis for this combination of codes.

How that would be helpful in research.

And that’s something too. I think there’s just so much potential with the codes because we just did a hackathon in October and one of the teams decided to, because I’m silly that week, decided to do was… So there’s really strict taxonomy metadata.

really rigid I would say like scientifically driven metadata and then there’s stuff like certain hospital systems don’t allow certain content to go out because they’re catholic hospital they’re this hospital or they don’t want to provide this they want to stuff in their own how do we create that metadata so that it’s associated with it that’s not being hand done and it can be changed so it’s more like It’s like a not it’s impermanent whereas like I think of things religious it’s not permanent permanent but it’s like pretty rigid in terms of like this ICD-9 code goes with these diseases or these concepts and these kind of all go together in certain ways. It’s like very medically driven and then some of this stuff is very just like socially driven and it means it’s a lot more flexible. Like how do we how can we use AI to do some of that and we actually built a working prototype of that one and it turned out Interesting. It worked pretty well, but I can’t demo it. When do you get to the point of you have something to demo? I really want to see everyone’s reaction again to the video avatar. I don’t know if they would ever let me demo the internal stuff because the internal metadata stuff is one I know at least as far as I know we’re not looking to build our own like LLM sort of training stuff on all the stuff that we have but there are medical ones being developed. At the given time, I would say like Stanford is probably our best bet. They’re linked here to the Arise group. They have a lot of their AI stuff.

Coalition for Health AI is really focused on standards for healthcare data in AI. Some of the rest of this stuff is just links to different stuff I talked about. But if you’re interested, Coalition for Health AI or CHI, as it’s shortened, has a good follow and so is the Arise group. And Horizon does like weekly webinars.

They have newsletters. They have just like random things they do. They sometimes do hackathons, I think. There’s a lot there for people that might be interested in just like dipping a toe into healthcare data, but not necessarily being like, I’m gonna pivot my whole career into this. It’s also just interesting to know, I think when we’re looking at like policies and unintended consequences and lots. Lots of other things. It’s just like, how did this play out? And it’s like, oh, I look back. Oh, like, I didn’t put it in here.

Definition of rural or hospital systems, hugely impactful on who gets money for certain hospital systems. And they tried to make a policy that would give more money to rural hospitals to be able to pay. Like, for more nurses or clinicians, the problem was when they hired them, they no longer qualified for the funding. But the funding, they couldn’t afford to keep the nurses full time. without the funding. And so it was like this circular problem that ended up being developed. Well, what ended up happening was they hired a bunch of nurses part-time and that was good for nobody in the long run. And so there’s a lot of things, especially when you start looking at policies where it’s like, oh, we can kind of backtrack to where it went wrong, but trying to figure that out in front is really difficult.

If you want to really fund side study population rates in all of America.

The overall population has been almost static for about a hundred years.

Everybody knows the figure of, oh, the percentage is decreasing, so people must be moving away and that’s why everything’s closed.

No, the population’s the same.

Things are closing because there’s no economy there anymore and more people have moved to cities, but it’s still about the same number of people since the 1920s living in rural areas. They’re just losing all the services because no economy there. Now my personal experience with that is that my extended family is from section Alabama and we drove by to go to a funeral as a course and we saw a population sign and my mom goes, that same population is one I would visit as a kid. I’m a change. So that’s one of our like close to home ones. You know, one of the challenges, think of the question she had, one of the challenges that we have in AI in general, not just in medical, across the board, as we all know, it’s all based on LLMs, right? But just to make a comparison, when you go by a car, they will tell you if it’s got a 4-cylinder or an 8-cylinder, what kind of engine’s in there, right?

When it comes to LLM, there’s no disclaimer as to what it hasn’t been trained on, specifically, back to a very specific question. If you would know that, yeah, it’s been trained on the following, like a disclaimer, that’d be great. But right now, I don’t care which one you log on to, whether it’s Gemma or GTI GPT, they don’t tell you. I may won’t because they’re totally going to get sued. Again. Yeah, I mean they’re already being sued. Someone is tracking a lawsuit. They’re not going to tell you for a lot of reasons, but also when you get down into it, I saw someone that was like, I want to know specifically. The piece of content that drove the estimate that gave you this is why you returned this particular phrase to me. It’s like, that’s not really how the math works.

Because you’re broken down to some tokenized level, you can’t necessarily trace it all the way back to the initial piece of content where the thing got pulled.

But what I’m saying, if I wasn’t even trained on what specific person you have, what is medical otherwise?

Yeah. What does that mean? My worst nightmare is that it’s trained on a bunch of 4chan data.

Because it’s a bit like… A lot of my friends are gamers. I’ve seen some stuff.

One of the training on Reddit data kind of killed it. Oh gosh, that was the… That was my first time here, so you don’t know if it’s here or not, right? Like, you’re training on whatever’s on the internet. There’s a lot of things that aren’t on the internet, going back to digitization of records. There’s a lot of things that aren’t on the internet or available for that. I think it’s not digitized at all, because a lot of people in here work with stuff that will never be on the internet. But on top of that, it’s like, we don’t know if it’s been digitized, but also what it was trained on, we don’t know if it’s quality or not. information.

So it sounds like we need a different AI model that’s trained on how to find what the other AI models were trained on.

I’m pretty sure that the internal groups are probably doing that.

Because a lot of the architectures just built on like verified information.

Well, the best part of it is if there’s another one, we’ll make it up.

Yeah.

Like this .edu’s.

Yeah.

Right. Yeah. I find it interesting because like the executives that I talked to think that like it takes out human involvement, but really to build all these systems that requires a massive amount of human effort. And that’s a piece that’s kind of missing I think in a lot of people’s brains is to do this well. How much human effort does it actually require to build the thing that we really, really need? So like LLMs are cool, but for me, it’s always going to be like, we don’t know the quality of information that went into this.

So I would never, ever want to put something built on that directly into production.

Personally, I don’t, I have really like, I’m really risk-a-mers.

So.

You work well at the DOD.

No, there’s no crowds.

Yeah, so.

We do need to wrap up. Let’s thank Jacqueline for all the work to put into this. Appreciate it. We will be taking next week off.