Although the Centers for Disease Control and Prevention (CDC) recommends universal screening for HIV in the U.S., an estimated 166,000 undiagnosed HIV-positive people continue to fall through the cracks. The growing use of electronic medical records (EMR) offers a potential technological solution to help physicians more easily flag high-risk patients, increasing the likelihood that an HIV test will be recommended for them.
New research presented at IDWeek 2017 in San Diego, California, explores how we can use computer algorithms to help ensure that undiagnosed people with HIV are tested, thus increasing the likelihood of them moving forward on the HIV care cascade. Our correspondent Sony Salzman spoke with study author Jason Zucker, M.D., a postdoctoral clinical fellow at Columbia University Medical Center, about the findings.
Sony Salzman: I was hoping you could walk me through the purpose of your study and its main findings.
Jason Zucker, M.D.: Sure. As you know, the CDC recommends universal screening. The problem is, the CDC really recommends one-time universal screening and then often using targeted screening for repeat screenings. So, [we are] trying to figure out who needs to be screened more than just one time.
We previously presented an abstract looking at what we called missed opportunities for earlier HIV diagnosis. We found that about 40% of our patients who were newly diagnosed with HIV actually had visits within the 12 months prior to their diagnosis where they were not screened for HIV. And so, one of our big suppositions was: (a) how can we identify these people earlier to get them screened and linked to care, and then (b) now that we're sort of expanding our PrEP [pre-exposure prophylaxis] program, how can we identify people who are at risk of HIV, both to screen them and then, if they're negative, enroll them in PrEP programs?
One of the biggest barriers we face to doing that is that a lot of the information you need for assessing HIV risk is not in any structured field. Structured fields [are] things [in the EMR] like: Is someone a man? What's their age? Those are really easy to take out of the electronic medical record. What's much more complicated to get are things like: Do people use intravenous drugs? Are they a man who has sex with other men [MSM]? Do they have multiple partners?
Things like that are often written in narrative form in the social history section. They're not in any checkbox that you can easily obtain.
So, our study was designed to look and to see: (a) could we model risk for HIV; and (b) would that model improve if we tried to use different techniques for natural language processing that actually extract from those risk factors from the notes?
In our results, what we found is that we were able to identify some risk using just structured fields, including diagnoses codes, gender, age, etc., but that there was a definite increase in what we were able to do using topic modeling, which is one form of natural language processing -- and that we actually got an even better model when we used clinical keywords.
What we found is that we were able to increase our precision -- meaning, our positive predictive value -- although our recall and our sensitivity still remained low. So, we were able to identify more people at risk of HIV using natural language processing.
SS: [Can you expand on the differences between] (1) natural language processing and then (2) keyword search?
JZ: It's not keyword search; it's still using a different form of natural language processing. It's using the unigram model. So, our baseline model was just structured fields. We then used a unigram model, and then a latent Dirichlet allocation model, which is a type of topic modeling.
SS: Is it correct to say that overall language processing did improve your ability to identify those high-risk folks? And that, within this abstract we're talking about, two different types of language processing each maybe had pros and cons?
JZ: Yes … There are different methods of doing it. To be honest, one was just better than the other, honestly.
Baseline plus keyword model was better. Topic modeling was not as effective as unigram models.
SS: [Can you] dive into some of the more practical implications of this work that you've done?
JZ: One of the biggest challenges is that we've been talking to providers about how we can improve HIV screening. And one of the biggest things we're finding is that providers are just busy. They all think that HIV screening is important. We agree that HIV screening is important. And they'd like to do it, but collecting some of these risk factors is time consuming. You know, if a patient comes to the emergency room complaining of problem X, there may not be time during that encounter to collect a detailed sexual history and determine whether that patient needs HIV testing at that visit.
In order to help people with this, we can use the notes -- particularly for things that you can find in the longitudinal record that may not change. Young MSM with multiple partners may need to be screened more frequently every time they come into the hospital for any particular reason. And this can help us flag them (a) so that they know that they need to be screened, and (b) to serve as a reminder that, in addition to screening, we actually have primary prevention now.
So, if you can identify these people, you can then offer them services like primary prevention, not just wait until they become HIV positive over time.
SS: Is this something that could be tacked on to [providers’] existing electronic health records? [Would this be] like an app?
JZ: We've actually discussed a lot of different ways, and we haven't come up with what the ideal way is yet. We're still talking with providers, and talking with patients, about what's the best way for this information to be provided. … Right now, we have pretty good precision, but not as good recall. And so, it has some role, but it may not be perfect yet. I think we need to continue trying to optimize the model and figuring out what the best way to implement that is, whether it's just a reminder in the chart, whether it's reaching out directly to primary care providers or whether it's even reaching directly out to patients, eventually.
SS: What’s the big takeaway that you'd like folks to walk away with if they come across your study?
JZ: I think the big takeaway is that moving forward we have the ability to build longitudinal models of HIV risk using data that's already in the electronic medical record [and] to try to assist providers -- not necessarily HIV providers who are maybe thinking about this, but primary care providers, emergency room providers, subspecialty providers -- to help get people linked, both for HIV screening and to preventive care.
SS: You know, it's interesting because, when you say it like that, it makes it sound as if it wouldn't necessarily be an extra step for very busy clinicians but [just] using data that already exists in the electronic medical record. Is that the case that, in theory, when these models improve, it could be tacked on without creating any additional work for health care providers?
JZ: That's exactly what we're going for. The idea would be that this could serve as either a reminder or a popup that they should be referring the patient for primary preventive care and HIV screening. The goal would be that the provider receiving this message wouldn't have to do particularly much. We could provide them that information and then allow them to obviously make the decision on their own. But we could provide them the information that this patient is at high risk for HIV acquisition [and] could benefit from primary prevention services.
SS: Given your experience with the care cascade, I'm wondering whether you think that this step, this testing step, is the biggest gap in the cascade?
JZ: I think there are a lot of gaps in the care cascade overall. I think, in terms of the end-the-epidemic efforts, this is the most important step because if people are not diagnosed then you can't get them into HIV treatment, get their viral load down, make it so they don't spread it to others. At the same time, now that we've expanded PrEP so significantly, if you are not screening people, then you're not thinking about it, and you're not referring these people for primary prevention.
So, I think it's the most important step, because it's what gets people into the cascade, whether it's the prevention cascade or the treatment cascade.
SS: [It sounds like the prospect for pop-up reminders for HIV testing based on the EMR is coming soon. What should clinicians keep in mind until then?]
JZ: I think the most important thing is still taking a history. Because what we're doing here is taking histories that are already in the charts for patients and [using] them to build models -- populations, health models -- to identify people at risk.
At the individual level, if you take a thorough history still, you can then come up with a patient's assessment for risk and figure out whether that patient needs HIV screening and other services.
So, I think that it all starts with taking a good, solid history and then thinking about it for your patients. In the future, if this gets larger and can be expanded to other places, you'll be able to have alerts and other things based on this. But I don't think we're there yet. ... I think when it comes to looking at HIV risk assessment, obviously the more data the better. And the more information we have on a large variety of HIV-positive patients, the better.
The current methods of HIV risk assessment are taking a history and then really thinking about which risk factors the patient has and doesn't have, which takes clinician training and time. And so, anything we can do to lower the activation energy to think about this, I think, is really important.
In terms of this work, I think our next step from our standpoint is to try to make the model better. And we're trying some more novel, natural language processing, techniques that we hope will allow us to improve the model over time, and then work on ways to try to put it into practice to help start screening providers. We're also surveying providers and speaking to them directly about how they would want this information provided to them: What's the way that is going to be easiest for them to receive it, and then act on it?
SS: You said you're doing those surveys now?
JZ: We are surveying providers right now, yes.
This transcript has been lightly edited for clarity.