Will They Say “Yes” to the Vaccine You’re About to Offer?

A first look at a new prediction model

By Dale Dauten, Syndicated Columnist

Can you look at someone and predict whether or not they’ll get a vaccine you know they need?

I suspect that many vaccine veterans reading would answer, “Oh, yeah – I can always tell.” But, then again, others would say, “You can’t stereotype. People will surprise you.”

So, what if, instead of looking at the person, you could look at a person’s behavior and demographics – could you predict whether they’ll get a particular shot?

That was the question that the big brains at STChealth’s Analytics team set out to answer. For their testing, they started where they had the most data, the Covid vaccine. Dr. Kyle Freese, Chief Science Officer, explained: “We took a clean dataset [more on that shortly], then blinded the model as to each person’s Covid vaccine status. Then we included all the variables we knew to be associated with vaccine uptake – whether population-based or individual-based — and then we asked the model to tell us its prediction of if that person got a Covid vaccine or not.”

And? Did it work?

Dr. Freese: “We compared the model’s prediction to reality. Could it predict if the person got the vaccine? It was accurate over 96% of the time.”

Then, they stretched the model. Dr. Freese explained: “Okay, you can predict pretty well if they got the shot or not, but can it predict when they got it, whether they got it early on or waited? Even with that, it was above 90%.”

The goal is to take the methodology and use it with other vaccines, such as flu and shingles, and also includes vaccines that don’t yet exist to help better understand how a population will react to the rollout of a new vaccine.  “Say we have a new, future pathogen,” Freese suggested, “and there’s a vaccine being developed against it. We’d want to know the underlying population sentiment about new vaccines and how that translates into uptake, both overall and in terms of timing. That can help inform public health interventions and where to send resources.”

Summing up, Dr. Freese, concluded,

“We are hoping that this provides another level of information for vaccine providers, pharmacists included, in their daily work. It allows more community  and culturally-appropriate conversations and then, longer-term, help to rebuild trust in public health where it’s been diminished, especially in those communities that have historical mistrust. How do we move the needle there? This is one tool that we could incorporate.”


For those of you interested in details of the work that comes before the modeling can begin…

Dirty data.

We’ve got to talk about it.

I wish it were “Dirty Dancing” or even “Dirty Deeds Done Dirt Cheap,” but hey, the details of data quality matter, and the Analytics experts are seeking ways to improve the immunization data most everyone reading this is involved with – either putting it into an IIS or analyzing what comes out.  They have been asking themselves how to make the data better, and thus more useful, and how AI might help.

Sawyer Koops, STChealth’s Director of Data Science, added, “You hear ‘model’ and that’s the fun part, the part everyone wants to jump to, but 80 or 85 percent of your time is going to be on the data. Everything starts with the data. The model is only as good as the data.”

So, say you have missing data in a vaccine report to an IIS: Is it possible to use the data around it to figure out the missing number?

The data team in West Virginia volunteered to work with the Analytics team at STChealth to test ways to improve data quality. West Virginia’s health department is dealing with the realities of its population demographics and health histories:

  • 40% of the population lives in a rural county 
  • 25% of the population is 65+
  • One-third live in a high SVI county (that’s “social vulnerability index,” a measure of how vulnerable a community is when faced with a “stress” to the system, such as a pandemic)
  • Estimated 15% received the 2021-22 flu season vaccine
  • Highest in the nation for obesity, heart attacks, COPD, & diabetes
    (2/3 have more than one condition)

Said another way, West Virginia wants to maximize its health resources and one way to do that would be to improve the accuracy of the data around vaccinations. For instance, 19% of its records were missing race and 39% were missing ethnicity.

The experiment was to compare three methods of improving the data…

  1. Current Method (Manual and chart review of historical records)
  2. Multiple Imputation — Estimation of missing values using non-missing data across a series of datasets
  3. Machine Learning – using publicly-available models that are informed by real-life data to estimate missing data. (Similar to multiple imputation, but uses other populations’ data to inform.)

The goal was to determine which of the three did the best job of balancing personnel time, computational time, and extensibility/scalability.

Dr. Freese summarized the results:  

“Long story short, we found that of the three, the Multiple Imputation method worked the best. It was highly accurate, not computationally intensive, and in terms of personnel time, though it’s a big lift up front, it doesn’t have a lot of personnel time moving forward. That said, it’s a highly advanced technique so not anyone off the street can do it.”

So, no suspense … we know the winner. But, hold on, what is “multiple imputation”?

Dr. Freese explained: “Say you want to impute – estimate – a missing value for say, race, for a specific person in the dataset. If we have a dataset of three million people, and everything else we know about the person, we can run an analysis and estimate the missing data point. And if we run the analysis over and over, say 25 times, we get a more and more precise estimation of what that missing value likely is.”

Then, Dr. Freese explained, as you put the new imputed values into the dataset, “we can reduce bias caused by missing data – it’s biased because some groups have more missing data than others – and that allows us to do more detailed, nitty-gritty analyses.” 

By creating better, cleaner data, the new model for predicting vaccine hesitancy was enabled, and thus it was the discovery behind the discovery.

*A compilation of Johnny Carson’s “Carnac the Magnificent” bits…


 Is Seeing Believing?

By Bill Davenhall, Geomedicine Analyst

Over the last several decades we have witnessed, across the entire United States, a great response by both government and the private sector to local, regional, and national natural disasters – from wildfires, severe storms, unexpected temperature extremes, and most recently,  a global pandemic. Volunteers and resources flowed across community, state, county, and even national boundaries as if no barriers existed and many of the partisan issues that divide the sentiments of our citizens melted away quickly with seemingly good outcomes for most of the people in the geographies that needed immediate assistance. Then, much later on, after the “emergency” and urgent assistance subsided, most communities fell back into “normal order” where “boundaries” and in some cases even barriers re-appear — what I like to call the “emergency cooperation syndrome” or ECS!

The map below grabbed my attention – how do we have 1,104 counites where childhood poverty rates would be changing so significantly from one five-year period to another? Here is what the US Census Bureau reported: “According to the new estimates, the national child poverty rate declined from 21% in the period of (2012-2016) to 17% in the 2017-2012 period” and only 117 counties (or 3.7%) of all the nation’s counties with a poverty universe population of less than 10,000 had poverty rates 40% or more. The majority (81.2%) of them were in the South and  conversely, 302 counties (9.6% of all counties), had a child poverty rate below 8%. Nearly half (146) were in the Midwest”.

While this reveals some great progress on reducing childhood poverty – (rates declined in 1,017 counties), why is it that 43% of counties with declining  rates of childhood poverty rates were only in the South?

Was this drop in rates of childhood poverty the result of the early,  but brief, reprieve during the early days and months of the pandemic?  Was the drop in the most recent period (2017-2021) magnified by the increase of emergency preventative health resources, including helping many more children and their families become eligible for “insured” healthcare access opportunities? Wonder what the rates will look like in the next measurement period of 2021-2025?

I suspect that children who live in poverty today will likely remain in poverty over the next five years unless we have a reason for a more sustained response other than the more obvious ones that “natural disasters” often receive. Perhaps localized childhood poverty conditions need to be viewed as possible impending “disasters” – something that better preparedness demands. It will be interesting to see what childhood poverty rates will look like in the next assessment period report (due around 2027 by the Census Bureau). But in the meantime, maybe begin to re-think childhood poverty as a potential local or regional disaster that can happen anywhere but always with national implications – the future of all the children?

I always appreciate a 2nd opinion.