How and why to diversify healthcare AI training data

AI trained mostly on chest x-rays from men will perform poorly when a clinician applies it to women patients. An algorithm for diagnosing skin cancer on dermatologic photos will botch the job if the patient is dark-skinned and most of the training images came from the fair-skinned. And so on.

Examples of this brand of bias are accumulating into an unignorable chink in healthcare AI’s armor.

Granted, gathering AI-suitable training data from widely diverse patient populations is difficult even when it’s doable. But if AI in healthcare is unable to help some of the most vulnerable patient demographics, how can it help the cause of improving America’s health at the population level?

Three Stanford MD/PhDs lay out the best thinking on the subject in an opinion piece published Nov. 17 in Scientific American.

“Bias in AI is a complex issue; simply providing diverse training data does not guarantee elimination of bias,” write Kaushal, Altman and Langlotz. “Several other concerns have been raised—for example, lack of diversity among developers and funders of AI tools; framing of problems from the perspective of majority groups; implicitly biased assumptions about data; and use of outputs of AI tools to perpetuate biases, either inadvertently or explicitly.”

They note that researchers are trying to get around the problem of un-diverse data by building algorithms that can extrapolate broad outputs from limited inputs.

“From these innovations may emerge new ways to decrease AI’s need for huge data sets,” the authors write. “But for now, ensuring diversity of data used to train algorithms is central to our ability to understand and mitigate biases of AI.”

With that they call for building the technical, regulatory, economic and privacy infrastructure needed to glean data that’s not only big but also diverse enough to train medical AI for the benefit of all patients everywhere.

Read the whole thing.

Dave Pearson

Dave P. has worked in journalism, marketing and public relations for more than 30 years, frequently concentrating on hospitals, healthcare technology and Catholic communications. He has also specialized in fundraising communications, ghostwriting for CEOs of local, national and global charities, nonprofits and foundations.

Around the web

The tirzepatide shortage that first began in 2022 has been resolved. Drug companies distributing compounded versions of the popular drug now have two to three more months to distribute their remaining supply.

The 24 members of the House Task Force on AI—12 reps from each party—have posted a 253-page report detailing their bipartisan vision for encouraging innovation while minimizing risks. 

Merck sent Hansoh Pharma, a Chinese biopharmaceutical company, an upfront payment of $112 million to license a new investigational GLP-1 receptor agonist. There could be many more payments to come if certain milestones are met.