AI biopsy dilemma: Wolf or husky, equity or bias?
Early on in the development of digital image recognition, the technology showed a penchant for taking logical but potentially problematic shortcuts: It would look to image artifacts and incidental “asides” such as background features to distinguish between two visually similar subjects.
In one exemplary case, an algorithm used the presence or lack of snow to tell wild wolves from domestic huskies.
The training dataset more frequently pictured wolves than huskies in the snow.
The algorithm labeled all lupine canines pictured with frozen white stuff as wolves.
The AI therefore stood to give incorrect outputs half the time.
Now researchers at the University of Chicago have found a similar phenomenon at work in deep-learning interpretations of cancer-biopsy slides bearing clues on tissue features and genetic predispositions.
Reviewing the performance of algorithms trained on data from the massive Cancer Genome Atlas, senior study author Alexander Pearson, MD, PhD, and colleagues found these slides inadvertently give away the identity of the provider organization that prepared the slides.
And they do so not with image data per se but, rather, image artifacts. These might include the color or amount of the stain on the slide, the tissue processing technique behind the preparation, and/or markers of the equipment used to capture and digitize the image.
This would set up the AI to give biased outputs based on the patient population served by that organization, the authors point out. Algorithms trained on slides prepared at hospitals serving affluent populations, for example, would tend to predict overoptimistic outcomes when applied to patients from poorer communities.
And the outputs would include clinical predictions of such consequential cancer outcomes as five-year survival, genomic mutations and tumor stage.
“Care should be taken to describe the distribution of outcomes of interest across sites, and if significant, a submitting site should be isolated to either the cohort used for training or for testing a model,” Pearson and colleagues warn.
Further, they note, ethnicity can be inferred from the revealed site-specific signatures. This aspect “must be accounted for to ensure equitable application of deep learning.”
Nature Communications published the study July 20. Lead author is Frederick Howard, MD.
In internal news coverage by UChicago Medicine, Pearson puts a finer point on the problem as revealed by the new research.
Cancer diagnosticians tap AI to help “find a signal to differentiate between images, and it does so lazily by identifying the site,” he says. “We actually want to understand what biology within a tumor is more likely to predispose resistance to treatment or early metastatic disease, so we have to disentangle that site-specific digital histology signature from the true biological signal.”
UChicago Medicine says it’s crucial to carefully consider training data in order to head off algorithms prone to seeking site-bound shortcuts of the husky vs. wolf kind.
“Developers can make sure that different disease outcomes are distributed evenly across all sites used in the training data,” the institution offers, “or by isolating a certain site while training or testing the model when the distribution of outcomes is unequal.”
To this Pearson adds:
The promise of artificial intelligence is the ability to bring accurate and rapid precision health to more people. In order to meet the needs of the disenfranchised members of our society, however, we have to be able to develop algorithms which are competent and make relevant predictions for everyone.”
The UChicago coverage is posted here, and the study is available in full for free.