Advanced AI models improve data extraction from free-text pathology reports

Researchers have developed two AI-powered tools for automatically extracting key information from free-text pathology reports. The team, from the government-funded Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee, shared its findings in the Journal of the American Medical Informatics Association.

The ORNL is one of the U.S. Department of Energy’s most important research laboratories, and its scientists are constantly working to find new ways to improve patient outcomes using advanced technologies such as AI and natural language processing.

“Population-level cancer surveillance is critical for monitoring the effectiveness of public health initiatives aimed at preventing, detecting, and treating cancer,” corresponding author Gina Tourassi, director of the Health Data Sciences Institute and the National Center for Computational Sciences at the ORNL, said in a prepared statement. “Collaborating with the National Cancer Institute, my team is developing advanced AI solutions to modernize the national cancer surveillance program by automating the time-consuming data capture effort and providing near real-time cancer reporting.”

For this study, lead author Mohammed Alawad and colleagues trained multitask convolutional neural networks (MTCNNs) to extract cancer-related data from free-text pathology reports. The MTCNNs—one “hard parameter sharing” model and one “cross-stitch” model—performed five separate extraction tasks. Their performances were compared with single-task CNNs and a selection of machine learning techniques.

Overall, the MTCNNs outperformed all other AI models. Based on retrospective analysis, the hard parameter model (59.04%) and cross-stitch model (57.93%) correctly classified a higher percentage of pathology reports than the other models, which ranged from 36.75% to 53.68%. A prospective analysis of the two MTCNNs also resulted in a superior performance (60.11% for the hard parameter model, 58.13% for the cross-stitch model) compared to the other models.

So what’s next for these researchers?

“The next step is to launch a large-scale user study where the technology will be deployed across cancer registries to identify the most effective ways of integration in the registries’ workflows,” Tourassi said in the same ORNL statement. “The goal is not to replace the human but rather augment the human.”

Michael Walter
Michael Walter, Managing Editor

Michael has more than 18 years of experience as a professional writer and editor. He has written at length about cardiology, radiology, artificial intelligence and other key healthcare topics.

Around the web

The tirzepatide shortage that first began in 2022 has been resolved. Drug companies distributing compounded versions of the popular drug now have two to three more months to distribute their remaining supply.

The 24 members of the House Task Force on AI—12 reps from each party—have posted a 253-page report detailing their bipartisan vision for encouraging innovation while minimizing risks. 

Merck sent Hansoh Pharma, a Chinese biopharmaceutical company, an upfront payment of $112 million to license a new investigational GLP-1 receptor agonist. There could be many more payments to come if certain milestones are met.