JAMIA: Making EHR data extraction more exact is possible

interoperability - 53.70 Kb
The medical information contained within EHRs is undoubtedly valuable to many of healthcare’s stakeholders, but its presentation as unstructured text often makes locating necessary information difficult, according to a study published July 6 in the Journal of the American Medical Informatics Association.

Using a novel method to detect named entities within EHRs, however, researchers demonstrated that constructing more effective information extraction systems is possible at low costs.

Most named entity recognition methods for collecting information from datasets use conditional random field (CRF) recognizers, which rely heavily on local contexts surrounding named entities and assume that similar local contexts lead to the same judgements, according to the study’s author, Eric I-Chao Chang, MD, of Microsoft Research Asia. The problem with data extraction methods relying solely on CRF recognizers is that they often deliver contradictory information.

Using 20,000 radiology reports as a dataset to test multiple extraction systems’ ability to parse through records and deliver relevant patient follow-up information through an EHR system, Chang determined that pairing a labeled sequential pattern (LSP) classifier with a CRF recognizer was an effective method for filtering out irrelevant information.

The radiology reports contained a total of 121,748 sentences of which only 3,997 contained follow-up information. A data extraction method using both an LSP classifier and a CRF recognizer was able to reduce inexact matching by 6 percent compared to a data extraction method using just a CRF recognizer.

“In our method, LSP captures global patterns to choose candidate sentences before CRF identifies NEs [named entities] or relevant phrases,” Chang wrote. “The experiment shows that filtering out a large number of negative examples from the training set by an LSP classifier can significantly improve the performance of a CRF recognizer.”

Around the web

The tirzepatide shortage that first began in 2022 has been resolved. Drug companies distributing compounded versions of the popular drug now have two to three more months to distribute their remaining supply.

The 24 members of the House Task Force on AI—12 reps from each party—have posted a 253-page report detailing their bipartisan vision for encouraging innovation while minimizing risks. 

Merck sent Hansoh Pharma, a Chinese biopharmaceutical company, an upfront payment of $112 million to license a new investigational GLP-1 receptor agonist. There could be many more payments to come if certain milestones are met.