JAMIA: Making EHR data extraction more exact is possible
Using a novel method to detect named entities within EHRs, however, researchers demonstrated that constructing more effective information extraction systems is possible at low costs.
Most named entity recognition methods for collecting information from datasets use conditional random field (CRF) recognizers, which rely heavily on local contexts surrounding named entities and assume that similar local contexts lead to the same judgements, according to the study’s author, Eric I-Chao Chang, MD, of Microsoft Research Asia. The problem with data extraction methods relying solely on CRF recognizers is that they often deliver contradictory information.
Using 20,000 radiology reports as a dataset to test multiple extraction systems’ ability to parse through records and deliver relevant patient follow-up information through an EHR system, Chang determined that pairing a labeled sequential pattern (LSP) classifier with a CRF recognizer was an effective method for filtering out irrelevant information.
The radiology reports contained a total of 121,748 sentences of which only 3,997 contained follow-up information. A data extraction method using both an LSP classifier and a CRF recognizer was able to reduce inexact matching by 6 percent compared to a data extraction method using just a CRF recognizer.
“In our method, LSP captures global patterns to choose candidate sentences before CRF identifies NEs [named entities] or relevant phrases,” Chang wrote. “The experiment shows that filtering out a large number of negative examples from the training set by an LSP classifier can significantly improve the performance of a CRF recognizer.”