JAMIA: SAS-based natural language processing tools show potential for cancer research

Puzzle Pieces - 58.54 Kb
Natural language processing (NLP) applications could become a powerful aid to clinical researchers, according to a study published July 21 in the Journal of the American Medical Informatics Association. At Kaiser Permanente Southern California (KPSC), researchers built a SAS-based NLP tool that was able to successfully identify primary and recurrent cancer diagnoses in data from the healthcare system’s EHRs.

Due to a substantial delay between the time cancer diagnoses are made and their capture by cancer registries, cancer researchers are currently forced to rely on chart review and medical claims data to identify primary and recurrent cancers. The proliferation of EHRs, however, creates the potential for timely and complete identification of cancers for clinical research.

To put the potential to the test, KPSC researchers built an SAS-based coding, extraction and nomenclature tool (SCENT) to identify cancer diagnoses in the electronic pathology reports of 400 breast and 400 prostate cancer patients treated by the integrated health network between 2000 and 2007. A total of 915 pathology reports were included in the study and also manually examined by trained abstractors. In the reports, SCENT recognized 51 of 54 new primary and 60 of 61 recurrent cancer cases, and only produced three false positives in 792 true benign cases.

Following a set of hierarchical classification rules, SCENT examined processed electronic text using a dictionary of approximately 1,000 terms to identify clinical concepts associated with cancer and report them in SNOMED format.

Based on their findings, researchers led by Justin A. Strauss, MA, research associate III at KPSC, believe that SAS-based NLP tools could be easily implemented in most clinical settings for the purpose of analyzing electronic text.

“The widespread adoption of SAS in clinical analysis and research settings ensures that SCENT is highly accessible,” Strauss et al wrote. “Integration of SAS with relevant data systems has already been established in these settings, allowing electronic text to be readily extracted for analysis.”

“This functionality has the potential to provide significant value to clinical and epidemiological researchers, particularly when statistical NLP is infeasible due to resource or other constraints,” they concluded. “SCENT is proof of concept for SAS-based NLP applications that can be easily shared between institutions to support clinical and epidemiologic research.”

Strauss conceptualized and developed the NLP system described in this paper, led the validation study and drafted the manuscript, while his colleague, Virginia P. Quinn, PhD, a KPSC research scientist, provided breast cancer research expertise, assisted with data acquisition, contributed to validation study design, interpreted results and had input into the manuscript.

Around the web

The American College of Cardiology has shared its perspective on new CMS payment policies, highlighting revenue concerns while providing key details for cardiologists and other cardiology professionals. 

As debate simmers over how best to regulate AI, experts continue to offer guidance on where to start, how to proceed and what to emphasize. A new resource models its recommendations on what its authors call the “SETO Loop.”

FDA Commissioner Robert Califf, MD, said the clinical community needs to combat health misinformation at a grassroots level. He warned that patients are immersed in a "sea of misinformation without a compass."

Trimed Popup
Trimed Popup