The Dawn of Data Mining

Laura Pedulli | May 30, 2014 | Economics

As EHR adoption gains ground nationwide, healthcare professionals are just beginning the expedition into newly available electronic health data at their disposal.

Data mining is the art and science of intelligent data analysis, which aims to discover meaningful insights and knowledge from vast pools of data. It also can entail building models that can predict readmissions, risk of disease or efficacy of medications for better clinical decision support, as well as yield insights into quality performance.

While other industries have capitalized on data mining for years, healthcare is just beginning to catch on.

“It’s definitely growing in the medical field,” says John Elder of Elder Research, a consulting company that has specialized in data mining and analytics in Charlottesville, Va., for 17 years. For him, the interest makes sense. “It brings cutting-edge technology into the front lines of a practice to achieve a high return on investment.”

Resource Challenges

For the most part, it’s the larger integrated systems that have the staff and resources to engage in data mining, says Paul Kleeberg, MD, CMIO for Stratis Health and clinical director for the Regional Extension Assistance Center for Health IT (REACH), which covers Minnesota and North Dakota. These integrated systems build data warehouses to mine for data on quality performance, high utilizers and disease indicators.

Smaller hospitals and physician practices for the most part are still grappling with EHR adoption and the beginning stages of Meaningful Use. They are mostly interested in data analytics for performance measures but few have progressed far, he says.

A small hospital or clinic only stores a limited amount of electronic data, but linked together there is a possibility of greater insight.

Kleeberg cites a group of federally qualified health centers in northwestern Minnesota that formed a safety-net ACO that utilized simplified data mining techniques to track details of their patient populations. They invested resources into customized tools to extract EHR data from multiple clinical sites and make them publicly available. Through it, providers are able to run reports on the number of patients with previous cardiovascular incidents who are reaching goals for hemoglobin, low-density lipoprotein and blood pressure—and conduct outreach for those who are not adequately managing their conditions.

“Each provider could see how they were doing in comparison to each other and it winds up improving the quality of care and ensuring patients don’t fall through the cracks,” says Kleeberg.

Open source data mining tools like Rattle–via R are free and “extremely powerful” but the workforce is inadequate to keep up with the steep learning curve, says Ryan Sandefeld, REACH consultant and chair and assistant professor of health informatics and information management at The College of St. Scholastica in Duluth, Minn. “There is a big gap of doing sophisticated analysis between smaller hospitals and larger systems.”

But new models to centralize IT operations for multiple hospitals may mean these tools will get more play. For example, he cited SISU, an organization that handles all IT projects for 17 hospitals in Duluth and “allows the costs and the expertise to be spread out.”

This unique model is just beginning to open the door to shared analytics, Sandefeld says.

A Learning Healthcare System

“In reality, data mining is going to make clinical trials obsolete as we move forward to a learning healthcare system,” says Kleeberg.

The Institute of Medicine popularized the term “learning healthcare system” and rallied the call for healthcare organizations to be a part of a system in which care of patients is integrated with medical research so that the healthcare practices offered in the system are continuously studied and improved.

Indeed, a few regional extension centers are joined with academic institutions with an eye toward data mining to advance research.

One such case is the Chicago Health Information Technology Regional Extension Center (CHITREC), which is operated out of Northwestern University and seeks to use EHR data in clinical investigations.

Through a grant, the center is building a network of EHR-enabled providers throughout the region. “With their consent, we are finding ways to pull data together in a responsible way to be able to conduct research studies, aggregate data for public health purposes and quality improvement work,” says Abel N Kho, MD, co-executive director of CHITREC and assistant professor at Northwestern.

This effort is running parallel to a push to connect small practices into the Illinois health information exchange, which is under development and in its current form is limited to secure messaging, he says.

Also, Northwestern and CHITREC are collaborating with seven other Chicago-based institutions to build a mechanism to pool data across multiple institutions in an encrypted, secure manner. With funding from a $7 million Patient-Centered Outcomes Research Institute grant that began this year, the eMERGE project pulls data from EHRs that help researchers identify cohorts for genetic studies.

The current dataset has linked more than 5 million de-identified records, including individuals’ clinical data, diagnoses, medications, laboratory tests and vital signs. Over the next year, bioinformatics specialists will work with the software to expand coverage to include patient records from all participating institutions. They’ll also look to streamline the task of inputting patient-reported outcomes and merging these with EHR-derived data, he says.

The aggregation of so many EHRs also provides a database for mapping of diseases throughout Chicago.

Kho says this work is modeled after the New York City Department of Health and Mental Hygiene’s Macroscope project, an up-and-coming population health surveillance system that will use EHRs to track conditions managed by primary care practices. The goal is real-time monitoring of chronic conditions, as well as smoking rates and vaccine uptakes.

These insights are not only meaningful for research, but at the point of care.

As a person sees their provider, Kho says he or she can design the system so it can send out a request to a separate dataset for a home address and zip code, and can mine these data to show block by block distribution of diseases like diabetes, heart disease and obesity. This system also can produce a list of resources in the community to help a patient self-manage based on his or her risks, he says.

The project, thus, brings big data to the individual patient level, he says. “People are excited to see where we’ll go.”

More Robust CDS

Applications of data mining are most exciting in building robust clinical support that informs a physician at the point of care to possible diagnoses they may overlook, says Elder. The success of the IBM Watson program, in particular, is helping usher in a new era where physicians have the latest research out there available to inform their practice.

Having a computer that deals with diagnostics allows for the surfacing of low probability events that physicians may not have experience with, he says.
“It happens too often—the false dismissal of an actual factor is a big mistake. The computer will put options in front of a doctor and help keep the alternative information alive long enough if it comes from evidence-based research,” he says. “The time is coming.”

Also, Elder predicts that the mining of unstructured data will produce some of the most important insights in healthcare. In his experience, the most successful data mining projects analyze unstructured data. For example, his firm utilized unstructured data to successfully predict fraud in the area of social security. For hospitals willing to mine physician notes and other unstructured data, it’s harder work “but you really get rewards,” he says.

Privacy and security

One major road block to widespread data mining analysis is that legal policies have not kept pace with the latest tools.

Data are more fluid and we can control the flow, says Trisha Harkness, a practice consultant for the Kansas Foundation for Medical Care, a health IT regional extension center.

But, researchers and state health organizations are grappling with how to navigate patient privacy and ethical considerations surrounding secondary data access.

Many hospitals and physicians across Kansas have been sending data into a state HIE for more than 18 months, but these data cannot be mined until policies are in place to ensure the privacy of patient information and determine who will have access. Harkness is looking to work with local health departments to develop population health reports utilizing these data—but everything is on hold until these legal issues get worked out.

The situation in Kansas is played out across the country. The federal government essentially has left it up to states to develop their own data use policies when it comes to HIEs.

Kansas has developed a new advisory council for the HIE to start to work on policies so the treasure trove of data is available for research, population health and other purposes.

Proceed With Caution

The ability to unearth unknown relationships and insights from large quantities of data is exciting, but hurdles remain, says Kleeberg.

Payment systems must move away from fee-for-service and align and support quality improvement and efficiency of care for data mining to advance.
The largest danger may be jumping to conclusions.

“The biggest caution that we need to understand is that correlation does not prove causation,” says Kleeberg. While researchers “go on a fishing expedition, looking for significance in all data,” statistics holds true that a certain percentage of seemingly significant relationships in reality are just random.

While mining may yield many nuggets, some can be fools’ gold. The true insights that will be gleaned are just beginning.

“There is all of this hype about Meaningful Use Stages 1 and 2, and we’re moving closer and closer to universal adoption. Once we have all the systems implemented, it’s really the beginning, it’s not the end,” says Sandefeld. “In the next decade we’ll have a 50 percent increase in data. Hopefully, the workforce for analytics can keep pace.”

Data Mining Cancer

Data mining applications are poised to tackle one of the most complex and personalized diseases out there: cancer.

Earlier this year, the Department of Defense sought researchers from universities, government and private industry for a $45 million data mining project to research cancer through Big Mechanism, a “causal, explanatory model of [a] complicated system in which interactions have important causal effects,” according to a solicitation from the Defense Advanced Research Projects Agency (DARPA).

The DARPA solicitation seeks prospective partners to build the mechanism and weave data mining into current research, with the ultimate goal of identifying targets for cancer treatment therapy based on their findings in the data.

These efforts follow the release of National Cancer Institute (NCI) tools that allow researchers to compare data from collections of genomic information against thousands of drugs to find the best cancer treatments. NCI’s tool, called CellMiner, was built for use with NCI-60, the institute's massive collection of cancer cell samples used to test potential anti-cancer drugs. It provides access to the 22,379 genes catalogued in the NCI-60 and to 20,503 previously analyzed chemical compounds, including 102 FDA-approved drugs.

Cancer is a “molecularly complicated beast,” says Genevera Allen, assistant professor of statistics at Rice University in the departments of statistics and electrical and computer engineering.

Along with researchers at the Baylor College of Medicine and the University of Texas at Austin, Allen received a $1.3 million joint federal grant from the National Science Foundation and the National Institute of Health to analyze large amounts of molecular cancer data.

The researchers are developing techniques for sorting, analyzing and drawing connections between data gathered through high-throughput -omics technologies like genome sequencing, RNA sequencing, microarrays and others. Specifically, they are creating a mathematical framework to enable researchers to pinpoint conditional dependence relationships between any two variables. Such a tool should make it possible to analyze and integrate multiple sets of high-dimensional data that were measured from one group of subjects.

While personalized medicine is a buzzword these days, Allen is cautious of any major breakthrough in the next five years.

“It’s not just one or two things that can cause cancer—it’s 20 things plus environmental factors that can cause someone to get cancer. Every cancer is slightly different and there are all different subtypes,” she says. “Two patients can both present clinically with ovarian cancer, but the basic genetic data can look completely different.”

Given these challenges, “personalized medicine is farther out than we would hope,” she says. Translating basic discoveries into actionable clinical information is another hold up.

The work has just begun and “it will take a long time.”

Laura Pedulli

Around the web

Cardiovascular Business

FDA says years-long tirzepatide shortage is resolved, will give limited leeway to compounders

The tirzepatide shortage that first began in 2022 has been resolved. Drug companies distributing compounded versions of the popular drug now have two to three more months to distribute their remaining supply.

AI in Healthcare

From Capitol Hill to a hospital near you? 5 federal recommendations for healthcare AI policy

The 24 members of the House Task Force on AI—12 reps from each party—have posted a 253-page report detailing their bipartisan vision for encouraging innovation while minimizing risks.