HIMSS14: Data mining yields insights on quality indicators
ORLANDO—Each day in 2013 generated an estimated 2.5 zettabytes of data, the equivalent of about 225 newspapers every day for every person on earth. “It boggles the mind how much digital footprint you have every day,” said Brett Trusko, PhD, MBA, president and executive director of the International Association of Innovation Professionals, during a session at the Health Information and Management Systems Society's annual conference.
Using much smaller datasets can yield tremendous insights.
Trusko and his team mined nearly 6 gigabytes of uncompressed data from the Agency for Healthcare Research and Quality’s Healthcare Cost and Utilization Project (HCUP) dataset on discharges, allowing for the review of regional and demographic relationships between disease and other components. Specifically, they mined the data to provide new insight on three quality indicators: acute myocardial infarction mortality rate, acute stroke mortality rate and pneumonia mortality rate.
Their work seeks to establish a baseline of relationships with the goal to eventually develop a commercial platform to share these data with the public.
Some limitations to their research were that the dataset only represented 20 percent of U.S. discharges and the data are from 2011—just as the Accountable Care Act kicked into gear, Trusko said. In all, they parsed through data on 8.2 million discharges and loaded them into the Hadoop, an open source software project that enabled the distributed processing of large data sets across clusters of commodity servers. Then the team used a commercial platform for the analytics.
Trusko shared some insight gleaned from data mining the HCUP dataset on the quality indicators:
- Acute Myocardial Infarction (AMI) Mortality Rate (for patients 18 years and older and excluding obstetrics discharged and transfers to another hospital): From their data, they identified 8 million discharges, with 113,972 total AMI cases and 6,742 deaths. They noticed that the mortality rate is highest in the northeastern part of the U.S. and more common during winter. Also, data showed that Asian and Native American women are dying the most from the disease. As women often come in with different symptoms, “it confirms that it is more dangerous for women coming into the ER then men.”
- Acute Stroke Mortality Rate (for patients 18 years and older), with individual focuses on subarachnoid stroke, hemorrhagic stroke and ischemic stroke. The team identified 123,000 cases, with a death rate of 9,879. There was no noticeable differences between gender and region, but the large differences between the different types of stroke suggest that they may all be distinct. Ishemic stroke is the most common, with the lowest risk of death—however, the risk of death for patients with subarachnoid stroke and hemorrhagic stroke is about four times higher. This poses questions on how this information is packaged for statistics, Trusko said.
- Pneumonia Mortality Rate (for patients 18 years and older and excluding obstetric discharged and transfers to another hospital). The team identified 199,275 cases, with 7,564 deaths. Some insight gleaned was that pneumonia rates were lower in the west, and the risk of death is significantly higher in Asian men while Native American women have very low rates. Income differences did not translate into different outcomes with stroke.
While these insights open the door to more researchers, Trusko said big data will be most effective with real-time discharge data that would allow researchers to see the trends immediately.