Mattison on big data: Great potential but risks exist

BOSTON—Kaiser Permanente CMIO John Mattison, MD, introduced several new concepts during his keynote address at the Big Data Healthcare Analytics Forum on Nov. 20.

The “inverted big bang” is his way of describing how big data is different. “Rather than trying to identify relationships between terms in a database, we bring together all the metadata associated with data points. We can use a very minimalistic data model but you bring all the metadata with the data into the database and operate on what’s of interest to you for a particular set of questions. That’s a radical departure from the data warehousing models of the past.” This funneling down of vast amounts from disparate sources is almost the exact inverse of the big bang that created the universe, he noted.

For effective use of big data, Mattison said there will be a need for “meta-topical brainforests”—collections of subject matter experts with a broad range of expertise who work with data scientists to make sense of the data. Tropical rainforests have unique characteristics in that they have a small amount of resources but generate the greatest diversity and the most efficient capture of solar energy. Big data is similar in that it will require great diversity and depth of knowledge and collaboration across minds, he said.

Other developments highlight the growing impact of big data. For example, he cited a book that examines the social impact of health and wellness. The book says that an individual’s average friend on Facebook is genetically related to him or her as is a fourth cousin. “That’s mindblowing,” he said.

Also, studies have shown that meditation and yoga can radically decrease the prevalence of disease and big data can help us understand the mechanisms of how that happens.

Mattison discussed the two distinct modes of discovery--hypothesis driven and hypothesis generating. Hypothesis driven considers, for example, the correlation between data types and what people are tweeting about and the relationship between what they are saying and a flu outbreak. “We can identify early outbreaks two weeks before the Centers for Disease Control & Prevention by watching Facebook.” That allows public health workers to identify small pockets and blanket the social network with notifications about vaccines. That can be expanded to numerous other conditions.

Hypothesis-generating discovery is when researchers find interesting, dramatic associations without knowing the relationships. This relies heavily on vendors being able to visualize data and show outliers.

Big data tools can help clinicians tap into patient conditions earlier and with greater detail, Mattison said. By using natural language processing, for example, “it’s extraordinary how much more real information you can get than is captured in medical records.”  

Big data presents risks, as well, he said. He cited a published study from a large institution where the researchers concluded that it did not matter how soon before surgery patients received prophylactic antibiotics. That contradicted just about every other study on the topic out there. The problem, Mattison said, is they didn’t pay enough attention to data definitions of when the drug was administered and when surgery was performed and got false conclusions. “There is a timing that really matters.”

Big data also can lead to invalid misinterpretations. For example, he said when the Department of Homeland Security was first formed, the CIA and FBI were ordered to pool their data. However, they use different definitions for the same data types. “When you pool data, it’s really easy to misinterpret data you’re not familiar with. Having people who understand the differences becomes more critical than ever.”

The goal of advancing technology such as wearable sensors, Mattison said, is not to have digital nannies but to drive behavior so we’re more mindful. Wearables and other technologies will produce more than ten times as much new knowledge by 2020 than is currently generated by randomized controlled trials, he predicted.

And, wearables could greatly influence how we track our health. “In a few years, it will be considered grossly negligent not to have embedded sensors in football helmets” to better track head injuries, he said.

“We all have digital representation,” he said. The public is customized to having company websites know about them and their past purchases. This same idea will help with motivation and the “ability to motivate people will become more and more prominent in how we interact with the digital world.”

Genomics will identify individual risk and help drive prevention and wellness planning, and pharmacogenomics will drive personalized dosing, he said, predicting a shift from an organ-based treatment system to one that’s more genetically based.

Beth Walsh,

Editor

Editor Beth earned a bachelor’s degree in journalism and master’s in health communication. She has worked in hospital, academic and publishing settings over the past 20 years. Beth joined TriMed in 2005, as editor of CMIO and Clinical Innovation + Technology. When not covering all things related to health IT, she spends time with her husband and three children.

Around the web

The tirzepatide shortage that first began in 2022 has been resolved. Drug companies distributing compounded versions of the popular drug now have two to three more months to distribute their remaining supply.

The 24 members of the House Task Force on AI—12 reps from each party—have posted a 253-page report detailing their bipartisan vision for encouraging innovation while minimizing risks. 

Merck sent Hansoh Pharma, a Chinese biopharmaceutical company, an upfront payment of $112 million to license a new investigational GLP-1 receptor agonist. There could be many more payments to come if certain milestones are met.