Clinical Data Repositories: What Do You Want Your Data to Do?
The answer to this question will vary across facilities and by scope and function of the CDR. However, there are common denominators—including high-quality, accessible data and, increasingly, scalability.
It might help to clarify what constitutes a clinical data repository versus a data warehouse. A CDR can aggregate data from warehouses and package those data in different ways for various purposes, whereas data warehouses typically have one purpose: data storage.
Mayo Clinic, in Rochester, Minn., faced a proliferation of data and data warehouses. In 2005, the organization began development of a data repository built on industry standard data warehousing principles, called the Enterprise Data Trust. "We call it a data trust, but functionally or from any sort of operational characteristic, I would not say they're any different," says Christopher Chute, MD, DrPh, an internal medicine specialist who established the Division of Biomedical Informatics at Mayo Clinic and has seen its CDRs evolve and proliferate over the past decade.
In contrast, the terms are not used interchangeably at Carilion Clinic, a Roanoke, Va.-based organization that includes seven hospitals, about 200 ambulatory sites and 550 providers that serve an area some 200 miles in diameter in Southeast Virginia. "When we talk about a data warehouse, it's separate from the CDR," says Steve Morgan, MD, CMIO at Carilion.
"I am not sure how that is strictly defined and feel like the understanding of what that should be varies," says Steven M. Adkins, MD, FAAFP, chief medical informatics officer at Holston Medical Group, a physician-led, multi-specialty group of nearly 150 primary care physicians, specialists and mid-level providers based in Kingsport, Tenn.
A CDR might not be necessary for managing a population of patients with a given clinical diagnosis, such as diabetes, within a hospital system because this can be done readily using existing analytics: "all I need is data about my patients residing in the EHR," Adkins says. However, benchmarking that care against external clinical standards might necessitate a repository because it requires comparative data from a large, preferably national or regional, aggregation of similar patients.
"Most organizations will not have the scope or reach to collect enough data on their own," he says.
Holston Medical Group has a large primary clinical database in its EHR and has created several data warehouses outside that primary database used for reporting and analytics, according to Adkins, but hasn't aggregated data from outside the system into a warehouse or repository separate from the EHR.
At the other end of the spectrum, Mayo has about 800 clinical registries of various stripes. "That was part of the problem that precipitated our goal of engineering a shared common repository, our Enterprise Data Trust," says Chute. "It's inefficient to maintain that many purpose-specific registries." Much of the contents of these registries should be common and sharable, with the understanding that "you don't need to share data 800 independent times, particularly for things like demographics or lab data or medication data—what I would call the bread and butter of most registries. They can and should be curated well and once, and then serve [their appropriate] use case," he says.
Prior to Mayo's Enterprise Data Trust efforts, the organization had "probably half a dozen" traditional data warehouses for specific purposes that were engineered over time, says Chute. The Enterprise Data Trust consolidation strategy focused on the warehouses as opposed to the registries, enabling Mayo to build and operate a single comprehensive source for data that is curated and managed appropriately, with robust access safeguards protecting security and integrity, says Chute.
"One must always be concerned about inappropriate access and use. Privacy is obviously the foremost attention of the organization," Chute says. "We don't want to risk that with a needless proliferation of data points."
The Carilion Clinic CDR is retrieving data from the EMR (Epic), as well as gathering data from its radiology and lab information systems, according to Morgan. "As an organization, we see our next step as picking up external labs, looking at potential data exchanges. In particular, we're very interested in working with payors to pull in payor data, claims [and billing] data from their end to be able to complete the loop of patients who may not been within the system," he says.
"I think [cloud] is certainly the direction we are headed," says Adkins. "Storage is essentially free as a percentage of the cost associated with these systems, and a cloud-based architecture certainly makes sense." He cites a cloud-based effort that's under way at Humedica, a healthcare informatics company, which is partnering with healthcare organizations to create longitudinal clinical data.
"We use some cloud technology within our system, [but we] are not using it as much for clinical information at this point," Morgan says. "It's something we monitor and explore: It's certainly a direction we need to move in because there are a lot of potential efficiencies and cost savings."
Mayo uses traditional database systems—IBM DB2 environments with scalable feeder systems and organizational systems that feed data into it. "For us, that's a much more maintainable and reasonable system," Chute says. "We are establishing several small [internal] clouds—fog patches—that we're using for SHARP [Strategic Health IT Advanced Research] and Beacon grants, maintaining patient information within the firewall."
"More, better, faster is the usual mantra," says Chute. "We are frustrated by limitations of the amount of information that we actually have in our repository. We put most of the information that we have into our EMR, all the news that's fit to print, in a sense—the items that are sensible to include there. [However,] you don't have to futz with it very long before you recognize that you're dealing with questions that really want information from a departmental system, which is the next layer down, or from some kind of source feeder system.
"It is a question of how deep into the original data do you go?" Chute adds.
At Carilion, "our current goal is looking at payors and we've had more active talks recently," Morgan says. "In the last two months, we have developed a relationship with Aetna, and we're having more discussions. Certainly, the more information, the better. [We're trying to] figure out ways we can share data back and forth."
Carilion generates many reports for leadership regarding physician quality and pulls financial data out of that repository, says Morgan. The network also has shared data with Humedica for the Anceta project, enabling it to perform additional analytics beyond what it can do onsite, he adds. Anceta, a subsidiary of the American Medical Group Association (AMGA), is partnering with Humedica to create data-sharing care collaboratives that are open only to AMGA members.
"With the Anceta project, using [its] diabetes module, we've created some leadership reports trying to refine the data more. And in our patient-centered medical home project, all of our quality data and the quality data that we pull from [the National Committee on Quality Assurance] is pulled from the CDR."
Mayo maintains its warehoused data in standardized formats that conform to national information exchange standards, including meaningful use: "We're adopting meaningful use to our warehouse standard specifications," says Chute.
"Our warehouses, to the extent that they're not yet fully consolidated, could speak to each other, but we've not seen much reason to do that. We've focused our energies on consolidating data out to warehouses, rather than trying to get virtually distributed warehouses."
A federated warehouse structure, with virtual integration, would be a viable option for many sites, says Chute. "If we were to start today, we might seriously pursue that as an overarching architecture. But our warehouse activity started in earnest about a decade ago, before federated, distributed computing was reliable."
A flow of CDR information into HIEs is also over the horizon for many organizations. "I seriously doubt we will see public CDRs in RHIOs [regional health information organizations] or state-based HIEs any time soon," Adkins predicts, although the major issues may be political rather than technical, and the hospital systems might want to own the HIEs as an outreach tool to drive referrals, limiting what can be done in the public sector.
Carilion also is not exchanging CDR data with an HIE, according to Morgan. "In Virginia, we are a little behind with data exchange other than what's been necessary for meaningful use," he says. The topic of HIE connection has come in the context of Carilion's five-year plan, but currently, "there is nothing active going on as far as HIE data exchange."
Regardless of what entities exchange data with a CDR, systems that can keep data as intact as possible have some advantages. "Unfragmented data are going to be more reliable as the basis for making inferences and understanding practice patterns," Chute says.
It might help to clarify what constitutes a clinical data repository versus a data warehouse. A CDR can aggregate data from warehouses and package those data in different ways for various purposes, whereas data warehouses typically have one purpose: data storage.
Mayo Clinic, in Rochester, Minn., faced a proliferation of data and data warehouses. In 2005, the organization began development of a data repository built on industry standard data warehousing principles, called the Enterprise Data Trust. "We call it a data trust, but functionally or from any sort of operational characteristic, I would not say they're any different," says Christopher Chute, MD, DrPh, an internal medicine specialist who established the Division of Biomedical Informatics at Mayo Clinic and has seen its CDRs evolve and proliferate over the past decade.
In contrast, the terms are not used interchangeably at Carilion Clinic, a Roanoke, Va.-based organization that includes seven hospitals, about 200 ambulatory sites and 550 providers that serve an area some 200 miles in diameter in Southeast Virginia. "When we talk about a data warehouse, it's separate from the CDR," says Steve Morgan, MD, CMIO at Carilion.
"I am not sure how that is strictly defined and feel like the understanding of what that should be varies," says Steven M. Adkins, MD, FAAFP, chief medical informatics officer at Holston Medical Group, a physician-led, multi-specialty group of nearly 150 primary care physicians, specialists and mid-level providers based in Kingsport, Tenn.
A CDR might not be necessary for managing a population of patients with a given clinical diagnosis, such as diabetes, within a hospital system because this can be done readily using existing analytics: "all I need is data about my patients residing in the EHR," Adkins says. However, benchmarking that care against external clinical standards might necessitate a repository because it requires comparative data from a large, preferably national or regional, aggregation of similar patients.
"Most organizations will not have the scope or reach to collect enough data on their own," he says.
Holston Medical Group has a large primary clinical database in its EHR and has created several data warehouses outside that primary database used for reporting and analytics, according to Adkins, but hasn't aggregated data from outside the system into a warehouse or repository separate from the EHR.
At the other end of the spectrum, Mayo has about 800 clinical registries of various stripes. "That was part of the problem that precipitated our goal of engineering a shared common repository, our Enterprise Data Trust," says Chute. "It's inefficient to maintain that many purpose-specific registries." Much of the contents of these registries should be common and sharable, with the understanding that "you don't need to share data 800 independent times, particularly for things like demographics or lab data or medication data—what I would call the bread and butter of most registries. They can and should be curated well and once, and then serve [their appropriate] use case," he says.
Prior to Mayo's Enterprise Data Trust efforts, the organization had "probably half a dozen" traditional data warehouses for specific purposes that were engineered over time, says Chute. The Enterprise Data Trust consolidation strategy focused on the warehouses as opposed to the registries, enabling Mayo to build and operate a single comprehensive source for data that is curated and managed appropriately, with robust access safeguards protecting security and integrity, says Chute.
"One must always be concerned about inappropriate access and use. Privacy is obviously the foremost attention of the organization," Chute says. "We don't want to risk that with a needless proliferation of data points."
The Carilion Clinic CDR is retrieving data from the EMR (Epic), as well as gathering data from its radiology and lab information systems, according to Morgan. "As an organization, we see our next step as picking up external labs, looking at potential data exchanges. In particular, we're very interested in working with payors to pull in payor data, claims [and billing] data from their end to be able to complete the loop of patients who may not been within the system," he says.
Navigating 'fog patches'
As cloud computing conquers other data- and storage-intensive applications, it might be a good fit for CDR architecture. But not yet."I think [cloud] is certainly the direction we are headed," says Adkins. "Storage is essentially free as a percentage of the cost associated with these systems, and a cloud-based architecture certainly makes sense." He cites a cloud-based effort that's under way at Humedica, a healthcare informatics company, which is partnering with healthcare organizations to create longitudinal clinical data.
"We use some cloud technology within our system, [but we] are not using it as much for clinical information at this point," Morgan says. "It's something we monitor and explore: It's certainly a direction we need to move in because there are a lot of potential efficiencies and cost savings."
Mayo uses traditional database systems—IBM DB2 environments with scalable feeder systems and organizational systems that feed data into it. "For us, that's a much more maintainable and reasonable system," Chute says. "We are establishing several small [internal] clouds—fog patches—that we're using for SHARP [Strategic Health IT Advanced Research] and Beacon grants, maintaining patient information within the firewall."
A place for everything
If money and time were no concern, what would you like to see in your CDR?"More, better, faster is the usual mantra," says Chute. "We are frustrated by limitations of the amount of information that we actually have in our repository. We put most of the information that we have into our EMR, all the news that's fit to print, in a sense—the items that are sensible to include there. [However,] you don't have to futz with it very long before you recognize that you're dealing with questions that really want information from a departmental system, which is the next layer down, or from some kind of source feeder system.
"It is a question of how deep into the original data do you go?" Chute adds.
At Carilion, "our current goal is looking at payors and we've had more active talks recently," Morgan says. "In the last two months, we have developed a relationship with Aetna, and we're having more discussions. Certainly, the more information, the better. [We're trying to] figure out ways we can share data back and forth."
Carilion generates many reports for leadership regarding physician quality and pulls financial data out of that repository, says Morgan. The network also has shared data with Humedica for the Anceta project, enabling it to perform additional analytics beyond what it can do onsite, he adds. Anceta, a subsidiary of the American Medical Group Association (AMGA), is partnering with Humedica to create data-sharing care collaboratives that are open only to AMGA members.
"With the Anceta project, using [its] diabetes module, we've created some leadership reports trying to refine the data more. And in our patient-centered medical home project, all of our quality data and the quality data that we pull from [the National Committee on Quality Assurance] is pulled from the CDR."
Mayo maintains its warehoused data in standardized formats that conform to national information exchange standards, including meaningful use: "We're adopting meaningful use to our warehouse standard specifications," says Chute.
"Our warehouses, to the extent that they're not yet fully consolidated, could speak to each other, but we've not seen much reason to do that. We've focused our energies on consolidating data out to warehouses, rather than trying to get virtually distributed warehouses."
A federated warehouse structure, with virtual integration, would be a viable option for many sites, says Chute. "If we were to start today, we might seriously pursue that as an overarching architecture. But our warehouse activity started in earnest about a decade ago, before federated, distributed computing was reliable."
Integration and exchange
On the integration and data standards side, the consolidated clinical data is indirectly saving money, improving privacy and workflow, according to Chute. It also is improving care because with more information available, "you get a more complete picture of what's going on with the patient than you'd get in a thumbnail view in any particular registry. It's much more holistic to look at all information associated with patients regardless of which department systems it came from or which computer they were seen on," he says.A flow of CDR information into HIEs is also over the horizon for many organizations. "I seriously doubt we will see public CDRs in RHIOs [regional health information organizations] or state-based HIEs any time soon," Adkins predicts, although the major issues may be political rather than technical, and the hospital systems might want to own the HIEs as an outreach tool to drive referrals, limiting what can be done in the public sector.
Carilion also is not exchanging CDR data with an HIE, according to Morgan. "In Virginia, we are a little behind with data exchange other than what's been necessary for meaningful use," he says. The topic of HIE connection has come in the context of Carilion's five-year plan, but currently, "there is nothing active going on as far as HIE data exchange."
Regardless of what entities exchange data with a CDR, systems that can keep data as intact as possible have some advantages. "Unfragmented data are going to be more reliable as the basis for making inferences and understanding practice patterns," Chute says.