Strata Rx 2013: Model tests robustness of data de-identification
BOSTON--With about 27 percent of healthcare providers experiencing a breach each year, privacy and security remain as critical as ever. De-identification is one piece of an enterprise privacy program that can make privacy effective, according to speakers at Strata Rx 2013.
With the unprecedented amount of data coming to healthcare providers, pro-active de-identification strategies enable professionals to harness these data while maintaining patient privacy, said Nathalie Holmes, director of business development of Privacy Analytics in Ottawa, Canada.
The ability to anonymize data allows their use without dealing with consent and authority issues, said Khaled El Emam, MD, Canada research chair, Electronic Health Information CHEO Research Institute and the University of Ottawa.
Privacy regulations do not require a zero chance of re-identification of data, however, evidence suggests that robust methods can minimize risks to practically nothing, said Emam. “When you look at the evidence carefully, if you de-identify data properly, the hit rate will be small,” he said, referring to breaches. “If you don’t, the hit rate can be high.”
Emam described a formal framework—based on more than 60 real-world scenarios—that evaluates the maturity of de-identification services within an organization and whether they are satisfying the Safe Harbor method outlined in HIPAA regulations. The Safe Harbor standard requires the de-identification of 18 elements, including direct and quasi-identifiers.
Essentially, the framework evaluates entities' practices, how well they implement them and how well they automate them.
In one case study, a disease registry had many connected databases, with much of that data released to internal and external data analysts. While their primary way of anonymizing data is through the Safe Harbor standard, it did not meet the automation criteria because its homegrown scripting language for implementing Safe Harbor did not have external validation. “Sometimes tools don’t do what regulations require them to do,” said Emam.
Also, the entity did not de-identify some identifiers, such as clinical trial participant numbers, which are included under the Safe Harbor standard.
In another case study, Emam cited a claim processor with a need for realistic data for software testing. While it met the implementation and automation thresholds, its practice of anonymizing only through data masking revealed that the processor did not meet the Safe Harbor standard. As data masking only deals with direct identifiers, the entity did not anonymize certain quasi-identifiers required by the standard.
“Despite these efforts, they have missed some key quasi identifiers, as some dates and ZIP codes were not addressed. There is no evidence that the risk of de-identification was very small,” Eman said.