AI predicts gathering disease with a deep dive into evolutionary genetics

Researchers have used unsupervised machine learning to predict disease-causing properties in more than 36 million genetic variants across more than 3,200 disease-related genes.

In the process they’ve advanced the classification of more than 256,000 genetic variants whose properties—helpful, harmful or neither—have been unknown.

The work was conducted at Harvard Medical School and Oxford University. The resulting study is posted online in Nature.

“Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences,” write co-lead authors Jonathan Frazer, Mafalda Dias and colleagues to contextualize their pursuit.

“In principle, computational methods could support the large-scale interpretation of genetic variants,” they add. “However, state-of-the-art methods have relied on training machine learning models on known disease labels.”

For the current project, the team sought to overcome this limitation by modeling the distribution of sequence variation across organisms—and over vast swaths of time.

In so doing, they hypothesized, they would isolate fitness-maintaining features in protein sequences.

Calling their model EVE for evolutionary model of variant effect, the authors report their technique proved more accurate than labeled-data AI approaches.

What’s more, it can equal or improve upon predictions from more commonly used approaches.

The team states their work with EVE suggests models of evolutionary information can “provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.”

In coverage of the project by Harvard’s news division, Harvard science writer Ekaterina Pesheva reports that EVE looked for patterns that evolution preserved over time. To do so, it analyzed data from 140,000 species—including endangered and extinct organisms.

Co-senior author Debora Marks of Harvard warns that EVE is not a diagnostic test.

Instead, its “computational prowess” can combine with existing clinical options to help geneticists and physicians make diagnoses, predict disease progression and “even choose treatment based on the presence of certain disease-causing genetic mutations.”

To this co-senior author Yarin Gal of Oxford adds:

“We’re not providing clinicians merely with a number but also giving them the degree of uncertainty that comes with it,” Gal said. “This is something that the expert can take and use in the decision-making process. … Building trust between the tool and the expert is an important aspect of this work.”

More from Marks:

“Our results turned out to be far better than we expected. It seems that by simply training a model to fit the distribution of sequences across evolution we extract information that enables us to make unexpectedly precise predictions about disease risk arising from a given genetic variant.”

Harvard coverage here, study here (behind paywall).

Dave Pearson

Dave P. has worked in journalism, marketing and public relations for more than 30 years, frequently concentrating on hospitals, healthcare technology and Catholic communications. He has also specialized in fundraising communications, ghostwriting for CEOs of local, national and global charities, nonprofits and foundations.

Around the web

Compensation for heart specialists continues to climb. What does this say about cardiology as a whole? Could private equity's rising influence bring about change? We spoke to MedAxiom CEO Jerry Blackwell, MD, MBA, a veteran cardiologist himself, to learn more.

The American College of Cardiology has shared its perspective on new CMS payment policies, highlighting revenue concerns while providing key details for cardiologists and other cardiology professionals. 

As debate simmers over how best to regulate AI, experts continue to offer guidance on where to start, how to proceed and what to emphasize. A new resource models its recommendations on what its authors call the “SETO Loop.”