Calculating genetic links between diseases, without the genetic data - Featuring Andrey Rzhetsky

Dec 10, 2019
By Matt Wood

Disease embeddings group different conditions by type and plot them in two-dimensional space to show how closely they are related to one another.

Physicians use standard disease classifications based on symptoms or location in the body to help make diagnoses. These classifications, called nosologies, can help doctors understand which diseases are closely related, and thus may be caused by the same underlying issues or respond to the same treatments.

An important part of understanding disease is estimating its heritability, that is, what percentage of disease variation in individuals is due to inherited genetic variants versus environmental causes like exposure to pollution, infections or trauma. Traditionally, to calculate the heritability of a given disease, researchers needed expensive data sets containing all kinds of medical and genetic data plus detailed knowledge of family relationships. In a new study, data scientists from the University of Chicago estimated heritability and mapped out relationships among thousands of diseases using data from electronic health records.

The study, published December 3, 2019 in Nature Communications, calculated statistical curves of each disease’s prevalence over an average lifetime, showing which tend to strike earlier or later in life. The researchers also created “disease embeddings,” or groupings of diseases that show how closely they are related to each other based on diagnostic codes and notes in the health record. Using similarities in these curves and patterns revealed by the disease embeddings, researchers could then estimate heritability and genetic correlations between diseases.

“It used to be that every new estimate of heritability or genetic and environmental correlations between diseases was a big deal,” said Andrey Rzhetsky, PhD, a data scientist at UChicago who is the paper’s senior author. “Here we were able to estimate thousands of heritability values and hundreds of thousands of correlations, doing what used to be very expensive and slow at a very large scale.”

Read more.