Using cutting-edge statistical models to analyze data from nearly 2,000 families with an autistic child, a multi-institute research team discovered tens of thousands of rare mutations in noncoding DNA sequences and assessed if these contribute to autism spectrum disorder.
Published in the journal Science, the study is the largest to date for whole-genome sequencing in autism. It included 1,902 families comprised of both biological parents, a child affected with autism and an unaffected sibling.
Scientists representing Carnegie Mellon University, UC San Francisco, University of Pittsburgh School of Medicine, Massachusetts General Hospital, Harvard Medical School and the Broad Institute led the research team.
The study is one of 13 being released as part of the first round of results to emerge from the National Institute of Mental Health’s PsychENCODE consortium — a nationwide research effort that seeks to decipher how noncoding DNA, often referred to as the ‘dark matter’ of the human genome, contributes to psychiatric diseases such as autism, bipolar disorder and schizophrenia.
Over the past decade, scientists have identified dozens of genes associated with autism by studying so-called “de novo” mutations — newly arising changes to the genome found in children but not their parents. To date, most de novo mutations linked to autism have been found in protein-coding genes. It has proven far more difficult for scientists to identify autism-associated mutations in noncoding regions of the genome.
“Protein-coding genes clearly play an important role in human disorders like autism, yet their expression is regulated by the ‘noncoding’ genome, which covers the remaining 98.5 percent of the genome and remains somewhat mysterious,” said Carnegie Mellon’s Kathryn Roeder, PhD, corresponding author and UPMC Professor of Statistics and Life Sciences. “Because the genome comprises 3 billion nucleotides, identifying which portions of the noncoding genome, when mutated, enhance the risk of autism is as challenging as looking for a needle in a haystack.”
Using a novel bioinformatics framework, the researchers were able to compress the search from billions of nucleotides to tens of thousands of functional categories that potentially contribute to autism. Working with these categories, they used machine learning tools to build statistical models to predict autism risk from a subset of the families in the study. They then applied this model to an independent set of families and successfully predicted patterns of risk in the noncoding genome.
Though rare de novo mutations were found in many noncoding regions of the genome, the strongest signals arose from promoters — noncoding DNA sequences that control gene transcription. These risk-conferring promoters were most often located far from the genes under their control. They were also found to be largely conserved across species, suggesting that any rare mutations that might arise in these promoters are more likely to disrupt normal biology.
at the University of Pittsburgh School of Medicine.
“We are just scratching the surface of what there is to learn about noncoding regulatory variation in human disease, and the new methods this team has developed will catalyze an important step forward into larger and more comprehensive studies,” said Michael Talkowski, PhD, of Massachusetts General Hospital, Harvard Medical School and the Broad Institute, who also served as corresponding author on the study.