Scientists at A*STAR’s Genome Institute of Singapore (GIS) have developed a revolutionary method to quickly cut through noise and generate a unified and simplified analysis of high-throughput biological data from, for example, patient samples. The technique, known as a pre-whitening matched filter, is well known in electrical engineering and widely used in cell phones and radar. This is the first time, however, computational scientists, led by Dr Shyam Prabhakar, Associate Director, Integrated Genomics, GIS, have adapted it to the analysis of high-throughput DNA sequencing data, with surprisingly accurate results. The development was recently published in the prestigious journal, Nature Biotechnology.
High-throughput DNA sequencing has revolutionized the study of molecular biology and human disease. The technology has yielded major insights into cancer, infectious diseases, Parkinson’s disease and many developmental disorders.
The difficulties facing this technique are the massive amounts of data that are generated. To add to that, it was generally believed that a different method of analysis was required for each type of sequence data. Hence, each new data type was treated as a completely new analysis problem, resulting in a tremendous number of different analytical methods to solve them.
Dr Prabhakar and his team at the GIS, however, discovered that by using the pre-whitening matched filter technique, the results were uniformly better than other existing algorithms at a whole range of analysis tasks. In essence, the technique was applied to accurately detect segments of the genome that stood out from the rest of the sequence data. This was possible because, as lead author Dr Vibhor Kumar quickly realized, the underlying mathematics to the solution of all these analysis problems was the same.
The team was also able to use a variant of the technique to accurately predict gene expression, from epigenomic data. In other words, they could predict the activity levels of genes from data on chemical changes in the genetic material. This is significant especially in clinical settings, since gene expression is difficult to measure by conventional methods in old and degraded tissue samples.
Read more at: Phys.org