New computational techniques developed at Lawrence Berkeley National Laboratory (Berkeley Lab) may help save scientists from drowning in their own data. Computational scientists at the Lab have figured out how to streamline the analysis of enormous scientific datasets. The analysis uses the same techniques that make complex subway systems understandable at a glance.
They describe their work in a paper published in PPoPP’13: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
What’s the problem here?
Sophisticated sensors and supercomputers are generating bigger and more complex scientific datasets than ever before. In disciplines like genomics, combustion and climate science, these datasets can range anywhere from tens of terabytes to several petabytes in size. A petabyte of data is equivalent to the storage consumed by 13.3 years of high-definition television.
To tease out the significant features for analysis, many scientists turn to a branch of mathematics called topology, which characterizes shapes of objects without considering aspects like length or angles—simplifying them the same way a subway map turns a complex maze of tunnels, trains and stations into colored lines and dots.
But scientific data are becoming so massive and complex that even simplified topological representations are becoming difficult to analyze efficiently. So more and more researchers are turning to massively parallel supercomputers to study their data.
Read more at: Phys.org