Statisticians have combined state-of-the-art analytical techniques from the academic and business worlds to tackle the Big Data challenges confronting astrophysicists and astronomers as they explore the mysteries of our universe, Lars K.S. Daldorff and Siavoush Mohammadi today told an audience at the 2015 Joint Statistical Meetings (JSM 2015) in Seattle.
These technical advances–called automatic explorative analysis of data–have the potential of greatly aiding these scientists as they seek to understand our universe, as well as researchers who work with Big Data in other fields, said Daldorff and Mohammadi while presenting a topic-contributed session titled “Novel Application of Statistical Tools for Big Data Analyses of Solar Physics” at JSM 2015.
Daldorff is an atmospheric, oceanic and space sciences research fellow in the college of engineering at the University of Michigan and a consultant for NASA’s Goddard Space Flight Center, and Mohammadi is a consultant with Infotrek, a Swedish business intelligence and data warehousing company.
The new analytical techniques Daldorff and Mohammadi described are being used in a study of giant magnetic loops generated by our solar system’s sun. When physicists use large supercomputers to simulate the sun, their research produces massive amounts of data, but the phenomenon of interest is usually located at a specific point in time and space, essentially creating a proverbial needle-in-a-haystack situation for the researcher.
The large quantity of data has forced physicists to reduce data amounts, which they do by looking at small portions of the data at the time, making the process long and slow before true insight is found.
But what if you could scan the entire haystack at once to find the proverbial needle? That’s the question Daldorff and Mohammadi sought to answer when they looked to the commercial data warehouse industry for solutions to search, categorize and filter the large amount of solar research data from plasma simulations Daldorff had conducted for NASA.
There are still many open questions surrounding solar magnetic loops associated with solar spots, which contribute to a considerable increase in X-ray and ultraviolent radiation from the outer solar atmosphere and into the upper atmosphere of the Earth. The phenomena can be seen in this video clip released by NASA’s Heliophysics Science Division as part of the Solar Dynamics Observatory project.
The astrophysicist community speculates that a phenomenon called “magnetic reconnection” occurs when these powerful arches are created. It’s this moment in the data that researchers like Daldorff and Mohammadi want to identify both spatially and in time–or both where and when.
The duo use statistical methods that frequently are used in data warehouses and by analysts at companies to study human behavior–for example, customers–or scientific data, in this case coronal loops. These are analytical methods that combine computational power and statistics to turn information into insight. These standardized methods, widely used in the business world, suddenly find use for a completely different type of data.
As for analytical tools, they have been using SAS’ Visual Analytics platform–a Big Data reporting and explorative tool that works in-memory. Many of the statistical methods employed by SAS Visual Analytics also are standardized statistical methods used in numerous data warehouses.
These analytical tools and methods don’t care what your data are. The methods for identifying points of interest, performing analysis and visualizations and creating reports are the same, regardless if it’s used on business or scientific data, Daldorff and Mohammadi told session attendees.
This automatic exploration of large data sets using statistics and modern analytical methods can greatly reduce the time it takes to extract insight from Big Data–not just for heliophysics research, but all data-intensive research subjects. It removes a major manual repetitive step and automates it, allowing the subject expert to focus on the research topic instead of processing data manually.
“Our hope is these results can help with solar magnetic loops research at NASA and at the same time our work will show the effectiveness of explorative analysis of data in other data-intensive fields. There are numerous possibilities for this new application that could potentially help various types of researchers–in academia, business and science–obtain quicker insights and results from their research’s Big Data,” said Daldorff.
JSM 2015 is being held August 8-13 at the Washington State Convention Center in Seattle. More than 6,000 statisticians–representing academia, business and industry, as well as national, state and local governments–from numerous countries are attending North America’s largest statistical science gathering.