U.S. National Science Foundation (NSF) officials today are spotlighting the development and implementation of novel, multi-stakeholder partnerships that promise progress in big data discovery, education and innovation at a White House-sponsored event, Data to Knowledge to Action, in Washington, D.C.
The event follows last year’s launch of the Administration’s National Big Data R&D Initiative that has recorded significant progress and accomplishments since being introduced in March 2012.
A leader in the initiative, NSF has made significant investments to advance big data–the collection and analysis of extremely large datasets–by developing foundational techniques and technologies, building infrastructure, and nurturing research and education communities.
In addition, NSF has supported a broad range of research activities, including projects to extract value from large, complex data sets such as cancer genome data; to approximate how the human brain processes language to improve computer search engines and understanding of our own brain’s activity and to build data storage systems that retrieve information 200 times faster than current systems. One NSF-funded research group at the University of California, Berkeley has already attracted the likes of Yahoo! and Twitter and has fueled an array of new start-ups.
“We are seeing tremendous progress–data are accelerating the pace of discovery in almost every science and engineering discipline,” said Farnam Jahanian, NSF’s head of Computer and Information Science and Engineering who also serves as co-chair of the Networking and Information Technology Research and Development subcommittee that is co-sponsoring Data to Knowledge to Action. “These discoveries lay the groundwork for a national innovation ecosystem that will strengthen the foundations of U.S. competitiveness for decades to come.”
Last March, NSF led the charge in carrying out federal efforts to advance the Obama Administration’s National Big Data Research and Development Initiative, a major step toward addressing the challenges and opportunities of big data. At its launch, six federal departments and agencies promised more than $200 million in new commitments aimed at managing the explosion of big data and developing new tools and techniques to extract knowledge and accelerate discovery and innovation.
What follows is an update on NSF’s activities:
NSF invests $30 million in foundational research for new techniques and technologies for big data. NSF, with support from the National Institutes of Health (NIH), issued a joint call for proposals in March of 2012 to advance foundational research in managing, analyzing, visualizing and extracting useful information from large, diverse, distributed and heterogeneous data sets. After a robust merit review, NSF invested nearly $30 million in 46 new projects involving dozens of universities across the country for a wide portfolio of research in areas including data management techniques, analytic approaches and e-science collaboration environments.
NSF invests $32 million in eight Data Infrastructure Building Blocks (DIBBs) projects. DIBBs efforts aim to advance and build extensive data infrastructures that enable scientific discovery. One DIBBs project, for instance, will operate the Sloan Digital SkyServer for the astronomy community and update and modify its components so that it can be easily reused by other areas of science, such as turbulence research, environmental science, neuroscience, genomics and radiation oncology. Other projects work to create more sophisticated tools for data analytics and to design search engines that go beyond text-based queries.
NSF funds Berkeley’s AMPLab. The University of California at Berkeley’s Algorithms, Machines, and People Laboratory (AMPLab), whichNSF is supporting with a five-year, $10 million Expeditions in Computing award, has developed a deeply integrated, high performance open source platform for data analysis and machine learning on big data. This work, which has already fueled start-ups, has also been adopted by companies such as Yahoo! and Twitter.
NSF sponsors Ideas Lab. From Oct. 7-11 in Atlanta, Ga., an NSF-funded Ideas Lab brought together leading experts from a range of disciplines and perspectives to generate transformative ideas for using large datasets to enhance the effectiveness of teaching and learning environments. In 2014, NSF plans to make up to $5 million available for compelling proposals, including up to $3 million for those that emerged from the Ideas Lab.
NSF funds new center to better understand human intelligence, build smarter machines. NSF recently awarded $25 million to establish a Center for Brains, Minds and Machines at the Massachusetts Institute of Technology. This center is seizing opportunities in areas ranging from artificial intelligence to neurotechnology, all involving vast amounts of data, for an integrated effort to produce major breakthroughs in fundamental knowledge. This is a key component of the administration’s BRAIN Initiative.
NSF funds large project to improve the usability of privacy policies in an era of big data. A $3.75 million award made through NSF’s Secure and Trustworthy Cyberspace program will enable researchers to systematically collect and analyze website privacy policies in order to improve the efficacy of communicating privacy policies to website users.
EarthCube invests in big data technology in the geosciences. EarthCube, which supports the development of cyberinfrastructure to expedite the delivery of geoscience data for knowledge, awarded $13.25 million to 16 projects in 2013. Funded projects include networks to help geoscientists develop standards and policies, demonstrations of promising technologies for integrating across vast geosciences data, and activities to plan innovative architectures across the whole enterprise.
NSF supports U.S. participation in the Research Data Alliance (RDA). NSF awarded $2.5 million to support the U.S. arm of the international RDA, which has participation from 50 countries, to accelerate research data sharing among scientists around the globe. In the past year, RDA held two plenary sessions in order to devise use cases and best practices for improving interoperability and accessibility of science research data.
NSF invests more than $17 million in Computational and Data-Enabled Science and Engineering (CDS&E). CDS&E projects develop, adapt or use capabilities offered by advancement of both research and infrastructure in computation and data. NSF funded 60 multidisciplinary projects that will employ new computational and data analysis approaches for major scientific and engineering breakthroughs for an investment that surpasses $17 million in 2013, including many involving analysis of large data sets. In fiscal year 2014, a substantial expansion of the program is anticipated.
NSF builds Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21). NSF’s CIF21 portfolio consists of programs throughout the foundation that advance cyberinfrastructure for science across all disciplines. This portfolio, which includes programs such as BigData and CDS&E, supports innovations in software, networking and data sciences, as well as in training of the next generation of cyberinfrastructure-savvy scientists and engineers. In fiscal year 2013 this strategic framework included programs with a combined investment of over $75 million.
NSF funds CIF21 track for graduate education and research. NSF funded three CIF21 Integrative Graduate Education and Research Traineeship (IGERT) projects. While all research involves advanced computation of data in a variety of disciplines, one award in particular, Big Data U: A Program for Integrated Multidisciplinary Education and Research for Big Data Science, aims to create a new breed of scientists able to navigate the intricacies of specific research disciplines and to develop tools for big data science.
The CDS&E track in mathematical and statistical sciences invests an additional $12.3 million. NSF funded another three dozen projects in 2012 and 2013 for researchers from disciplines such as computer science, theoretical and applied mathematics and statistics to engage in projects that cover a wide spectrum of technology. For example, one project is working to make linear algebra, which is widely used in scientific computing, faster and easier to use.