As with many fields, computing is changing how geologists conduct their research. One example: the emergence of digital rock physics, where tiny fragments of rock are scanned at high resolution, their 3-D structures are reconstructed, and this data is used as the basis for virtual simulations and experiments.
Digital rock physics complements the laboratory and field work that geologists, petroleum engineers, hydrologists, environmental scientists, and others traditionally rely on. In specific cases, it provides important insights into the interaction of porous rocks and the fluids that flow through them that would be impossible to glean in the lab.
In 2015, the National Science Foundation (NSF) awarded a team of researchers from The University of Texas at Austin and the Texas Advanced Computing Center (TACC) a two-year, $600,000 grant to build the Digital Rocks Portal where researchers can store, share, organize and analyze the structures of porous media, using the latest technologies in data management and computation.
“The project lets researchers organize and preserve images and related experimental measurements of different porous materials,” said Maša Prodanović, associate professor of petroleum and geosystems engineering at The University of Texas at Austin (UT Austin). “It improves access to them for a wider geosciences and engineering community and thus enables scientific inquiry and engineering decisions founded on a data-driven basis.”
The grant is a part of EarthCube, a large NSF-supported initiative that aims to create an infrastructure for all available Earth system data to make the data easily accessible and useable.
Small pores, big impacts
The small-scale material properties of rocks play a major role in their large-scale behavior – whether it is how the Earth retains water after a storm or where oil might be discovered and how best to get it out of the ground.
As an example, Prodanović points to the limestone rock above the Edwards Aquifer, which underlies central Texas and provides water for the region. Fractures occupy about five percent of the aquifer rock volume, but these fractures tend to dominate the flow of water through the rock.
“All of the rain goes through the fractures without accessing the rest of the rock. Consequently, there’s a lot of flooding and the water doesn’t get stored,” she explained. “That’s a problem in water management.”
Digital rocks physicists typically perform computed tomography (CT) scans of rock samples and then reconstruct the material’s internal structure using computer software. Alternatively, a branch of the field creates synthetic, virtual rocks to test theories of how porous rock structures might impact fluid flow.
In both cases, the three-dimensional datasets that are created are quite large — frequently several gigabytes in size. This leads to significant challenges when researchers seek to store, share and analyze their data. Even when data sets are made available, they typically only live online for a matter of months before they are erased due to space issues. This impedes scientific cross-validation.
Furthermore, scientists often want to conduct studies that span multiple length scales — connecting what occurs at the micrometer scale (a millionth of a meter: the size of individual pores and grains making up a rock) to the kilometer scale (the level of a petroleum reservoir, geological basin or aquifer), but cannot do so without available data.
The Digital Rocks Portal helps solve many of these problems.
James McClure, a computational scientist at Virginia Tech uses the Digital Rocks Portal to access the data he needs to perform large-scale fluid flow simulations and to share data directly with collaborators.
“The Digital Rocks Portal is essential to share and curate experimentally-generated data, both of which are essential to allow for re-analyses and reproducibility,” said McClure. “It also provides a mechanism to enable analyses that span multiple data sets, which researchers cannot perform individually.”
The Portal is still young, but its creators hope that, over time, material studies at all scales can be linked together and results can be confirmed by multiple studies.
“When you have a lot of research revolving around a five-millimeter cube, how do I really say what the properties of this are on a kilometer scale?” Prodanović said. “There’s a big gap in scales and bridging that gap is where we want to go.”
A framework for knowledge sharing
When the research team was preparing the Portal, they visited the labs of numerous research teams to better understand the types of data researchers collected and how they naturally organized their work.
Though there was no domain-wide standard, there were enough commonalities to enable them to develop a framework that researchers could use to input their data and make it accessible to others.
“We developed a data model that ended up being quite intuitive for the end-user,” said Maria Esteva, a digital archivist at TACC. “It captures features that illustrate the individual projects but also provides an organizational schema for the data.”
Raw images, microscopy, data and visualizations from various types of simulations are organized on a branching tree and all facets can be shared, viewed and downloaded. This lets scientists work from one another’s basic research and avoid duplication.
“If I’m spending money creating these images for research, why not spend it on something that hasn’t been imaged,” Prodanović said. “If it’s supported by a federal grant, we all should share the data.”
The team recently launched a set of tools that facilitate common data analysis practices, such as visualizing raw data or running simulations with different parameters. These types of analyses often can’t be accomplished on a single computer in a lab, or even on a small university cluster. For that reason, the portal is connected to the Maverick supercomputer at TACC, which contains NVIDIA GPUs for remote visualization and data analytics. This allows researchers to run complex, high-resolution analyses, even if they are not computational experts.
They have also developed tools to encourage researchers to self-publish their datasets. As they go through the publication process, the system makes sure that data renders correctly, that datasets are complete and that the documentation is correct and spell-checked. Importantly, it also assign Digital Object Identifiers (DOIs) to each dataset, helping to legitimize the idea that data is a product in and of itself that should be shared freely.
“It’s hard to publish accurate, clear datasets,” Esteva said. “Based on this conclusion, we developed a publication pipeline to help researchers publish with confidence and be more productive.”
Another much-appreciated feature of the portal: automatically-generated thumbnail visualizations of each project, which appear on the Portal’s homepage. These let researchers quickly see the types of work their peers are doing, simplifying the task of finding relevant research.
Said Prodanović: “While we have plenty of work to do to build a lasting digital rock physics infrastructure that will enable us to integrate rock information over a range of scales, it is an exciting time — and the right time — to launch the Portal, as all of the essential data and simulation resources and components are in place.”