A Washington State University research team has received a National Science Foundation grant to develop a computing framework to help scientists better manage the large-scale data challenges that are used in everything from climate change and cosmology research to disease treatments.
Led by Dingwen Tao, assistant professor in the School of Electrical Engineering and Computer Science, the researchers will develop a layer of software that can easily help scientists configure the compression of their data while making sure that they can still use and analyze compressed data.
The 3-year, $600,000 grant expands on a project Tao worked on with Argonne National Laboratory to compress massive data sets for use in an exascale computing. This next generation of supercomputing aims to be able to do 1018 computations, or a billion billion, computations in one second.
Similar to the way compression works to compress video or photos so that they can easily be shared, the compression system that the researchers developed has to be able to selectively lose some of the massive amounts of data at petascale, or 1015 , or exascale levels without compromising on data accuracy and its usefulness for the scientists. But the technology that compresses videos or photography can’t be directly applied to management of scientific data.
“The scientists need to be able to identify scientific ideas, do data analysis, and identify the useful information there,” said Tao. “We’re trying to develop efficient data compression software to reduce data size, but we want to keep the scientific discovery in the data.”
The framework the researchers developed carefully controls the loss of the data while also reducing the data sizes.
“We don’t need to have full resolution of that data, but we have to be more sophisticated than just cutting the data bits,” Tao said.
The researchers used artificial intelligence and data mining techniques to predict how to best reduce data sizes.
“Originally, the data is very coarse and irregular, but if we make a good prediction based on historical data or spatial information, we can make the data more regular and smoother, and then that data is easier to manage and compress,” he said. “We can guarantee that the loss of the information in the data will not affect their user scenario or applications or post analysis.”
The Argonne researchers along with Tao recently received an R & D 100 Award for their work to develop the compression software. The award, sponsored by R & D World Magazine, is known as the most prestigious awards program and is often considered the “Oscars” of innovation in science and technology.
The new grant continues work on the framework with the goal of expanding its scope to a more general population of scientists and making it easier for them to use. The 3-year, $600,000 grant will allow Tao’s team to develop a layer of software that can easily help scientists to configure their data compression while in the meantime guaranteeing their post analysis or visualization.
The scientists don’t want to be worrying about the details about how their data is compressed, he said. They just want to know that their data is accurate and safe.
“They just want to download the software, click, and use it to manage their data set very well,” he said.