From the Ecological Society of America

Computational reproducibility — the ability to accurately reproduce outcomes from data sets using the same code and software — will be an increasingly important factor in future scientific studies according to a new paper released in the Ecological Society of America’s journal Ecological Applications.

Authors Stephen M. Powers and Stephanie E. Hampton, researchers with Washington State University’s School of the Environment, highlight the importance of adapting to, providing and using data sets that are open to and usable by the public and investigators in ecology and other field research.

“Increasingly, peers and the public want more transparency,” Powers explains.

Ecologists, finding themselves in an inherently field-oriented science, have long faced the challenge that it is impossible to perfectly repeat observational studies of the natural world — weather conditions vary, populations change over time, and many other conditions in field work are not reproducible. The paper argues that ecologists should focus more on data sharing and transparency in the future in order to increase scientific reproducibility.

An investigator may spend considerable time, effort and cost attempting to generate results of someone else’s study from scratch. When both data and code used to obtain statistics and results are published, the investigator saves on these efforts, and can even improve or modify the original author’s computer code. Essentially, sharing this information means less time is wasted for reviewers, editors, and authors alike.

Mobile radar antenna pointed at severe storm.
Data gathering and sharing offers key to increasing weather knowledge base and reproducibility. (Photo courtesy of NOAA.)

It’s not only scientists that benefit from reproducibility and transparency; “In natural resource management and similar policy issues, high transparency is essential to maintain public trust,” says Hampton, who is also director for the Division of Environmental Biology (DEB) at the National Science Foundation (NSF). Being open about data and code from the beginning of a project can help scientists minimize post-publication work to share or clarify the products or to answer questions about contentious results from outside audiences.

The authors also emphasize that it is imperative to prepare young researchers for the computational expectations of the future by engaging them in the process now: “It takes time to develop new practices and skills so it’s important to prepare for transparency at the beginning of a project. It’s no fun to scramble and address transparency requirements at the last possible moment,” said Powers.

To facilitate these efforts, code is frequently shared through web-based services and repositories that host thousands of data sets. Such tools are now widely accessible and attitudes and norms increasingly favor data reuse.

Three years ago, Ecological Applications mandated that all data associated with manuscripts must be made available in a permanent, publicly accessible archive or repository. “When science is used to support decisions, transparency is paramount and the more consequential the decision, the more important it is that all of the stakeholders be able to examine the basis of a recommendation,” the journal’s chief editor, David Schimel, says. “Our open science policy ensures that work published in our journal meets the highest standards for actionable information.”

Other policies like this include the NASA Earth Science Data and Information Policy and the Long-term Ecological Research (LTER) program. The NSF has also mandated that submitted grant proposals include data management plans as well as the details of data publication from prior NSF support.

Powers, Hampton, and others argue that these developments in data requirements allow ecologists to examine studies and ideas with unprecedented power and to foster critical inquiry and new knowledge for the benefit of society amid global change.