$1.5 million grant to advance ‘big data’ for genomic research

By Sylvia Kantor, College of Agricultural, Human & Natural Resource Sciences

big-data-220PULLMAN, Wash. – Scientists at Washington State University have received a grant from the National Science Foundation to help meet the growing needs of the data driven genomic science community. The Tripal Gateway project will build on existing cyberinfrastructure to enhance the capacity of genomic databases to manage, exchange and process “big data.”

“In a single day, some modern DNA sequencers can output as much data as the human genome,” said WSU’s Stephen Ficklin. “We expect the deluge of data to continue to grow exponentially.”

Ficklin, the lead investigator and a research scientist in horticulture, said that just as computers have seen dramatic improvements that have lowered costs and allowed for mass production, DNA sequencing technologies are undergoing a similar transition. The challenge is no longer affordability of DNA sequencing, he said.

The WSU project is one of 17 grants, totaling $31 million, awarded by the NSF Data Infrastructure Building Blocks (DIBBS) program (http://www.nsf.gov/news/news_summ.jsp?cntn_id=132880&org=NSF&from=news).

Sharing information

Genomic research relies on community databases – websites that house genomic, genetic and breeding data – for use by scientists working in the same research area; for example, cotton, cacao (chocolate) or plants in the rosaceae family like apple, cherry and pear.

By creating ways to easily share data between community databases, on demand, researchers will no longer have to navigate between multiple websites to obtain the information they need.

“Genomics scientists who can access large data sets but have limited resources for storing, sharing and analyzing them will benefit from this work,” Ficklin said.

The three-year project will use software-defined networking technology to quickly transfer large data sets between computational resources and the database to support data sharing and analysis. Ultimately, it will link existing community databases for fruit and hardwood trees as well as legumes into a larger network of online research databases.

Tripal software

The project is based on open-source software known as Tripal (http://tripal.info), originally developed by Ficklin and Meg Staton at Clemson University and significantly enhanced by Dorrie Main at WSU and Kirsten Bett at the University of Saskatchewan. Tripal is used by at least 24 different plant and animal databases, including the Genome Database for Rosaceae (http://www.rosaceae.org/) and community databases for 24 crops developed by the Main lab. Main is a co-investigator of the new project.

The project team also includes Sook Jung, WSU; Alex Feltus and Kuang-Ching Wang, Clemson University; Meg Staton, University of Tennessee; and Jill Wegrzyn, University of Connecticut.

 

Contacts:
Stephen Ficklin, WSU Department of Horticulture, stephen.ficklin@wsu.edu
Dorrie Main, WSU Department of Horticulture, 509-335-2774, dorrie@wsu.edu
Sylvia Kantor, WSU CAHNRS Communications, 206-770-6063, kantors@wsu.edu