Seth Truscott, College of Agricultural, Human & Natural Resource Sciences
These gene expression networks could promote development of biomarkers that breeders use to grow new crops through traditional methods. The work could aid scientists in finding new genes that influence plant and animal health.
Stephen Ficklin, a computational biologist in the WSU Department of Horticulture, and colleagues will build the networks with $895,000 from the National Science Foundation. Their project will help test the Scientific Data Analysis at Scale project, or SciDAS, a $2.9 million NSF-funded effort to improve the United States’ cyber infrastructure and help scientists make better use of it.
“Improving our cyber infrastructure helps make the U.S. more competitive in research,” Ficklin said. “It keeps us in the forefront of data science.”
The WSU team will collaborate with Clemson University and RENCI, the Renaissance Computing Institute, which is a collaboration between the University of North Carolina-Chapel Hill, North Carolina State University and Duke University.
Helping breeders screen seedlings quickly
The researchers will take data from the National Center for Biotechnology Information, a public repository of genomic information, for as many species as possible and create networks that show how each gene interacts with every other one.
“In the end, we will create the most complete repository of gene coexpression networks that exists anywhere,” Ficklin said.
By following network connections, he said, scientists could discover genes that benefit agriculture, medicine, animal science and other fields.
“As a WSU researcher, I hope to help plant breeders,” he said. “I want to build networks like these as tools for breeders to find traits they’re interested in. They could use biomarkers to screen their seedlings in weeks instead of months or years.”
“Plant breeders can look for genes known to be associated with good or bad traits and use them to make traditional crosses,” he added.
Expanding cyber infrastructure
With tens of thousands of genes in every organism, the computer power required to create these gene-expression networks is vast – well beyond the capability of a single supercomputer. A single test case by Ficklin and collaborators required 1,200 processors and four weeks – divided among 70,000 computation jobs.
SciDAS gives researchers an easier way to spread their computational needs across existing large-scale resources, such as the Open Science Grid or Cloud Lab, growing or shrinking their demands as needed.
The final network will be kept, as part of SciDAS, on three petabytes (3 million gigabytes) of storage at WSU, RENCI and Clemson University.
News media contact:
Stephen Ficklin, WSU Department of Horticulture, 509-335-4295, email@example.com