Engineer solves biological problems with programming

PULLMAN, Wash. – Ananth Kalyanaraman, assistant professor in the School of Electrical Engineering and Computer Science, has received a prestigious U.S. Department of Energy Early Career Award for work using sophisticated computer programming to solve challenging, big picture problems for biologists.
The $750,000, five-year award supports the research goals of promising faculty members starting their careers. Nationwide, 65 researchers received the award out of approximately 1,150 applicants from universities and national laboratories.
 
The field of computational biology still is in its nascent stage, said Kalyanaraman. With increasingly faster and better computers, biologists are able to gather vast amounts of data in fields such as genomics.
 
In environmental microbiology, for instance, identifying protein families has important implications in fields ranging from alternative energy to medicine. Scoop up one small handful of dirt, however, and you can get millions of microbes; trying to find the one that is doing something that you’re interested in can be impossible.
 
Knowledge about the microbial world is limited to about one percent of microbes, Kalyanaraman said. Researchers face challenges when trying to find relationships that interconnect vast uncharacterized protein sequences collected from environmental samples.
He will be using speedy computers, 10,000 to 1 million times faster than the average desktop computer, to help solve some of these grand scale problems for biologists.

“A better use (than increasing speed) is to increase our capacity and our capability,’’ he said. “I want to help solve much bigger problems and improve the quality of the results.’’

 
With the grant, Kalyanaraman will work to identify clusters within very large graphs that are built from massive amounts of biological data. In the graph-based data, there can be billions of nodes. Researchers want to know where the nodes concentrate as groups, which could provide valuable information about something like the function of proteins in a community of microbes.
“The idea is to organize the data that they have and get information out of it,’’ Kalyanaraman said.

Currently, researchers use serial computers or, at best, commodity clusters – where one computer communicates with others – to try to tease out answers. Unfortunately, the scale of problems that can be solved in such environments is not adequate to provide the big picture or get a holistic understanding of the data.

 
Kalyanaraman also will develop algorithms to support new queries that can transcend database boundaries, so that scientists can ask more general questions about protein functions and relationships. Researchers currently can look at one particular protein and a particular database, but they have little ability to look at all the vast amounts of data in numerous databases.
 
Using the super computer and newly developed algorithms, Kalyanaraman would like to develop a unified graph to express the relationships that would provide output across the databases, he said.
However, scaling up algorithms to some of the fastest available computers isn’t straightforward, he said.
 
“The old ideas won’t work,’’ he said. “You need new ideas and you have to re-invent the algorithms.’’