Computing the weight of a cell

Creating algorithms to understand the relationships between cells in the human body is one computational challenge. Another research endeavour, underway at the Ohio State University Medical Center, is to weigh the proteins that cause cancer.

A recent finding in cancer research is that protein weight changes are an indicator of cancer. This is a field called proteomics, and it's the study of how proteins work, in the same way that genomics is the study of how DNA works. Cancerous cells cause a change in a protein's amino acid sequence. Weight changes in proteins are also caused by 'post translational' modifications, where a naturally occurring change signalling a cellular event isn't handled properly by cancerous cells.

Dr Michael A Freitas, a research assistant professor at the Ohio State University Medical Center, says that each proteomic study can produce hundreds of thousands of mass measurements. The process involves creating a theoretical space of all database forms in the samples, resulting in a theoretical spectrum greater than 10 to the eighth power. Scientists then compare this theoretical space with an experimental spectrum.

The computational power already exists to analyse the spectra, but there are currently no algorithms that can identify every possible protein form. "We require algorithms with a better understanding of biology," says Dr Freitas. "For example, if we want to consider somatic gene mutations that give rise to changes in a protein sequence, we have to tell the algorithm to include all known mutations. However, we still don't know all the somatic mutations that may exist. We need algorithms that are capable of finding patterns in the data. These unknown protein forms could lead to breakthrough discoveries in biology and lead to a better understanding of cancer."

Besides the need for better algorithms, there's also an infrastructure challenge. The data from samples accumulates at 1GB of data per hour and requires high-capacity file servers, high-speed networks and dedicated terminals to run the required algorithms, plus access to clusters of computers for processing. "Informatics and high-performance computing are playing a greater role in our efforts to improve cancer patient outcome," says Dr Freitas. "As we improve our ability to detect and identify proteins, we improve our understanding of the cellular machinery. We rely on the use of parallel processing to speed up analysis and distribute the memory requirement across machines."

Dr Freitas noted that two advances in computer technology in particular are helping to further his research. First, multicore processors, 64-bit operating systems and high-speed RAM mean that massive data informatics analysis can be handled on the desktop. Second, graphics card technology has advanced to the point where a field-programmable gate array could be used in a cluster for cancer research.

Currently, researchers use the Ohio Supercomputer Center. The supercomputer consists of the IBM 1350 AMD Opteron cluster (which can handle 22 trillion floating point operations per second) and the HP Itanium 2 cluster. The HP cluster has 516 Intel Itanium 2 processors, more than a terabyte of RAM and 11 terabytes of aggregate disk space, an SGI Altix 3700 with 32 processors and 64 gigabytes of memory and three SGI Altix 250 systems, each with 16 processors and 32 to 64 gigabytes of memory.