At the request of Richard Crandall, here is a summary of some of the work I have been doing in the computational neuroscience lab of Adrienne Fairhall at the University of Washington. I hope to give you a brief taste of what we're doing in the lab and then to explain how many of the computational problems we face are easily amenable to massively parallel architectures such as Xgrid and how parallelizing computations allows researchers in the group to spend less time writing code.
Of course, we would be happy to answer questions or offer suggestions if anyone is considering similar work.
First, a bit of background. The goal of the Fairhall Lab is to understand the computation performed by individual neurons and neural circuits. For our purposes, "computation" is a functional description of the neuron's mapping from stimuli to spike times: because mammalian action potentials are stereotyped, the time of individual action potentials is the only information transmitted to down-stream neurons. As Hodgkin and Huxley first showed in 1952, the computation performed by a single neuron at relatively short time scales can be described as a system of nonlinear differential equations. These equations describe the kinetics of ionic currents across the neuron's membrane, which are both caused by the stimulus and are responsible for the generation and propagation of an action potential response. The parameters of these equations can be determined by experiment. Unfortunately, the equations describing even simple neurons are not analytically solvable and the form of the equations describing ionic current kinetics don't provide direct insight into the 'computation' performed by the neuron—is a neuron containing a particular set of ion channels an integrator, a differentiator, a high-pass filter, etc.? So, how can we map dynamical system descriptions to functional descriptions? This question is a current focus of the lab and we take two approaches to answering it: developing techniques to accurately define a functional description of a neuron from experimental data and developing analytical techniques to relate the dynamical system description to the functional description and vice versa.
As you might begin to guess from my description, many of the questions we're addressing involve analyzing many thousands or millions of neural responses in stimulus-response pairs, either to identify the features of the stimulus that trigger an action-potential response or to simulate the dynamical system with particular parameters and input. Because we can identify the history dependence of a given neuron from experimental data (or analytically from neuron models), we can divide these analyses into independent stimulus-response units, suitable for distribution across a parallel architecture.
Because numerical research of this kind is often exploratory, coding all of our analyses in efficient languages such as C or C++, for example, often amounts to an unacceptable overhead. It is difficult to use such code from an interactive prompt and incremental changes in UI or algorithms require substantial coding effort, unless great care is taken from the outset in code design. Most of the researchers in the lab have no computer science background, so ease of development and interaction with analysis code is a priority for us. Therefore, interactive languages such as Matlab or Python are the preferred language/environment for implementing analysis algorithms. The down side to these languages, of course, is that they are much less efficient than compiled languages, even though they use standard numerical libraries for computations (e.g. Numerical python).
Xgrid has provided a very welcome addition to our toolkit because it allows us to mitigate the loss in computational efficiency of rapid development languages. Instead of spending time re-coding analysis routines in C/C++, researchers in our lab can now "simply" distribute their analysis across our Xgrid (remember that our analyses are often of many independent events).
Briefly, our Xgrid is a 'desktop recovery'-type cluster. We do not have the resources to install or maintain a dedicated cluster (yet), but we can recover the un-used cycles from the many workstations in our lab. Currently we have a heterogeneous grid of 10 processors (G4 and G5) in 6 agents, totaling approx. 22 GHz. With only one job running on the cluster, my analysis runs approximately 600% faster than on a single 2.5GHz G5, a gain of about 68% of the added processing power. Several projects have become feasible because analysis time has been reduced from weeks to days.
Now, here's the really subversive part of this technology that's gotten us particularly excited: we offer other computational biology labs access to the cluster in exchange for adding their workstations to the grid. Without adding ownership cost (all of the work stations were already in use by researchers), we provide significant computational power to a growing number of researchers. Because each group (at least so far) uses less computation time than they add to the grid in spare cycles, everyone wins big.
It seems that this type of application might become a very valuable niche for Xgrid. Many computational biology groups in our department do not have the resources or sufficiently urgent needs to justify a dedicated computational cluster including acquisition and maintenance costs. Nonetheless, many of these labs are finding that as the complexity of their analyses increases, they wish they had more computational power at their disposal. In addition, they want to maintain the flexible, exploratory nature of their work. Xgrid technology allows us to address both of these needs: as the grid grows researchers have increasing computational power at their fingertips, with no additional overhead, and they can continue to use interactive, interpreted languages for coding their analysis without waiting too long for results.
If you've made it this far, thanks for reading. Please direct any questions/comments/rants my way :)