2006-11-17

mid-term updates

The dearth of posts here for the last month should be taken as an indication of progress and stress here at the home base. It is difficult to save the world one simulated cloud at a time, but it will be worth it. The blog will likely continue to suffer in coming months, but I will try to put up interesting tidbits on a weekly-ish basis.

Today's tidbit is about supercomputers. What do I know about supercomputers? Not too much, but sometimes I use them. Okay, sometimes I use small little chunks of them (anywhere from 8 to 128 processors right now, maybe more in the near future). However, people who do know about high-performance computing are abuzz about the new rankings of the top 500 supercomputers [LINK]. The IBM machine at the Lawrence Livermore National Lab is destroying its competition, running at an impressive 280.6 teraflops. That is 280.6 trillion operations per second, where an operation is basically adding or multiplying some numbers. A nice desktop computer can usually crank out about one billion operations per second (1 gigaflops), which is 280,600 times less than the BlueGene/L at LLNL. The next closest speed to the BlueGene/L is at Sandia National Lab, which runs a Cray (called "red storm") that gets 101.4 teraflops. That seems like nothing in comparison, but it is only the second system to break the 100 TFLOPS barrier.

For comparison, the Earth-Simulator in Japan (made from NEC parts, 5120 processors) is now ranked 14th at about 35 TFLOPS. That facility is still considered an amazing feat, and the atmospheric simulations coming from them are still astounding people in the atmospheric sciences [EXAMPLE]. NCAR's newest machine (one I definitely do not have access to) is "blueice," an IBM machine running 1696 processors at 10.5 TFLOPS.... I think this machine is getting expanded very soon, too. They also have a BlueGene to play with that is ranked 144, using 2048 processors, and IBM machines (1600, 608 processors) at numers 193 and 213.

Why does any of this matter? Well, for one thing, we are inching closer and closer to the ultimate goal. Also, we are on the brink of "peta-scale" computing, which is probably going to change the way computational science gets done. We'll be able to do simulations much faster with much finer resolutions, which will produce incredible amounts of data. It will be a challenge over the next few years to develop ways to deal with all that data. It will require different software approaches as well as new hardware. With standard desktop technology of today, the file I/O (that is, just reading the data from the hard disk) is far too slow to deal with the amount of information that we're going to be dealing with. Crunching the numbers and then visualizing the model output is going to be tremendously difficult without an incredible amount of support from computer-savvy folks who can help the scientists. The technology is coming, money is already being spent, projects are being planned, so now is the time to start thinking about how to deal with the output.