2009-02-20

Code development in the cloud

Well, perhaps this diversion into the world of cloud-computing and "Web 2.0" and emerging technology is more than just a diversion. Maybe it is a full-fledged series. Today, I have just learned about a Mozilla Labs project called Bespin, which fits so neatly into my overall picture of how computational science should be done that I felt compelled to throw it up here to make a record of it.

First of all, Bespin is a Star Wars reference. It's the moon where Cloud City is, which you can see in Star Wars: The Empire Strikes Back. Remember? Lando is in charge of Cloud City. Anyway....

The software Bespin is a code editing environment implemented in java and within a modern web browser. The idea is that you open your browser, presumably from any computer you are sitting in front of, and navigate to your code. I'm not sure of the details of where the code resides, if you have a Bespin-related repository, or if you have to provide your own storage that is web-accessible (I'll check), but in any case, you have some code somewhere in the cloud. Bespin provides a unified front end to access and edit the code, including a file browser that they are calling a dashboard. Select your file, and it opens within the Bespin environment, and you have an actual text editor, where you can modify or write code and save. They're also working on collaboration tools, wherein you'd be editing a file and your friend could be editing the same file and you can see what each other are doing. Combined with Skype or other chat client, this could be a really useful way for people to work on code together without constant e-mails and attachments, or the possibility of getting divergent forks in a small project.

The editor is supposed to be very flexible. What this means in reality is yet to be seen. I'm not sure if you can just click a button to use Emacs encodings versus Vi, say, or if you are stuck learning a new editor and figuring out how to customize it. This could be the downfall of Bespin. People are so hung up on their editors (yes, I use emacs and don't want to learn another set of keyboard commands) that they will forego a new useful tool in favor of their comfortable, customized, "efficient" (really, probably not) system. But we shall see.

How does this fit into my vision of science? Well, first let's establish that in several branches of science there are very large computational projects that are used and modified by a fairly large number of people. Let's also put out there now that, at least in atmospheric science, these projects are climate models or the components thereof, and that the people modifying the code are usually NOT developers. At least in the sense that they don't have backgrounds in computer science, they don't keep up with trends in software development, and they don't necessarily know a lot about proper coding. They are, instead, scientists who want the model to do something differently than some other scientist designed it to do. Some equation is changed, or a new process is introduced, or some grad student just wants to run a sensitivity study by changing some parameter. So, each of these uses is going to be working with their own collection of the code, with their modifications. Maybe a group at some university wants to take an established model, and tinker with it collectively, so they'd want their own centralized repository of code that they can all access, but isn't accessible to the general community working with that model. There are lots of permutations and combinations that need to be transparently established by something like Bespin. And more importantly, these people (and I'm definitely throwing myself into this group) need to be able to get into Bespin and forget that they are using it. They need to log in, find their code and start working, with minimal investment in modifying and customizing and tweaking and generally fiddling. It needs to "just work."

Now, back to the point. It would be terrific to have the code accessible from anywhere for all users (or a subset of users, or your own code just for you). This could be especially useful for code that is going to be run on powerful remote computers (just downstairs maybe, but maybe in another state, or across the globe) because it would reduce the overhead of just getting to the code. Many of us have a multistep process to log in to these computers, and then we are stuck with the environment that is set up on the computer (which varies from computer to computer even within the same institution). Having a secure log-in to a website, where all our files would be sitting immediately, always in the same format with the same permissions and color scheme and shortcuts/aliases/keyboard macros/etc would increase productivity overnight. The possibility that one could use a computer in San Diego or Illinois or Virginia and the only difference be the URL that you navigate to would blow people's minds. Sure, there are some technical details I'm glazing over, like how do you compile and run the code from Bespin (that's not what it does, as far as I know), but just getting things set up in the code from a uniform environment would be life altering.

The impediments to this vision are several-fold. The first is inertia. We've had thirty years of using nearly the same simple environment, and for half that time almost exactly the same methods (from an end-user perspective) for accessing computers, setting up our user accounts, customizing our editors and other software tools, and learning how to move things around and get code to run. Change comes slowly to people set in their ways. Second, I think this needs to be adopted by large institutions, such as the supercomputing centers, and promoted very strongly as the "right way" to edit files on those machines. Similarly, the big science codes, like the community climate system model, should adopt this as a preferred method of modifying code within the development process, and that will spread to the research community. Third, these centers/projects/etc have to address security concerns in a reasonable way, with users in mind. This might be a separate thread from promoting a more efficient way of editing and sharing code, but there needs to be some sensical security procedures for getting into the big computers without a bevy of passwords and a pocket full of devices for accessing different computers.

Note that in the impediments, I never use the word Bespin, because it does not have to be Bespin. The scientific community is in need of new tools, that are powerful, flexible, and easy to use. It does not matter what they are, as long as they can be used across platforms and computing environments. Walls need to be broken down, code needs to be cleaned up, and resources need to be better used. Something like Bespin might provide one more piece of the puzzle to getting some of these things done.

For more about Bespin, check out their web site, and/or watch this screencast:

Introducing Bespin from Dion Almaer on Vimeo.

1 comment:

.brian said...

An article on Ars about Bespin

[LINK]