facing the fire: computers

Showing posts with label computers. Show all posts

2019-01-10

In-place upgrade from python 3.6 to 3.7

Based on reports Python Bytes and from [here] and [there] it seems like 3.7 is generally faster than 3.6. So, I decided to try it. On one machine, I set up a fresh conda environment with 3.7 and installed all the packages I typically use. The first time I did that, which was months ago, not everything was working, and I put this upgrade plan on hold. Later, I re-tested, and all my packages seemed to be playing nicely with 3.7. I worked in that environment for a while with no problems.

During down time today, I thought it might be good to move another machine to 3.7. This time I decided to take the leap and move my base environment to python 3.7 from 3.6.6. Why not?

There is one step:
$ conda install python=3.7

This takes some time for conda to "solve" the environment. I'm not sure what this actually does, but since it checks for dependencies, it is no wonder that it will take a while because essentially every installed package will need to be removed and reinstalled.

One potential gotcha with this approach is that anything that was pip installed will need to be reinstalled. I think there are a couple of these, but I don't know how to tell which are which. Oh well, I guess I'll find out when something breaks.

Eventually the environment does get solved, and a plan is constructed. Answer 'y' and conda dutifully downloads and extracts many packages.

Conda does all the work:
Preparing transaction step happens with the spinning slash, and finishes.
Verification step happens with the spinning slash, and finishes.
Removing some deprecated stuff (jupyter js widgets) ... And then enabling notebook extensions and validating. Give the OK.
Prints done, returns the prompt.

Did it work?

$which python
/Users/brianpm/anaconda3/bin/python

$python --version
Python 3.7.1

$python
Python 3.7.1 | packaged by conda-forge | (default, Nov 13 2018, 09:50:42)
[Clang 9.0.0 (clang-900.0.37)] :: Anaconda custom (64-bit) on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print(2.**8)
256.0
>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib as mpl
>>> import xarray as xr
>>> xr.DataArray(np.random.randn(2, 3))

array([[-0.355778, 0.836539, 0.210377],
[ 0.480935, 0.469618, -0.101545]])
Dimensions without coordinates: dim_0, dim_1
>>> data = xr.DataArray(np.random.randn(2, 3), coords={'x': ['a', 'b']}, dims=('x', 'y'))
>>> xr.DataArray(pd.Series(range(3), index=list('abc'), name='foo'))

array([0, 1, 2])
Coordinates:
* dim_0 (dim_0) object 'a' 'b' 'c'

Okay, this seems to be working. Repeated similar interactive test with ipython. So far, so good.

Lesson: conda is kind of amazing.

2010-06-14

Someday BibTeX's stranglehold will be broken

But probably not by CrossTeX. I'd like to believe. Maybe if Donald Knuth starts using it...

UPDATE:
Also see biblatex and biber, if you're shopping for replacements.

2009-07-26

Should we prepare for the singularity?

The singularity is the hypothesized moment when artificial intelligence becomes as intelligent as humans. At that point, machines might have the ability to decide to make smarter machines, which will make smarter machines, ad infinitum, relegating humans to a subservient role in society. Another view of the singularity is that it will free humanity from the shackles of the material world, allowing unimaginable lifespan and freedom to think, create, and explore. A NYTimes.com article covers a meeting of computer scientists who are starting to wonder whether limits on artificial intelligence research should be imposed [LINK].

The article makes it seem as though these scientists are concerned with current, or near-future, technologies that could disrupt society. It cites a few recent advances, especially pushing this empathy simulating robot. From my reading, none of these technologies seems very threatening, and most have much more potential for good than harm.

Thinking farther into the future, to a time when the singularity is imminent, these concerns become very relevant. I suspect the scientists are more interested in dealing with ethical issues now that will help decision making then. The fact of the matter is that the singularity, in one form or another, is imminent, and so some thought about what it means is important. Regulating research seems like a wrong-headed direction to me though, because that will mean that the singularity will sneak up on us. Everyone will be pushing their science to bump around the edges of the rules, and suddenly that surface beyond which lies advanced artificial intelligence will be gone, disintegrated, and humanity won't be properly prepared because everyone promised they weren't going to go past that boundary.

Don't get me wrong, even at the moment of the singularity, I don't think it means machines will start taking over. Simply having the capacity to be more intelligent than humans doesn't mean those initial machines will be successful at autonomous thought and decision-making... i.e., they won't really be conscious. Rather, those intelligent machines will be in increment in the machine-human interaction that will, I hope, push the boundaries of the human experience. There are possibilities to extend lifespan, expand thought capacity, stimulate creativity, and boost productivity. These are the promises of intelligent machines, but so were they the promises of digital computers and nano-bots, so we can't rely on it happening. We still don't have flying cars and jet-packs, and we still don't have nanotechnology that repairs roads and buildings or constructs moon bases for us, nor do we know whether a simulation of the human brain pushes artificial intelligence to a new level, or if very advanced computing technology will be able to interact with biological systems in any interesting ways [cf. LINK]. Despite my hope for the coming singularity, it is far from certain that we'll know when it happens or what it means, and it is unlikely, with any amount of planning, that we'll know what to do when that day comes to make the most of the technology.

2009-07-15

You too can calculate a linear trend, for just $1,800

I just saw the announcement that SPSS Inc. is releasing PASW Statistics 17 [LINK]. This is the new name for what used to be called SPSS Statistics, which was (as far as I know) ubiquitously known as SPSS. I'm not sure what prompted the name change, nor do I know what either SPSS or PASW stand for. SPSS Inc. has updated the interface, which now looks a lot like Matlab, except not as useful. Their promotional video really pushes the syntax highlighting and point and click to put in bookmarks/breakpoints or commands. These are features that other languages have had for many years. There are useful statistical features, like their nearest neighbor analysis tools, though it was hard to tell from the demo what the point of it is.

The price is $1,800. Yeah. This is an amazing price for software that is quite limited.

Especially when you compare it against the language R [news, official], which is free and more powerful, and still focuses on statistical analysis. The R language is based on an older language called S, and I think I remember learning that SPSS is also based on S. So these are two evolutionary lines from S, one free and open source and powerful and one very expensive, somewhat slick, and potentially powerful.

Both R and SPSS are limited to statistics (more or less). If you want to do something a little beyond statistical models or analysis, you'll still have to go to some other scripting language like Matlab, IDL, NCL, etc. And, in my opinion, if you have to know one of those anyway, there's really no reason to deal with R or SPSS, since all the same statistics can be calculated pretty easily in those more general languages. But, of course, there are large numbers of people who's work requires a bit of statistics but not much other computation, and I guess those are the people who use SPSS. So maybe there is a spectrum of users, some who use just SPSS, some who dabble in R, some who only use R, some who use R along with other software, and others who just use more general high-level languages. Still, at that price point, I can't believe anyone uses SPSS.

2009-06-03

CoolIris

Holy moly... http://www.cooliris.com

Having just installed this "simple browser plugin" a mere 10 minutes ago, I am wholeheartedly endorsing it and pleading with you to go and install it.

http://www.cooliris.com

I can't even try to describe it, except to say that it is paradigm changing for me.

2009-01-23

Cloud computing and computing clouds

More and more I'm frustrated with the cyber-infrastructure of climate science. It seems to be on the verge of crisis, in a Kuhnian sense. Everyone has individual solutions for how to do large computations, manage very large data sets, and collaborate between institutions. For example, due to limited resources, I just had to move some simulation output from a remote server to a local, external hard drive. One simulation (not a big one) generated some 50GB of output that I don't really want to throw away. Retrieving this data took hours, and then several more hours to send it from the remote site to my desk. It's crazy, inefficient, and isolating.

There needs to be a better way. We need to harness the power inherent in "cloud computing" and the latest technology for using simple, intuitive web interfaces for accessing remote data (e.g., MobileMe, Google applications, etc.) and apply them to scientific computation, data storage, and data analysis.

We have seen small steps in these directions from projects like SETI@home and climateprediction.net, among others. I have also just read an article from Nature [LINK] saying that Amazon (see update below) and Google have both started down these roads, as has the NSF with something called DataNet. However, as the article notes, there are serious challenges, not just in terms of technology but also dealing with access, cost, and fairness. These can be touchy issues, especially in fields where the rate of work can vary greatly among different research groups.

I'll also just complain that even besides dealing with sharing and storing data, the ever-growing size of data sets in Earth Sciences, and particularly in the climate sciences, demands new tools for analyzing and visualizing the data. I've seen some projects that seek to deal with the emerging issues, but the progress of these new tools seems to be lagging significantly behind the growing data sets. As a concrete example, take the analysis of output from the NICAM, a global cloud-system resolving model. This is a model that has points every 7km over the entire surface of the earth. A good deal of variable are on vertical levels, say about 50 of them. It is conceivable that you'd be interested in examining global fields every hour for several years. On a typical desktop, loading a single 3-dimensional field for ONE hour would require all (or more) of the available memory, making operating on the field pretty slow, and doing serious number crunching is basically impossible. This isn't going to be a special case for long, either. A new generation of cutting-edge models will have similar resolution, and as they start producing actual simulations (i.e., ones from which scientific results are desired), analysis tools need to be available to do the job. Right now, I don't have any such tools. Those that do exist need to be made available and useable, and soon.

UPDATE:
I have been looking into these vague notions a bit more. Amazon has a side company called Amazon Web Services that sells cloud computing (computation, storage, database query, etc). The service seems to leverage the fact that Amazon has a ton of computational power and storage just sitting around, so they try to sell their downtime to companies that need more cyberinfrastructure than they can afford. It's a pay as you go system, and you only pay for the compute power/time that you actually use. It seems very interesting. Of course, the problem is transferring this kind of system to a more science community system. It would be nice, for example, if the same kind of system were available from an NSF computing center, and you could access data interactively using a web browser, or submit large simulations from a web browser that then run in the cloud with results going to the online storage facility. Of course, the problem is that "science" doesn't have a giant existing distributed computing environment with plenty of downtime, and there's not a lot of incentive to set one up (i.e., the NSF isn't that altruistic). These are just thoughts to chew on.

2007-04-27

US Army getting into supercomputers

Here's a quick story that seems like it is important. I will refrain from any interpretation of speculation here.

Army funds supercomputing center [LINK]

2006-11-17

mid-term updates

The dearth of posts here for the last month should be taken as an indication of progress and stress here at the home base. It is difficult to save the world one simulated cloud at a time, but it will be worth it. The blog will likely continue to suffer in coming months, but I will try to put up interesting tidbits on a weekly-ish basis.

Today's tidbit is about supercomputers. What do I know about supercomputers? Not too much, but sometimes I use them. Okay, sometimes I use small little chunks of them (anywhere from 8 to 128 processors right now, maybe more in the near future). However, people who do know about high-performance computing are abuzz about the new rankings of the top 500 supercomputers [LINK]. The IBM machine at the Lawrence Livermore National Lab is destroying its competition, running at an impressive 280.6 teraflops. That is 280.6 trillion operations per second, where an operation is basically adding or multiplying some numbers. A nice desktop computer can usually crank out about one billion operations per second (1 gigaflops), which is 280,600 times less than the BlueGene/L at LLNL. The next closest speed to the BlueGene/L is at Sandia National Lab, which runs a Cray (called "red storm") that gets 101.4 teraflops. That seems like nothing in comparison, but it is only the second system to break the 100 TFLOPS barrier.

For comparison, the Earth-Simulator in Japan (made from NEC parts, 5120 processors) is now ranked 14th at about 35 TFLOPS. That facility is still considered an amazing feat, and the atmospheric simulations coming from them are still astounding people in the atmospheric sciences [EXAMPLE]. NCAR's newest machine (one I definitely do not have access to) is "blueice," an IBM machine running 1696 processors at 10.5 TFLOPS.... I think this machine is getting expanded very soon, too. They also have a BlueGene to play with that is ranked 144, using 2048 processors, and IBM machines (1600, 608 processors) at numers 193 and 213.

Why does any of this matter? Well, for one thing, we are inching closer and closer to the ultimate goal. Also, we are on the brink of "peta-scale" computing, which is probably going to change the way computational science gets done. We'll be able to do simulations much faster with much finer resolutions, which will produce incredible amounts of data. It will be a challenge over the next few years to develop ways to deal with all that data. It will require different software approaches as well as new hardware. With standard desktop technology of today, the file I/O (that is, just reading the data from the hard disk) is far too slow to deal with the amount of information that we're going to be dealing with. Crunching the numbers and then visualizing the model output is going to be tremendously difficult without an incredible amount of support from computer-savvy folks who can help the scientists. The technology is coming, money is already being spent, projects are being planned, so now is the time to start thinking about how to deal with the output.