facing the fire: February 2019

2019-02-16

Trimming time series with xarray

There are many times when a data set starts and/or ends at an inconvenient time. My most common experience with this is obtaining an observational data set that begins and ends at months in the middle of the year, but I want to either look at annual means or derive anomalies. These dangling months can be cumbersome, and most of the time I find that the easiest way to deal with them is to trim them off.

Just to be explicit, say I have data in which the time coordinate is something like:

time = ["1920-03-15", "1920-04-15", "1920-05-15", ... , "1987-09-15", "1987-10-15"]

It's monthly data, and I know that there are no gaps, but it starts in March and ends in October.

To deal with this easily, I use the argmax function in xarray. This function returns the index of the maximum of the argument, and in case of multiple equal maxima, the index of the first occurrence. It is worth noting that argmax is directly using numpy's argmax function. So, I just construct a test for what I want for the beginning and ending and look for the first true value (=1). Here's an example in which I have some xarray DataSet called ds that has a time coordinate; note that I've already made sure that the time coordinate can be decoded so we can use the 'dt' accessor.

time = ds_smpl['time']
months = time.dt.month
first_january =  np.asscalar((time.dt.month == 1).argmax())  
# argmax will return the index of the first True (= 1) value
last_december = np.asscalar(-1 - (time.dt.month == 12)[::-1].argmax())
ds_trim = ds_smpl.isel(time=slice(first_january, last_december+1)) 
# +1 b/c slice is exclusive on end
print(f"Index of first January is {first_january} and last december is {last_december}")
print(ds_trim['time'][0].values)
print(ds_trim['time'][-1].values)

The same strategy can be used for other kinds of trimming of time series. If you know the times that you want a priori, this is overkill because you could directly slice that data out of your data set, but when you just want to get a whole number of years or similar, this is the easiest method I've found.