Just to be explicit, say I have data in which the time coordinate is something like:
time = ["1920-03-15", "1920-04-15", "1920-05-15", ... , "1987-09-15", "1987-10-15"]It's monthly data, and I know that there are no gaps, but it starts in March and ends in October.
To deal with this easily, I use the argmax function in xarray. This function returns the index of the maximum of the argument, and in case of multiple equal maxima, the index of the first occurrence. It is worth noting that argmax is directly using numpy's argmax function. So, I just construct a test for what I want for the beginning and ending and look for the first true value (=1). Here's an example in which I have some xarray DataSet called ds that has a time coordinate; note that I've already made sure that the time coordinate can be decoded so we can use the 'dt' accessor.
1 2 3 4 5 6 7 8 | time = ds_smpl['time'] months = time.dt.month first_january = np.asscalar((time.dt.month == 1).argmax()) # argmax will return the index of the first True (= 1) value last_december = np.asscalar(-1 - (time.dt.month == 12)[::-1].argmax()) ds_trim = ds_smpl.isel(time=slice(first_january, last_december+1)) # +1 b/c slice is exclusive on end print(f"Index of first January is {first_january} and last december is {last_december}") print(ds_trim['time'][0].values) print(ds_trim['time'][-1].values) |
The same strategy can be used for other kinds of trimming of time series. If you know the times that you want a priori, this is overkill because you could directly slice that data out of your data set, but when you just want to get a whole number of years or similar, this is the easiest method I've found.
No comments:
Post a Comment