When I use .shift() from Pandas on a column in a DataFrame with a date index, I can use it, for example, with .corr(), but I cannot update my old DataFrame or create a new one.
Dataset
My df looks like this:
Date X Y Z
2000-01-01 x1 y1 z1
2000-01-02 x2 y1 z1
2000-01-03 x3 y1 z1
...
2000-01-31 x31 y1 z1
2000-02-01 x32 y2 z2
2000-02-02 x33 y2 z2
...
2000-02-29 x60 y2 z2
2000-03-01 x61 y3 z3
And I want to shift some of the features with monthly lag so the data afterward pass to:
Date X Y Z
2000-01-01 x1 NaN NaN
2000-01-02 x2 NaN NaN
2000-01-03 x3 NaN NaN
...
2000-01-31 x31 NaN NaN
2000-02-01 x32 y1 NaN
2000-02-02 x33 y1 NaN
...
2000-02-29 x60 y1 NaN
2000-03-01 x61 y2 z1
What I want to do is shift some columns in the dataframe df using time indexing or a similar approach. My first column consists of data that vary daily, while the other columns contain data that change on a monthly or quarterly basis. This is why I cannot simply apply a daily lag. After shifting the data, they should also align with the "new" month.
As I have shown in the data, y1 after a one-month lag should shift from January to February, and then be adjusted to match the correct number of days in this "new" month.
First try
I tried for first to update my old df:
# max_lag is a dictionary with lags for each feature / column in my df
for key in max_lag.keys():
df[key].shift(periods=max_lag[key], freq='ME')
This code doesn't cause any error but df after it is not updated with shift values.
Second try
Then I tried to assign df:
for key in max_lag.keys():
df[key] = df[key].shift(periods=max_lag[key], freq='ME')
And I receive this error:
ValueError: cannot reindex on an axis with duplicate labels
Third try
This time I created a new df to store sifted values:
for key in max_lag.keys():
df_shift[key] = df[key].shift(periods=max_lag[key], freq='ME')
Error:
TypeError: 'type' object does not support item assignment
I have tried many other small changes, but nothing works. I would really appreciate any help and an explanation of what I am doing wrong or what another approach I can try to achieve shifting and adjusting.
df[key].shift, which has a datetime index that is obviously shifted in relation to df. I think that is why I am experiencing an indexing problem.pd.DataFrame({'X': [1,2]}, index=pd.DatetimeIndex(["2000-01-30", "2000-01-31"]))and we shift this 1 month, then you should get 2x Feb 29, 2000. So, that creates duplicate index values. What is the desired output in that case? (You can see that this is a particular issue with month offsets, because it's not a stable unit.)X = 1, 1 forX = 2, or just 1 row withX = [1, 2], or something else (e.g. a mean?). Worry about the method after establishing the desired result. Please edit your question to adjust the sample so that it includes such problematic dates and add the desired result for the sample.