0

When I use .shift() from Pandas on a column in a DataFrame with a date index, I can use it, for example, with .corr(), but I cannot update my old DataFrame or create a new one.

Dataset

My df looks like this:

      Date    X     Y     Z
2000-01-01    x1    y1    z1
2000-01-02    x2    y1    z1
2000-01-03    x3    y1    z1
            ...
2000-01-31    x31   y1    z1
2000-02-01    x32   y2    z2
2000-02-02    x33   y2    z2
            ...
2000-02-29    x60   y2    z2
2000-03-01    x61   y3    z3

And I want to shift some of the features with monthly lag so the data afterward pass to:

      Date    X     Y     Z
2000-01-01    x1    NaN   NaN
2000-01-02    x2    NaN   NaN
2000-01-03    x3    NaN   NaN
            ...
2000-01-31    x31   NaN   NaN
2000-02-01    x32   y1    NaN
2000-02-02    x33   y1    NaN
            ...
2000-02-29    x60   y1    NaN
2000-03-01    x61   y2    z1

What I want to do is shift some columns in the dataframe df using time indexing or a similar approach. My first column consists of data that vary daily, while the other columns contain data that change on a monthly or quarterly basis. This is why I cannot simply apply a daily lag. After shifting the data, they should also align with the "new" month.

As I have shown in the data, y1 after a one-month lag should shift from January to February, and then be adjusted to match the correct number of days in this "new" month.

First try

I tried for first to update my old df:

# max_lag is a dictionary with lags for each feature / column in my df

for key in max_lag.keys():
    df[key].shift(periods=max_lag[key], freq='ME')

This code doesn't cause any error but df after it is not updated with shift values.

Second try

Then I tried to assign df:

for key in max_lag.keys():
    df[key] = df[key].shift(periods=max_lag[key], freq='ME')

And I receive this error:

ValueError: cannot reindex on an axis with duplicate labels

Third try

This time I created a new df to store sifted values:

for key in max_lag.keys():
    df_shift[key] = df[key].shift(periods=max_lag[key], freq='ME')

Error:

TypeError: 'type' object does not support item assignment

I have tried many other small changes, but nothing works. I would really appreciate any help and an explanation of what I am doing wrong or what another approach I can try to achieve shifting and adjusting.

6
  • Hello, and thanks for commenting. I have checked for duplicates, but there are none. Fortunately, thanks to your comment, I have found the source of the problem. Although I still don't know how to solve it. In: ``` df[key] = df[key].shift(periods=max_lag[key], freq='MS') ``` I'm creating a new DataFrame with df[key].shift, which has a datetime index that is obviously shifted in relation to df. I think that is why I am experiencing an indexing problem. Commented Nov 17, 2024 at 19:42
  • As I wrote, I am not sure, but I think that by shifting the datetime index and trying to assign a variable to another DataFrame with datetime index, I receive this problem. Commented Nov 17, 2024 at 19:51
  • Ok, the fact that you have daily data changes things. E.g., suppose you have this: pd.DataFrame({'X': [1,2]}, index=pd.DatetimeIndex(["2000-01-30", "2000-01-31"])) and we shift this 1 month, then you should get 2x Feb 29, 2000. So, that creates duplicate index values. What is the desired output in that case? (You can see that this is a particular issue with month offsets, because it's not a stable unit.) Commented Nov 17, 2024 at 21:25
  • Yes, I use daily data. Some of my features are changing it's value every day, and some every month. So after shifting I want them to be correctly adjusted. And as you wrote, that crates duplicate. Do you think that there is a possibility to use shift() here, or I need to use completely different solution? Commented Nov 18, 2024 at 5:25
  • First focus should be the exact desired result, not the method to get there (cf. XY problem). In the example I provided, a 1-month offset would create 2 entries for Feb 29, 2000. What do you want to happen? 2 rows with that date, 1 for X = 1, 1 for X = 2, or just 1 row with X = [1, 2], or something else (e.g. a mean?). Worry about the method after establishing the desired result. Please edit your question to adjust the sample so that it includes such problematic dates and add the desired result for the sample. Commented Nov 18, 2024 at 5:39

1 Answer 1

0

You need to define the shift lag:

import pandas as pd

data = {
    "Date": ["2000-01-01", "2000-02-01", "2000-03-01", "2000-04-01", "2000-05-01", "2000-06-01"],
    "X": ["x1", "x2", "x3", "x4", "x5", "x6"],
    "Y": ["y1", "y2", "y3", "y4", "y5", "y6"]
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

max_lag = {'X': 2, 'Y': 1}

df_shifted = pd.DataFrame(index=df.index)
for key, lag in max_lag.items():
    df_shifted[key] = df[key].shift(periods=lag)

print(df_shifted)

Which gives you your expected result.

               X     Y
Date                  
2000-01-01  None  None
2000-02-01  None    y1
2000-03-01    x1    y2
2000-04-01    x2    y3
2000-05-01    x3    y4
2000-06-01    x4    y5

The first attempt failed because shift() does not modify the df in-place, but returns a new series. So you need to explicitly reassign the result back to the df. Your second attempt, using freq was wrong because freq is for time-based shifts and introduces reindexing problems, leading to a ValueError due to duplicate or conflicting indices. The third attempt failed because df_shift was not properly initialized, causing a TypeError when you tried to assign values to it.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for commenting. For the sake of clarity, I have shown simplified data. In reality, my data consists of day-to-day datetime entries, but I want to shift my data with a monthly lag. That's why I need to use shift() with freq. Can you tell me if there is any way to use shift() with datetime indexing and avoid reindexing problems?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.