Operation on Pandas Dataframe columns using its Index

Question

This should be relatively easy. I have a pandas dataframe (Dates):

    A   B   C
1/8/2017    1/11/2017   1/20/2017   1/25/2017
1/9/2017    1/11/2017   1/20/2017   1/25/2017
1/10/2017   1/11/2017   1/20/2017   1/25/2017
1/11/2017   1/20/2017   1/25/2017   1/31/2017
1/12/2017   1/20/2017   1/25/2017   1/31/2017
1/13/2017   1/20/2017   1/25/2017   1/31/2017

I would like to take the difference between Dates.index and Dates. The output would be like so:

    A   B   C
1/8/2017     3   12      17 
1/9/2017     2   11      16 
1/10/2017    1   10      15 
1/11/2017    9   14      20 
1/12/2017    8   13      19 
1/13/2017    7   12      18

Naturally, I tried this:

Dates - Dates.index

But I receive this lovely TypeError:

TypeError: Could not operate DatetimeIndex...with block values ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('O')

Instead, I've written a loop to go column by column, but that just seems silly. Can anyone suggest a pythonic way to do this?

EDIT

In [1]: import pandas as pd
import numpy as np
import datetime
dates = pd.date_range('20170108',periods=6)
df = pd.DataFrame(np.empty([len(dates),3]),index=dates,columns=list('ABC'))
df['A'].loc[0:3] = datetime.date(2017, 1, 11)
df['B'].loc[0:3] = datetime.date(2017, 1, 20)
df['C'].loc[0:3] = datetime.date(2017, 1, 25)
df['A'].loc[3:6] = datetime.date(2017, 1, 20)
df['B'].loc[3:6] = datetime.date(2017, 1, 25)
df['C'].loc[3:6] = datetime.date(2017, 1, 31)

In [2]: print(df)
                     A           B           C
2017-01-08  2017-01-11  2017-01-20  2017-01-25
2017-01-09  2017-01-11  2017-01-20  2017-01-25
2017-01-10  2017-01-11  2017-01-20  2017-01-25
2017-01-11  2017-01-20  2017-01-25  2017-01-31
2017-01-12  2017-01-20  2017-01-25  2017-01-31
2017-01-13  2017-01-20  2017-01-25  2017-01-31

In [3]: df = df.sub(df.index.to_series(),axis=0)

ValueError: operands could not be broadcast together with shapes (18,) (6,)

jezrael · Accepted Answer · 2017-03-29 16:32:27Z

You need first convert all columns to_datetime and then use sub:

#if dtypes of all columns are datetime, omit it
date_cols = list('ABC')
for col in df.columns:
    df[col] = pd.to_datetime(df[col])

df = df.sub(df.index.to_series(),axis=0)
print (df)
                A       B       C
2017-01-08 3 days 12 days 17 days
2017-01-09 2 days 11 days 16 days
2017-01-10 1 days 10 days 15 days
2017-01-11 9 days 14 days 20 days
2017-01-12 8 days 13 days 19 days
2017-01-13 7 days 12 days 18 days

You need dtypes datetime64:

dates = pd.date_range('20170108',periods=6)
df = pd.DataFrame(index=dates)
df.loc[0:3, 'A'] = pd.Timestamp(2017, 1, 11)
df.loc[0:3, 'B'] = pd.Timestamp(2017, 1, 20)
df.loc[0:3, 'C'] = pd.Timestamp(2017, 1, 25)
df.loc[3:6, 'A'] = pd.Timestamp(2017, 1, 20)
df.loc[3:6, 'B'] = pd.Timestamp(2017, 1, 25)
df.loc[3:6, 'C'] = pd.Timestamp(2017, 1, 31)
print (df)
                    A          B          C
2017-01-08 2017-01-11 2017-01-20 2017-01-25
2017-01-09 2017-01-11 2017-01-20 2017-01-25
2017-01-10 2017-01-11 2017-01-20 2017-01-25
2017-01-11 2017-01-20 2017-01-25 2017-01-31
2017-01-12 2017-01-20 2017-01-25 2017-01-31
2017-01-13 2017-01-20 2017-01-25 2017-01-31

print (df.dtypes)
A    datetime64[ns]
B    datetime64[ns]
C    datetime64[ns]
dtype: object

df = df.sub(df.index.to_series(),axis=0)
print (df)
                A       B       C
2017-01-08 3 days 12 days 17 days
2017-01-09 2 days 11 days 16 days
2017-01-10 1 days 10 days 15 days
2017-01-11 9 days 14 days 20 days
2017-01-12 8 days 13 days 19 days
2017-01-13 7 days 12 days 18 days

While your solution is helpful, it still loops by column, which I am specifically trying to avoid. Is there not another way?
The loop through columns is only to convert to datetime. If they already are datetime, then you can skip that part
It loop only for converting to datetime, better is use parse_dates parameter in read_csv
Right, so my data is already in datetime format, but this solution produces the operands error... "operands could not be broadcast together with shapes"
Both, really. I've provided an edit to the original question with some mock code.

Jean-François Fabre · Accepted Answer · 2019-05-02 21:18:44Z

0

I think a more explicit and elegant way to do this is to simply use apply.

df = df.apply(pd.to_datetime, axis="columns") # just to make sure values are datetime df.apply(lambda x: x - df.index.to_series(), axis="rows)

edited May 2, 2019 at 21:18

Jean-François Fabre♦

141k24 gold badges179 silver badges246 bronze badges

answered Mar 29, 2017 at 16:03

ehudk

6256 silver badges15 bronze badges

Collectives™ on Stack Overflow

Operation on Pandas Dataframe columns using its Index

2 Answers 2

9 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

Comments

Linked

Related