3

This should be relatively easy. I have a pandas dataframe (Dates):

    A   B   C
1/8/2017    1/11/2017   1/20/2017   1/25/2017
1/9/2017    1/11/2017   1/20/2017   1/25/2017
1/10/2017   1/11/2017   1/20/2017   1/25/2017
1/11/2017   1/20/2017   1/25/2017   1/31/2017
1/12/2017   1/20/2017   1/25/2017   1/31/2017
1/13/2017   1/20/2017   1/25/2017   1/31/2017

I would like to take the difference between Dates.index and Dates. The output would be like so:

    A   B   C
1/8/2017     3   12      17 
1/9/2017     2   11      16 
1/10/2017    1   10      15 
1/11/2017    9   14      20 
1/12/2017    8   13      19 
1/13/2017    7   12      18 

Naturally, I tried this:

Dates - Dates.index

But I receive this lovely TypeError:

TypeError: Could not operate DatetimeIndex...with block values ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('O')

Instead, I've written a loop to go column by column, but that just seems silly. Can anyone suggest a pythonic way to do this?

EDIT

In [1]: import pandas as pd
import numpy as np
import datetime
dates = pd.date_range('20170108',periods=6)
df = pd.DataFrame(np.empty([len(dates),3]),index=dates,columns=list('ABC'))
df['A'].loc[0:3] = datetime.date(2017, 1, 11)
df['B'].loc[0:3] = datetime.date(2017, 1, 20)
df['C'].loc[0:3] = datetime.date(2017, 1, 25)
df['A'].loc[3:6] = datetime.date(2017, 1, 20)
df['B'].loc[3:6] = datetime.date(2017, 1, 25)
df['C'].loc[3:6] = datetime.date(2017, 1, 31)

In [2]: print(df)
                     A           B           C
2017-01-08  2017-01-11  2017-01-20  2017-01-25
2017-01-09  2017-01-11  2017-01-20  2017-01-25
2017-01-10  2017-01-11  2017-01-20  2017-01-25
2017-01-11  2017-01-20  2017-01-25  2017-01-31
2017-01-12  2017-01-20  2017-01-25  2017-01-31
2017-01-13  2017-01-20  2017-01-25  2017-01-31

In [3]: df = df.sub(df.index.to_series(),axis=0)

ValueError: operands could not be broadcast together with shapes (18,) (6,) 

2 Answers 2

2

You need first convert all columns to_datetime and then use sub:

#if dtypes of all columns are datetime, omit it
date_cols = list('ABC')
for col in df.columns:
    df[col] = pd.to_datetime(df[col])

df = df.sub(df.index.to_series(),axis=0)
print (df)
                A       B       C
2017-01-08 3 days 12 days 17 days
2017-01-09 2 days 11 days 16 days
2017-01-10 1 days 10 days 15 days
2017-01-11 9 days 14 days 20 days
2017-01-12 8 days 13 days 19 days
2017-01-13 7 days 12 days 18 days

You need dtypes datetime64:

dates = pd.date_range('20170108',periods=6)
df = pd.DataFrame(index=dates)
df.loc[0:3, 'A'] = pd.Timestamp(2017, 1, 11)
df.loc[0:3, 'B'] = pd.Timestamp(2017, 1, 20)
df.loc[0:3, 'C'] = pd.Timestamp(2017, 1, 25)
df.loc[3:6, 'A'] = pd.Timestamp(2017, 1, 20)
df.loc[3:6, 'B'] = pd.Timestamp(2017, 1, 25)
df.loc[3:6, 'C'] = pd.Timestamp(2017, 1, 31)
print (df)
                    A          B          C
2017-01-08 2017-01-11 2017-01-20 2017-01-25
2017-01-09 2017-01-11 2017-01-20 2017-01-25
2017-01-10 2017-01-11 2017-01-20 2017-01-25
2017-01-11 2017-01-20 2017-01-25 2017-01-31
2017-01-12 2017-01-20 2017-01-25 2017-01-31
2017-01-13 2017-01-20 2017-01-25 2017-01-31

print (df.dtypes)
A    datetime64[ns]
B    datetime64[ns]
C    datetime64[ns]
dtype: object

df = df.sub(df.index.to_series(),axis=0)
print (df)
                A       B       C
2017-01-08 3 days 12 days 17 days
2017-01-09 2 days 11 days 16 days
2017-01-10 1 days 10 days 15 days
2017-01-11 9 days 14 days 20 days
2017-01-12 8 days 13 days 19 days
2017-01-13 7 days 12 days 18 days
Sign up to request clarification or add additional context in comments.

9 Comments

While your solution is helpful, it still loops by column, which I am specifically trying to avoid. Is there not another way?
The loop through columns is only to convert to datetime. If they already are datetime, then you can skip that part
It loop only for converting to datetime, better is use parse_dates parameter in read_csv
Right, so my data is already in datetime format, but this solution produces the operands error... "operands could not be broadcast together with shapes"
Both, really. I've provided an edit to the original question with some mock code.
|
0

I think a more explicit and elegant way to do this is to simply use apply.

df = df.apply(pd.to_datetime, axis="columns") # just to make sure values are datetime df.apply(lambda x: x - df.index.to_series(), axis="rows)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.