1

This is what I have:

ID  PRICE   VOLUME  PRODUC      FROM_DATE   TO_DATE         NUMDAYS

1   20.5    15.0    prod_1      2018-08-06      2018-08-13      7
2   15.6    10.0    prod_2      2018-08-06      2018-08-08      2

This is what I want to achieve:

ID  PRICE   VOLUME  PRODUC      FROM_DATE   TO_DATE         NUMDAYS

1   20.5    15.0    prod_1      2018-08-06      2018-08-07      1
1   20.5    15.0    prod_1      2018-08-07      2018-08-08      1
1   20.5    15.0    prod_1      2018-08-08      2018-08-09      1
1   20.5    15.0    prod_1      2018-08-09      2018-08-10      1
1   20.5    15.0    prod_1      2018-08-10      2018-08-11      1
1   20.5    15.0    prod_1      2018-08-11      2018-08-12      1
1   20.5    15.0    prod_1      2018-08-12      2018-08-13      1
2   15.6    10.0    prod_2      2018-08-06      2018-08-07      1
2   15.6    10.0    prod_2      2018-08-07      2018-08-08      1

So I have a Dataframe with information about products that affect different dates.

  • Products may affect from 1 day to n days.
  • The volume affects each date in between.

How could I do it?

I have tryed: - To do a for loop for each element of the dataframe but

df_results = pd.DataFrame(columns=df.columns)
for index, row in df.iterrows():
    day = row.to_dict()
    for i in range(0,int(row['numdays'])):
        day['NUMDAYS'] = 1
        day['FROM_DATE'] = row['FROM_DATE']+datetime.timedelta(days=i)
        day['TO_DATE'] =  day['FROM_DATE'] + datetime.timedelta(days=1)
        df_aux = pd.DataFrame.from_dict(day)
        df_results .append(df_aux)

However I can't make it work.

1 Answer 1

1

In pandas is best avoid loops, because slow:

#convert columns to datetimes if necessary
df['FROM_DATE'] = pd.to_datetime(df['FROM_DATE'])
df['TO_DATE'] = pd.to_datetime(df['TO_DATE'])

#repeat rows
df = df.loc[np.repeat(df.index, df['NUMDAYS'])]

#add timedeltas by counter
df['FROM_DATE'] += pd.to_timedelta(df.groupby('ID').cumcount(), unit='d')
#add one dau
df['TO_DATE'] = df['FROM_DATE'] + pd.Timedelta(1, unit='d')
#assign scalar
df['NUMDAYS'] = 1
#create default unique index
df = df.reset_index(drop=True)
print (df)
   ID  PRICE  VOLUME  PRODUC  FROM_DATE    TO_DATE  NUMDAYS
0   1   20.5    15.0  prod_1 2018-08-06 2018-08-07        1
1   1   20.5    15.0  prod_1 2018-08-07 2018-08-08        1
2   1   20.5    15.0  prod_1 2018-08-08 2018-08-09        1
3   1   20.5    15.0  prod_1 2018-08-09 2018-08-10        1
4   1   20.5    15.0  prod_1 2018-08-10 2018-08-11        1
5   1   20.5    15.0  prod_1 2018-08-11 2018-08-12        1
6   1   20.5    15.0  prod_1 2018-08-12 2018-08-13        1
7   2   15.6    10.0  prod_2 2018-08-06 2018-08-07        1
8   2   15.6    10.0  prod_2 2018-08-07 2018-08-08        1
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.