0

I am trying to select using multiple dates and assign value to a column based on max value of the price based on those 2 dates. could be helpful if someone can point out if this is fastest way possible.

I have tried this code but it created a new row and doesn't change existing row.

def updateRecord(dfIn, starDate, endDate):
    mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
    new_df = dfIn.loc[mask]
    if len(new_df) == 0:
        return dfIn

    dfIn.loc[dfIn.loc[mask].price.max(), 'highest'] = 1
    dfIn.loc[dfIn.loc[mask].price.min(), 'lowest'] = 1
    return dfIn
date       price  highest  lowest
2000-05-01 04:00:00    4.439730             0            0
2000-05-02 04:00:00    4.209830             0            0
2000-05-03 04:00:00    4.109380             0            0
2000-05-04 04:00:00    3.953130             0            0
2000-05-05 04:00:00    4.040180             0            0
2000-05-08 04:00:00    3.933040             0            0
2000-05-09 04:00:00    3.765630             0            0
2000-05-10 04:00:00    3.546880             0            0
2000-05-11 04:00:00    3.671880             0            0
2000-05-12 04:00:00    3.843750             0            0
2000-05-15 04:00:00    3.607150             0            0
2000-05-16 04:00:00    3.774560             0            0
2000-05-17 04:00:00    3.620540             0            0
2000-05-18 04:00:00    3.598220             0            0
2000-05-19 04:00:00    3.357150             0            0
2000-05-22 04:00:00    3.212060             0            0
2000-05-23 04:00:00    3.064740             0            0
2000-05-24 04:00:00    3.131700             0            0
2000-05-25 04:00:00    3.116630             0            0
2000-05-26 04:00:00    3.084830             0            0
2000-05-30 04:00:00    3.127230             0            0
2000-05-31 04:00:00    3.000000             0            0
2000-06-01 04:00:00    3.183040             0            0
2000-06-02 04:00:00    3.305810             0            0
.....
2000-06-30 04:00:00    3.261160             0            0

desired outcome should be that rows should be updated as below:

df = updateRecord(df, '2000-05-01 04:00:00', '2000-05-31 04:00:00')

df output should be:

2000-05-01 04:00:00    4.439730             1            0
2000-05-31 04:00:00    3.000000             0            1

my current code creates a new row instead of updating existing row.

2
  • Your outcome is not clear. Could you include a dataframe with desired outcome? Commented May 14, 2019 at 18:24
  • @run-out I have updated the results Commented May 14, 2019 at 18:30

3 Answers 3

1

I am sure this is not the best way.

def updateRecord(dfIn, starDate, endDate):
    df_o = dfIn.loc[(dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)]
    if len(df_o) == 0:
        return dfIn
    # What is supposed to happen if len(df_o) > 0?
    idx = df_o['price'].argmax()
    df_o.at[idx,'highest'] = 1

    idx_l = df_o['price'].argmin()
    df_o.at[idx_l,'lowest'] = 1

    return df_o

Hope it works.

Sign up to request clarification or add additional context in comments.

6 Comments

if length is greater than zero means I can update mean there is min and or max
Doesn't work. because it only updates min and max for all of df not selected df.
Is the selected df the one defined in the first line? i.e. df_o in the code above?
not sure what you mean? so the resulting update is happening to the whole dfIn.. meaning not happening between dates.
Do you only want the highest and lowest rows to be in the df?
|
0

This works, but brings the selected DataFrame. If you want the same but bringing the entire DataFrame I can do it too.

def updateRecord(dfIn, startDate, endDate):
    mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
    new_df = dfIn.loc[mask]
    if len(new_df) == 0:
        return dfIn
    new_df['highest']=np.where(new_df.price==new_df.price.max(),1,0)
    new_df['lowest']=np.where(new_df.price==new_df.price.min(),1,0)
    return new_df

Comments

0

I think you're looking for this.

def updateRecord(dfIn, starDate, endDate):
    mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
    if sum(mask) == 0:
        return dfIn

    # You want the argmax[min] for the given mask, not the entire DF, as you stated.
    dfIn.loc[dfIn.loc[mask, 'price'].argmax(), 'highest'] = 1
    dfIn.loc[dfIn.loc[mask, 'price'].argmin(), 'lowest'] = 1

    return dfIn

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.