2

struggling to get to grips with update/merge/concat etc to achieve what I thought was a simple merge or update procedure. I have df (recorded values) and df2 (times series with 0s), and would like to update/replace the values in df2 with the corresponding recorded values of df, matching on date_time.

Example: df=

Date_Time   Perc
03/03/2010 00:05    1.0
03/03/2010 00:15    2.0

df2 =

  Date_Time Perc
03/03/2010 00:00    0.0
03/03/2010 00:05    0.0
03/03/2010 00:10    0.0
03/03/2010 00:15    0.0
03/03/2010 00:20    0.0

Where the result would return:

Date_Time   Perc
03/03/2010 00:00    0.0
03/03/2010 00:05    1.0
03/03/2010 00:10    0.0
03/03/2010 00:15    2.0
03/03/2010 00:20    0.0

I'm finding this frustrating as there is good info and examples on http://pandas.pydata.org/pandas-docs/stable/merging.html and a great SO question with multiple solutions on How to update values in a specific row in a Python Pandas DataFrame? but having tried several approaches, none work.

Tried so far: Re-indexing to Date_time and df2.update(df) with variations, multiple merging/join/concat variations, an adapted definition (below) with apply...and now wondering if I need to use iterrows (see below?). Any suggestions on pointers in the right direction would be appreciated...maybe I'm missing something fundamental in my approach...

def update_vals(row, data=data):
    if row.Date_Time == df.Date_Time:
        row.Perc = df.Perc
    return row

for index, row in df2['Date_Time'].iterrows():
    x = df2['Date_Time']
    for index, row in df['Date_Time'].iterrows():
        x2 = df['Date_Time']
        if x2 ==x:
            df2['Perc'] = df['Perc']

Thought this would work as well, but it results in (ValueError: cannot reindex from a duplicate axis)

df.set_index('Date_Time', inplace=True)
df2.set_index('Date_Time', inplace=True)
df2.update(df[['Perc']])
2
  • df2.update(df) works for me. Commented Nov 10, 2015 at 7:38
  • @hellpanderrr that's strange, when I run it my output looses the full df2 (see below 00:00 time stamp is gone) I'm tying to preserve the complete df2, and only change the matching times values of 'perc' : Date_Time Perc 0 03/03/2010 00:05 1.0 1 03/03/2010 00:15 2.0 2 03/03/2010 00:10 0.0 3 03/03/2010 00:15 0.0 Commented Nov 10, 2015 at 8:14

1 Answer 1

2

df2

                  Perc
Date_Time   
03/03/2010 00:00    0
03/03/2010 00:05    0
03/03/2010 00:10    0
03/03/2010 00:15    0
03/03/2010 00:20    0

df

                 Perc
Date_Time   
03/03/2010 00:05    1
03/03/2010 00:15    2

df2.update(df)

                 Perc
Date_Time   
03/03/2010 00:00    0
03/03/2010 00:05    1
03/03/2010 00:10    0
03/03/2010 00:15    2
03/03/2010 00:20    0
Sign up to request clarification or add additional context in comments.

1 Comment

Many thanks for the help...have to admit, found you were right, and that in my actual data set that is over 500k lines, there is a duplicate Date_time, that threw out the update process...but, learnt more, and maybe this will help someone else...many thanks..

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.