0

Suppose I have the following 2 DataFrames:

  1. df1, whose index is ['NameID', 'Date']. For example, df1 can be a panel dataset of historical salaries of employees in a company.

  2. df2, whose index is ['NameID']. For example, df2 can be a dataset of employees' birthday and SSN.

What is the most efficient way to join df1 and df2 on 'NameID' as an index on a 1:m basis? DataFrame.join() doesn't allow 1:m join. I know I can first reset_index() for both df1 and df2, and then use DataFrame.merge() to join them on columns, but I think that is not efficient.

Code:

df1 = pd.DataFrame({'NameID':['A','B','C']*3,
                    'Date':['20180801']*3+['20180802']*3+['20180803']*3,
                    'Salary':np.random.rand(9)
               })
df1 = df1.set_index(['NameID', 'Date'])
df1

NameID  Date    Salary
A   20180801    0.831064
B   20180801    0.419464
C   20180801    0.239779
A   20180802    0.500048
B   20180802    0.317452
C   20180802    0.188051
A   20180803    0.076196
B   20180803    0.060435
C   20180803    0.297118

df2 = pd.DataFrame({'NameID':['A','B','C'],                   
                    'SSN':[999,888,777]
                   })
df2 = df2.set_index(['NameID'])
df2

NameID  SSN
A       999
B       888
C       777

The result I want to get is:

NameID  Date        Salary      SSN
A       20180801    0.831064    999
A       20180802    0.500048    999
A       20180803    0.076196    999
B       20180801    0.419464    888
B       20180802    0.317452    888
B       20180803    0.060435    888
C       20180801    0.239779    777
C       20180802    0.188051    777
C       20180803    0.297118    777
4
  • 1
    It would be more helpful if you created a minimal reproducible example Commented Aug 10, 2018 at 15:22
  • 1
    Did you try merging on index ? Example, df3 = pd.merge(df1, df2, left_index=True, right_index=True) Commented Aug 10, 2018 at 15:38
  • Thanks warwick. I think I am all set now with your answer... I can't believe it is that simple. I was totally mislead by the top answer in this post and thought merge cannot be used to merge on indices... stackoverflow.com/questions/36538780/…. Please post your answer and I will select it. Commented Aug 10, 2018 at 15:42
  • No worries. Glad that it helped ! Commented Aug 10, 2018 at 15:45

3 Answers 3

2

You may want to merge.

df = pd.merge(df1, df2, on='NameID', how='left')
Sign up to request clarification or add additional context in comments.

Comments

0

See Michael B's answer, but in addition, you might also want to sort to get your requested output:

pd.merge(df1, df2, on='NameID', how='left').sort_values('SSN', ascending=False)

Comments

0

Answering on behalf of warwick12

df3 = pd.merge(df1, df2, left_index=True, right_index=True)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.