5

I downloaded data from different sources in dataframes and would like to merge them in one final DataFrame. Let's illustarte it with the following example:

dataframe 1 (already multi indexed columns)

index    stockA        stockB      ...
        O  L  H  C    O  L  H  C
1/1/19  10 15 20 17  35 30 39 37
2/1/19  ...          ...
...

dataframe 2 (non multi indexed columns)

index    stockA  stockB     
1/1/19    1.5     3.2 
2/1/19    ...     ...
...

I would like to merge both dataframe and giving a column name to data in dataframe2 . Index Date might be not the same in both dataframe, So i might need to do an inner merge.

Expected outputs (multi indexed columns)

index    stockA                 stockB             ...
        O  L  H  C new_col    O  L  H  C  new_col
1/1/19  10 15 20 17 1.5       35 30 39 37  3.2
2/1/19       ...                     ...
...
2
  • Can you add expected output for 1) and 2) ? Why inner merge with sample data is necessary? Maybe is necessary change data to minimal, complete, and verifiable example, especially for 2) Commented Mar 16, 2019 at 10:08
  • Hi, I removed 2) as I figured it out. I'd need inner merge because dates might not match between dataframes. But I can overcome this by reslicing the new dataframe with intersected dates between dataframes. Commented Mar 16, 2019 at 10:44

1 Answer 1

5

Use:

print (df1)
       stockA             stockB            
            O   L   H   C      O   L   H   C
1/1/19     10  15  20  17     35  30  39  37
2/1/19     12  13  26  27     31  50  29  17

print (df2)
        stockA  stockB
2/1/19     1.5     3.2
3/1/19     1.2     6.2

Convert index in both index to datetimes if necessary:

df1.index = pd.to_datetime(df1.index, format='%d/%m/%y')
df2.index = pd.to_datetime(df2.index, format='%d/%m/%y')

Get same values in both indices by Index.intersection:

idx = df1.index.intersection(df2.index)
print (idx)
DatetimeIndex(['2019-01-02'], dtype='datetime64[ns]', freq=None)

Create MultiIndex in MultiIndex.from_product in df2:

df2.columns = pd.MultiIndex.from_product([df2.columns, ['new']])
print (df2)
           stockA stockB
              new    new
2019-01-02    1.5    3.2
2019-01-03    1.2    6.2

Filter both DataFrames by DataFrame.loc, join together by DataFrame.join and last sorting MultiIndex by DataFrame.sort_index:

df = df1.loc[idx].join(df2.loc[idx]).sort_index(level=0, axis=1)
print (df)
           stockA                  stockB                 
                C   H   L   O  new      C   H   L   O  new
2019-01-02     27  26  13  12  1.5     17  29  50  31  3.2
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.