0

I am trying to do an index match in 2 data set but having trouble. Here is an example of what I am trying to do. I want to fill in column "a", "b", "c" that are empty in df with the df2 data where "Machine", "Year", and "Order Type".

The first dataframe lets call this one "df"

    Machine Year    Cost    a   b   c   
0   abc     2014    5500    nan nan nan                                     
1   abc     2015    89      nan nan nan                                 
2   abc     2016    600     nan nan nan                                 
3   abc     2017    250     nan nan nan                                 
4   abc     2018    2100    nan nan nan                                         
5   abc     2019    590     nan nan nan                                         
6   dcb     2020    3000    nan nan nan                                         
7   dcb     2021    100     nan nan nan                                     

The second data set is called "df2"

    Order Type  Machine Year    Total Count
0   a           abc      2014   1
1   b           abc      2014   1
2   c           abc      2014   2
4   c           dcb      2015   4
3   a           abc      2016   3

Final Output is:

    Machine Year    Cost    a   b   c   
0   abc     2014    5500    1   1   2                                       
1   abc     2015    89      nan nan nan                                 
2   abc     2016    600     3 nan nan                                   
3   abc     2017    250     nan nan nan                                 
4   abc     2018    2100    nan nan nan                                         
5   abc     2019    590     1   nan nan                                         
6   dcb     2014    3000    nan nan 4                                           
7   dcb     2015    100     nan nan nan     

Thanks for help in advance

1 Answer 1

1

Consider DataFrame.pivot to reshape df2 to merge with df1.

final_df = (
    df1.reindex(["Machine", "Type", "Cost"], axis=True)
       .merge(
            df.pivot(
                index=["Machine", "Year"], 
                columns="Order Type", 
                values="Total Count"
            ).reset_index(),
            on = ["Machine", "Year"]
       )
)
Sign up to request clarification or add additional context in comments.

3 Comments

join with on set might be better since it avoids a copy of df2 and join on index is faster than merge: df1[["Machine", "Type", "Cost"]].join(df2.pivot(index=['Machine', 'Year'], columns='Order Type', values='Total Count'), on=['Machine', 'Year'])
@HenryEcker, indeed that is another alternative. And merge also supports joining on index and columns. Newcomers to pandas get confused with indexes.
Thank you! @Parfait

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.