python / pandas - Find common columns between two dataframes, and create another one with same columns showing their difference

Question

My version of pandas is:

pd.__version__
'0.25.3'

I have two dataframes, below is a sample, with the majority of the columns being the same across the two dataframes. I am trying to find the common columns, and create a new dataframe with all the common columns that shows their difference in values.

A sample from c_r dataframe:

Comp_name        EOL - CL Per $      Access - CL Per $      Total Impact - CL Per $
Nike             -0.02               -0.39                    -0.01
Nike             -0.02               -0.39                    -0.02
Adidas           -0.02               -0.39                    -0.01
Adidas           -0.02               -0.39                    -0.02

A sample from x dataframe:

Comp_name        EOL - CL Per $      Access - CL Per $      Total Impact - CL Per $
Nike             -0.02               -0.39                    0.05
Nike             -0.02               -0.39                    0.03
Adidas           -0.02               -0.39                    0.08
Adidas           -0.02               -0.39                    0.08

new_df: (to have the same column names, and show the difference, i.e:)

EOL - CL Per $ - Diff      Access - CL Per $ - Diff      Total Impact - CL Per $ - Diff
-0.00                      -0.00                         -0.06
-0.00                      -0.00                         -0.05
-0.00                      -0.00                         -0.09
-0.00                      -0.00                         -0.10

I have tried - please see where the error is in the code:

new_df = pd.DataFrame()

for i in c_r:
    for j in x:
        if c_r[i].dtype != object and x[j].dtype != object:
            if i == j:
               ## THE ISSUE IS IN THE LINE BELOW ##
                new_df[i+'-Diff'] = (c_r[i]) - (x[j])
        
        else:
            pass

but for some reason I get back only 1 row of values.

Any ideas of why my code does not work? How can I achieve it the resulting dataframe, including the initial column of Comp_name?

Thanks all.

sophocles · Accepted Answer · 2020-12-22 19:51:01Z

1

Have you tried using intersection/ symmetric_difference(for difference) i.e.

a = dataframe2.columns.intersection(dataframe1.columns)
print(a)

edited Dec 22, 2020 at 19:51

sophocles

13.9k3 gold badges18 silver badges36 bronze badges

answered Dec 2, 2020 at 14:23

Satarupa

989 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

sophocles Over a year ago

Yeah I did try, but how does that help me? The problem is appending the new info in the new dataframe, not looping over the columns.

Satarupa Over a year ago

Then in that case you can use pandas.concat([df1['c'], df2['c']], axis=1, keys=['df1', 'df2'])

Satarupa Over a year ago

Doesn't this stackoverflow.com/questions/21231834/…. answer the question

sophocles Over a year ago

I don't think that answers my question. I am not just referring about merging the 2 files. I am referring at substracting the common columns between the new files, and store the results in a new data frame, wich all the common columns in it. Thank you though.

Satarupa Over a year ago

sorry at first I thought concatenation was the problem, I have attached a small code.

Satarupa · Accepted Answer · 2020-12-02 15:47:24Z

I think I understood the problem now, I have a small code as below.    
   import pandas as pd
    
    d = {'col1': [-0.02  ,  -0.02  ,-0.02  ,-0.02  ], 'col2': [-0.39,   -0.39,  -0.39,  -0.39],'col3': [-0.01,-0.02,-0.01,-0.02]}
    d1 = {'col1': [-0.02  ,  -0.02  ,-0.02  ,-0.02  ], 'col2': [-0.39,   -0.39,  -0.39,  -0.39],'col3': [0.05,0.03,0.06,0.04]}
    
    df = pd.DataFrame(data=d)
    df2 = pd.DataFrame(data=d1)
    
    
    
    df = df.apply(pd.to_numeric, errors='coerce')
    df2 = df2.apply(pd.to_numeric, errors='coerce')
    
    print(df)
    print(df2)
    
    col1  = df.col1 - df2.col1
    col2  = df.col2 - df2.col2
    col3  = df.col3 - df2.col3
    
    dfnew = pd.concat([col1, col2,col3], axis=1)
    
    
    print(type(col1))
    print(dfnew)

Thanks for your answer, however, I have many columns in the 2 files (like 50 or so). So manually typing them won't do the trick for me. That is why I would like to do everything in a for loop.

Collectives™ on Stack Overflow

python / pandas - Find common columns between two dataframes, and create another one with same columns showing their difference

2 Answers 2

5 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Linked

Related