1

My version of pandas is:

pd.__version__
'0.25.3'

I have two dataframes, below is a sample, with the majority of the columns being the same across the two dataframes. I am trying to find the common columns, and create a new dataframe with all the common columns that shows their difference in values.

A sample from c_r dataframe:

Comp_name        EOL - CL Per $      Access - CL Per $      Total Impact - CL Per $
Nike             -0.02               -0.39                    -0.01
Nike             -0.02               -0.39                    -0.02
Adidas           -0.02               -0.39                    -0.01
Adidas           -0.02               -0.39                    -0.02

A sample from x dataframe:

Comp_name        EOL - CL Per $      Access - CL Per $      Total Impact - CL Per $
Nike             -0.02               -0.39                    0.05
Nike             -0.02               -0.39                    0.03
Adidas           -0.02               -0.39                    0.08
Adidas           -0.02               -0.39                    0.08

new_df: (to have the same column names, and show the difference, i.e:)

EOL - CL Per $ - Diff      Access - CL Per $ - Diff      Total Impact - CL Per $ - Diff
-0.00                      -0.00                         -0.06
-0.00                      -0.00                         -0.05
-0.00                      -0.00                         -0.09
-0.00                      -0.00                         -0.10

I have tried - please see where the error is in the code:

new_df = pd.DataFrame()

for i in c_r:
    for j in x:
        if c_r[i].dtype != object and x[j].dtype != object:
            if i == j:
               ## THE ISSUE IS IN THE LINE BELOW ##
                new_df[i+'-Diff'] = (c_r[i]) - (x[j])
        
        else:
            pass

but for some reason I get back only 1 row of values.

Any ideas of why my code does not work? How can I achieve it the resulting dataframe, including the initial column of Comp_name?

Thanks all.

2 Answers 2

1

Have you tried using intersection/ symmetric_difference(for difference) i.e.

a = dataframe2.columns.intersection(dataframe1.columns)
print(a)
Sign up to request clarification or add additional context in comments.

5 Comments

Yeah I did try, but how does that help me? The problem is appending the new info in the new dataframe, not looping over the columns.
Then in that case you can use pandas.concat([df1['c'], df2['c']], axis=1, keys=['df1', 'df2'])
Doesn't this stackoverflow.com/questions/21231834/…. answer the question
I don't think that answers my question. I am not just referring about merging the 2 files. I am referring at substracting the common columns between the new files, and store the results in a new data frame, wich all the common columns in it. Thank you though.
sorry at first I thought concatenation was the problem, I have attached a small code.
0
I think I understood the problem now, I have a small code as below.    
   import pandas as pd
    
    d = {'col1': [-0.02  ,  -0.02  ,-0.02  ,-0.02  ], 'col2': [-0.39,   -0.39,  -0.39,  -0.39],'col3': [-0.01,-0.02,-0.01,-0.02]}
    d1 = {'col1': [-0.02  ,  -0.02  ,-0.02  ,-0.02  ], 'col2': [-0.39,   -0.39,  -0.39,  -0.39],'col3': [0.05,0.03,0.06,0.04]}
    
    df = pd.DataFrame(data=d)
    df2 = pd.DataFrame(data=d1)
    
    
    
    df = df.apply(pd.to_numeric, errors='coerce')
    df2 = df2.apply(pd.to_numeric, errors='coerce')
    
    print(df)
    print(df2)
    
    col1  = df.col1 - df2.col1
    col2  = df.col2 - df2.col2
    col3  = df.col3 - df2.col3
    
    dfnew = pd.concat([col1, col2,col3], axis=1)
    
    
    print(type(col1))
    print(dfnew)

1 Comment

Thanks for your answer, however, I have many columns in the 2 files (like 50 or so). So manually typing them won't do the trick for me. That is why I would like to do everything in a for loop.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.