how to compare columns in two different .csv file in python?

Question

import pandas as pd
A=pd.read_csv("C:/Users/amulya/Desktop/graves lab/main_now.csv", index_col=False, header=None)
DATA1=pd.DataFrame(A)
DATA1[0]
B=pd.read_csv("C:/Users/amulya/Desktop/graves lab/words.csv", index_col=False, header=None) 
DATA2=pd.DataFrame(B)
DATA2[0]
for xrow in range (1,len(DATA1)):  
for yrow in range (1,len(DATA2)):
    if DATA2== DATA1:
    print(DATA1[3])

"In column 1 of DATA1 file there is numbers from 1-3000, and in column 1 of DATA2 there 465 random numbers . I want to search these numbers in DATA1 file and print rest of the columns"

ALollz · Accepted Answer · 2018-04-06 15:53:11Z

1

You can use isin to find if the value in col1 of Data2 is a value in col1 of Data1 and then slice Data1 by that boolean DataFrame.

import pandas as pd
df1 = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,9],
                    'col2': [1,3,5,7,9,11,13,15,17]})
df2 = pd.DataFrame({'col1': [1, 101, 6, 9, 4]})

We have the two DataFrames df1 and df2. You can select the first column of the first dataframe by its column name by either df['col1'] or equivalently df.col1

df1.col
#0    1
#1    2
#2    3
#3    4
#4    5

The condition you want is whether the value in df1.col1 appears in the first column of df2. This is accomplished with the isin function. The syntax reads as you expect, it looks for 'whether df1.col1 is in df2.col1' and returns a True/False dataframe.

df1.col1.isin(df2.col1)
#0     True
#1    False
#2    False
#3     True
#4    False
#5     True

When you then slice df1 by this true false dataframe, it returns only the rows that were TRUE, in this case the indices 0,3,5 and 8. It will return all columns, as you are only slicing the dataframe by rows.

df1[df1.col1.isin(df2.col1)]
#   col1  col2
#0     1     1
#3     4     7
#5     6    11
#8     9    17

edited Apr 6, 2018 at 15:53

answered Apr 6, 2018 at 4:03

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

amulya b n Over a year ago

that's great!! but both my files are .csv files with DATA1 file having 103 colums and 2999 rows. And DATA2 have 1 column and 465 rows. So with the above solution we have to mention all the column names?

amulya b n Over a year ago

i am beginner in python..so m not clear how to go about t.could you please elaborate on the solution?

ALollz Over a year ago

No not at all! df1.col1.isin(df2.col1) returns a single column DataFrame that is just True or False indicating whether that value was found anywhere in the first column of DATA2. When you then slice the first dataframe by df[]` it return all columns, but only the rows where the condition was true

ALollz Over a year ago

You will replace df1 by DATA1 and df2 by DATA2. And the only other thing you need to specify are the column names for the first columns in your dataframes. so df1.col1 should be replaced by DATA1.whatever_your_column_is_named and the same for df2

amulya b n Over a year ago

Worked!! thanks for the help. Also wanted to know how to put output back to .csv file??

|

Collectives™ on Stack Overflow

how to compare columns in two different .csv file in python?

1 Answer 1

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Related