Python: Comparing specific columns in two csv files

Question

Say that I have two CSV files (file1 and file2) with contents as shown below:

file1:

fred,43,Male,"23,45",blue,"1, bedrock avenue"

file2:

fred,39,Male,"23,45",blue,"1, bedrock avenue"

I would like to compare these two CSV records to see if columns 0,2,3,4, and 5 are the same. I don't care about column 1.

What's the most pythonic way of doing this?

EDIT:

Some example code would be appreciated.

EDIT2:

Please note the embedded commas need to be handled correctly.

@ulidtko Yes, appreciate that. Didn't want to be prescriptive though in case there was another solution I didn't know about. — coder999
– coder999, Commented Jan 15, 2011 at 16:12

Elalfer · Accepted Answer · 2011-01-15 21:26:05Z

11

I suppose the best ways is to use Python library: http://docs.python.org/library/csv.html.

UPDATE (example added):

import csv
reader1 = csv.reader(open('data1.csv', 'rb'), delimiter=',', quotechar='"'))
row1 = reader1.next()
reader2 = csv.reader(open('data2.csv', 'rb'), delimiter=',', quotechar='"'))
row2 = reader2.next()
if (row1[0] == row2[0]) and (row1[2:] == row2[2:]):
    print "eq"
else:
    print "different"

edited Jan 15, 2011 at 21:26

answered Jan 15, 2011 at 15:46

Elalfer

5,38424 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

coder999 Over a year ago

Could you give an example please?

coder999 Over a year ago

@Elalfer I like this, but it doesn't compare col 0 does it?

Elalfer Over a year ago

@coder999 oh true, you asked to compare all fields but 1st. Updated example

John Machin Over a year ago

@Elafer, @coder999: BUG if (row1[0] == row2[0]) and (row[2:] == row[2:]): should be if (row1[0] == row2[0]) and (row1[2:] == row2[2:]):

Community · Accepted Answer · 2020-06-20 09:12:55Z

>>> import csv
>>> csv1 = csv.reader(open("file1.csv", "r"))
>>> csv2 = csv.reader(open("file2.csv", "r"))
>>> while True:
...   try:
...     line1 = csv1.next()
...     line2 = csv2.next()
...     equal = (line1[0]==line2[0] and line1[2]==line2[2] and line1[3]==line2[3] and line1[4]==line2[4] and line1[5]==line2[5])
...     print equal
...   except StopIteration:
...     break
True

Update

3 years later, I think I'd rather write it this way.

import csv

interesting_cols = [0, 2, 3, 4, 5]

with open("file1.csv", 'r') as file1,\
     open("file2.csv", 'r') as file2:

    reader1, reader2 = csv.reader(file1), csv.reader(file2)

    for line1, line2 in zip(reader1, reader2):
        equal = all(x == y
            for n, (x, y) in enumerate(zip(line1, line2))
            if n in interesting_cols
        )
        print(equal)

I am new for this and its really helpful for me, I achieved what i wanted but instead of index! can we use column names for compression?? If yes, than how ? could you please guide
@RaviK yes it's possible to use column names. Please see the official documentation & examples for Python's built-in csv module.

Santiago Alessandri · Accepted Answer · 2011-01-15 15:57:02Z

1

I would read both records, eliminate column 1 and the compare what's left. (In python3 works)

import csv
file1 = csv.reader(open("file1.csv", "r"))
file2 = csv.reader(open("file2.csv", "r"))
r1 = next(file1)
r1.pop(1)
r2 = next(file2)
r2.pop(1)
return r1 == r2

edited Jan 15, 2011 at 15:57

answered Jan 15, 2011 at 15:48

Santiago Alessandri

6,88532 silver badges49 bronze badges

2 Comments

coder999 Over a year ago

This wouldn't work because of the embedded commas in the values

ulidtko Over a year ago

You should have submitted another answer rather than completely rewriting this one.

Brenners Daniel · Accepted Answer · 2019-07-29 16:33:23Z

# Include required modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Include required csv files

df_TrainSet = pd.read_csv('../data/ldp_TrainSet.csv')
df_DataSet = pd.read_csv('../data/ldp_DataSet.csv')


# First test
[c for c in df_TrainSet if c not in df_DataSet.columns]

# Second test
[c for c in df_DataSet if c not in df_TrainSet.columns]

With this example I check both CSV files whether the columns in both files are present in each other.

Collectives™ on Stack Overflow

Python: Comparing specific columns in two csv files

4 Answers 4

4 Comments

Update

2 Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Update

2 Comments

2 Comments

Comments

Linked

Related