9

Say that I have two CSV files (file1 and file2) with contents as shown below:

file1:

fred,43,Male,"23,45",blue,"1, bedrock avenue"

file2:

fred,39,Male,"23,45",blue,"1, bedrock avenue"

I would like to compare these two CSV records to see if columns 0,2,3,4, and 5 are the same. I don't care about column 1.

What's the most pythonic way of doing this?

EDIT:

Some example code would be appreciated.

EDIT2:

Please note the embedded commas need to be handled correctly.

2
  • About EDIT2: just use import csv and you'll be fine. Commented Jan 15, 2011 at 16:02
  • @ulidtko Yes, appreciate that. Didn't want to be prescriptive though in case there was another solution I didn't know about. Commented Jan 15, 2011 at 16:12

4 Answers 4

11

I suppose the best ways is to use Python library: http://docs.python.org/library/csv.html.

UPDATE (example added):

import csv
reader1 = csv.reader(open('data1.csv', 'rb'), delimiter=',', quotechar='"'))
row1 = reader1.next()
reader2 = csv.reader(open('data2.csv', 'rb'), delimiter=',', quotechar='"'))
row2 = reader2.next()
if (row1[0] == row2[0]) and (row1[2:] == row2[2:]):
    print "eq"
else:
    print "different"
Sign up to request clarification or add additional context in comments.

4 Comments

Could you give an example please?
@Elalfer I like this, but it doesn't compare col 0 does it?
@coder999 oh true, you asked to compare all fields but 1st. Updated example
@Elafer, @coder999: BUG if (row1[0] == row2[0]) and (row[2:] == row[2:]): should be if (row1[0] == row2[0]) and (row1[2:] == row2[2:]):
7
>>> import csv
>>> csv1 = csv.reader(open("file1.csv", "r"))
>>> csv2 = csv.reader(open("file2.csv", "r"))
>>> while True:
...   try:
...     line1 = csv1.next()
...     line2 = csv2.next()
...     equal = (line1[0]==line2[0] and line1[2]==line2[2] and line1[3]==line2[3] and line1[4]==line2[4] and line1[5]==line2[5])
...     print equal
...   except StopIteration:
...     break
True

Update

3 years later, I think I'd rather write it this way.

import csv

interesting_cols = [0, 2, 3, 4, 5]

with open("file1.csv", 'r') as file1,\
     open("file2.csv", 'r') as file2:

    reader1, reader2 = csv.reader(file1), csv.reader(file2)

    for line1, line2 in zip(reader1, reader2):
        equal = all(x == y
            for n, (x, y) in enumerate(zip(line1, line2))
            if n in interesting_cols
        )
        print(equal)

2 Comments

I am new for this and its really helpful for me, I achieved what i wanted but instead of index! can we use column names for compression?? If yes, than how ? could you please guide
@RaviK yes it's possible to use column names. Please see the official documentation & examples for Python's built-in csv module.
1

I would read both records, eliminate column 1 and the compare what's left. (In python3 works)

import csv
file1 = csv.reader(open("file1.csv", "r"))
file2 = csv.reader(open("file2.csv", "r"))
r1 = next(file1)
r1.pop(1)
r2 = next(file2)
r2.pop(1)
return r1 == r2

2 Comments

This wouldn't work because of the embedded commas in the values
You should have submitted another answer rather than completely rewriting this one.
1
# Include required modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Include required csv files

df_TrainSet = pd.read_csv('../data/ldp_TrainSet.csv')
df_DataSet = pd.read_csv('../data/ldp_DataSet.csv')


# First test
[c for c in df_TrainSet if c not in df_DataSet.columns]

# Second test
[c for c in df_DataSet if c not in df_TrainSet.columns]

With this example I check both CSV files whether the columns in both files are present in each other.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.