2

I'm trying to compare two csv files and then print the product_id field on one line from each csv file. This is the code in question. Something to note is that in the two csv files, the fields are not in the same order.

import csv
import sys

f1 = sys.argv[1]
f2 = sys.argv[2]
num_matches = 0

with open(f1, 'rb') as f:
    csv_readerf = csv.reader(f)
    csv_readerf.next()
    with open(f2, 'rb') as n:
        csv_readern = csv.reader(n)
        csv_readern.next()
        for row in csv_readerf:
            a_name  = row[0].replace(" ", "").lower()   #not used, can be ommitted
            a_id    = row[1]
            a_post  = row[2]
            a_rev   = row[3]
            a_loc   = row[4]                            #not used, can be ommitted
            a_desc  = row[5].replace(" ", "").lower()   #remove all whitespaces for uniformity
            a_ovr   = row[6]
            a_cmf   = row[7]
            a_sty   = row[8]
            a_siz   = row[9]
            a_arc   = row[10]
            a_wid   = row[11]
            a_url   = row[12]
            for rowP in csv_readern:
                p_name  = rowP[10].replace(" ", "").lower()
                p_id    = rowP[6]

                temp    = rowP[11].split(" ")[0:3]      #disregard time stamp
                p_post  = (" ").join(temp)

                p_rev   = rowP[7]
                if p_rev is "":
                    p_rev = "Anonymous"
                p_desc  = rowP[1].replace(" ", "").replace("\n", "").replace("\r\n", "").lower()
                p_ovr   = rowP[4]
                p_cmf   = rowP[3]
                p_sty   = rowP[0]
                p_siz   = rowP[8]
                p_arc   = rowP[9]
                if p_arc:
                    p_arc = p_arc[0 : p_arc.index(" ")]     #for arch we only want the first word
                p_wid   = rowP[5]
                p_url   = rowP[2]

                print a_id, p_id

The problem I am having is that in the output, which I dumped into a .txt file, not all the product_id's from f1 are printed. I know this for sure, because f1 is a test file I created, and I purposely placed several products of different id's in there.

Another things to note is that I tried looping through each csv in separate scripts, and each worked correctly, printing out each product_id as expected. Why is it that when I embed the for loops, iterating through the first file seems to be cut short? What could be the problem? The test files that I made are small, so they should be able to fit in memory perfectly fine.

3
  • 2
    Why are you putting the second for loop inside the first? This could be the source of your problem, as after this loop runs once, it will read to the end of the file, so on subsequent runs the inner loop will not run. Commented Sep 7, 2012 at 4:45
  • 2
    Read the smaller file into a dictionary, then loop over that while reading the bigger file. Commented Sep 7, 2012 at 4:54
  • Okay, I'll try that. Thanks. But I don't really understand @BrenBarn 's comment. Doesn't it only read the inner file's end of file, but not the outside file's? So shouldn't it continue? Commented Sep 7, 2012 at 8:12

1 Answer 1

3

The error is, like BrenBam mentioned your loop construction

for row in csv_readerf:
   ....
   for rowP in csv_readern:
       # will only work in the first iteration of the outer loop
       # since the csv reader hits eof
       ...

so you simply compare only the first line of csv_readerf with all lines of csv_readern

you would prevent that if you open the csv file of the inner loop inside the outer loop:

for row in csv_readerf:
    ...
    with open(f2, 'rb') as n:
       csv_readern = csv.reader(n)
       csv_readern.next()       ....

        for rowP in csv_readern:
           # will iterate over csv_readern, but only in the first iteration of the outer loop

or if you read the inner file into an array first and loop over that

This a very common beginners error, which often occurs with deep nesting, using functions may help

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.