parsing file in python

Question

I'm trying to parse the 2 pipe/comma separated files and if the particular field matches in the file create the new entry in the 3rd file.

Code as follows:

#! /usr/bin/python

fo = open("c-1.txt" , "r" )
for line in fo:
    #print line
    fields = line.split('|')
    src  = fields[0]

    f1 = open("Airport.txt", 'r')
    f2 = open("b.txt", "a")
    #with open('c.csv', 'r') as f1:
    #    line1 = f1.read()
    for line1 in f1:
        reader = line1.split(',')
        hi = False
        target = reader[0]
        if target == src and fields[1] == 'ZHT':
            print target
            hi = True
            f2.write(fields[0])
            f2.write("|")
            f2.write(fields[1])
            f2.write("|")
            f2.write(fields[2])
            f2.write("|")
            f2.write(fields[3])
            f2.write("|")
            f2.write(fields[4])
            f2.write("|")
            f2.write(fields[5])
            f2.write("|")
            f2.write(reader[2])
    if hi == False:
         f2.write(line)
    f2.close()
    f1.close()
fo.close()

The matching field gets printed 2 times in the new file. what could be the reason?

Btw it's better to use with open("c-1.txt" , "r") as fo:, then you don't need to close explicitly. — Pavel Šimerda
– Pavel Šimerda, Commented Jun 1, 2015 at 10:35
Make sure you mark an answer as correct so people looking at this in the future know what helped! — NDevox
– NDevox, Commented Jun 1, 2015 at 16:31

tobias_k · Accepted Answer · 2015-06-01 10:09:11Z

The problem seems to be that you reset hi to False in each iteration of the loop. Lets say the second line matches, but the third does not. You set hi to True in the second line, but then to False again in the third, and then print the original line.

Try like this:

hi = False
for line1 in f1:
    reader = line1.split(',')
    target = reader[0]
    if target == src and fields[1] == 'ZHT':
        hi = True
        f2.write(stuff)
if hi == False:
     f2.write(line)

Or, assuming that only one line will ever match, you could use for/else:

for line1 in f1:
    reader = line1.split(',')
    target = reader[0]
    if target == src and fields[1] == 'ZHT':
        f2.write(stuff)
        break
else:
     f2.write(line)

Also note that you could probably replace that series of f2.write statements by this one, joining the several parts with |:

f2.write('|'.join(fields[0:6] + [reader[2]])

NDevox · Accepted Answer · 2015-06-01 10:43:06Z

As mentioned already, you reset the flag within the loop so are liable to printing multiple lines.

If there is definitely only one row that will match it might be worth breaking the loop once that row has been found.

and finally check your data to make sure there aren't identical matching rows.

Other than that I have a couple other suggestions to clean up your code and make it easier to debug:

1) Use the csv library.

2) If the files can be held in memory, it would be better to hold them in memory instead of constantly opening and closing them.

3) Use with to handle the files (I not you have already tried in your comments).

Something like the following should work.

#! /usr/bin/python

import csv

data_0 = {}

data_1 = {}

with open("c-1.txt" , "r" ) as fo, open("Airport.txt", "r") as f1:

    fo_reader = csv.reader(fo, delimiter="|")  

    f1_reader = csv.reader(f1) # default delimiter is ','

    for line in fo_reader:

        if line[1] == 'ZHT':
            try:  # Add to a list here in case keys are duplicated.
                data_0[line[0]].append(line)
            except KeyError:
                data_0[line[0]] = [line]

    for line in f1_reader:
        data_1[line[0]] = line[2]  # We only need the third column of this row to append to the data.

with open("b.txt", "a") as f2:

    writer = csv.writer(f2, delimiter="|")  # I would be tempted to not make this a pipe, but probably too late already if you've got a pre-made file.

    for key in data_0:
        if key in data_1.keys():
            for row in data_0[key]:
                writer.writerow(row[:6]+data_1[key])  # index to the 6th column, and append the data from the other file.
        else:
            for row in data_0[key]:
                writer.writerow(row)

That should avoid having the extra rows as well as there is no true/False flag to rely on.

Collectives™ on Stack Overflow

parsing file in python

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related