2

I have multiple CSVs; however, I'm having difficulty merging them as they all have the same headers. Here's an example.

CSV 1:

ID,COUNT
1,3037
2,394
3,141
5,352
7,31

CSV 2:

ID, COUNT
1,375
2,1178
3,1238
5,2907
6,231
7,2469

CSV 3:

ID, COUNT
1,675
2,7178
3,8238
6,431
7,6469

I need to combine all the CSV file on the ID, and create a new CSV with additional columns for each count column.

I've been testing it with 2 CSVs but I'm still not getting the right output.

with open('csv1.csv', 'r') as checkfile: #CSV Data is pulled from
    checkfile_result = {record['ID']: record for record in csv.DictReader(checkfile)}


with   open('csv2.csv', 'r') as infile:
#infile_result = {addCount['COUNT']: addCount for addCount in csv.Dictreader(infile)}
with open('Result.csv', 'w') as outfile:
    reader = csv.DictReader(infile)
    writer = csv.DictWriter(outfile, reader.fieldnames + ['COUNT'])
    writer.writeheader()
    for item in reader:
        record = checkfile_result.get(item['ID'], None)
        if record:
            item['ID'] = record['COUNT']  # ???
            item['COUNT'] = record['COUNT']
        else:
            item['COUNT'] = None
            item['COUNT'] = None
        writer.writerow(item)

However, with the above code, I get three columns, but the data from the first CSV is populated in both columns. For example.

Result.CSV *Notice the keys skipping the ID that doesn't exist in the CSV

ID, COUNT, COUNT
1, 3037, 3037
2, 394, 394
3,141, 141
5,352. 352
7,31, 31

The result should be:

ID, COUNT, COUNT
1,3037, 375
2,394, 1178
3,141, 1238
5,352, 2907
6, ,231
7,31, 2469

Etc etc

Any help will be greatly appreciated.

2
  • Can you hold all the csv files in memory? Commented May 30, 2013 at 12:03
  • Technically, yes. When I define the alias I can save the file into a separate dictionary instead of nesting everything that needs to be done. reader = csv.reader(open('test.csv')) result = {} for row in reader: key = row[0] if key in result: # implement your duplicate row handling here pass result[key] = row[1:] print result Commented May 30, 2013 at 15:46

1 Answer 1

2

This works:

import csv

def read_csv(fobj):
    reader = csv.DictReader(fobj, delimiter=',')
    return {line['ID']: line['COUNT'] for line in reader}


with open('csv1.csv') as csv1, open('csv2.csv') as csv2, \
     open('csv3.csv') as csv3, open('out.csv', 'w') as out:
    data = [read_csv(fobj) for fobj in [csv1, csv2, csv3]]
    all_keys = sorted(set(data[0]).union(data[1]).union(data[2]))
    out.write('ID COUNT COUNT COUNT\n')
    for key in all_keys:
        counts = (entry.get(key, '') for entry in data)
        out.write('{}, {}, {}, {}\n'.format(key, *tuple(counts)))

The content of the output file:

ID, COUNT, COUNT, COUNT
1, 3037, 375, 675
2, 394, 1178, 7178
3, 141, 1238, 8238
5, 352, 2907, 
6, , 231, 431
7, 31, 2469, 6469

The Details

The function read_csv returns a dictionary with the ids as keys and the counst as values. We will use this function to read all three inputs. For example for csv1.csv

with open('csv1.csv') as csv1:
    print(read_csv(csv1))

we get this result:

{'1': '3037', '3': '141', '2': '394', '5': '352', '7': '31'}

We need to have all keys. One way is to convert them to sets and use union to find the unique ones. We also sort them:

all_keys = sorted(set(data[0]).union(data[1]).union(data[2]))

['1', '2', '3', '5', '6', '7']

In the loop over all keys, we retrieve the count using entry.get(key, ''). If the key is not contained, we get an empty string. Look at the output file. You see just commas and no values at places were no value was found in the input. We use a generator expression so we don't have to re-type everything three times:

counts = (entry.get(key, '') for entry in data)

This is the content of one of the generators:

list(counts)
('3037', '375', '675')

Finally, we write to our output file. The * converts a tuple like this ('3037', '375', '675') into three arguments, i.e. .format() is called like this .format(key, '3037', '375', '675'):

out.write('{}, {}, {}, {}\n'.format(key, *tuple(counts)))
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.