Python: appending/merging multiple csv files respecting headers and write to csv

Question

[Using Python3] I'm very new to (Python) programming but nonetheless am writing a script that scans a folder for certain csv files, then I want to read them all and append them and write them into another csv file.

In between it is required that data is returned only where the values in a certain columns are matched to a set criteria.

All csv files have the same columns, and would look somewhere like this:

header1 header2 header3 header4 ...
string  float   string  float   ...
string  float   string  float   ...
string  float   string  float   ...
string  float   string  float   ...
...     ...     ...     ...     ...

The code I'm working with right now is the following (below), however it just keeps on overwriting the data from the previous file. That does make sense to me, I just cannot figure out how to get it working though.

Code:

import csv
import datetime
import sys
import glob
import itertools
from collections import defaultdict

# Raw data files have the format like '2013-06-04'. To be able to use this script during the whole of 2013, the glob is set to search for the pattern '2013-*.csv'
files = [f for f in glob.glob('2013-*.csv')]

# Output file looks like '20130620-filtered.csv'
outfile = '{:%Y%m%d}-filtered.csv'.format(datetime.datetime.now())

# List of 'Header4' values to be filtered for writing output
header4 = ['string1', 'string2', 'string3', 'string4']

for f in files:
    with open(f, 'r') as f_in:
        dict_reader = csv.DictReader(f_in)

        with open(outfile, 'w') as f_out:
            dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
            dict_writer.writeheader()
            for row in dict_reader:
                if row['Campaign'] in campaign_names:
                    dict_writer.writerow(row)

I also tried something like readers = list(itertools.chain(*map(lambda f: csv.DictReader(open(f)), files))), and trying to iterate over the readers however then I cannot figure out how to work with the headers. (I get the error that itertools.chain() does not have the fieldnames attribute).

Any help is very much appreciated!

Dan Fuller · Accepted Answer · 2013-06-19 13:27:30Z

3

You keep re-opening the file and overwriting it.

Open outfile once, before your loops start. For the first file you read, write the header and the rows. For rest of the files, just write the rows.

Something like

with open(outfile, 'w') as f_out:
    dict_writer = None
    for f in files:
        with open(f, 'r') as f_in:
            dict_reader = csv.DictReader(f_in)
            if not dict_writer:
                dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
                dict_writer.writeheader()
            for row in dict_reader:
                if row['Campaign'] in campaign_names:
                    dict_writer.writerow(row)

edited Jun 19, 2013 at 13:27

answered Jun 19, 2013 at 13:20

Dan Fuller

1,1511 gold badge11 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Matthijs Over a year ago

Hey Dan, thanks for this answer! It worked like a charm :) Still trying to figure out exactly what you're doing and why this actually worked. For instance, why do you set dict_writer = None after opening the outfile? Also, why is the flow statement if not dict_writer: necessary? Thanks again!

Dan Fuller Over a year ago

In Python, if a variable is set to None, then it will evaluate to False. So if not dict_writer is the same as if dict_writer is None. Basically, this ensures that you will create the dict_writer only once.

Dan Fuller Over a year ago

Oh, and dict_writer = None doesn't need to be done after opening the file - it could be before it as well. What is important is that it happens outside the loops. To get a better feel for Python, I suggest you work through diveinto.org/python3

Matthijs Over a year ago

Hey Dan, thanks for getting back and that tip. I'm actually going through Diveinto Python3. Hope that will be fruitful.

Collectives™ on Stack Overflow

Python: appending/merging multiple csv files respecting headers and write to csv

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related