1

[Using Python3] I'm very new to (Python) programming but nonetheless am writing a script that scans a folder for certain csv files, then I want to read them all and append them and write them into another csv file.

In between it is required that data is returned only where the values in a certain columns are matched to a set criteria.

All csv files have the same columns, and would look somewhere like this:

header1 header2 header3 header4 ...
string  float   string  float   ...
string  float   string  float   ...
string  float   string  float   ...
string  float   string  float   ...
...     ...     ...     ...     ...

The code I'm working with right now is the following (below), however it just keeps on overwriting the data from the previous file. That does make sense to me, I just cannot figure out how to get it working though.

Code:

import csv
import datetime
import sys
import glob
import itertools
from collections import defaultdict

# Raw data files have the format like '2013-06-04'. To be able to use this script during the whole of 2013, the glob is set to search for the pattern '2013-*.csv'
files = [f for f in glob.glob('2013-*.csv')]

# Output file looks like '20130620-filtered.csv'
outfile = '{:%Y%m%d}-filtered.csv'.format(datetime.datetime.now())

# List of 'Header4' values to be filtered for writing output
header4 = ['string1', 'string2', 'string3', 'string4']

for f in files:
    with open(f, 'r') as f_in:
        dict_reader = csv.DictReader(f_in)

        with open(outfile, 'w') as f_out:
            dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
            dict_writer.writeheader()
            for row in dict_reader:
                if row['Campaign'] in campaign_names:
                    dict_writer.writerow(row)

I also tried something like readers = list(itertools.chain(*map(lambda f: csv.DictReader(open(f)), files))), and trying to iterate over the readers however then I cannot figure out how to work with the headers. (I get the error that itertools.chain() does not have the fieldnames attribute).

Any help is very much appreciated!

1 Answer 1

3

You keep re-opening the file and overwriting it.

Open outfile once, before your loops start. For the first file you read, write the header and the rows. For rest of the files, just write the rows.

Something like

with open(outfile, 'w') as f_out:
    dict_writer = None
    for f in files:
        with open(f, 'r') as f_in:
            dict_reader = csv.DictReader(f_in)
            if not dict_writer:
                dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
                dict_writer.writeheader()
            for row in dict_reader:
                if row['Campaign'] in campaign_names:
                    dict_writer.writerow(row)
Sign up to request clarification or add additional context in comments.

4 Comments

Hey Dan, thanks for this answer! It worked like a charm :) Still trying to figure out exactly what you're doing and why this actually worked. For instance, why do you set dict_writer = None after opening the outfile? Also, why is the flow statement if not dict_writer: necessary? Thanks again!
In Python, if a variable is set to None, then it will evaluate to False. So if not dict_writer is the same as if dict_writer is None. Basically, this ensures that you will create the dict_writer only once.
Oh, and dict_writer = None doesn't need to be done after opening the file - it could be before it as well. What is important is that it happens outside the loops. To get a better feel for Python, I suggest you work through diveinto.org/python3
Hey Dan, thanks for getting back and that tip. I'm actually going through Diveinto Python3. Hope that will be fruitful.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.