Converting multiple Excel files in multiple folders into CSV

Question

The whole project is quite large, involving parsing the CSV files, validating them, exporting them to a database and such, but my code in question involves a subset of the project: converting multiple Excel files. The converter method can be seen below:

import glob
from openpyxl import load_workbook
import csv
import codecs
from zipfile import BadZipFile


def convert_excel(data, meta, meta_additional, sheet_number):
    convert_xlsx_to_csv(data, sheet_number)
    convert_xlsx_to_csv(meta, sheet_number)
    convert_xlsx_to_csv(meta_additional, sheet_number)


def convert_xlsx_to_csv(directory, sheet_number):
    excel_files = glob.glob(directory + '*.xlsx')
    for excel_file in excel_files:
        try:
            workbook = load_workbook(excel_file)
            sheets = workbook.get_sheet_names()
            sheet = sheets[sheet_number]
            out = excel_file.replace('.xlsx', '.csv')
            write_as_csv(out, workbook, sheet)
        # this exception will only occur if the excel file is corrupted.
        except BadZipFile:
            print("error in file: " + excel_file + " -- ignoring file")


def write_as_csv(output, workbook, sheet):
    # utf-8 is used, because the parsing parts rely on that and the database
    # itself is encoded as utf-8
    with codecs.open(output, 'w', 'utf-8') as converted_file:
        wr = csv.writer(converted_file, delimiter='\t')
        for row in workbook[sheet].rows:
            wr.writerow([cell.value for cell in row])

if __name__ == '__main__':
    convert_excel('testdir1/', 'testdir2/', 'testdir3/', 0)

And here is how it's called from another module (the 'main'-module):

import convert_excel
convert_excel.convert_excel(**params)

where sheet number and directories are read from a config file that looks like this:

[parseparams]
data=path/to/dir/
meta=path/to/meta/
meta_additional=path/to/meta_additional/
sheet_number=0

and the config is parsed like this:

from configparser import ConfigParser


def config(filename='config/cfg.ini', section):
    """
    parses the directory information
    :param filename: path to the config file
    :param section: which portion of the config will be parsed;
    'parseparams', for parsing, 'dbase' for database config. 
    :return: returns config information as a dictionary
    """
    parser = ConfigParser()
    parser.read(filename)
    params = {}

    if parser.has_section(section):
        parameters = parser.items(section)
        for param in parameters:
            params[param[0]] = param[1]
    else:
        raise Exception("no config was found!")
    return params

I decided to make calling the module a one-liner (and thus add one method, convert_excel() to the converter module), to keep the main module as short and readable as possible.

toolic · Accepted Answer · 2024-12-29 12:35:33Z

Overview

The functions and variables are well-named, and the comments are helpful.

DRY

In the convert_excel function, these 3 lines are repeated except for the directory argument (data, etc.):

convert_xlsx_to_csv(data, sheet_number)
convert_xlsx_to_csv(meta, sheet_number)
convert_xlsx_to_csv(meta_additional, sheet_number)

If you pass an array of directories into convert_excel, instead of 3 separate directories, then you can use a loop and call convert_xlsx_to_csv once in the loop body. This will make the function scalable to any number of directories.

f-string

Using f-strings can simplify code. For example, change:

        print("error in file: " + excel_file + " -- ignoring file")

to:

        print(f"error in file: {excel_file} -- ignoring file")

Documentation

Add a docstring to the top of each Python file to summarize its purpose.

Add docstrings to each function as well. For example:

def convert_xlsx_to_csv(directory, sheet_number):
    """ Convert all Excel files in directory to CSV """

Stack Exchange Network

Converting multiple Excel files in multiple folders into CSV

1 Answer 1

Overview

DRY

f-string

Documentation

You must log in to answer this question.

Hot Network Questions

1 Answer 1

Overview

f-string

Documentation

You must log in to answer this question.

Related