6
\$\begingroup\$

The whole project is quite large, involving parsing the CSV files, validating them, exporting them to a database and such, but my code in question involves a subset of the project: converting multiple Excel files. The converter method can be seen below:

import glob
from openpyxl import load_workbook
import csv
import codecs
from zipfile import BadZipFile


def convert_excel(data, meta, meta_additional, sheet_number):
    convert_xlsx_to_csv(data, sheet_number)
    convert_xlsx_to_csv(meta, sheet_number)
    convert_xlsx_to_csv(meta_additional, sheet_number)


def convert_xlsx_to_csv(directory, sheet_number):
    excel_files = glob.glob(directory + '*.xlsx')
    for excel_file in excel_files:
        try:
            workbook = load_workbook(excel_file)
            sheets = workbook.get_sheet_names()
            sheet = sheets[sheet_number]
            out = excel_file.replace('.xlsx', '.csv')
            write_as_csv(out, workbook, sheet)
        # this exception will only occur if the excel file is corrupted.
        except BadZipFile:
            print("error in file: " + excel_file + " -- ignoring file")


def write_as_csv(output, workbook, sheet):
    # utf-8 is used, because the parsing parts rely on that and the database
    # itself is encoded as utf-8
    with codecs.open(output, 'w', 'utf-8') as converted_file:
        wr = csv.writer(converted_file, delimiter='\t')
        for row in workbook[sheet].rows:
            wr.writerow([cell.value for cell in row])

if __name__ == '__main__':
    convert_excel('testdir1/', 'testdir2/', 'testdir3/', 0)

And here is how it's called from another module (the 'main'-module):

import convert_excel
convert_excel.convert_excel(**params)

where sheet number and directories are read from a config file that looks like this:

[parseparams]
data=path/to/dir/
meta=path/to/meta/
meta_additional=path/to/meta_additional/
sheet_number=0

and the config is parsed like this:

from configparser import ConfigParser


def config(filename='config/cfg.ini', section):
    """
    parses the directory information
    :param filename: path to the config file
    :param section: which portion of the config will be parsed;
    'parseparams', for parsing, 'dbase' for database config. 
    :return: returns config information as a dictionary
    """
    parser = ConfigParser()
    parser.read(filename)
    params = {}

    if parser.has_section(section):
        parameters = parser.items(section)
        for param in parameters:
            params[param[0]] = param[1]
    else:
        raise Exception("no config was found!")
    return params

I decided to make calling the module a one-liner (and thus add one method, convert_excel() to the converter module), to keep the main module as short and readable as possible.

\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Overview

The functions and variables are well-named, and the comments are helpful.

DRY

In the convert_excel function, these 3 lines are repeated except for the directory argument (data, etc.):

convert_xlsx_to_csv(data, sheet_number)
convert_xlsx_to_csv(meta, sheet_number)
convert_xlsx_to_csv(meta_additional, sheet_number)

If you pass an array of directories into convert_excel, instead of 3 separate directories, then you can use a loop and call convert_xlsx_to_csv once in the loop body. This will make the function scalable to any number of directories.

f-string

Using f-strings can simplify code. For example, change:

        print("error in file: " + excel_file + " -- ignoring file")

to:

        print(f"error in file: {excel_file} -- ignoring file")

Documentation

Add a docstring to the top of each Python file to summarize its purpose.

Add docstrings to each function as well. For example:

def convert_xlsx_to_csv(directory, sheet_number):
    """ Convert all Excel files in directory to CSV """
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.