33

I want to write a Python script that reads in an Excel spreadsheet and saves some of its worksheets as CSV files.

How can I do this?

I have found third-party modules for reading and writing Excel files from Python, but as far as I can tell, they can only save files in Excel (i.e. *.xls) format. If I'm wrong here, some example code showing how to do what I'm trying to do with these modules would be appreciated.

I also came across one solution that I can't quite understand, but seems to be Windows-specific, and therefore would not help me anyway, since I want to do this in Unix. At any rate, it's not clear to me that this solution can be extended to do what I want to do, even under Windows.

5 Answers 5

67

The most basic examples using the two libraries described line by line:

  1. Open the xls workbook
  2. Reference the first spreadsheet
  3. Open in binary write the target csv file
  4. Create the default csv writer object
  5. Loop over all the rows of the first spreadsheet
  6. Dump the rows into the csv

import xlrd
import csv

with xlrd.open_workbook('a_file.xls') as wb:
    sh = wb.sheet_by_index(0)  # or wb.sheet_by_name('name_of_the_sheet_here')
    with open('a_file.csv', 'wb') as f:   # open('a_file.csv', 'w', newline="") for python 3
        c = csv.writer(f)
        for r in range(sh.nrows):
            c.writerow(sh.row_values(r))

import openpyxl
import csv

wb = openpyxl.load_workbook('test.xlsx')
sh = wb.active
with open('test.csv', 'wb') as f:  # open('test.csv', 'w', newline="") for python 3
    c = csv.writer(f)
    for r in sh.rows:
        c.writerow([cell.value for cell in r])
Sign up to request clarification or add additional context in comments.

2 Comments

To evaluate the Excel formulas with openpyxl : wb = openpyxl.load_workbook('test.xlsx', data_only=True)
@Zeugma how can I write this csv back to a folder? (In my case aws s3) I keep getting AttributeError: '_io.TextIOWrapper' object has no attribute 'save'
18

Using pandas is a bit shorter:

import pandas as pd

df = pd.read_excel('my_file', sheet_name='my_sheet_name')  # sheet_name is optional
df.to_csv('output_file_name', index=False)  # index=False prevents pandas from writing a row index to the CSV.

# oneliner
pd.read_excel('my_file', sheet_name='my_sheet_name').to_csv('output_file_name', index=False)

4 Comments

I don't trust pandas to do this. It's been converting all my leading zeros.
Can you give some more details?
This implementation works perfectly fine for my scenario, just change sheetname to sheet_name as it's a typo.
@rrs why not use pd.read_excel('my_file', dtype=str) ? more info here
17

As of December 2021 and Python 3:

The openpyxl API has changed sufficiently (see https://openpyxl.readthedocs.io/en/stable/usage.html) that I have updated this part of the answer by @Boud (now @Zeugma?), as follows:

import openpyxl
import csv

wb = openpyxl.load_workbook('test.xlsx')
sh = wb.active # was .get_active_sheet()
with open('test.csv', 'w', newline="") as file_handle:
    csv_writer = csv.writer(file_handle)
    for row in sh.iter_rows(): # generator; was sh.rows
        csv_writer.writerow([cell.value for cell in row])

@Leonid made some helpful comments - in particular:

csv.writer provides some additional options e.g. custom delimiter:

csv_writer = csv.writer(fout, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)

HTH

4 Comments

A couple of typos here. The "with" needs "as f" on the end, and "sh.iter_rows" should be "sh.iter_rows()" Otherwise, works well, thanks!
@eakst7 Huge thanks - can you believe I typed it out - now fixed - glad it helped.
Thanks, that was useful. Two comments from me: 1. pylama does not like single-letter variable names and the call to csv.writer provides additional options (such as custom delimiter) which would be cool to highlight. For example: csv_writer = csv.writer(fout, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)
@Leonid Thanks so much - updated as per your helpful comments!
5

Use the xlrd or openpyxlmodule to read xls or xlsx documents respectively, and the csv module to write.

Alternately, if using Jython, you can use the Apache POI library to read either .xls or .xlsx, and the native CSV module will still be available.

2 Comments

And if you need to read .xlsx files use openpyxl.
I prefer xlsxrd to read .xlsx files. At some point, it will be merged into xlrd.
0

First read your Excel spreadsheet into Pandas. The code below will import your Excel spreadsheet into Pandas as an OrderedDict which contains all of your worksheets as DataFrames. Then, simply use the worksheet_name as a key to access specific worksheet as a DataFrame and save only the required worksheet as a csv file by using df.to_csv(). Hope this will work in your case.

import pandas as pd
df = pd.read_excel('YourExcel.xlsx', sheet_name=None)
df['worksheet_name'].to_csv('output.csv')  

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.