AssertionError with pandas when reading excel

Question

I'm trying to read an xlsx file into python using pandas.
I've done this thousands of times before but some reason it is not working with a particular file.

The file is downloaded from another source and I get an AssertionError (see end) when reading with pandas:

df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1")

The variable is defined for the path. The path exists (os.path.exists(path) returns True).

When I copy the contents of the file and paste the values in a new excel doc, this new one will open with the read_excel() method.

When I copy the contents of the file and paste the formatting in a new excel, this new one will open with the read_excel() method.

It doesn't seem to be the values or the formatting.

I am guessing this could be an encoding issue?
Thank you for any help.

    df1 = pandas.read_excel(snap1)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
    io = ExcelFile(io, engine=engine)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 206, in __init__
    self.book = xlrd.open_workbook(io)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
    ragged_rows=ragged_rows,
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 794, in open_workbook_2007_xml
    x12sheet.process_stream(zflo, heading)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 531, in own_process_stream
    self_do_row(elem)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 597, in do_row
    assert 0 <= self.rowx < X12_MAX_ROWS
AssertionError

based on the assertion check being 0 <= self.rowx < X12_MAX_ROWS I'm guessing that the rowx is either negative (Idk why that would happen) or more then what ever X12_MAX_ROWS is, is your spreadsheet remarkably large? — Tadhg McDonald-Jensen
– Tadhg McDonald-Jensen, Commented May 27, 2016 at 14:57
then is it possible that there is a stray value way out of bounds in the file? what if you backup the content, select all and delete then paste back in the content? — Tadhg McDonald-Jensen
– Tadhg McDonald-Jensen, Commented May 27, 2016 at 15:01
I selected all when I copied to the other excel file.....when I try the read_table method instead it says something about a 0x89 found? Is this possibly a source of error? — Eoin
– Eoin, Commented May 27, 2016 at 15:04
I had a similar error, and the xlrd-people seem to be aware. For me it had to do with the row= set to 0 in one of the underlying xml-files. I could solve it by unpacking the xlsx, correcting the value to 1 and rezipping the files. Now I'm looking for a more automatic solution — Maarten Fabré
– Maarten Fabré, Commented Nov 30, 2016 at 11:30

jmarcio · Accepted Answer · 2020-10-02 13:48:31Z

3

Look at your system for the file xlsx.py.

In your computer it's apparently at C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py

Search the line :

X12_MAX_ROWS = 2 ** 20

and change it so something like

X12_MAX_ROWS = 2 ** 22

This will push the limit of the number of lines from 1 million lines to 4 million lines.

answered Oct 2, 2020 at 13:48

jmarcio

1063 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Prutha Modak · Accepted Answer · 2018-08-17 07:13:37Z

2

In my case, I was using xlrd package to read excel and I got the same error of Assertion. Open your xlrd package from site-packages, and from that open sheet.py (https://github.com/python-excel/xlrd/blob/master/xlrd/sheet.py)

Find this code in sheet.py

    if self.biff_version >= 80:
        self.utter_max_rows = 65536
    else:
        self.utter_max_rows = 16384

Convert the above one into...

 #if self.biff_version >= 80:
 self.utter_max_rows = 65536
 #else:
 #      self.utter_max_rows = 16384

Now try to run your program... Problem will be solved..:)

edited Aug 17, 2018 at 7:13

answered Aug 17, 2018 at 7:05

Prutha Modak

313 bronze badges

1 Comment

Yaakov Bressler Over a year ago

Any insight into how / what is happening here?

Community · Accepted Answer · 2017-05-23 12:24:26Z

Just for completeness, I had a similar problem where the row number of the first row was incorrect, I fixed my problem by changing xlsx-file with code adapted from this answer

def repair_broken_excelfile(zipfname, *filenames, new_name=None):
    # https://stackoverflow.com/a/4653863/1562285
    import tempfile
    import zipfile
    import shutil
    import os
    tempdir = tempfile.mkdtemp()
    try:
        tempname = os.path.join(tempdir, 'new.zip')
        with zipfile.ZipFile(zipfname, 'r') as zipread:
            with zipfile.ZipFile(tempname, 'w') as zipwrite:
                for item in zipread.infolist():
                    print('fn: ' + item.filename)
                    if item.filename not in filenames:
                        data = zipread.read(item.filename)

                        zipwrite.writestr(item, data)
                    else:

                        data = zipread.read(item.filename)
                        data = data.replace(b'<row r="0" spans="">', b'<row r="1" spans="">')
                        zipwrite.writestr(item, data)
                        pass
        if not new_name:
            new_name = zipfname
        shutil.move(tempname, new_name)
    finally:
        shutil.rmtree(tempdir)

Apparently there is a fix underway in xlrd

ibn · Accepted Answer · 2019-07-11 16:53:21Z

0

Have encountered the same problem, I save file under xml format: "Save as type: XML Spreadsheet 2003" on window. Then I open the file and save as xlsx format. The new file no longer gives error message.

answered Jul 11, 2019 at 16:53

ibn

11 bronze badge

Comments

Eoin · Accepted Answer · 2020-03-19 08:23:17Z

0

The file contained Korean characters in the text. These needed alternative encoding. Using the "encoding" parameter in the read_excel() method resolved the issue.

df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1", encoding="utf-16")

answered Mar 19, 2020 at 8:23

Eoin

3571 gold badge4 silver badges20 bronze badges

3 Comments

Abdulsalam Almahdi Over a year ago

there is no parameter encoding for read_excel method. why is this even selected as an answer

Eoin Over a year ago

You may be looking at the current pandas version which doesn't support it.

Eoin Over a year ago

Go back in time to the version I likely used (this should have been included on the original question) and the method accepts and passes key word arguments to the nested method. github.com/pandas-dev/pandas/blob/0.19.x/pandas/io/excel.py

HannibalTheBarbarian · Accepted Answer · 2020-08-06 12:30:41Z

0

Sometimes this can be resolved just by deleting the (blank) lines below your table in Excel.

answered Aug 6, 2020 at 12:30

HannibalTheBarbarian

361 silver badge4 bronze badges

Collectives™ on Stack Overflow

AssertionError with pandas when reading excel

6 Answers 6

Comments

1 Comment

Comments

Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

1 Comment

Comments

Comments

3 Comments

Comments

Linked

Related