I'm trying to read an xlsx file into python using pandas.
I've done this thousands of times before but some reason it is not working with a particular file.
The file is downloaded from another source and I get an AssertionError (see end) when reading with pandas:
df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1")
The variable is defined for the path. The path exists (os.path.exists(path) returns True).
When I copy the contents of the file and paste the values in a new excel doc, this new one will open with the read_excel() method.
When I copy the contents of the file and paste the formatting in a new excel, this new one will open with the read_excel() method.
It doesn't seem to be the values or the formatting.
I am guessing this could be an encoding issue?
Thank you for any help.
df1 = pandas.read_excel(snap1)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 206, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 794, in open_workbook_2007_xml
x12sheet.process_stream(zflo, heading)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 531, in own_process_stream
self_do_row(elem)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 597, in do_row
assert 0 <= self.rowx < X12_MAX_ROWS
AssertionError
0 <= self.rowx < X12_MAX_ROWSI'm guessing that therowxis either negative (Idk why that would happen) or more then what everX12_MAX_ROWSis, is your spreadsheet remarkably large?row=set to 0 in one of the underlying xml-files. I could solve it by unpacking the xlsx, correcting the value to 1 and rezipping the files. Now I'm looking for a more automatic solution