I'm looking to parse CSV files containing multiple tables using Python 3's csv module.
These complex CSVs are not unlike the toy example below. My goal is to make an idiom for picking out any one table using a known header row.
Complex CSV file toy.csv:
lists, fruits, books, forks, rope, gum
4, 2, 3, 0, 2, 2
Manhattan Produce Market
id, fruit, color
1, orange, orange
2, apple, red
Books
id, book, pages
1, Webster’s Dictionary, 1000
2, Tony the Towtruck, 20
3, The Twelfth Night, 144
Rope
id, rope, length, diameter, color
1, hemp, 12-feet, .5, green
2, sisal, 50-feet, .125, brown
Kings County Candy
id, flavor, color, big-league
1, grape, purple, yes
2, mega mango, yellow-orange, no
Each table is preceded by a title (except for a garbage table at the start). I save the previous row, and when I match the correct table header, I add the title as a new column.
import csv, re
header = [] #doesn't need to be list, but I'm thinking ahead
table = []
with open('toy.csv', 'r') as blob:
reader = csv.reader(blob)
curr = reader.__next__()
while True:
prev = curr
try:
curr = reader.__next__()
except StopIteration:
break
if not ['id', ' book', ' pages'] == curr:
continue
else:
header.append(prev)
table.append(['title'] + curr)
while True:
try:
curr = reader.__next__()
if curr == []:
break
else:
table.append(header[0] + curr)
except StopIteration:
break
The first part is to make an idiom which I can simply repeat for each table I want to extract. Later, I will combine the tables into one super-table filling NANs where the table headers don't match.
[['title', 'id', ' book', ' pages'],
['Books', '1', ' Webster’s Dictionary', ' 1000'],
['Books', '2', ' Tony the Towtruck', ' 20'],
['Books', '3', ' The Twelfth Night', ' 144']]
The code is based on this Stack Overflow post.
Happy to hear your suggestions to make the code more compact, idiomatic, and fit for my goals.