4

I'm wondering how to get parsed tables from panda into a single CSV, I have managed to get each table into a separate CSV for each one, but would like them all on one CSV. This is my current code to get multiple CSVs:

import pandas as pd
import csv

url = "https://fasttrack.grv.org.au/RaceField/ViewRaces/228697009? 
raceId=318809897"

data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )

for i, datas in enumerate(data):

    datas.to_csv("new{}.csv".format(i), header = False, index = False)
2
  • Is the schema for all tables same? Commented May 9, 2018 at 3:48
  • yes the schema is the same Commented May 9, 2018 at 4:33

3 Answers 3

4

I think need concat only, because data is list of DataFrames:

df = pd.concat(data, ignore_index=True)
df.to_csv(file, header=False, index=False)
Sign up to request clarification or add additional context in comments.

1 Comment

You can use axis=1 in concat to put the dataframes side-by-side instead of one after the other (not sure which one you want).
3

You have 2 options:

  1. You can tell pandas to append data while writing to the CSV file.

    data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
    for datas in data:
        datas.to_csv("new.csv", header=False, index=False, mode='a')
    
  2. Merge all the tables into one DataFrame and then write that into the CSV file.

    data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
    df = pd.concat(data, ignore_index=True)
    df.to_csv("new.csv", header=False, index=False)
    

Edit

To still separate the dataframes on the csv file, we shall have to stick with option #1 but with a few additions

data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
with open('new.csv', 'a') as csv_stream:
    for datas in data:
        datas.to_csv(csv_stream, header=False, index=False)
        csv_stream.write('\n')

1 Comment

Thankyou! Would you know how to somehow still seperate the tables during the concat? So they aren't straight after one another? Like have one row of space between
0
all_dfs = []

for i, datas in enumerate(data):
    all_dfs.append(datas.to_csv("new{}.csv".format(i), header = False, index = False))

result = pd.concat(all_dfs)

2 Comments

This can be a one-liner with list comprehension, but I chose the form above for clarity.
Thanks for your reply, I'm getting an error with that code ValueError: All objects passed were None

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.