7

I'm parsing data into lists and using pandas to frame and write to an CSV file. First my data is taken into a set where inv, name, and date are all lists with numerous entries. Then I use concat to concatenate each iteration through the datasets I parse through to a CSV file like so:

counter = True
data = {'Invention': inv, 'Inventor': name, 'Date': date}

if counter is True:
  df = pd.DataFrame(data)
  df = df[['Invetion', 'Inventor', 'Date']]

else:
  df = pd.concat([df, pd.DataFrame(data)])
  df = df[['Invention', 'Inventor', 'Date']]

  with open('./new.csv', 'a', encoding = utf-8) as f:
    if counter is True:
      df.to_csv(f, index = False, header = True)
    else:
      df.to_csv(f, index = False, header = False)

counter = False

The counter = True statement resides outside of my iteration loop for all the data I'm parsing so it's not overwriting every time.

So this means it only runs once through my data to grab the first df set then concats it thereafter. The problem is that even though counter is only True the first round and works for my first if-statement for df it does not work for my writing to file.

What happens is that the header is written over and over again - regardless to the fact that counter is only True once. When I swap the header = False for when counter is True then it never writes the header.

I think this is because of the concatenation of df holding onto the header somehow but other than that I cannot figure out the logic error.

Is there perhaps another way I could also write a header once and only once to the same CSV file?

2
  • The first line of your code is counter = True. You have to make sure that is outside of the loop, otherwise counter will be set to True every time through the loop. Commented Jan 1, 2018 at 21:21
  • Yeah it's definitely outside of my loop which I did not mention. I will update that fact. Commented Jan 1, 2018 at 21:27

3 Answers 3

8

It's hard to tell what might be going wrong without seeing the rest of the code. I've developed some test data and logic that works; you can adapt it to fit your needs.

Please try this:

import pandas as pd

early_inventions = ['wheel', 'fire', 'bronze']
later_inventions = ['automobile', 'computer', 'rocket']

early_names = ['a', 'b', 'c']
later_names = ['z', 'y', 'x']

early_dates = ['2000-01-01', '2001-10-01', '2002-03-10']
later_dates = ['2010-01-28', '2011-10-10', '2012-12-31']

early_data = {'Invention': early_inventions,
    'Inventor': early_names,
    'Date': early_dates}

later_data = {'Invention': later_inventions,
    'Inventor': later_names,
    'Date': later_dates}

datasets = [early_data, later_data]

columns = ['Invention', 'Inventor', 'Date']
header = True
for dataset in datasets:
    df = pd.DataFrame(dataset)
    df = df[columns]
    mode = 'w' if header else 'a'
    df.to_csv('./new.csv', encoding='utf-8', mode=mode, header=header, index=False)
    header = False

Alternatively, you can concatenate all of the data in the loop and write out the dataframe at the end:

df = pd.DataFrame(columns=columns)
for dataset in datasets:
    df = pd.concat([df, pd.DataFrame(dataset)])
    df = df[columns]
df.to_csv('./new.csv', encoding='utf-8', index=False)

If your code cannot be made to conform to this API, you can forego writing the header in to_csv altogether. You can detect whether the output file exists and write the header to it first if it does not:

import os

fn = './new.csv'
if not os.path.exists(fn):
    with open(fn, mode='w', encoding='utf-8') as f:
        f.write(','.join(columns) + '\n')
# Now append the dataframe without a header
df.to_csv(fn, encoding='utf-8', mode='a', header=False, index=False)
Sign up to request clarification or add additional context in comments.

3 Comments

Yes it's difficult using a snippet because this script is a bit on the large side. The problem with your code is that it assumes both early_inventions and later_inventions exist at the same time so you can DataFrame them according to the culmination datasets. However, my script parses one of these lists at a time - where my data (named data above) changes with every iteration in a large for-loop. Isn't there some way I can simply create a long list of strings (such as your "columns") and just write it once before the other data? My code works fine outside of this.
Yes, you could just write the header line separately and then append each dataframe to the file without a header.
Would you be able to provide a pseudo code for me? This was the initial struggle I was having actually.
0

I found the same problem. Pandas dataframe to csv works fine if the dataframe is finished and no need to do anything beyond any tutorial.

However if our program is making results and we are appending them, it seems that we find the repetitive header writing problem

In order to solve this consider the following function:

def write_data_frame_to_csv_2(dict, path, header_list):
    df = pd.DataFrame.from_dict(data=dict, orient='index')
    filename = os.path.join(path, 'results_with_header.csv')
    if os.path.isfile(filename):
        mode = 'a'
        header = 0
    else:
        mode = 'w'
        header = header_list

    with open(filename, mode=mode) as f:
        df.to_csv(f, header=header, index_label='model')

If the file does not exist we use write mode and header is equal to header list. When this is false, and the file exists we use append and header changed to 0.

The function receives a simple dictionary as parameter, In my case I used:

model = { 'model_name':{'acc':0.9,
                    'loss':0.3,
                    'tp':840,
                    'tn':450}

      }

Using the function form ipython console several times produces expected result:

write_data_frame_to_csv_2(model, './', header_list)

Csv generated:

model,acc,loss,tp,tn
model_name,0.9,0.3,840,450
model_name,0.9,0.3,840,450
model_name,0.9,0.3,840,450
model_name,0.9,0.3,840,450

Let me know if it helps. Happy coding!

Comments

0

just add this check before setting header property if you are using an index to iterate over API calls to add data in csv file.

if i > 0:
        dataset.to_csv('file_name.csv',index=False, mode='a', header=False)
    else:
        dataset.to_csv('file_name.csv',index=False, mode='a', header=True)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.