Append pandas DataFrame column to CSV

Question

I'm trying to append a pandas DataFrame (single column) to an existing CSV, much like this post, but it's not working! Instead my column is added at the bottom of the csv, and repeated over and over (rows in csv >> size of column). Here's my code:

with open(outputPath, "a") as resultsFile:
    print len(scores)
    scores.to_csv(resultsFile, header=False)
    print resultsFile

Terminal output:4032 <open file '/Users/alavin/nta/NAB/results/numenta/artificialWithAnomaly/numenta_art_load_balancer_spikes.csv', mode 'a' at 0x1088686f0>

Thank you in advance!

two additional bits of info would be helpful: 1) what do the existing contents of resultsFile look like? (confirm by inspecting the file manually) and 2) what does the scores DataFrame look like? (scores.head(10) should suffice) — Paul H
– Paul H, Commented Jan 8, 2015 at 18:44
It's appending the scores data frame at the end of the file because that's how the pandas to_csv functionality works. If you want to append scores as a new column on the original csv data frame then you would need to read the csv into a data frame, append the scores column and then write it back to the csv. — alacy
– alacy, Commented Jan 8, 2015 at 18:45
resultsFile is a csv of 5 columns: 'timestamp', 'value', 'aaa', 'bbb', 'label'. I would like the 6th to be the scores DataFrame. I've verified all columns are the same length. scores has a column header 's'. — BoltzmannBrain
– BoltzmannBrain, Commented Jan 8, 2015 at 19:08
@aus_lacy I should've clarified, I'm trying to do this without reading in the csv... — BoltzmannBrain
– BoltzmannBrain, Commented Jan 8, 2015 at 19:19
@alavin89 I don't think it's possible to append the column to the original data frame within the csv without opening the file and parsing the data since python has no way of knowing that there is a data frame in the csv to append to. — alacy
– alacy, Commented Jan 8, 2015 at 19:23

Anzel · Accepted Answer · 2015-01-08 19:09:39Z

13

Like what @aus_lacy has already suggested, you just need to read the csv file into a data frame first, concatenate two data frames and write it back to the csv file:

supposed your existing data frame called df:

df_csv = pd.read_csv(outputPath, 'your settings here')

# provided that their lengths match
df_csv['to new column'] = df['from single column']

df_csv.to_csv(outputPath, 'again your settings here')

That's it.

answered Jan 8, 2015 at 19:09

Anzel

20.6k5 gold badges54 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

BoltzmannBrain Over a year ago

I'm trying to avoid opening and reading in all that data, but this does work :)

Anzel Over a year ago

@alavin89, do you have to use python?

Anzel Over a year ago

@alavin89, then it will be difficult, provided that you still need to open+read each line of your csv to find the linefeed and append your new column. I don't like DiskIO-wise you can have an easy solution

Veronica Over a year ago

For some reason, to_csv() is adding data in new rows, I want to add the dataframe in new columns. Can you please help? pythonfiddle.com/copy-csv-and-dataframe

Anzel Over a year ago

@Veronica are you certain the length of two dataframes actually matched?

|

DisabledWhale · Accepted Answer · 2021-08-11 08:02:38Z

I find the solution problematic, if many columns are to be added to a large csv file iteratively.

A solution would be to accept the csv file to store a transposed dataframe. i.e. headers works as indices and vice versa.

The upside is that you don't waste computation power on insidious operations.

Here is operation times for regular appending mode, mode='a', and appending column approach for series with length of 5000 appended 100 times:

The downside is that you have to transpose the dataframe to get the "intended" dataframe when reading the csv for other purposes.

Code for plot:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt

col = []
row = []
N = 100

# Append row approach
for i in range(N):
    t1 = dt.datetime.now()
    data = pd.DataFrame({f'col_{i}':np.random.rand(5000)}).T
    data.to_csv('test_csv_data1.txt',mode='a',header=False,sep="\t")
    t2 = dt.datetime.now()
    row.append((t2-t1).total_seconds())

# Append col approach
pd.DataFrame({}).to_csv('test_csv_data2.txt',header=True,sep="\t")
for i in range(N):
    t1 = dt.datetime.now()
    data = pd.read_csv('test_csv_data2.txt',sep='\t',header=0)
    data[f'col_{i}'] = np.random.rand(5000)
    data.to_csv('test_csv_data2.txt',header=True,sep="\t")
    t2 = dt.datetime.now()
    col.append((t2-t1).total_seconds())
    
t = pd.DataFrame({'N appendices':[i for i in range(N)],'append row':row,'append col':col})
t = t.set_index('N appendices')

Collectives™ on Stack Overflow

Append pandas DataFrame column to CSV

2 Answers 2

6 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Linked

Related