58

I'm writing a script to reduce a large .xlsx file with headers into a CSV, and then write a new CSV file with only the required columns based on the header names.

import pandas
import csv

df = pandas.read_csv('C:\\Python27\\Work\\spoofing.csv')

time = df["InviteTime (Oracle)"]
orignum = df["Orig Number"]
origip = df["Orig IP Address"]
destnum = df["Dest Number"]

df.to_csv('output.csv', header=[time,orignum,origip,destnum])

The error I'm getting is with that last bit of code, and it says

ValueError: Writing 102 cols but got 4 aliases

I'm sure I'm overlooking something stupid, but I've read over the to_csv documentation on the pandas website and I'm still at a loss. I know I'm misusing the to_csv parameters but I can't seem to get my head around the documentation.

Any help is appreciated, thanks!

2 Answers 2

124

The way to select specific columns is this -

header = ["InviteTime (Oracle)", "Orig Number", "Orig IP Address", "Dest Number"]
df.to_csv('output.csv', columns = header)
Sign up to request clarification or add additional context in comments.

5 Comments

Here is information from the documentation on the parameters.
Seems to be a mismatch in column names. You can check your columns with df.columns
Only if one unreasonably repeats it :)
Is append feature in df.to_csv?
3
column_list=["column_name1", "column_name2", "column_name3", "column_name4"]

#filter the dataframe beforehand
ds[column_list].to_csv('output.csv',index=False)

#or use columns arg
ds.to_csv('output.csv', columns = column_list,index=False)

I provide index=False arg in order to write only column values

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.