How to add header row to a pandas DataFrame

Question

I am reading a csv file into pandas. This csv file consists of four columns and some rows, but does not have a header row, which I want to add. I have been trying the following:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame = pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

But when I apply the code, I get the following Error:

ValueError: Shape of passed values is (1, 1), indices imply (4, 1)

What exactly does the error mean? And what would be a clean way in python to add a header row to my csv file/pandas df?

Here is a different interpretation of your question: Add another header to an existing Dataframe to create a MultiIndex. — cs95
– cs95, Commented May 24, 2019 at 6:44

cs95 · Accepted Answer · 2019-05-24 06:42:48Z

458

You can use names directly in the read_csv

names : array-like, default None List of column names to use. If file contains no header row, then you should explicitly pass header=None

Cov = pd.read_csv("path/to/file.txt", 
                  sep='\t', 
                  names=["Sequence", "Start", "End", "Coverage"])

edited May 24, 2019 at 6:42

cs95

406k106 gold badges744 silver badges794 bronze badges

answered Dec 4, 2015 at 15:43

Leb

16k11 gold badges58 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Anton Protopopov · Accepted Answer · 2016-03-25 21:55:48Z

204

Alternatively you could read you csv with header=None and then add it with df.columns:

Cov = pd.read_csv("path/to/file.txt", sep='\t', header=None)
Cov.columns = ["Sequence", "Start", "End", "Coverage"]

edited Mar 25, 2016 at 21:55

answered Dec 4, 2015 at 17:27

Anton Protopopov

31.9k13 gold badges93 silver badges96 bronze badges

1 Comment

Alex Jean Over a year ago

Code is good - but I found out that on an empty dataframe it will not work. In that case, Python throws "ValueError: Length mismatch: Expected axis has 0 elements, new values have ... elements". You might need something like stackoverflow.com/questions/44513738/…

vvvvv · Accepted Answer · 2022-03-17 14:34:57Z

29

col_Names=["Sequence", "Start", "End", "Coverage"]
my_CSV_File= pd.read_csv("yourCSVFile.csv",names=col_Names)

having done this, just check it with:

my_CSV_File.head()

edited Mar 17, 2022 at 14:34

vvvvv

32.8k19 gold badges70 silver badges103 bronze badges

answered Jan 24, 2018 at 3:08

Bhardwaj Joshi

4114 silver badges3 bronze badges

Comments

Mugen · Accepted Answer · 2021-08-19 08:42:52Z

23

Simple And Easy Solution:

import pandas as pd

df = pd.read_csv("path/to/file.txt", sep='\t')
headers =  ["Sequence", "Start", "End", "Coverage"]
df.columns = headers

NOTE: Make sure your header length and CSV file header length should not mismatch.

edited Aug 19, 2021 at 8:42

Mugen

9,24512 gold badges75 silver badges161 bronze badges

answered Jul 28, 2021 at 5:47

Shoaib Arif

8179 silver badges16 bronze badges

2 Comments

KansaiRobot Over a year ago

I applaud you since you are the only(?) one who actually answered the question instead of suggesting to avoid the problem in the first place

Reincoder Over a year ago

I believe there is an issue with this solution. After you read the CSV file, the first row gets declared as the columns. So, when you re-declare the columns with the headers list, you are not adding a new row rather you are replacing the first row (which was declared as a header) with the header list. So you end up removing the literal first row of data.

romulomadu · Accepted Answer · 2018-02-05 06:35:45Z

14

To fix your code you can simply change [Cov] to Cov.values, the first parameter of pd.DataFrame will become a multi-dimensional numpy array:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame(Cov.values, columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

But the smartest solution still is use pd.read_excel with header=None and names=columns_list.

answered Feb 5, 2018 at 6:35

romulomadu

6776 silver badges9 bronze badges

1 Comment

YoungSheldon Over a year ago

When we give columns_list, can we add default values for selected columns?

cottontail · Accepted Answer · 2023-09-20 21:45:17Z

When reading a file without headers, existing answers correctly say that header= parameter should be set to None, but none explain why. It's because by default, header=0, which means the first row of the file is inferred as the header. For example, the following code overwrites the first row with col_names because the first row was read as the header and it was replaced by col_names.

Note that it's assumed that the columns are separated by a space ' ' here.

col_names = ["Sequence", "Start", "End", "Coverage"]
df = pd.read_csv("path/to/file.txt", sep=' ')                   # <--- wrong
df.columns = col_names

To get the correct output, you can do one of the following two things:

set header=None:

df = pd.read_csv("path/to/file.txt", sep=' ', header=None)      # <--- OK
df.columns = col_names

or use names= parameter to assign column names in one function call:

df = pd.read_csv("path/to/file.txt", sep=' ', names=col_names)  # <--- OK

header=None way is often preferred if the number of columns is not known (because it is important that len(col_names) is equal to the number of columns inferred from the file, otherwise only the last column will be read as a column and all preceding rows will be read as index levels) or if the specific column names are not important. For example, calling add_prefix() after read_csv can add prefix to the default column names:

df = pd.read_csv("path/to/file.txt", sep=' ', header=None).add_prefix('col')

On the other hand, if the file has a header, i.e. first row in the file is meant to be read as column labels, then passing names= will push the first row as the first row in the dataframe. In that case, if you want to set the column labels during the pd.read_csv call, pass header=0.

import io
data = """
ab,bc
10,2.
"""

df = pd.read_csv(io.StringIO(data), names=['a', 'b'])           # <--- wrong
df = pd.read_csv(io.StringIO(data), names=['a', 'b'], header=0) # <--- OK

user3636989 · Accepted Answer · 2022-03-13 12:55:09Z

0

Since this is mentioned that we are reading from a csv, so the delimiter should be ','[as default, not need to mention]' and the given file has no header so header=None`

Sample Code :

import pandas as pd
data = pd.read_csv('path/to/file.txt',header=None)
data.columns = ["Sequence", "Start", "End", "Coverage"]
print(data.head()) #Print the first rows

answered Mar 13, 2022 at 12:55

user3636989

1911 silver badge10 bronze badges

Collectives™ on Stack Overflow

How to add header row to a pandas DataFrame

7 Answers 7

Comments

1 Comment

Comments

2 Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

1 Comment

Comments

2 Comments

1 Comment

Comments

Comments

Linked

Related