Python extract columns with repetitive headers with pandas

Question

I have a csv file with 900000 rows and 30 columns. The header is in the first row: "Probe Set ID","dbSNP RS ID","Chromosome","Physical Position", etc...

I want to extract only certain columns using pandas.

Now my problem is that the header repeats itself every 50 rows or so, so when I extract the columns I get only the first 50 rows. How can get the complete columns while skipping all the headers but the first one?

This is the code I have so far, but works nicely only until the second header:

import pandas
data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID', 'Physical         Position'])

import sys  
sys.stdout = open("data2.csv", "w") 
print data

This is an example representing some rows of the extracted columns:

      dbSNP RS ID       Physical Position
0        rs4147951          66943738
1        rs2022235          14326088
2        rs6425720          31709555
3       rs12997193         106584554
4        rs9933410          82323721
...
48       rs5771794          49157118
49       rs1061497           1415331
50      rs12647065         136012580

dbSNP RS ID             Physical Position
...
dbSNP RS ID             Physical Position
...
and so on...

Thanks very much in advance !

Stefan · Accepted Answer · 2015-12-04 18:35:57Z

2

You could read the file with header=None, drop the duplicate rows (which keeps the first per default), and then set the remaining first row as header like so:

df = read_csv(path, header=None).drop_duplicates()
df.columns = df.iloc[0]
df = df.iloc[1:]

answered Dec 4, 2015 at 18:35

Stefan

43.1k13 gold badges80 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Lucas Over a year ago

Thanks very much for your help, @Stefan Jansen !

Stefan Over a year ago

Glad to hear my answer was useful.

Collectives™ on Stack Overflow

Python extract columns with repetitive headers with pandas

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related