I have a csv file with 900000 rows and 30 columns. The header is in the first row: "Probe Set ID","dbSNP RS ID","Chromosome","Physical Position", etc...
I want to extract only certain columns using pandas.
Now my problem is that the header repeats itself every 50 rows or so, so when I extract the columns I get only the first 50 rows. How can get the complete columns while skipping all the headers but the first one?
This is the code I have so far, but works nicely only until the second header:
import pandas
data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID', 'Physical         Position'])
import sys  
sys.stdout = open("data2.csv", "w") 
print data
This is an example representing some rows of the extracted columns:
      dbSNP RS ID       Physical Position
0        rs4147951          66943738
1        rs2022235          14326088
2        rs6425720          31709555
3       rs12997193         106584554
4        rs9933410          82323721
...
48       rs5771794          49157118
49       rs1061497           1415331
50      rs12647065         136012580
dbSNP RS ID             Physical Position
...
dbSNP RS ID             Physical Position
...
and so on...
Thanks very much in advance !