1

i have a text file like the following small example:

small example:

0,1,2,3,4,5,6
chr1,144566,144597,30,chr1,120000,210000
chr1,154214,154245,34,chr1,120000,210000
chr1,228904,228935,11,chr1,210000,240000
chr1,233265,233297,13,chr1,210000,240000
chr1,233266,233297,58,chr1,210000,240000
chr1,235438,235469,36,chr1,210000,240000
chr1,262362,262393,16,chr1,240000,610000
chr1,347253,347284,12,chr1,240000,610000
chr1,387022,387053,38,chr1,240000,610000

I want to remove the first line and instead of comma separated, make a tab separated file. like the expected output:

expected output:

chr1    144566  144597  30  chr1    120000  210000
chr1    154214  154245  34  chr1    120000  210000
chr1    228904  228935  11  chr1    210000  240000
chr1    233265  233297  13  chr1    210000  240000
chr1    233266  233297  58  chr1    210000  240000
chr1    235438  235469  36  chr1    210000  240000
chr1    262362  262393  16  chr1    240000  610000
chr1    347253  347284  12  chr1    240000  610000
chr1    387022  387053  38  chr1    240000  610000

I am trying to do that in python using pandas. I wrote this code but does not return what I want. do you how to fix it?

import pandas
file = open('myfile.txt', 'rb')
new =[]
for line in file:
    new.append(line.split(','))
    df = pd.DataFrame(new)
    df.to_csv('outfile.txt', index=False)

2 Answers 2

3
import pandas as pd    
df = pd.read_csv('myfile.txt', header=0)
df.to_csv('outfile.txt', sep='\t', index=None, header=False)
Sign up to request clarification or add additional context in comments.

2 Comments

Looks like you'll want to throw a header=False in there as well.
^ Thanks. Didn't test it. Updated and tested and now works as expected.
1

Depending on how big your file is, avoiding Pandas and using base Python I/O could be a much more efficient idea. That way you don't have to read the whole file into memory, instead read line-by-line and dump into a new file with tab separations:

with open("myfile.txt", "r") as r:
    with open("myfile2.txt", "w") as w:
        for line in r:
            w.write("\t".join(line.split(',')))

myfile2.txt is now the tab-separated version of myfile.txt.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.