Extract sections of a file into separate files

Question

I have a file of the form -

>SDF123.1 blah blah

ATCTCTGGAAACTCGGTGAAAGAGAGTAT

AGTGATGAGGATGAGTGAG...

>SBF123.1 blah blah

ATCTCTGGAAACTCGGTGAAAGAGAGTAT

AGTGATGAGGATGAGTGAG....

And I want to extract the various sections of this file into individual files (like here

I wrote the following code, but it runs too slow, as compared to when I did not have the close command in it. I had to incorporate the close command, since without it, I was getting the awk error - too many open files.

Here is the code -

cat C1_animal.fasta | awk -F ' ' '{
        if (substr($0, 1, 1)==">") {filename=(substr($1,2) ".fa")}
        print $0 >> filename; close (filename)
}'

How can I make this code more time efficient? I am new to awk.

Arnaud Valmary · Accepted Answer · 2021-09-04 17:21:30Z

2

Try to close your filename only when it's necessary:

File actg.awk

BEGIN {
    FS=" "
}
/^>/ {
    if (filename != "") {
        close(filename)
    }
    filename = substr($1,2) ".fa"
    next
}
filename != "" {
    print $0 > filename
}
END {
    close (filename)
}

With shell command:

awk -f actg.awk C1_animal.fasta

Note: if you are sure there is no line before the first "> ...", you can skip the filename != " " test

edited Sep 4, 2021 at 17:21

answered Sep 4, 2021 at 17:16

Arnaud Valmary

1363 bronze badges

\$\begingroup\$ Thank you, this code worked nicely and was quite faster. Could you explain a little how this code works? I am still trying to laern awk \$\endgroup\$

user1995
– user1995

2021-09-06 05:54:04 +00:00
Commented Sep 6, 2021 at 5:54

Add a comment |

Stack Exchange Network

Extract sections of a file into separate files

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Extract sections of a file into separate files

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions