2

I have to extract the values which are greater than or equal to 0.01 from column number 6 of tab-delimited file(My files contain more than 6 columns). I had tried with following code

for i in $(find ./ `pwd` -name "BC_4_*_*shift.txt" ); do
    awk -F"\t" 'NR==1 || $6>=0.01' $i > $i"_"ctdna_freq.txt;
done

to write this code I had taken help from get all rows having a column value greater than a threshold, Using this code I am able extract values from 6th column which are greater than 0.01 but I am not able to extract the values which are equal to 0.01 following is my input file

chr     pos         ref var p.val       freq.var
chr19   9074573     A   C   6.73E-22    0.586593469
chr19   9091288     G   T   5.96E-188   0.508732726
chr8    124518636   C   T   9.55E-21    0.00005
chr12   56490398    G   T   0.005271732 0.010003218
chr12   56477619    G   A   1.40E-15    0.010001069
chr12   56477619    G   A   1.40E-15    0.010001069
chr3    52677261    C   T   5.13E-06    0.01
chr5    67591010    A   G   4.82E-23    0.01

Expected output

chr     pos         ref var p.val       freq.var
chr19   9074573     A   C   6.73E-22    0.586593469
chr19   9091288     G   T   5.96E-188   0.508732726
chr12   56490398    G   T   0.005271732 0.010003218
chr12   56477619    G   A   1.40E-15    0.010001069
chr12   56477619    G   A   1.40E-15    0.010001069
chr3    52677261    C   T   5.13E-06    0.01
chr5    67591010    A   G   4.82E-23    0.01
2
  • cannot reproduce, the awk works fine for me, outputting lines with 0.01, maybe break it down (remove for loop etc., so try to reproduce the issue with a minimal script, sth like awk 'NR==1 || $6>=0.01' file Commented May 13, 2020 at 10:25
  • What awk version are you using? On GNU awk 5.0.1 it seems to work as you would expect. I suspect it might be a floating point precision error, try with awk 'NR==1 || $6>=0.0099' Commented May 13, 2020 at 18:41

3 Answers 3

3
#!/usr/bin/env bash
while IFS= read -r i; do
    awk -F'\t' 'NR==1 || $6>=0.01' "$i" > "${i}_ctdna_freq.txt"
done < <(find . -name 'BC_4_*_*shift.txt')

or:

#!/usr/bin/env bash
find . -name 'BC_4_*_*shift.txt' |
xargs -n 1 -I {} awk -F'\t' 'NR==1 || $6>=0.01' "{}" > "{}_ctdna_freq.txt"

Don't do for i in ..., see https://mywiki.wooledge.org/BashFAQ/001, and do always quote your variables, see https://mywiki.wooledge.org/Quotes. Run all your shell scripts through http://shellcheck.net until you're familiar with the fundamentals.

3
  • For your second example, may I recommend adding the -print0 option to the find command and the -0 option to the xargs command. I know that it is not necessary in the OPs use case, but others looking for a similar problem may be dealing with "not-so-well-behaved" filenames where this might be an issue ... Commented May 14, 2020 at 15:19
  • @AdminBee I thought about it but then it's GNU-specific for both tools and like you say, it's not necessary in the OPs case. Commented May 14, 2020 at 15:23
  • 1
    Ok, valid point ... Commented May 14, 2020 at 15:28
0

I put your data into a file known as data1.txt

Similarly, I made manual modifications and made many files.

This code does what you want in all. But it outputs to a single file.

find . -name "data*.txt" -type f -exec awk 'NR==1 || $6>=0.01' {} + >>output.txt

0

command

awk '$6 >= 0.01' file.txt

output

chr     pos         ref var p.val       freq.var
chr19   9074573     A   C   6.73E-22    0.586593469
chr19   9091288     G   T   5.96E-188   0.508732726
chr12   56490398    G   T   0.005271732 0.010003218
chr12   56477619    G   A   1.40E-15    0.010001069
chr12   56477619    G   A   1.40E-15    0.010001069
chr3    52677261    C   T   5.13E-06    0.01
chr5    67591010    A   G   4.82E-23    0.01

Python

#!/usr/bin/python

k=open('file.txt','r')
k.readline()
print ("chr     pos         ref var p.val       freq.var")
for i in k:
    q=i.split(' ')[-1]
    if (float(q) >= 0.01):
        print (i.strip())



output

chr     pos         ref var p.val       freq.var
chr19   9074573     A   C   6.73E-22    0.586593469
chr19   9091288     G   T   5.96E-188   0.508732726
chr12   56490398    G   T   0.005271732 0.010003218
chr12   56477619    G   A   1.40E-15    0.010001069
chr12   56477619    G   A   1.40E-15    0.010001069
chr3    52677261    C   T   5.13E-06    0.01
chr5    67591010    A   G   4.82E-23    0.01

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.