How to remove a row if the difference between two columns is less than 2000

Question

I have a dataset that looks as follows:

chr1    HAVANA  gene    69091   70008   .   +   .   gene_id "ENSG00000186092.4"; transcript_id "ENSG00000186092.4"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "OR4F5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "OR4F5"; level 2; havana_gene "OTTHUMG00000001094.1";
chr1    ENSEMBL gene    134901  139379  .   -   .   gene_id "ENSG00000237683.5"; transcript_id "ENSG00000237683.5"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "AL627309.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "AL627309.1"; level 3;
chr1    HAVANA  gene    367640  368634  .   +   .   gene_id "ENSG00000235249.1"; transcript_id "ENSG00000235249.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "OR4F29"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "OR4F29"; level 2; havana_gene "OTTHUMG00000002860.1";
chr1    HAVANA  gene    621059  622053  .   -   .   gene_id "ENSG00000185097.2"; transcript_id "ENSG00000185097.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "OR4F16"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "OR4F16"; level 2; havana_gene "OTTHUMG00000002581.1";
chr1    ENSEMBL gene    738532  739137  .   -   .   gene_id "ENSG00000269831.1"; transcript_id "ENSG00000269831.1"; gene_type "protein_coding"; gene_status "NOVEL"; gene_name "AL669831.1"; transcript_type "protein_coding"; transcript_status "NOVEL"; transcript_name "AL669831.1"; level 3;

I'd like to remove genes where the difference between $5 and $4 is less than 2000 using awk if it's possible. Though sed is acceptable as well.

So it returns the following:

 chr1   ENSEMBL gene    134901  139379  .   -   .   gene_id "ENSG00000237683.5"; transcript_id "ENSG00000237683.5"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "AL627309.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "AL627309.1"; level 3;

Thank you.

column 5 is always larger?

Jeff Schaller
– Jeff Schaller ♦

2016-01-14 19:30:01 +00:00
Commented Jan 14, 2016 at 19:30 — Jeff Schaller
– Jeff Schaller ♦, Commented Jan 14, 2016 at 19:30
Yes column 5 is always larger.

System
– System

2016-01-14 19:32:11 +00:00
Commented Jan 14, 2016 at 19:32 — System
– System, Commented Jan 14, 2016 at 19:32

Dani_l · Accepted Answer · 2016-01-14 19:32:23Z

3

awk '$5-$4 >= 2000' file

if $5 always larger than $4

answered Jan 14, 2016 at 19:32

Dani_l

5,1972 gold badges21 silver badges34 bronze badges

Add a comment |

Stack Exchange Network

How to remove a row if the difference between two columns is less than 2000

1 Answer 1

You must log in to answer this question.

Hot Network Questions

How to remove a row if the difference between two columns is less than 2000

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions