I have a data stored in three columns like this:
3651 3631 3913
3667 3996 4276
3674 4486 4605
3707 4706 5095
3720 5174 5326
3750 5439 5899
3755 5928 6263
3767 6437 7069
3779 7157 7232
3882 7384 7450
3886 7564 7649
3900 7762 7835
4006 7942 7987
4015 8236 8325
4026 8417 8464
4065 8571 8737
4156 6790 7069
4493 7157 7450
4541 7564 7649
4551 7762 7835
4597 7942 7987
4756 8236 8325
4776 8417 8464
where the 1st column is a specific value, 2nd column is start, and 3rd column is end. There are 825849 lines in the 1st column and 58386 in the 2nd and 3rd. I need to count values from the 1st if they are between start and end.
I know that in my file the first 12 specific values from column no.1 are between the first start and end, the next 5 are between the second start and end, and so on. I need to check whole file. I have tried with this, and it works but really slow:
coords='final_exons.txt'
snp=( $( cat $coords | awk '{print $1}') )
exon_start=( $( cat $coords | awk '{print $2}') )
exon_end=( $( cat $coords | awk '{print $3}') )
i=0
counter=0
for value in ${exon_end[@]}; do
new_val=$counter
counter=0
let "i++"
for snps in ${snp[@]}; do
if [[ $value > $snps ]]; then
#statements
let "counter++"
#$counter=$(echo "scale=2; $counter-$new_val" | bc)
else
#$new_val=$(echo "scale=2; $counter-$")
break
fi
done
#echo "NOWENOWE $new_val "
#echo "COUNTER $value : $counter "
final=$(echo "scale=2; sqrt(($counter-$new_val)^2)" | bc)
echo "Exon $i : $final SNPs"
done
Thank you in advance for any hints and tips