Skip to main content
deleted 68 characters in body
Source Link
amisax
  • 3.1k
  • 20
  • 23

The percentage calculation can be reduced to a single operation like this

 echo "${even##}" | awk '{x=gsub(/[ACT]/,""); y=gsub(/G/,""); printf "GC_CONT : %.2f%%\b", (y*100)/(x+y) }'

gsub substitutes a pattern and return the count of substitutions it has made. So that can be used to quickly calculate the percentage.

You could also process the odd and even lines in awk. It is not clear what you are doing with odd lines but your complete function can be put in a single awk -

awk -F awk'_' -v Y="$Y" '{ if(NR%2==1) {
             gsub(/cov\./,"",$3);
          printf "%s %s printf%s "%s%s %s\nnucleotidic_cov : %.4f\n",$1,$2,($3,$4,$5, ($6 / Y)
           } else {
               x=gsub(/[AT]/,""); 
               y=gsub(/[GC]/,""); 
               printf "GC_CONT : %.2f%%\n", (y*100)/(x+y)
           }
        }' large_file

EDIT : Based on OP's requirement changed the if block for odd lines. The gsub would remove the "cov." from the number. After passing the shell variable $Y to awk , we can now divide and print in the required format.

Using a single awk script instead of multiple operations will significantly speed the operation up.

The percentage calculation can be reduced to a single operation like this

 echo "${even##}" | awk '{x=gsub(/[ACT]/,""); y=gsub(/G/,""); printf "GC_CONT : %.2f%%\b", (y*100)/(x+y) }'

gsub substitutes a pattern and return the count of substitutions it has made. So that can be used to quickly calculate the percentage.

You could also process the odd and even lines in awk. It is not clear what you are doing with odd lines but your complete function can be put in a single awk -

  awk -v Y="$Y" '{ if(NR%2==1) {
             gsub(/cov\./,"",$3);
             printf "%s %s\nnucleotidic_cov : %.4f\n",$1,$2,($3 / Y)
           } else {
               x=gsub(/[AT]/,""); 
               y=gsub(/[GC]/,""); 
               printf "GC_CONT : %.2f%%\n", (y*100)/(x+y)
           }
        }' large_file

EDIT : Based on OP's requirement changed the if block for odd lines. The gsub would remove the "cov." from the number. After passing the shell variable $Y to awk , we can now divide and print in the required format.

Using a single awk script instead of multiple operations will significantly speed the operation up.

The percentage calculation can be reduced to a single operation like this

 echo "${even##}" | awk '{x=gsub(/[ACT]/,""); y=gsub(/G/,""); printf "GC_CONT : %.2f%%\b", (y*100)/(x+y) }'

gsub substitutes a pattern and return the count of substitutions it has made. So that can be used to quickly calculate the percentage.

You could also process the odd and even lines in awk. It is not clear what you are doing with odd lines but your complete function can be put in a single awk -

awk -F '_' -v Y="$Y" '{ if(NR%2==1) {
    printf "%s %s %s %s %s\nnucleotidic_cov : %.4f\n",$1,$2,$3,$4,$5, ($6 / Y)
} else {
    x=gsub(/[AT]/,""); 
    y=gsub(/[GC]/,""); 
    printf "GC_CONT : %.2f%%\n", (y*100)/(x+y)
    }
 }' large_file

EDIT : Based on OP's requirement changed the if block for odd lines. The gsub would remove the "cov." from the number. After passing the shell variable $Y to awk , we can now divide and print in the required format.

Using a single awk script instead of multiple operations will significantly speed the operation up.

improved answer based on op's clarification
Source Link
amisax
  • 3.1k
  • 20
  • 23

The percentage calculation can be reduced to a single operation like this

 echo "${even##}" | awk '{x=gsub(/[ACT]/,""); y=gsub(/G/,""); printf "GC_CONT : %.2f%%\b", (y*100)/(x+y) }'

gsub substitutes a pattern and return the count of substitutions it has made. So that can be used to quickly calculate the percentage.

You could also process the odd and even lines in awk. It is not clear what you are doing with odd lines but your complete function can be put in a single awk -

  awk -v Y="$Y" '{ if(NR%2==1) {
             gsub(/cov\./,"nucleotidic_cov : """,$3);
             printf "%s %s\n%s\n"%s\nnucleotidic_cov : %.4f\n",$1,$2,($3 / Y)
           } else {
               x=gsub(/[AT]/,""); 
               y=gsub(/[GC]/,""); 
               printf "GC_CONT : %.2f%%\n", (y*100)/(x+y)
           }
        }' large_file

EDIT : Based on OP's requirement changed the if block for odd lines. The gsub would remove the "cov." from the number. After passing the shell variable $Y to awk , we can now divide and print in the required format.

Using a single awk script instead of multiple operations will significantly speed the operation up.

The percentage calculation can be reduced to a single operation like this

 echo "${even##}" | awk '{x=gsub(/[ACT]/,""); y=gsub(/G/,""); printf "GC_CONT : %.2f%%\b", (y*100)/(x+y) }'

gsub substitutes a pattern and return the count of substitutions it has made. So that can be used to quickly calculate the percentage.

You could also process the odd and even lines in awk. It is not clear what you are doing with odd lines but your complete function can be put in a single awk -

awk '{ if(NR%2==1) {
         gsub(/cov\./,"nucleotidic_cov : ",$3);
         printf "%s %s\n%s\n",$1,$2,$3
       } else {
           x=gsub(/[AT]/,""); 
           y=gsub(/[GC]/,""); 
           printf "GC_CONT : %.2f%%\n", (y*100)/(x+y)
       }
    }' large_file

The percentage calculation can be reduced to a single operation like this

 echo "${even##}" | awk '{x=gsub(/[ACT]/,""); y=gsub(/G/,""); printf "GC_CONT : %.2f%%\b", (y*100)/(x+y) }'

gsub substitutes a pattern and return the count of substitutions it has made. So that can be used to quickly calculate the percentage.

You could also process the odd and even lines in awk. It is not clear what you are doing with odd lines but your complete function can be put in a single awk -

  awk -v Y="$Y" '{ if(NR%2==1) {
             gsub(/cov\./,"",$3);
             printf "%s %s\nnucleotidic_cov : %.4f\n",$1,$2,($3 / Y)
           } else {
               x=gsub(/[AT]/,""); 
               y=gsub(/[GC]/,""); 
               printf "GC_CONT : %.2f%%\n", (y*100)/(x+y)
           }
        }' large_file

EDIT : Based on OP's requirement changed the if block for odd lines. The gsub would remove the "cov." from the number. After passing the shell variable $Y to awk , we can now divide and print in the required format.

Using a single awk script instead of multiple operations will significantly speed the operation up.

based on OP's input
Source Link
amisax
  • 3.1k
  • 20
  • 23

The percentage calculation can be reduced to a single operation like this

 echo "${even##}" | awk '{x=gsub(/[ACT]/,""); y=gsub(/G/,""); printf "GC_CONT : %.2f%%\b", (y*100)/(x+y) }'

gsub substitutes a pattern and return the count of substitutions it has made. So that can be used to quickly calculate the percentage.

You could also process the odd and even lines in awk. It is not clear what you are doing with odd lines but your complete function can be put in a single awk -

awk '{ if(NR%2==1) {
         gsub(/cov\./,"nucleotidic_cov print: ",$3);
         printf "%s %s\n%s\n",$1,$2,$3
       } else {
           x=gsub(/[AB][AT]/,""); 
           y=gsub(/C[GC]/,""); 
           printf "GC_CONT : %.2f%%\n", (y*100)/(x+y)
       }
    }' large_file

The percentage calculation can be reduced to a single operation like this

 echo "${even##}" | awk '{x=gsub(/[ACT]/,""); y=gsub(/G/,""); printf "GC_CONT : %.2f%%\b", (y*100)/(x+y) }'

gsub substitutes a pattern and return the count of substitutions it has made. So that can be used to quickly calculate the percentage.

You could also process the odd and even lines in awk. It is not clear what you are doing with odd lines but your complete function can be put in a single awk -

awk '{ if(NR%2==1) {
          print $1
       } else {
           x=gsub(/[AB]/,""); 
           y=gsub(/C/,""); 
           printf "GC_CONT : %.2f%%\n", (y*100)/(x+y)
       }
    }' large_file

The percentage calculation can be reduced to a single operation like this

 echo "${even##}" | awk '{x=gsub(/[ACT]/,""); y=gsub(/G/,""); printf "GC_CONT : %.2f%%\b", (y*100)/(x+y) }'

gsub substitutes a pattern and return the count of substitutions it has made. So that can be used to quickly calculate the percentage.

You could also process the odd and even lines in awk. It is not clear what you are doing with odd lines but your complete function can be put in a single awk -

awk '{ if(NR%2==1) {
         gsub(/cov\./,"nucleotidic_cov : ",$3);
         printf "%s %s\n%s\n",$1,$2,$3
       } else {
           x=gsub(/[AT]/,""); 
           y=gsub(/[GC]/,""); 
           printf "GC_CONT : %.2f%%\n", (y*100)/(x+y)
       }
    }' large_file
Source Link
amisax
  • 3.1k
  • 20
  • 23
Loading