0

I am writing script to run a software. I am trying to add function in while loops to trim text in variable, so that it can be applied as variable in other part of the command. But what should be the correct way to add the function?

This is a working code when only one --msa file is run.

while read -r i; do 
    raxml-ng --msa ../C049.laln_1l --model $i --prefix C049-rT; 
done < ../C049.model

For a very brief introduction, raxml-ng is the software I use, the parameters to run the software is set up by each of these --msa, --model, and --prefix files. For every --msa file, they have their corresponding --model and --prefix. I named them the same to ease the scripting. For eg., C049.laln_1l need to be matched with C049.model and C049-rT.

As in the example above, I can loop the command if I have other --msa files with the same extension like this:

while read -r i; do 
    while read -r j; do
        raxml-ng --msa ../$i.laln_1l --model $j --prefix $i-rT; 
    done < ../$i.model
done < msalist

Now I have a list of --msa files (listed in msalist) to run and some of them with different file extension.

The msalist file contains:

C049.laln_1l
C092.laln_1l
C016.laln_1l
gc30_part.cseq
gc3f.glist.cseq...

I named the model and prefix using only the text before the first . .

Eg. list for model parameter:

C049.model
C092.model
C016.model
gc30_part.model
gc3f.model...

It is the same case for prefix parameter.

So when writing the bash script to loop for all the --msa files in msalist, I tried do "$( sed 's/\..*//g' "$i" )".model to get C049.model instead of C049.laln_1l.model. But it doesn't seem to work.

trees=$2
threads=$3

while read -r i; do
        while read -r j; do
                raxml-ng --msa ../"$i" --model "$j" --prefix "$( sed 's/\..*//g' "$i" )"-rT;
        done < ../"$( sed 's/\..*//g' "$i" )".model;
done < "$alnlist"

How to trim the text in msalist in order to be read by --model and --prefix?

1 Answer 1

1

To get the part before the first . in any POSIX shells, you can just do ${var%%.*}. So here:

while IFS= read -r i; do
  prefix=${i%%.*}
  while IFS= read -r j; do
    raxml-ng --msa ../"$i" --model "$j" --prefix "$prefix-rT";
  done < "../$prefix.model";
done < "$alnlist

Also note the syntax to read a line is IFS= read -r line, not read -r line.

Here, you could also do:

while IFS=. read -r prefix rest; do
  while IFS= read -r j; do
    raxml-ng --msa ../"$prefix.$rest" --model "$j" --prefix "$prefix-rT";
  done < "../$prefix.model";
done < "$alnlist

If you wanted to use sed to remove everything starting with the first ., first note that sed 's/\..*//' removes . followed by any number of characters from every line of its input, not the input as a whole, and you'd need to pass the content of $i as input, not as argument. sed treats its arguments as file names to read the input from, so:

printf '%s\n' "$i" | sed 's/\..*//'

For instance. Though to remove everything starting with the first . in the whole input, that would rather have to be:

printf '%s\n' "$i" | sed '
  :1
  $!{
    # except on the line line, append the next line to the
    # pattern space and loop
    N
    b1
  }
  s/\..*//'
1
  • Thanks for the help! I have been looking into this for the whole morning! It is working now! Btw, do you mind to elaborate on {i%%.*}? What does the i%% referring to? Commented Sep 30, 2022 at 16:11

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.