0

I have a file with listed sample names

head sample_id.txt
PD26405a--PD26405b
PD26414a--PD26414d
PD26417a--PD26417b
...

I also have a directory with the outputs of preprocessing step of the program that I am running

cd ./preprocessing_out
ls
PD26405a--PD26405b_allDirichletProcessInfo.txt
PD26405a--PD26405b_alleleFrequencies.txt
PD26405a--PD26405b_loci.txt
PD26405a--PD26405b_master.txt
PD26414a--PD26414d_allDirichletProcessInfo.txt
PD26414a--PD26414d_alleleFrequencies.txt
PD26414a--PD26414d_loci.txt
PD26414a--PD26414d_master.txt
PD26417a--PD26417b_allDirichletProcessInfo.txt
PD26417a--PD26417b_alleleFrequencies.txt
PD26417a--PD26417b_loci.txt
PD26417a--PD26417b_master.txt

sample names in sample_id.txt file matched with file names in preprocessing_out directory.

I want to run my main step script which takes only *_master.txt file. master file looks like this:

cat PD26405a--PD26405b_master.txt
sample  subsample   datafile    cellularity sex cnadatafile indeldatafiles
PD26405a--PD26405b  PD26405a--PD26405b  PD26405a--PD26405b_allDirichletProcessInfo.txt  0.83    female  NA  NA

If I want to run it for the first sample only

it is simply

Rscript --vanilla --slave /projects/dpclust_pipeline.R  -r 1 -d /projects/preprocessing_out -o /projectsdp_out -i /projects/preprocessing_out/PD26405a--PD26405b_master.txt


--r is ("run_sample"), type="integer", default=NULL, help="Sample to run".
--d is the directory that preprocessing results stored
--o is the directory that final output directory
--I is the path to master.txt 

I have more than 150 samples in total and I want to run this Rscript in a bash script with for loop. r=1 refers to first sample (PD26405a--PD26405b), r =2 refers to (PD26414a--PD26414d), and so on.

How can I adjust my code?

1 Answer 1

0

To just count up with r as you get another file, you can do something like:

r=1
while read sample
do
  Rscript --vanilla --slave /projects/dpclust_pipeline.R  -r ${r} -d /projects/preprocessing_out -o /projectsdp_out -i /projects/preprocessing_out/${sample}_master.txt
  r=$(( r + 1 ))
done < path/to/sample_id.txt
18
  • thanks, but where should I put my Rscript command? Commented Jun 6, 2021 at 22:50
  • You can replace the echo command above. I wanted you to try running with echo first to test/see how it works for you. Then you can replace that line using the same variables how you want $masterfile and $r Commented Jun 6, 2021 at 22:51
  • Unfortunately this does not work the way I want Commented Jun 6, 2021 at 22:54
  • What output do you get when you run it? Commented Jun 6, 2021 at 22:55
  • r= 1 should be PD26405a--PD26405b_master.txt , r=2 should be PD26414a--PD26414d_master.txt. based on the order of samples in sample.id.txt Commented Jun 6, 2021 at 22:55

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.