0

I am looping through multiple input files for two programs: bamtofastq-1.3.2 and cellranger count The output from bamtofastq-1.3.2 makes a folder ($FILE.folder) and a subfolder in which the files I need for the cellranger count are stored. I am linking to this folder using wildcard. However, the path to the files are not recognised. Any idea if the wildcards are wrong?

#!/bin/bash
mapfile -s 1 -t files <  files.txt 
echo "${files[@]}"
for FILE in ${files[@]}; do

    bamtofastq-1.3.2 --nthreads 40 $FILE $FILE.folder
    cellranger count --id=sample_$FILE \
    --transcriptome=refdata-gex-GRCh38-2020-A \
    --fastqs=$FILE.folder/*/;
done

files.txt

scRNA_sorted_25183_Relapse_1_bam
scRNA_sorted_27522_Primary_bam

Error:

error: Invalid value for '--fastqs <PATH>...': No such file or directory: 'scRNA_sorted_25183_Relapse_1_bam.folder/*/'
2
  • The error message means that there are no non-hidden entries in the directory scRNA_sorted_25183_Relapse_1_bam.folder. BTW, you better use quotes: for FILE in "${files[@]}"; do, just in case that one day, there are spaces in the file names. Commented Feb 24, 2022 at 10:30
  • For correct handling of spaces, quotes need to be added for all variables, i.e. not only for ${files[@]}, but also for all mentions of $FILE. Commented Feb 24, 2022 at 11:27

1 Answer 1

1

The problem seems to be that the wildcard in

--fastqs=$FILE.folder/*/

won't be expanded by the shell, i.e. cellranger receives the argument with the * as is.

Try it without the =, i.e.:

cellranger count --id=sample_$FILE \
--transcriptome=refdata-gex-GRCh38-2020-A \
--fastqs $FILE.folder/*/

Now, cellranger should receive the argument with the * expanded to the subfolder. I'm assuming here that there is only one subfolder - for multiple subfolders, additional code would be required as --fastqs expects a comma-separated list in that case.


As a side note, to handle files with spaces correctly, you would need to add a few quotes (this is generally a good idea when writing Bash scripts):

#!/bin/bash
mapfile -s 1 -t files < files.txt
echo "${files[@]}"
for FILE in "${files[@]}"; do
    bamtofastq-1.3.2 --nthreads 40 "$FILE" "$FILE.folder"
    cellranger count --id="sample_$FILE" \
    --transcriptome=refdata-gex-GRCh38-2020-A \
    --fastqs "$FILE.folder"/*/
done
Sign up to request clarification or add additional context in comments.

4 Comments

I understand your point, but shouldn't be the error message in this case complain about a missing --fastqs=scRNA_sorted_25183_Relapse_1_bam.folder/*/?
No. scRNA_sorted_25183_Relapse_1_bam.folder/*/ is being passed as the parameter for --fastqs to cellranger. But the * won't get expanded by the shell when using the --param=arg syntax.
In your solution, the expansion happens by the shell. Assuming that we have under folder two entries, A and B. This would lead to the command line ... --fastqs scRNA_sorted_25183_Relapse_1_bam.folder/A/ scRNA_sorted_25183_Relapse_1_bam.folder/B/ . Wouldn't the appearance of the extra argument be a problem for cellranger?
Yes, I'm assuming here that cellranger is unable to handle wildcards by itself (its documentation does not mention wildcards), thus the shell needs to perform the expansion. Also, the OP states there is a subfolder, not multiple ones, thus this should work fine. Multiple folders would pose a problem as cellranger expects a comma-separated list in that case.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.