1

I have several folders ,where each folder contain two files fastq.gz. Usually they are named as sample_R1.fastq.gz and sample_R2.fastq.gz. where sample_ can represent the folder name ,or something else.

But in my case the folders are :

1008_a
2085_a
2130_a
2192_a
2221_a
2242_a
2269_a
2482_a

And each of these folder consists of these files as :

1008_a Files : C85CBANXX_s6_1_O07_0452_SL137634.fastq.gz C85CBANXX_s6_2_O07_0452_SL137634.fastq.gz

2085_a :

C7V65ANXX_s6_1_M19_0413_SL131164.fastq.gz C7V65ANXX_s6_2_M19_0413_SL131164.fastq.gz

How can I rename these files to just like 1008_a_R1.fastq.gz & 1008_a_R2.fastq.gz for folder 1008_a

2085_a_R1.fastq.gz ,2085_a_R2.fastq.gz for folder 2085_a

And so on ,since all other folders have different kinds of patterns inside them.

Thanks, Ron

1
  • are all files in each folder have identical intermediate part _s6_1_ and _s6_2_ in its name? Commented Nov 7, 2017 at 7:20

1 Answer 1

1

find + bash solution:

Sample folder structure (for ex. 1080_a and 2085_a):

$ tree 1008_a/ 2085_a/
1008_a/
├── C85CBANXX_s6_1_O07_0452_SL137634.fastq.gz
└── C85CBANXX_s6_2_O07_0452_SL137634.fastq.gz
2085_a/
├── C7V65ANXX_s6_1_M19_0413_SL131164.fastq.gz
└── C7V65ANXX_s6_2_M19_0413_SL131164.fastq.gz

The job:

find . -type f -regextype posix-egrep \
-regex ".*/[0-9]{4}_a/[[:alnum:]_]+_[12]_[[:alnum:]_]+\.fastq\.gz$" -exec bash -c \
'path=${0%/*}/; bn=${0##*/}; dir_n=${0%/*}; dir_n=${dir_n##*/}; 
 new_fn=$(sed -E "s/.+_([12])_.+(\.fastq\.gz)$/${dir_n}_R\1\2/" <<<"$bn"); 
 mv "$0" "$path$new_fn"' {} \;

Results:

$ tree 1008_a/ 2085_a/
1008_a/
├── 1008_a_R1.fastq.gz
└── 1008_a_R2.fastq.gz
2085_a/
├── 2085_a_R1.fastq.gz
└── 2085_a_R2.fastq.gz

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.