I have thousands of files that look like this:
org_files:
reference_group1 _CEFNB_
group1 ACBF_BG
group2 ACB_MBM
...
For each file, I need to convert rows to columns and then add a column (called id_from_reference_group) consisting of the index of reference_group just like this:
Converted file
# explanation of each column
# reference_group_id serials_from_ref_group group_id serials_from_group
reference_group1 _ group1 A
reference_group1 group1 C
reference_group1 E group1 B
reference_group1 F group1 F
reference_group1 N group1 _
reference_group1 B group1 B
reference_group1 _ group1 G
reference_group1 _ group2 A
reference_group1 C group2 C
reference_group1 E group2 B
reference_group1 F group2 _
reference_group1 N group2 M
reference_group1 B group2 B
reference_group1 _ group2 M
The content of the 2nd column in org_files of each group consists of duplicated letters. and the 2nd column always have the same length.
I tried
input="reference_group1 _CEFNB_
group1 ACBF_BG
group2 ACB_MBM"
while IFS=" " read -ra line; do # read input line by line
# loop over fields
for (( i = 0 ; i < ${#line[@]}; i++ )); do
# only split 2nd field
if [[ $i == 1 ]]
then
for j in ${line[$i]}
do
# loopover each letter of 2nd field
for (( j=0; j<${#line[$i]}; j++ ))
do
echo "${line[$i-1]} ${line[$i]:$j:1}"
done
done
fi
done
done <<< "$input"
But I only got the result like this
reference_group1 _
...
group1 A
...
group2 M
And the code is a little bit messy. It would be better if there are simple commands. Thanks!