bash convert rows to columns

Question

I have thousands of files that look like this:

org_files:

reference_group1 _CEFNB_
group1          ACBF_BG
group2          ACB_MBM
...

For each file, I need to convert rows to columns and then add a column (called id_from_reference_group) consisting of the index of reference_group just like this:

Converted file


# explanation of each column
# reference_group_id    serials_from_ref_group  group_id    serials_from_group
reference_group1            _                group1            A
reference_group1                             group1            C
reference_group1            E                group1            B
reference_group1            F                group1            F
reference_group1            N                group1            _
reference_group1            B                group1            B
reference_group1            _                group1            G
reference_group1            _                group2            A
reference_group1            C                group2            C
reference_group1            E                group2            B
reference_group1            F                group2            _
reference_group1            N                group2            M
reference_group1            B                group2            B
reference_group1            _                group2            M

The content of the 2nd column in org_files of each group consists of duplicated letters. and the 2nd column always have the same length.

I tried

input="reference_group1 _CEFNB_
group1          ACBF_BG
group2          ACB_MBM"

while IFS=" " read -ra line; do # read input line by line
# loop over fields
  for (( i = 0 ; i < ${#line[@]}; i++ )); do
    # only split 2nd field
    if [[ $i == 1 ]]
    then
      for j in ${line[$i]}
      do
        # loopover each letter of 2nd field
        for (( j=0; j<${#line[$i]}; j++ ))
        do
          echo "${line[$i-1]}  ${line[$i]:$j:1}"
        done
      done
    fi

  done
done <<< "$input"

But I only got the result like this

reference_group1  _
...
group1  A
...
group2  M

And the code is a little bit messy. It would be better if there are simple commands. Thanks!

so, what have you tried, what's your approach? We're not a free code writing service. — Marcus Müller
– Marcus Müller, Commented Oct 25, 2022 at 13:03

Marius_Couet · Accepted Answer · 2022-10-25 14:36:29Z

You can use awk using a script like that (tst.awk):

BEGIN{print "#reference_group_id serials_from_ref_group group_id serials_from_group"}
$1 ~ /^reference_/ {ref=$1;ser=$2;next}
{
        for(i=1;i<=length($2);i++){
                print ref, substr(ser,i,1), $1, substr($2,i,1)
        }
}

I supposed that your reference_group_id is always begining with reference_ to store it to a var called ref, and store the serials_from_ref_group to ser. We then use both these var in a loop.

Then a line like that should work :

awk -f tst.awk file

As your output is formatted by column you can pipe the output to column -t

awk -f tst.awk file | column -t

Explanation for awk script :

BEGIN is executed once only, before the first input record
$1 ~ /^reference_/ if $1 matches the regular expression ^reference_
length($2) length of the second field
substr(ser,i,1) substring of ser starting at the i position and length of 1

Stack Exchange Network

bash convert rows to columns

1 Answer 1

You must log in to answer this question.

Hot Network Questions

bash convert rows to columns

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions