awk: replace a character on a given line and column if it doesn't match a character on the same given column in the first line

Question

I've tried for ages and I haven't even gotten close. Using awk, how do I: replace every '*' with '-' on every line > 1 in every column, but only if the corresponding column in line 1 is not an '*'?

Example input:
a|s|d|f|g|*|A|*|*|g|c|a|*|A|*
a|*|*|f|g|*|*|*|*|g|c|a|*|A|*
*|s|*|f|g|*|a|t|*|g|c|a|*|A|*
a|s|d|*|g|*|T|*|C|g|c|a|a|A|T

Example output
a|s|d|f|g|*|A|*|*|g|c|a|*|A|*
a|-|-|f|g|*|-|*|*|g|c|a|*|A|*
-|s|-|f|g|*|a|t|*|g|c|a|*|A|*
a|s|d|-|g|*|T|*|C|g|c|a|a|A|T

score 4 · Accepted Answer · 2020-03-27 06:03:28Z

4

The line of headers needs to be scanned to find all "not *".
That a column "has not" an * could be stored in an array a[].
For all next lines, only the columns that exist in a[] may need change.

That could be implemented as:

awk -F'|' 'BEGIN{OFS=FS}
           NR==1 {
                   for(i=1;i<=NF;i++) if( $i != "*" ) a[i]
                 } 
           NR>1  {
                   for(i in a)        if( $i == "*" ) $i="-"
                 } 
           1
          ' file

a|s|d|f|g|*|A|*|*|g|c|a|*|A|*
a|-|-|f|g|*|-|*|*|g|c|a|*|A|*
-|s|-|f|g|*|a|t|*|g|c|a|*|A|*
a|s|d|-|g|*|T|*|C|g|c|a|a|A|T

This implements the least amount of changes needed. It should be the fastest.

edited Mar 27, 2020 at 6:03

answered Mar 27, 2020 at 5:57

user232326

Yup, that's how you do it - nice solution!

Ed Morton
– Ed Morton

2020-03-27 13:22:09 +00:00
Commented Mar 27, 2020 at 13:22

Add a comment |

steeldriver · Accepted Answer · 2020-03-27 01:07:10Z

4

One possible way (may not be the best)

awk -F'|' '
  BEGIN{OFS=FS} 
  NR==1 {
    for(i=1;i<=NF;i++) if($i=="*") a[i]
  } 
  {
    for(i=1;i<=NF;i++) if($i=="*" && !(i in a)) $i="-"
  } 
  1
' file
a|s|d|f|g|*|A|*|*|g|c|a|*|A|*
a|-|-|f|g|*|-|*|*|g|c|a|*|A|*
-|s|-|f|g|*|a|t|*|g|c|a|*|A|*
a|s|d|-|g|*|T|*|C|g|c|a|a|A|T

answered Mar 27, 2020 at 1:07

steeldriver

83.9k12 gold badges124 silver badges175 bronze badges

Add a comment |

Stack Exchange Network

awk: replace a character on a given line and column if it doesn't match a character on the same given column in the first line

2 Answers 2

You must log in to answer this question.

Hot Network Questions

awk: replace a character on a given line and column if it doesn't match a character on the same given column in the first line

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions