2

I've tried for ages and I haven't even gotten close. Using awk, how do I: replace every '*' with '-' on every line > 1 in every column, but only if the corresponding column in line 1 is not an '*'?

Example input:
a|s|d|f|g|*|A|*|*|g|c|a|*|A|*
a|*|*|f|g|*|*|*|*|g|c|a|*|A|*
*|s|*|f|g|*|a|t|*|g|c|a|*|A|*
a|s|d|*|g|*|T|*|C|g|c|a|a|A|T

Example output
a|s|d|f|g|*|A|*|*|g|c|a|*|A|*
a|-|-|f|g|*|-|*|*|g|c|a|*|A|*
-|s|-|f|g|*|a|t|*|g|c|a|*|A|*
a|s|d|-|g|*|T|*|C|g|c|a|a|A|T

2 Answers 2

4

The line of headers needs to be scanned to find all "not *".
That a column "has not" an * could be stored in an array a[].
For all next lines, only the columns that exist in a[] may need change.

That could be implemented as:

awk -F'|' 'BEGIN{OFS=FS}
           NR==1 {
                   for(i=1;i<=NF;i++) if( $i != "*" ) a[i]
                 } 
           NR>1  {
                   for(i in a)        if( $i == "*" ) $i="-"
                 } 
           1
          ' file

a|s|d|f|g|*|A|*|*|g|c|a|*|A|*
a|-|-|f|g|*|-|*|*|g|c|a|*|A|*
-|s|-|f|g|*|a|t|*|g|c|a|*|A|*
a|s|d|-|g|*|T|*|C|g|c|a|a|A|T

This implements the least amount of changes needed. It should be the fastest.

1
  • Yup, that's how you do it - nice solution! Commented Mar 27, 2020 at 13:22
4

One possible way (may not be the best)

awk -F'|' '
  BEGIN{OFS=FS} 
  NR==1 {
    for(i=1;i<=NF;i++) if($i=="*") a[i]
  } 
  {
    for(i=1;i<=NF;i++) if($i=="*" && !(i in a)) $i="-"
  } 
  1
' file
a|s|d|f|g|*|A|*|*|g|c|a|*|A|*
a|-|-|f|g|*|-|*|*|g|c|a|*|A|*
-|s|-|f|g|*|a|t|*|g|c|a|*|A|*
a|s|d|-|g|*|T|*|C|g|c|a|a|A|T
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.