2

I have used same awk file to process two different string. So wired.

cat test.awk
BEGIN{FPAT="([^,])*|(\"[^\"]+\")";OFS=","}{$4="TDP,-1,-1,0,0"OFS$4;print $0}
  1. echo "a,b,b,b,b,b,b,b,b,b,b,b,\"a,b\"" | gawk -f test.awk

    a,b,b,TDP,-1,-1,0,0,b,,b,b,b,b,b,b,b,b,"a,b"

  2. echo "a,b,,\"a,b\""|gawk -f test.awk

    a,b,,TDP,-1,-1,0,0,"a,b"

Actually,for the first one, there is ",," in the result. But I am expecting the second result in the first one.

1 Answer 1

0

[Not really an answer - but too big for a comment]

I think the behavior you're seeing is related to the first atom of your FPAT - which potentially has a zero-length match i.e. ([^,])* matches zero or more non-comma characters - but exactly how it is related eludes me at this point. Consider for example (GNU Awk 4.0.1):

$ echo "a,b,c,d,e,f,g,h,i,j,k,l,\"m,n\"" | 
  gawk '
    BEGIN{FPAT="([^,])*|(\"[^\"]+\")";OFS=","}
    {print $0; $4=$4; print $0; print NF}
  '      a,b,c,d,e,f,g,h,i,j,k,l,"m,n"
a,b,c,d,,e,f,g,h,i,j,k,l,"m,n"
14

whereas if we access the value of NF before the re-assignement

$ echo "a,b,c,d,e,f,g,h,i,j,k,l,\"m,n\"" | 
  gawk '
    BEGIN{FPAT="([^,])*|(\"[^\"]+\")";OFS=","}
    {print $0; print NF; $4=$4; print $0; print NF}
  '
a,b,c,d,e,f,g,h,i,j,k,l,"m,n"
13
a,b,c,d,e,f,g,h,i,j,k,l,"m,n"
13

Regardless, the behavior seems to be unambiguous if you change FPAT to only match non-empty sequences:

$ echo "a,b,c,d,e,f,g,h,i,j,k,l,\"m,n\"" | 
  gawk '
    BEGIN{FPAT="([^,]+)|(\"[^\"]+\")";OFS=","}
    {$4="TDP,-1,-1,0,0" OFS $4; print $0}
  '
a,b,c,TDP,-1,-1,0,0,d,e,f,g,h,i,j,k,l,"m,n"
1
  • thanks steeldrive. Actually, I can't ignore the empty column. I find a workaround for this by using NF-$number. Commented Jul 23, 2017 at 15:10

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.