Replace string with a new value

Question

I'm trying to manipulate a huge file (+5.000.000 records), so I can replace the value of the 8th column for example.

If $8 = 1 replace it with success
if $8 = 2 replace it with check
if $8 = null replace with undefined

Here's a piece of the data which is separated by a , character:

"APPLICATION_ID","ORIGIN_ID","SERVICE_ID","PROVIDER_ID","RATING_ID","ATO","DATE","USER_TYPE","ESTATUS","OPERATION_ID"

"3","2","424","5020","1058","3017292917","30/11/2016 01:14:25 a.m.","1","2004","14804862360104011458"

The field I want to replace is USER_TYPE located at $8

I tried this but it doesn't replace the values:

awk '{if($8 = 1) print $1, $2, $3, $4, $5, $6, $7, "success", $9, $10}' input_file

How can I get this done?

as below, a single = assigns the value on the RHS to the name on the LHS. To test for equality, use ==, so if($8==1).... You also need to tell awk to split fields on the , char, with either awk -F, '{...}' file` OR awk 'BEGIN{FS=","}{....}' file. What happens when there are ,s inside the dbl-quoted data? Kaboom! ... So much better to use a <tab> char to separate fields in your data file (or maybe | char). Good luck. — shellter
– shellter, Commented Feb 13, 2017 at 20:43
When you need to deal with csv, awk is definitely not the good way to go, since it doesn't handle cases where the value contains the delimiter. You should use a tool designed to deal with csv, for example csvtool. — Casimir et Hippolyte
– Casimir et Hippolyte, Commented Feb 13, 2017 at 21:02

RavinderSingh13 · Accepted Answer · 2017-02-13 21:18:48Z

@sandatomo: Try(untested):

awk -F, -vs1="\"" 'NR>1{gsub(/\"/,"",$8);if($8==1){sub(/.*/,s1 "success" s1,$8)};if($8==2){sub(/.*/,s1 "check" s1,$8)};if($8=="null"){sub(/.*/,s1 "undefined" s1,$8)};print}' OFS=, Input_file

EDIT: Adding a non-one liner form of solution too now.

awk -F, -vs1="\"" 'NR>1{
                                gsub(/\"/,"",$8);
                                if($8==1){
                                                sub(/.*/,s1 "success" s1,$8)
                                         };
                                if($8==2){
                                                sub(/.*/,s1 "check" s1,$8)
                                         };
                                if($8=="null"){
                                                sub(/.*/,s1 "undefined" s1,$8)
                                              };
                                print
                       }
                  ' OFS=,  Input_file

EDIT2: I tested my code previous code and it was not having field separator as "," so edited it now.

EDIT3: Explanation of above.

awk  -F, -vs1="\"" 'NR>1{                                  ##### Setting Field separator as comma(,). Creating a variable named s1 whose value is a quote("). Then Checking here if current line number is greater than 1.
                                                           ##### If above condition is TRUE then all following statements will be executing.
                gsub(/\"/,"",$8);                          ##### substituting all quotes(") in $8 now.
        if($8==1){                                 ##### Check if 8th field value is 1, if yes then it will execute following statement.
                sub(/.*/,s1 "success" s1,$8)     ##### substitute everything in $8 with  s1 "success" s1
                 };
                if($8==2){                                 ##### Similarly like above checking if $8's value is 2
                sub(/.*/,s1 "check" s1,$8)       ##### Then substitute the $8's value with s1 "check" s1
                 };  
                if($8=="null"){                            ##### checking if $8's value is "null" here
                sub(/.*/,s1 "undefined" s1,$8)   ##### substituting the complete value of $8 with s1 "undefined" s1.
                      };
        print                                      ##### printing the whole line now.
         }
    '   OFS=,  Input_file                                  ##### Setting output field separator as a comma. Then mentioning the Input_file here.

Hello All(MODs), someone has given me a -ve vote on this solution. I would like to request person kindly provide the reason for same. So that I could improve my answer or edit it in case of any mistakes/problems in it.
@RavinderSingh13 .. I was granted an equally helpful downvote by some anonymous passer-by as well. These things happen from time to time. Blame the trolls.
@ghoti: Thank you for letting me know(as I am also kind of new only in stack overflow), I strongly believe we should have a mandate section if someone gives a DOWN vote to someone, then other person has FULL right to know what he/she did wrong in solution so that we could improve ourselves and post/thread too.
As a convention, that already exists. But as a policy, it would require some sort of enforcement, and therefore a change to the underlying code that runs SO. I am certain this has been discussed before in Meta, but it wouldn't hurt to bring it up there again, as it's a problem that doesn't go away by itself.

ghoti · Accepted Answer · 2017-02-13 21:01:43Z

Here's a shorter one-liner:

$ awk 'BEGIN{FS=OFS=",";a[1]="success";a[2]="check"} {gsub(/"/,"",$8)} $8 in a{$8=a[$8]} 1' input.txt

Broken out for commenting:

BEGIN {
  FS=OFS=","       # set our field separators
  a[1]="success"   # populate an array with replacement values
  a[2]="check"
}

{
  gsub(/"/,"",$8)   # remove quotes in field 8, for easier processing
}

$8 in a {           # check to see if field 8 is a member of our array
  $8=a[$8]          # replace field 8 with the contents of the array at that index
}

1                   # print the line

If it's important to keep the quotes around each field, you can do that by replacing the assignement with a sprintf() that includes them:

  $8=sprintf("\"%s\"",a[$8])

Remember that awk only knows about your field separator, not your quotes. If you have a field that includes a comma inside the quoted field, awk will consider it to be a field separator. You might add protection for this sort of occurrence with something like this at the top of the awk script:

NF != 10 { print "ERROR: wrong number of fields in line",NR > "/dev/stderr"; exit(1) }

right on the money, it works perfectly, thank you so much :)
Yay, glad I could help! :) (I do wonder who downvoted without a comment though.)

Lars Fischer · Accepted Answer · 2017-02-18 17:56:10Z

0

You could try something like this:

awk  'BEGIN {OFS=FS=",";r["\"\""] = "\"undefined\""; r["\"1\""]= "\"success\""; r["\"2\""]="\"check\""} {if($8 in r ) $8 = r[$8]} 1' input_file

Explanation

the BEGIN parts sets up a replacement-mapping in r. e.g. r["\"1\""]= "\"success\""; is a map of the literal token "1" (a 1 with the qoutes!) to the literal value "success" (also including the quotes!)
additionally FS and OFS are set to use comma as input and output separator in the BEGIN part
the part after the definition of r consists of a test if the value of the field $8 is a key in the map, if yes, then the field $8 is replaced by the value defined in the map r for this key
the question is not 100% clear if there are unmapped values in column $8, so use this as a starting point for your own experiments

edited Feb 18, 2017 at 17:56

answered Feb 13, 2017 at 20:38

Lars Fischer

10.4k3 gold badges31 silver badges38 bronze badges

5 Comments

karakfa Over a year ago

probably best to qualify replacement with $8 in r otherwise will remove unmatched fields.

sandatomo Over a year ago

@LarsFischer thank you for your help, I don't know why but with the null fields it didn't work, maybe it's because awk implementation, I'm using a HP-UX unix machine, greetings

ghoti Over a year ago

Lars, if you don't set OFS, your output will have fields separated by space instead of commas. Also, it would be great if you could explain what your solution actually does, so that the OP can learn from your answer more easily.

Lars Fischer Over a year ago

@sandatomo Please specify in the question what you mean by null fields. Best give an example input and output line. null field could mean anything from "", null, "null" or just the empty string between two commas.

sandatomo Over a year ago

@LarsFischer, Hi sorry it took me so long to respond in this case the null string was like this "" just doble quotes with nothing in between.

Collectives™ on Stack Overflow

Replace string with a new value

3 Answers 3

5 Comments

2 Comments

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

5 Comments

Related