1

I'm trying to manipulate a huge file (+5.000.000 records), so I can replace the value of the 8th column for example.

If $8 = 1 replace it with success
if $8 = 2 replace it with check
if $8 = null replace with undefined

Here's a piece of the data which is separated by a , character:

"APPLICATION_ID","ORIGIN_ID","SERVICE_ID","PROVIDER_ID","RATING_ID","ATO","DATE","USER_TYPE","ESTATUS","OPERATION_ID"

"3","2","424","5020","1058","3017292917","30/11/2016 01:14:25 a.m.","1","2004","14804862360104011458"

The field I want to replace is USER_TYPE located at $8

I tried this but it doesn't replace the values:

awk '{if($8 = 1) print $1, $2, $3, $4, $5, $6, $7, "success", $9, $10}' input_file

How can I get this done?

2
  • as below, a single = assigns the value on the RHS to the name on the LHS. To test for equality, use ==, so if($8==1).... You also need to tell awk to split fields on the , char, with either awk -F, '{...}' file` OR awk 'BEGIN{FS=","}{....}' file. What happens when there are ,s inside the dbl-quoted data? Kaboom! ... So much better to use a <tab> char to separate fields in your data file (or maybe | char). Good luck. Commented Feb 13, 2017 at 20:43
  • When you need to deal with csv, awk is definitely not the good way to go, since it doesn't handle cases where the value contains the delimiter. You should use a tool designed to deal with csv, for example csvtool. Commented Feb 13, 2017 at 21:02

3 Answers 3

1

@sandatomo: Try(untested):

awk -F, -vs1="\"" 'NR>1{gsub(/\"/,"",$8);if($8==1){sub(/.*/,s1 "success" s1,$8)};if($8==2){sub(/.*/,s1 "check" s1,$8)};if($8=="null"){sub(/.*/,s1 "undefined" s1,$8)};print}' OFS=, Input_file

EDIT: Adding a non-one liner form of solution too now.

awk -F, -vs1="\"" 'NR>1{
                                gsub(/\"/,"",$8);
                                if($8==1){
                                                sub(/.*/,s1 "success" s1,$8)
                                         };
                                if($8==2){
                                                sub(/.*/,s1 "check" s1,$8)
                                         };
                                if($8=="null"){
                                                sub(/.*/,s1 "undefined" s1,$8)
                                              };
                                print
                       }
                  ' OFS=,  Input_file

EDIT2: I tested my code previous code and it was not having field separator as "," so edited it now.

EDIT3: Explanation of above.

awk  -F, -vs1="\"" 'NR>1{                                  ##### Setting Field separator as comma(,). Creating a variable named s1 whose value is a quote("). Then Checking here if current line number is greater than 1.
                                                           ##### If above condition is TRUE then all following statements will be executing.
                gsub(/\"/,"",$8);                          ##### substituting all quotes(") in $8 now.
        if($8==1){                                 ##### Check if 8th field value is 1, if yes then it will execute following statement.
                sub(/.*/,s1 "success" s1,$8)     ##### substitute everything in $8 with  s1 "success" s1
                 };
                if($8==2){                                 ##### Similarly like above checking if $8's value is 2
                sub(/.*/,s1 "check" s1,$8)       ##### Then substitute the $8's value with s1 "check" s1
                 };  
                if($8=="null"){                            ##### checking if $8's value is "null" here
                sub(/.*/,s1 "undefined" s1,$8)   ##### substituting the complete value of $8 with s1 "undefined" s1.
                      };
        print                                      ##### printing the whole line now.
         }
    '   OFS=,  Input_file                                  ##### Setting output field separator as a comma. Then mentioning the Input_file here.
Sign up to request clarification or add additional context in comments.

5 Comments

Hi @RavinderSingh13 I'll give it a try, thank you :)
Hello All(MODs), someone has given me a -ve vote on this solution. I would like to request person kindly provide the reason for same. So that I could improve my answer or edit it in case of any mistakes/problems in it.
@RavinderSingh13 .. I was granted an equally helpful downvote by some anonymous passer-by as well. These things happen from time to time. Blame the trolls.
@ghoti: Thank you for letting me know(as I am also kind of new only in stack overflow), I strongly believe we should have a mandate section if someone gives a DOWN vote to someone, then other person has FULL right to know what he/she did wrong in solution so that we could improve ourselves and post/thread too.
As a convention, that already exists. But as a policy, it would require some sort of enforcement, and therefore a change to the underlying code that runs SO. I am certain this has been discussed before in Meta, but it wouldn't hurt to bring it up there again, as it's a problem that doesn't go away by itself.
0

Here's a shorter one-liner:

$ awk 'BEGIN{FS=OFS=",";a[1]="success";a[2]="check"} {gsub(/"/,"",$8)} $8 in a{$8=a[$8]} 1' input.txt

Broken out for commenting:

BEGIN {
  FS=OFS=","       # set our field separators
  a[1]="success"   # populate an array with replacement values
  a[2]="check"
}

{
  gsub(/"/,"",$8)   # remove quotes in field 8, for easier processing
}

$8 in a {           # check to see if field 8 is a member of our array
  $8=a[$8]          # replace field 8 with the contents of the array at that index
}

1                   # print the line

If it's important to keep the quotes around each field, you can do that by replacing the assignement with a sprintf() that includes them:

  $8=sprintf("\"%s\"",a[$8])

Remember that awk only knows about your field separator, not your quotes. If you have a field that includes a comma inside the quoted field, awk will consider it to be a field separator. You might add protection for this sort of occurrence with something like this at the top of the awk script:

NF != 10 { print "ERROR: wrong number of fields in line",NR > "/dev/stderr"; exit(1) }

2 Comments

right on the money, it works perfectly, thank you so much :)
Yay, glad I could help! :) (I do wonder who downvoted without a comment though.)
0

You could try something like this:

awk  'BEGIN {OFS=FS=",";r["\"\""] = "\"undefined\""; r["\"1\""]= "\"success\""; r["\"2\""]="\"check\""} {if($8 in r ) $8 = r[$8]} 1' input_file

Explanation

  • the BEGIN parts sets up a replacement-mapping in r. e.g. r["\"1\""]= "\"success\""; is a map of the literal token "1" (a 1 with the qoutes!) to the literal value "success" (also including the quotes!)
  • additionally FS and OFS are set to use comma as input and output separator in the BEGIN part
  • the part after the definition of r consists of a test if the value of the field $8 is a key in the map, if yes, then the field $8 is replaced by the value defined in the map r for this key
  • the question is not 100% clear if there are unmapped values in column $8, so use this as a starting point for your own experiments

5 Comments

probably best to qualify replacement with $8 in r otherwise will remove unmatched fields.
@LarsFischer thank you for your help, I don't know why but with the null fields it didn't work, maybe it's because awk implementation, I'm using a HP-UX unix machine, greetings
Lars, if you don't set OFS, your output will have fields separated by space instead of commas. Also, it would be great if you could explain what your solution actually does, so that the OP can learn from your answer more easily.
@sandatomo Please specify in the question what you mean by null fields. Best give an example input and output line. null field could mean anything from "", null, "null" or just the empty string between two commas.
@LarsFischer, Hi sorry it took me so long to respond in this case the null string was like this "" just doble quotes with nothing in between.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.