3

The following awk command target is remove spaces and unnecessary quotes from CSV file and only from single words between separators

awk 'gsub(/("[ ]+|[ ]+")/,"\""){$0=gensub(/"([[:alnum:]]+)"/,"\\1","g")}1' file.csv

example: ( before )

1,"1.0348    54 35.5",""45356",""4"""""35,"578 "

example: ( after )

1,"1.0348    54 35.5","45356,"4""""35,578

the problem is that awk can’t handle when characters are non alpha numeric

example

1,"  jde7@&^%  "," &^!@  ",)(*&^," (*^%%^&*( "

my target is to work with all kind of characters non alpha numeric and alpha numeric

as

(  A-Z , !@#$%^&&**( , 1-9 , etc )

I guess I need to replace the [[:alnum:]] with some other syntax.

What do I need to change in my awk syntax in order to support all kind of characters?

1 Answer 1

3

The [[:alnum:]] character class represents alphabetic and numeric characters, you can use

[^[:alnum:]] for non alpha numeric so for your goal:

my target is to work with all kind of characters non alpha numeric and alpha numeric

you can use this expression [[:alnum:]] | [^[:alnum:]]

so the awk command will be something like this:

awk 'gsub(/("[ ]+|[ ]+")/,"\""){$0=gensub(/"(([[:alnum:]]|[^[:alnum:]])+)"/,"\\1","g")}1' file.csv

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.