If you can only have 1 quoted field at most per line then you could do the following using any awk:
$ awk '
match($0,/".*"/) { # find the string from the first to the last `"` on this line
fld = substr($0,RSTART+1,RLENGTH-2) # save it in the variable `fld`
gsub(/"/,"",fld) # remove all `"`s from it
$0 = substr($0,1,RSTART) fld substr($0,RSTART+RLENGTH-1) # piece `$0` back together replacing the original `fld` string with the modified one
}
{ print } # print $0
' file
"Hi there, we are from XYZ team, we have an Opportunity at our organization"
invoice number,invoice date,vendor number,vendor site ID,supplier site CODE,invoice description,invoice currency code,invoice total amount,line number,line amount,line description,account code,business unit,business center,department,issue code,project,task number
1686,2024-03-28,258,9845,NEWYORK,CA Project: Content,USD,538,1,26,,232130,,,,,2915,"Review new applications, and instruct the same.The deposits. Review correspondence applications. Review and applications. Research Material Included and artwork , and email. Communications with team website. Call, and communications.",230,,,,,295,10
or this with any sed that interprets \n to mean newline (otherwise use \<literal newline> instead):
$ sed 's/"\(.*\)"/\n\1\n/; s/"//g; s/\n/"/g' file
"Hi there, we are from XYZ team, we have an Opportunity at our organization"
invoice number,invoice date,vendor number,vendor site ID,supplier site CODE,invoice description,invoice currency code,invoice total amount,line number,line amount,line description,account code,business unit,business center,department,issue code,project,task number
1686,2024-03-28,258,9845,NEWYORK,CA Project: Content,USD,538,1,26,,232130,,,,,2915,"Review new applications, and instruct the same.The deposits. Review correspondence applications. Review and applications. Research Material Included and artwork , and email. Communications with team website. Call, and communications.",230,,,,,295,10
If you can have more than 1 quoted field per line then it's impossible to do this job robustly with any tool without additional information on how to identify quotes within fields vs around fields.
The above were run on this input file constructed from the sample lines in the question:
$ cat file
"Hi there, we are from XYZ team, we have an "Opportunity" at our organization"
invoice number,invoice date,vendor number,vendor site ID,supplier site CODE,invoice description,invoice currency code,invoice total amount,line number,line amount,line description,account code,business unit,business center,department,issue code,project,task number
1686,2024-03-28,258,9845,NEWYORK,CA Project: Content,USD,538,1,26,,232130,,,,,2915,"Review new applications, and instruct the same.The deposits. Review correspondence applications. Review and applications. Research "Material Included" and artwork , and email. Communications with team website. Call, and communications.",230,,,,,295,10
EDIT: if you really want to have the quotes string appear as a single field in $0 then here's one way you could make that happen, again assuming you only have 1 quoted string per record:
$ cat file
1686,2024-03-28,258,9845,NEWYORK,CA Project: Content,USD,538,1,26,,232130,,,,,2915,"Review new applications, and instruct the same.The deposits. Review correspondence applications. Review and applications. Research Material Included and artwork , and email. Communications with team website. Call, and communications.",230,,,,,295,10
$ cat tst.awk
BEGIN { FS=OFS="," }
match($0,/".*"/) {
fld = substr($0,RSTART,RLENGTH)
$0 = substr($0,1,RSTART-1) "\"" substr($0,RSTART+RLENGTH)
for ( i=1; i<=NF; i++ ) {
if ( $i == "\"" ) {
$i = fld
}
}
}
{
for ( i=1; i<=NF; i++ ) {
print i "\t" $i
}
}
$ awk -f tst.awk file
1 1686
2 2024-03-28
3 258
4 9845
5 NEWYORK
6 CA Project: Content
7 USD
8 538
9 1
10 26
11
12 232130
13
14
15
16
17 2915
18 "Review new applications, and instruct the same.The deposits. Review correspondence applications. Review and applications. Research Material Included and artwork , and email. Communications with team website. Call, and communications."
19 230
20
21
22
23
24 295
25 10
Note that you cannot modify $0 as a whole after doing the above or awk will re-split $0 on commas again:
$ cat tst.awk
BEGIN { FS=OFS="," }
match($0,/".*"/) {
fld = substr($0,RSTART,RLENGTH)
$0 = substr($0,1,RSTART-1) "\"" substr($0,RSTART+RLENGTH)
for ( i=1; i<=NF; i++ ) {
if ( $i == "\"" ) {
$i = fld
}
}
}
{
$0 = $0 # even this will cause awk to re-split `$0` at `,`s
for ( i=1; i<=NF; i++ ) {
print i "\t" $i
}
}
$ awk -f tst.awk file
1 1686
2 2024-03-28
3 258
4 9845
5 NEWYORK
6 CA Project: Content
7 USD
8 538
9 1
10 26
11
12 232130
13
14
15
16
17 2915
18 "Review new applications
19 and instruct the same.The deposits. Review correspondence applications. Review and applications. Research Material Included and artwork
20 and email. Communications with team website. Call
21 and communications."
22 230
23
24
25
26
27 295
28 10
It's important when using awk to understand the difference between these 2 ways of modifying the current record:
- Modifying any field, e.g.
$1, causes awk to reconstruct $0, replacing all FSs with OFSs but it does NOT re-split the record.
- Modifying the record as a whole, i.e.
$0, causes awk to re-split the record into fields separated by FSs but it does NOT reconstruct $0 replacing FSs with OFSs.
If you understand that then you'll understand this output, especially NF being 1 for the 4th output:
$ echo 'a,b,c' | awk -F',' -v OFS='@' '{$1=$1; print NF "\t" $0}'
3 a@b@c
$ echo 'a,b,c' | awk -F',' -v OFS='@' '{$0=$0; print NF "\t" $0}'
3 a,b,c
$ echo 'a,b,c' | awk -F',' -v OFS='@' '{$0=$0; $1=$1; print NF "\t" $0}'
3 a@b@c
$ echo 'a,b,c' | awk -F',' -v OFS='@' '{$1=$1; $0=$0; print NF "\t" $0}'
1 a@b@c
"foo","bar"is a single field that contains 2 quotes and a comma or 2 fields separated by a comma.