Using any awk in any shell on every Unix box:
$ echo "some comment char '\;' embedded in strings ; along with inline comments" |
awk -F';' '{gsub(/\\\\/,RS); gsub(/\\;/,"\\\\"); gsub(/\\\\/,";",$1); gsub(RS,"\\",$1); print $1}'
some comment char ';' embedded in strings
and borrowing @Stéphane's sample input file:
$ cat file
foo\;bar;baz
foo\\;bar;baz
$ awk -F';' '{gsub(/\\\\/,RS); gsub(/\\;/,"\\\\"); gsub(/\\\\/,";",$1); gsub(RS,"\\",$1); print $1}' file
foo;bar
foo\
and extending that to include a line with more fields:
$ cat file
foo\;bar;baz
foo\\;bar;baz
foo\\;bar\;this\;that\\;baz;here\;and\;there
we can print any or all of the fields as we like (here also outputting the original line first and the field number at the start of each output line that contains a single field):
$ awk -F';' '{print; gsub(/\\\\/,RS) gsub(/\\;/,"\\\\"); for (i=1; i<=NF; i++) { gsub(/\\\\/,";",$i); gsub(RS,"\\",$i); print " " i, $i }; print "---" }' file
foo\;bar;baz
1 foo;bar
2 baz
---
foo\\;bar;baz
1 foo\
2 bar
3 baz
---
foo\\;bar\;this\;that\\;baz;here\;and\;there
1 foo\
2 bar;this;that\
3 baz
4 here;and;there
The above:
- converts every
\\ in the current input line ($0) into a newline (the default value of RS), which is a string that cannot exist within a newline-separated records, so we can handle \\; in the input as an escaped backslash rather than an escaped semi-colon, then
- converts every
\; in $0 into \\, which is also now a string that cannot exist in $0 since we just converted them all to RSs, to get rid of the troublesome ; in it, then
- the act of modifying
$0 causes awk to resplit $0 into fields at every remaining ; which puts our desired target string in $1, then
- we convert every
\\ (created at step 2 above) in $1 to ;, then
- convert every
RS (created at step 1 above) in $1 back to \\, then
- we print that field,
$1
That approach will work for every RS that is a literal string as defined by POSIX, if your RS is a regexp as supported by some awks, e.g. GNU awk, then come up with a string without regexp metachars that matches that regexp to use as the replacement instead of RS
cutas opposed to other more versatile tools likesed,awk,perl,python, etc. Tools liketror at mostgrepare fine.cutsimply isn't that fancy. If you tell it that;is your delimiter, then every;counts; there is no escaping.awkwould've been more readable but you can use grep's PCRE as follows:.... |grep -oP '.*(?=(;.*?))'and get the result you wantgrep -Po '.*?(?=(?<!\\);)'although I think perhaps plain perlperl -F'(?<!\\);' -lne 'print $F[0]'is clearer\\;? Can you be sure there will be no quoted backslash? If not then you have to actually parse the string which would be ugly.