Revisions to How do I use grep, awk, or sed to get a substring of a line up until a string literal?

added 617 characters in body

Source Link

edited Jul 20, 2023 at 20:16

35.9k
6
25
60

Using any awk:

$ awk 'n=index($0 RS,", characters I don\047t want" RS){$0=substr($0,1,n-1)} 1' file
ABC 123
DEF
GHI, these characters are ok

That's doing a literal string comparison so it'd work even if the string you're trying to match with contained regexp metachars., for example using this input:

$ cat file2
ABC 123
DEF, .*, .*
GHI, .* ok

We get the expected output:

$ awk 'n=index($0 RS,", .*" RS){$0=substr($0,1,n-1)} 1' file2
ABC 123
DEF, .*
GHI, .* ok

If you didn't care about regexp metachars you could just do:

$ awk '{sub(/, characters I don\047t want$/,"")} 1' file
ABC 123
DEF
GHI, these characters are ok

but then you'd get unexpected output from:

$ awk '{sub(/, .*$/,"")} 1' file2
ABC 123
DEF
GHI

and you'd have to escape the metachars to make them literal to get the expected output:

$ awk '{sub(/, \.\*$/,"")} 1' file2
ABC 123
DEF, .*
GHI, .* ok

which is getting cludgy given all you really wanted was a literal string comparison.

See http://awk.freeshell.org/PrintASingleQuote for why I'm using \047 instead of '.

As for why to use awk instead of python - awk is a mandatory POSIX tool and so is guaranteed to exist on all POSIX-compliant Unix installations while python is not, and it usually takes much less code to manipulate text with awk than it does with python. I suspect we will have to agree to disagree on which is more easily readable and maintainable.

Using any awk:

$ awk 'n=index($0 RS,", characters I don\047t want" RS){$0=substr($0,1,n-1)} 1' file
ABC 123
DEF
GHI, these characters are ok

That's doing a literal string comparison so it'd work even if the string you're trying to match with contained regexp metachars.

If you didn't care about regexp metachars you could just do:

$ awk '{sub(/, characters I don\047t want$/,"")} 1' file
ABC 123
DEF
GHI, these characters are ok

See http://awk.freeshell.org/PrintASingleQuote for why I'm using \047 instead of '.

As for why to use awk instead of python - awk is a mandatory POSIX tool and so is guaranteed to exist on all POSIX-compliant Unix installations while python is not, and it usually takes much less code to manipulate text with awk than it does with python. I suspect we will have to agree to disagree on which is more easily readable and maintainable.

Using any awk:

$ awk 'n=index($0 RS,", characters I don\047t want" RS){$0=substr($0,1,n-1)} 1' file
ABC 123
DEF
GHI, these characters are ok

That's doing a literal string comparison so it'd work even if the string you're trying to match with contained regexp metachars, for example using this input:

$ cat file2
ABC 123
DEF, .*, .*
GHI, .* ok

We get the expected output:

$ awk 'n=index($0 RS,", .*" RS){$0=substr($0,1,n-1)} 1' file2
ABC 123
DEF, .*
GHI, .* ok

If you didn't care about regexp metachars you could just do:

$ awk '{sub(/, characters I don\047t want$/,"")} 1' file
ABC 123
DEF
GHI, these characters are ok

but then you'd get unexpected output from:

$ awk '{sub(/, .*$/,"")} 1' file2
ABC 123
DEF
GHI

and you'd have to escape the metachars to make them literal to get the expected output:

$ awk '{sub(/, \.\*$/,"")} 1' file2
ABC 123
DEF, .*
GHI, .* ok

which is getting cludgy given all you really wanted was a literal string comparison.

See http://awk.freeshell.org/PrintASingleQuote for why I'm using \047 instead of '.

As for why to use awk instead of python - awk is a mandatory POSIX tool and so is guaranteed to exist on all POSIX-compliant Unix installations while python is not, and it usually takes much less code to manipulate text with awk than it does with python. I suspect we will have to agree to disagree on which is more easily readable and maintainable.

added 17 characters in body

Source Link

edited Jul 20, 2023 at 20:06

Ed Morton

35.9k
6
25
60

Using any awk:

$ awk 'n=index($0 RS,", characters I don\047t want" RS){$0=substr($0,1,n-1)} 1' file
ABC 123
DEF
GHI, these characters are ok

That's doing a literal string comparison so it'd work even if the string you're trying to match with contained regexp metachars.

If you didn't care about regexp metachars you could just do:

$ awk '{sub(/, characters I don\047t want$/,"")} 1' file
ABC 123
DEF
GHI, these characters are ok

See http://awk.freeshell.org/PrintASingleQuote for why I'm using \047 instead of '.

As for why to use awk instead of python - awk is a mandatory POSIX tool and so is guaranteed to exist on all POSIX-compliant Unix installations while python is not, and it usually takes much less code to manipulate text with awk than it does with python. I suspect we will have to agree to disagree on which is more easily readable and maintainable.

Using any awk:

$ awk 'n=index($0 RS,", characters I don\047t want" RS){$0=substr($0,1,n-1)} 1' file
ABC 123
DEF
GHI, these characters are ok

That's doing a literal string comparison so it'd work even if the string you're trying to match with contained regexp metachars.

If you didn't care about regexp metachars you could just do:

$ awk '{sub(/, characters I don\047t want$/,"")} 1' file
ABC 123
DEF
GHI, these characters are ok

See http://awk.freeshell.org/PrintASingleQuote for why I'm using \047 instead of '.

As for why to use awk instead of python - awk is a mandatory POSIX tool and so is guaranteed to exist on all POSIX-compliant Unix installations while python is not, and it usually takes much less code to manipulate text with awk than it does with python. I suspect we will have to agree to disagree on which is more easily readable.

Using any awk:

$ awk 'n=index($0 RS,", characters I don\047t want" RS){$0=substr($0,1,n-1)} 1' file
ABC 123
DEF
GHI, these characters are ok

That's doing a literal string comparison so it'd work even if the string you're trying to match with contained regexp metachars.

If you didn't care about regexp metachars you could just do:

$ awk '{sub(/, characters I don\047t want$/,"")} 1' file
ABC 123
DEF
GHI, these characters are ok

See http://awk.freeshell.org/PrintASingleQuote for why I'm using \047 instead of '.

As for why to use awk instead of python - awk is a mandatory POSIX tool and so is guaranteed to exist on all POSIX-compliant Unix installations while python is not, and it usually takes much less code to manipulate text with awk than it does with python. I suspect we will have to agree to disagree on which is more easily readable and maintainable.

added 336 characters in body

Source Link

edited Jul 20, 2023 at 20:01

Ed Morton

35.9k
6
25
60

Using any awk:

$ awk 'n=index($0 RS,", characters I don\047t want" RS){$0=substr($0,1,n-1)} 1' file
ABC 123
DEF
GHI, these characters are ok

That's doing a literal string comparison so it'd work even if the string you're trying to match with contained regexp metachars.

If you didn't care about regexp metachars you could just do:

$ awk '{sub(/, characters I don\047t want$/,"")} 1' file
ABC 123
DEF
GHI, these characters are ok

See http://awk.freeshell.org/PrintASingleQuote for why I'm using \047 instead of '.

As for why to use awk instead of python - awk is a mandatory POSIX tool and so is guaranteed to exist on all POSIX-compliant Unix installations while python is not, and it usually takes much less code to manipulate text with awk than it does with python. I suspect we will have to agree to disagree on which is more easily readable.

Using any awk:

$ awk 'n=index($0 RS,", characters I don\047t want" RS){$0=substr($0,1,n-1)} 1' file
ABC 123
DEF
GHI, these characters are ok

That's doing a literal string comparison so it'd work even if the string you're trying to match with contained regexp metachars. See http://awk.freeshell.org/PrintASingleQuote for why I'm using \047 instead of '.

Using any awk:

$ awk 'n=index($0 RS,", characters I don\047t want" RS){$0=substr($0,1,n-1)} 1' file
ABC 123
DEF
GHI, these characters are ok

That's doing a literal string comparison so it'd work even if the string you're trying to match with contained regexp metachars.

If you didn't care about regexp metachars you could just do:

$ awk '{sub(/, characters I don\047t want$/,"")} 1' file
ABC 123
DEF
GHI, these characters are ok

See http://awk.freeshell.org/PrintASingleQuote for why I'm using \047 instead of '.

As for why to use awk instead of python - awk is a mandatory POSIX tool and so is guaranteed to exist on all POSIX-compliant Unix installations while python is not, and it usually takes much less code to manipulate text with awk than it does with python. I suspect we will have to agree to disagree on which is more easily readable.

Post Undeleted by Ed Morton

occurred Jul 20, 2023 at 19:56

Post Deleted by Ed Morton

occurred Jul 20, 2023 at 19:53

Source Link

answered Jul 20, 2023 at 19:53

Ed Morton

35.9k
6
25
60

Loading

Stack Exchange Network

Return to Answer