Revisions to How do I use grep, awk, or sed to get a substring of a line up until a string literal?

added 1201 characters in body

Source Link

edited Jul 21, 2023 at 7:22

584.8k
96
1.1k
1.7k

Most obvious here would be to use sed:

To substitute it with nothing wouldthat string when found at the end of the line $ which we escape as \$ for the shell (to be future proof in case $/ means something in the most obviousshell in the future).

To also remove whatever follows that string if any, replace the \$ with .*, though we'd need to change the locale the C to guarantee .* matches everything up to the end even if that's not valid text in the user's locale:

<source LC_ALL=C sed "s/, characters I don't want.*//"

<source LC_ALL=C grep -Po "^.*?(?=(, characters I don't want)?\$)"

Or to remove everything if any after that string as well:

<source LC_ALL=C grep -Po "^.*?(?=, characters I don't want|\$)"

Or to remove everything if any after that string as well:

<source pcregrep -o1 "^(.*?)(, characters I don't want|\$)"

If the text to remove may contain anything including / or regex operators (but not newline characters which wouldn't make sense, nor NUL characters which can be passed in command arguments nor environment variables) and is stored in a shell variable, you do not want to use ~~sed "s/$string\$//"~~ as that would make it a command injection vulnerability.

string='/.*\^$'
<source LC_ALL=C grep -Po "^.*?(?=(\Q$string)?\$)"
<source pcregrep -o1 "^(.*?)\Q$string\E\$"(\Q$string\E)?\$"

Or to remove everything if any after that string as well:

<source LC_ALL=C grep -Po "^.*?(?=\Q$string|\$)"
<source pcregrep -o1 "^(.*?)(\Q$string\E|\$)"

Or to remove everything if any after that string as well:

<source perl -spe 's/\Q$string\E.*$//' -- -string="$string"

perl treats input as bytes not as if encoded in the user's locale charset by default, so we don't need to change the locale there.

To substitute it with nothing would be the most obvious.

<source grep -Po "^.*?(?=(, characters I don't want)?\$)"

If the text to remove may contain anything including / or regex operators (but not newline characters which wouldn't make sense, nor NUL characters which can be passed in command arguments nor environment variables) and is stored in a shell variable, you do not want to use ~~sed "s/$string\$//"~~ as that would make it a command injection vulnerability.

string='/.*\^$'
<source grep -Po "^.*?(?=(\Q$string)?\$)"
<source pcregrep -o1 "^(.*?)\Q$string\E\$"

Most obvious here would be to use sed:

To substitute that string when found at the end of the line $ which we escape as \$ for the shell (to be future proof in case $/ means something in the shell in the future).

To also remove whatever follows that string if any, replace the \$ with .*, though we'd need to change the locale the C to guarantee .* matches everything up to the end even if that's not valid text in the user's locale:

<source LC_ALL=C sed "s/, characters I don't want.*//"

<source LC_ALL=C grep -Po "^.*?(?=(, characters I don't want)?\$)"

Or to remove everything if any after that string as well:

<source LC_ALL=C grep -Po "^.*?(?=, characters I don't want|\$)"

Or to remove everything if any after that string as well:

<source pcregrep -o1 "^(.*?)(, characters I don't want|\$)"

If the text to remove may contain anything including / or regex operators (but not newline characters which wouldn't make sense, nor NUL characters which can be passed in command arguments nor environment variables) and is stored in a shell variable, you do not want to use ~~sed "s/$string\$//"~~ as that would make it a command injection vulnerability.

string='/.*\^$'
<source LC_ALL=C grep -Po "^.*?(?=(\Q$string)?\$)"
<source pcregrep -o1 "^(.*?)(\Q$string\E)?\$"

Or to remove everything if any after that string as well:

<source LC_ALL=C grep -Po "^.*?(?=\Q$string|\$)"
<source pcregrep -o1 "^(.*?)(\Q$string\E|\$)"

Or to remove everything if any after that string as well:

<source perl -spe 's/\Q$string\E.*$//' -- -string="$string"

perl treats input as bytes not as if encoded in the user's locale charset by default, so we don't need to change the locale there.

added 1470 characters in body

Source Link

edited Jul 21, 2023 at 7:03

Stéphane Chazelas

584.8k
96
1.1k
1.7k

<source sed "s/, characters I don't want\$//"

To substitute it with nothing would be the most obvious.

With GNU grep or compatible, when built with perl-like regexp support, that could be:

<source grep -Po "^.*?(?=(, characters I don't want)?\$)"

Or with pcregrep (when perl-like regex support is enabled in GNU grep, that's actually via libpcre which comes with pcregrep as an example application though has features beyond those of GNU grep):

<source pcregrep -o1 "^(.*?)(, characters I don't want)?\$"

If the text to remove may contain anything including / or regex operators (but not newline characters which wouldn't make sense, nor NUL characters which can be passed in command arguments nor environment variables) and is stored in a shell variable, you do not want to use ~~sed "s/$string\$//"~~ as that would make it a command injection vulnerability.

With the perl-grep ones, you can use:

string='/.*\^$'
<source grep -Po "^.*?(?=(\Q$string)?\$)"
<source pcregrep -o1 "^(.*?)\Q$string\E\$"

That still chokes on $strings that contain \E, though not with as dramatic consequences as with sed.

Or you could use perl directly which has a sed mode with its -p option, has mechanisms to pass arbitrary strings (here using -s for a crude option passing, but you could also use @ARGV directly (equivalent of python's sys.argv) or environment variables (mapped to the %ENV associative array)), and can \Quote strings inside regexps (here with \E in $string not being a problem):

<source perl -spe 's/\Q$string\E$//' -- -string="$string"

Note that contrary to sed, the line delimiter is included in the pattern space ($_ in perl on which s/// acts by default) by default and its $ regex operator matches either at the end of the subject or before a line delimiter at the end of the subject so is able to cope with both delimited and undelimited lines.

sed "s/, characters I don't want\$//"

To substitute it with nothing would be the most obvious.

With GNU grep or compatible, when built with perl-like regexp support, that could be:

grep -Po "^.*?(?=(, characters I don't want)?\$)"

Or with pcregrep (when perl-like regex support is enabled in GNU grep, that's actually via libpcre which comes with pcregrep as an example application though has features beyond those of GNU grep):

pcregrep -o1 "^(.*?)(, characters I don't want)?\$"

<source sed "s/, characters I don't want\$//"

To substitute it with nothing would be the most obvious.

With GNU grep or compatible, when built with perl-like regexp support, that could be:

<source grep -Po "^.*?(?=(, characters I don't want)?\$)"

Or with pcregrep (when perl-like regex support is enabled in GNU grep, that's actually via libpcre which comes with pcregrep as an example application though has features beyond those of GNU grep):

<source pcregrep -o1 "^(.*?)(, characters I don't want)?\$"

If the text to remove may contain anything including / or regex operators (but not newline characters which wouldn't make sense, nor NUL characters which can be passed in command arguments nor environment variables) and is stored in a shell variable, you do not want to use ~~sed "s/$string\$//"~~ as that would make it a command injection vulnerability.

With the perl-grep ones, you can use:

string='/.*\^$'
<source grep -Po "^.*?(?=(\Q$string)?\$)"
<source pcregrep -o1 "^(.*?)\Q$string\E\$"

That still chokes on $strings that contain \E, though not with as dramatic consequences as with sed.

Or you could use perl directly which has a sed mode with its -p option, has mechanisms to pass arbitrary strings (here using -s for a crude option passing, but you could also use @ARGV directly (equivalent of python's sys.argv) or environment variables (mapped to the %ENV associative array)), and can \Quote strings inside regexps (here with \E in $string not being a problem):

<source perl -spe 's/\Q$string\E$//' -- -string="$string"

Note that contrary to sed, the line delimiter is included in the pattern space ($_ in perl on which s/// acts by default) by default and its $ regex operator matches either at the end of the subject or before a line delimiter at the end of the subject so is able to cope with both delimited and undelimited lines.

added 274 characters in body

Source Link

edited Jul 20, 2023 at 17:37

Stéphane Chazelas

584.8k
96
1.1k
1.7k

sed "s/, characters I don't want\$//"

To substitute it with nothing would be the most obvious.

With GNU grep or compatible, when built with perl-like regexp support, that could be:

grep -Po "^.*?(?=(, characters I don't want)?\$)"

Or with pcregrep (when perl-like regex support is enabled in GNU grep, that's actually via libpcre which comes with pcregrep as an example application though has features beyond those of GNU grep):

pcregrep -o1 "^(.*?)(, characters I don't want)?\$"

sed "s/, characters I don't want\$//"

To substitute it with nothing would be the most obvious.

With GNU grep or compatible, when built with perl-like regexp support, that could be:

grep -Po "^.*?(?=(, characters I don't want)?\$)"

sed "s/, characters I don't want\$//"

To substitute it with nothing would be the most obvious.

With GNU grep or compatible, when built with perl-like regexp support, that could be:

grep -Po "^.*?(?=(, characters I don't want)?\$)"

Or with pcregrep (when perl-like regex support is enabled in GNU grep, that's actually via libpcre which comes with pcregrep as an example application though has features beyond those of GNU grep):

pcregrep -o1 "^(.*?)(, characters I don't want)?\$"

Source Link

answered Jul 20, 2023 at 17:29

Stéphane Chazelas

584.8k
96
1.1k
1.7k

Loading

Stack Exchange Network

Return to Answer