2

Update 5/26/2020

It seemed like this was a bug, so I filed a bug. Its ID is #41558.


I was just messing around with sed and I came up with this exercise: to replace the 3rd-to-last occurrence of "and" (the word, not as a substring), to create:

dog XYZ foo and bar and baz land good

I thought this would work

echo 'dog and foo and bar and baz land good' |
    sed -E 's/(.*)\band\b((.*\band\b){2})/\1XYZ\2/'

but it actually replaces the 2nd-to-last occurrence of "and". The only explanation I can think of is that it's including "land" as one of the \band\b, but that shouldn't be the case because I included the \b word boundaries?

11
  • What's your OS? You might try [[:<:]]and[[:>:]] instead of \band\b Commented Apr 14, 2020 at 2:23
  • 1
    That looks very much like a bug. Also in GNU grep: echo lala | grep -Eo '(\bla){2}' => lala. As a workaround, use \< and \> instead of \b. Commented Apr 14, 2020 at 2:24
  • @mosvy the BSD sed on macOS for example doesn't accept either \b or the \< and \> pair Commented Apr 14, 2020 at 2:26
  • 1
    @Fox I think that the OP is using GNU sed. I've added the [gnu] tag. Let them revert it if it's not true ;-) Commented Apr 14, 2020 at 2:29
  • 1
    Fancy seeing this example, am I right that you're using my tutorial? I converted it into a book as well. Commented Apr 14, 2020 at 5:27

2 Answers 2

4

This is difficult to do as sed does not support look-arounds etc. (as you can do in a PCRE). It would be easier to reverse the string and replace the third occurrence of the reversed word from the start, then reverse again.

$ echo 'dog and foo and bar and baz land good' | rev | sed 's/\<dna\>/XXX/3' | rev
dog XXX foo and bar and baz land good

As for why your expression does not work, this looks like a bug. The back-reference \3 seem to be the string  baz land, as if the \b before and in .*\band\b never had any effect.

The command

sed -E 's/(.*)\<and\>((.*\<and\>){2})/\1XYZ\2/'

seems to do the right thing on OpenBSD with its native sed (which uses \< and \> instead of \b).

I have yet to find an existing bug report against GNU sed or GNU glibc about this, although I wouldn't be surprised if it was at least related to glibc bug 25322 (because, see below).

You can work around it by being a bit more verbose:

sed -E 's/(.*)\band\b(.*\band\b.*\band\b)/\1XYZ\2/'
0
1

I'd suggest to file an issue. I've tested these examples, which results in same behavior with GNU grep, GNU sed and GNU awk. Except one case, which is noted below.

  • Wrong output:

    $ echo 'cocoa' | sed -nE '/(\bco){2}/p'
    cocoa
    

    sed -nE '/(\<co){2}/p' and awk '/(\<co){2}/' has the buggy behavior too, but grep -E '(\<co){2}' correctly doesn't give ouput

  • Correct behavior, no output:

    $ echo 'cocoa' | sed -nE '/\bco\bco/p'
    
  • Wrong output: there's only 1 whole word it after with

    $ echo 'it line with it here sit too' | sed -E 's/with(.*\bit\b){2}/XYZ/'
    it line XYZ too
    
  • Correct behavior, input isn't modified

    $ echo 'it line with it here sit too' | sed -E 's/with.*\bit\b.*\bit\b/XYZ/'
    it line with it here sit too
    
  • Changing word boundaries to \< and \> results in a different problem.

    This correctly doesn't modify the input:

    $ echo 'it line with it here sit too' | sed -E 's/with(.*\<it\>){2}/XYZ/'
    it line with it here sit too
    

    This correctly modifies the input

    $ echo 'it line with it here it too' | sed -E 's/with(.*\<it\>){2}/XYZ/'
    it line XYZ too
    

    But this one fails to modify the input

    $ echo 'it line with it here it too sit' | sed -E 's/with(.*\<it\>){2}/XYZ/'
    it line with it here it too sit
    

Also, the problematic behavior is seen only if the conflicting word has extra characters at the beginning. For example, it and sit. But not if there are characters at the end. For example, it and site and item.

$ echo 'it line with it here item too' | sed -E 's/with(.*\bit\b){2}/XYZ/'
it line with it here item too
$ echo 'it line with it here it too item' | sed -E 's/with(.*\<it\>){2}/XYZ/'
it line XYZ too item
1
  • 1
    I've filed a bug and mentioned it in the main post, thanks! Commented May 27, 2020 at 4:19

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.