7

The ESC character is 0x1B. In GNU sed, it can be used as 0x1b. It can also be used as \c[ ²; this, however, behaves inconsistently:

sed "s/XXX/\c[/"    # right-hand side, no problem
sed "s/\c[/XXX/"    # error: unterminated `s' command
sed "s/\c\[/XXX/"   # error¹: recursive escaping
                              after \c not allowed

Is this a bug in sed (more precisely: in the GNU implementation of the sed command) or am I missing something? Tried in version 4.8 that came with Git for Windows (yes, this is on Windows) and version 4.9 downloaded from Github/mbuilov, errors are identical in both versions. Using single quotes does not make a difference.

The same s/// commands work fine in Perl.

¹ The second try does not correspond to the manual, it is simply a blind shot.

² Quote from the GNU sed manual:

\cx Produces or matches CONTROL-x, where x is any character. The precise effect of ‘\cx’ is as follows: if x is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz’ becomes hex 1A, but ‘\c{’ becomes hex 3B, while ‘\c;’ becomes hex 7B.

6
  • I'm not familiar with the encoding \c[ to represent an Escape character (0x1B) in sed. I've seen it referenced as an ANSI C encoding in the bash man page, however it must be enclosed within the $'...' quoting notation, and there's a simpler encoding for Escape: \E. Can you cite the source that your computer's sed supports \c[ or \E as an encoding for Escape without special quoting like Bash uses? Commented Sep 23 at 20:29
  • I think you found a bug in sed ... there was something similar on savannah in 2017. lists.gnu.org/archive/html/bug-sed/2017-02/msg00017.html Commented Sep 23 at 20:42
  • ... at least with GNU sed 4.9 it appears to work if the sequence is enclosed in a bracket expression e.g. printf 'foo\x1bbar\n' | sed "s/[\c[]/XXX/" Commented Sep 23 at 20:45
  • @SottoVoce, I've added the reference to the GNU manual to my post. — @tink, good find, indeed there's ambiguity. Also encountered the double backslash requirement; which in cmd.exe, for some reason, with GNU sed 4.8 (but not 4.9 !) is a triple backslash requirement: echo hallo | sed "s/\c\\\/X/". — @steeldriver, I confirm your suggestion works fine in both GNU sed 4.8 and 4.9. Commented Sep 24 at 8:33
  • Hint: It is NEVER a bug in SED!! Commented Sep 25 at 22:02

1 Answer 1

12

While \c[, \C-[, \33, \o33, \d27, \x1b, \u001b, \U0000001b, \e, \E might work in a particular implementation and version of sed on the left or right hand side of a s command to represent the ASCII ESC character (\33 is unlikely to as that would conflict with back references), none of them is standard nor portable.

The GNU implementation of sed supports \c[, \d27, \o33, \x1b but when \c[ is used on the left-hand side of s/LHS/RSH/ or in an address, it runs into a bug, incorrectly claiming the s command is unterminated (misses the RHS).

It most likely boils down to the fact that since sed allows you to do s/[/]/replacement/ where that / inside [...] is not treated as a separator between pattern and replacement, sed needs to first look for pairs of [...] to find that separator and at that point overlooks that that particular [ in \c[ is not the opening [ of a bracket expression, but is part of the \c[ representation of ESC, aka ^[.

Also, something like sed '/\c[/b]/=' which should branch to the ]/= label for lines that contain ESC instead run the = command for lines that contain ESC/b].

As @steeldriver notes in comments, you can however use:

sed 's/[\c[]/XXX/g'

As a work around. As once that [ is inside a bracket expression (following \c, \m or not), sed knows it's not opening another bracket expression and must not be matched by another ]¹.

Here of course, with GNU sed, you can also just do:

sed 's/\x1b/XXX/g'
sed 's/\o33/XXX/g'
sed 's/\d27/XXX/g'

Portably, you'd pass a literal ESC character, either embedding it literally or with shells compliant to the 2024 edition of the POSIX standard using the $'...' form of quotes from ksh93, where you can use $'\e', $'\33', $'\x1b', $'\c[':

sed $'s/\e/XXX/g'

A few shells also support $'\u001b' which like \e refers to the ESC character (U+001B) as opposed to the 0x1b byte (with \33, \x1b) which in theory would also work on systems where ESC is not encoded as 0x1b (not that you'd come across any). zsh also supports \C-[ and ksh93 also \x{1b} in there.

With shells only compliant to older versions of POSIX, you can always do:

ESC=$(printf '\33')
sed "s/$ESC/XXX/g"

(some printf implementations also support one or more of \e, \E, \x1b, \u001b, none of which is standard).

Or you could use perl where you can use \e, \E, \x1b, \c[ on either left or right hand side (even \33 if there are fewer than 33 capture groups in the regex) and which can cope with non-text input (unlike many sed implementations):

perl -pe 's/\e/XXX/g'

¹ Unless it's part of [:class:] (character class), [.x.] (collating element), [=x=] (equivalence class).

2
  • Great answer, merci. Not quite sure about the POSIX compliance of cmd.exe … probably not quite on 2024 level yet. :) Also, in practice, the means of passing literal ESC control characters are somewhat limited. But anyway, that doesn't matter as I don't have a practical problem as there are other ways to match the ESC character. The issue is only about concordance between manual and implementation. Either one would have to be adapted, I think. Commented Sep 24 at 8:52
  • 1
    @Lumi, Microsoft's CMD.EXE would be off-topic here, but you can install a number of Unix shells (POSIX-compliant or otherwise) and GNU sed via Cygwin which is on topic here. Commented Sep 24 at 9:13

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.