1

This is the content of input.txt:

foo-bar
foo-baz

I want to substitute - with _ and join the two lines.

Notice the difference between using -e and a pipe when using sed:

$ sed -e 's/-/_/g' -e ':a; N; $!ba; s/\n//g' input.txt
foo_barfoo-baz
$ sed -e 's/-/_/g' input.txt | sed ':a; N; $!ba; s/\n//g'
foo_barfoo_baz

Why does the -e option give the wrong output? How does it work?

6
  • 1
    Note: In several sed implementations including the original one, :a; N; $!ba; s/\n//g' is to define a label called a; N; $!ba. Commented yesterday
  • how does your sample input.txt differ from the actual file you want to modify? does it contain only two lines? if it contains more than two lines, do you want to merge all lines together? or join every second line to the previous line? or something else? do you need to insert a space or tab between the joined lines? Commented yesterday
  • related, but not a dupe: Make multiple edits with a single call to sed Commented yesterday
  • 1
    FWIW re: "I want to substitute - with _ and join the two lines" ... sed -z 's/-/_/g; s/\n//g' should do what you want at least with GNU sed. Commented yesterday
  • @Raffa It works... I tried using \n but I didn't work without the -z option. For future reference: unix.stackexchange.com/q/26284/589116 Commented yesterday

3 Answers 3

10

-e accumulates lines in the “script” that sed will run; so

sed -e 's/-/_/g' -e ':a; N; $!ba; s/\n//g' input.txt

is the same as running sed with the following sed script:

s/-/_/g
:a; N; $!ba; s/\n//g

The problem you’re seeing doesn’t come from the use of -e, but from the order of the commands. The first line in the sed script replaces - with _ in the pattern space; the second line reads the next line and appends it, removing newlines — but the - replacement isn’t applied again!

In more detail:

  1. sed reads the first line into the pattern space, foo-bar.
  2. The first line of the script changes this to foo_bar.
  3. The first three commands of the second line (:a; N; $!ba) loop over the remaining lines, appending them to the pattern space, which ends up containing foo_bar\nfoo-baz.
  4. The last command of the second line removes the newline, leaving foo_barfoo-baz.
  5. sed outputs the pattern space before moving to the next line — but everything has been consumed by N commands, so it exits.

The second invocation avoids this by replacing all hyphens in the file before removing newlines.

If you flip the lines in the script you’ll get the expected result:

sed -e ':a; N; $!ba; s/\n//g' -e 's/-/_/g' input.txt

(If you use GNU sed, you can run it with --debug to see exactly what it does.)

0
3

sed takes all the strings passed as arguments to -e options and the contents of files passed to -f options and concatenates them together with newlines in between to make up the sed script to interpret.

perl (without -f) and GNU awk do the same.

For instance doing:

sed -eA -f file1 -e 'C
D' -f file2 -e 'F;G'

Where file1 contains B and file2 contains E is the same as running:

sed -e 'A
B
C
D
E
F;G'

If not passed any -e or -f argument, the first non-option argument is taken as the literal sed code to interpret, so it's also the same as:

sed 'A
B
C
D
E
F;G'

In the rudimentary sed language (from the 70s; itself inspired from ed from the 60s), commands can be separated with newlines (or with separate -es which is equivalent to adding newlines) and some of them can be separated with ;.

There are some exceptions for those commands that take arguments and some others for historical but not good reason.

You can't separate a :, or b or t command from the next with ; because in the original sed implementation and as still allowed by POSIX and still the case with many sed implementations:

b foo;bar

branches to the foo;bar label. That's not b foo followed by b ar like it is in GNU sed.

Same goes for r or w (or the w flag of the s command) with which even in GNU sed r foo;bar is to read from the foo;bar file.

The closing } can also not be followed by anything in the original implementation for no good reason.

Those are good reasons why you may want to use -e (and for some reason you want your command to be on a single line; in the csh shell from the late '70s, it was a real pain entering a multi-line command).

But in your:

sed -e 's/-/_/g' -e ':a; N; $!ba; s/\n//g' input.txt

You're using -e in places where it's not needed and not when they are needed (in non-GNU sed implementations).

Besides the fact that you're running those commands in the wrong order, syntactically, that code is neither standard nor portable. You should not have other commands on the same line after the : and b commands.

It also ends up loading the whole input in the pattern space, the size of which is limited in most sed implementations (and where it's not, it could end up using up all the system's memory for no good reason).

sed '
  :a
  $ ! {
    N
    b a
  }
  y/-/_/
  s/\n//g' input.txt

Or if you need it on one line:

sed -e :a -e '$!{N;ba' -e '}' -e 'y/-/_/; s/\n//g' input

Would be syntactically and functionally correct with standard sed but load the whole input in memory.

But here you only need:

<input tr - _ | paste -sd '\0' -

Where tr transliterates -s to _s and paste joins the lines together each not using up more than a small buffer of memory.

Beware though that except with the busybox implementation of paste, that produces a non-empty output (one empty line) for an empty input.

If you're ready to swap those '70s tools with one from the (late) '80s (though it has kept evolving since), you could do:

perl -pe 'tr/-/_/; chomp unless eof' < input

Where we transliterate -s to _s (for sed devotees, y is provided as a synonym for tr) and chomps (removes the line delimiter from) all lines except the last.

perl -p is perl's sed mode, but contrary to sed, perl keeps the input line delimiter in the pattern space ($_ in perl), which allows us to remove it conditionally.

2

Why the dash in foo-baz is not replaced?

  1. The s/-/_/g command operates line by line before all lines are appended by N.

  2. The second line (foo-baz) has dash not replaced yet while being appended.

  3. The replacement command does not rerun on the combined, newline-removed string.

Result: the dash remains in the final joined output.

New contributor
DineshS is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.