4

I want sed to split lines and then process those lines.

E.g. sed 's/,/\n/g' | sed -n '/foo/p'

Can that be done in one sed command in general so that I do not need shell piping?

Example:

echo oof,foo | sed 's/,/\n/g' | sed -n '/foo/p'

Outputs (in Ubuntu):

foo

But if I just

echo oof,foo | sed -n 's/,/\n/g;/foo/p'

it prints

oof
foo

which is not what I expect.

10
  • 1
    Please add the expected input and output, and what you tried. Commented Aug 29, 2024 at 7:10
  • 2
    Is there an issue with the pipe that you need to solve? If not, then making the command more complicated does not sound like an improvement. Commented Aug 29, 2024 at 7:11
  • 1
    Note that using \n in the replacement regex is not portable. Using a backslash followed by a newline should work in most implementations of sed, since it is defined by POSIX. Commented Aug 29, 2024 at 7:20
  • 1
    The s/,/\n/g can also be replaced by the (portable) y/,/\n/. Commented Aug 29, 2024 at 7:36
  • 1
    @Vilinkameni In this case, the initial sed could be replaced by tr , '\n'. However, I have a hunch that why they want to change commas into newlines might be to process CSV fields (which means they should be using other tools entierly), but that's just my mind reading neurons flashing randomly. Commented Aug 29, 2024 at 7:42

3 Answers 3

2

TL;DR

sed -En 'y/,/\n/;/^[^\n]*foo[^\n]*(\n|$)/P;D'

or, portable but less elegant:

sed -En 'y/,/\n/;y/_\n/\n_/;/^[^_]*foo[^_]*(_|$)/{y/\n_/_\n/;P;y/_\n/\n_/;}
y/\n_/_\n/;D;y/_\n/\n_/'

Full answer

The sed(1p) manual page in the POSIX standard describes the s command:

Substitute the replacement string for the first instance of the regular expression RE in the pattern space.

and

g Make the substitution for all non-overlapping matches of the regular expression, not just the first one.

Also, the definition of pattern space:

In default operation, sed cyclically shall append a line of input, less its terminating <newline> character, into the pattern space.

so, the pattern space is a location in memory which initially holds the entire input line. So, the s/RE/replacement/ command, even with the g modifier, does not split the pattern space. It just Substitutes the RE with the replacement. After executing

printf "a,b,c\n" | sed 's/,/\
/g'

the pattern space will hold the literal text:

a
b
c

To prove this point:

$ printf "one,two,three\nfour,five,six\n" | sed 's/,/\
/g;s/^\(.*\)$/[\1]/'
[one
two
three]
[four
five
six]

or, using GNU sed (in OpenBSD named gsed):

$ printf "a,b,c\n" | gsed --debug 'y/,/\n/'
SED PROGRAM:
  y/,/
/
INPUT:   'STDIN' line 1
PATTERN: a,b,c
COMMAND: y/,/
/
PATTERN: a\nb\nc
END-OF-CYCLE:
a
b
c

When the pattern space is subsequently matched, the RE from the match is applied to the entire pattern space as above, not its individual "lines".

What can manipulate the "lines" in the pattern space, however, are the commands P and D:

[2addr]P Write the pattern space, up to the first <newline>, to standard output.

and

[2addr]D If the pattern space contains no <newline>, delete the pattern space and start a normal new cycle as if the d command was issued. Otherwise, delete the initial segment of the pattern space through the first <newline>, and start the next cycle with the resultant pattern space and without reading any new input.

So, we can use the following command:

$ printf "a,b,c foo\nd foo,e,foo f\ng,h,i\n" |
sed -En 'y/,/\n/;/^[^\n]*foo[^\n]*(\n|$)/P;D'
c foo
d foo
foo f

or, in the more readable form:

$ printf "a,b,c foo\nd foo,e,foo f\ng,h,i\n" |
sed -En 'y/,/\n/
/^[^\n]*foo[^\n]*(\n|$)/P
D'
c foo
d foo
foo f

Explanation

  • y/,/\n/ - Substitute all the characters , with newlines (similar to the tr(1) utility).
  • /^[^\n]*foo[^\n]*(\n|$)/P - Print the initial portion of pattern space up to the newline, but only if the initial part of the pattern space up to the newline contains foo.
  • D - Delete the initial portion of pattern space up to the newline and start the next cycle (reading a new input line to a pattern space only if there was no newline, aka at the final "line" in the pattern space).

Notes

  • As @Kusalananda noted, if the intention is to parse CSV, sed is not a good tool for the job. For just the case from this question, awk(1) could be used (note: @Ed Morton's solution is more concise):
printf "a,b foo,c\nd,e,foo g\n" | awk '
BEGIN{FS=","}
{
    for (i = 1; i <= NF; i++)
    {
        if ($i ~ /foo/)
        {
            print $i
        }
    }
}'

but it doesn't handle quoted fields (eg. for printf "a,b foo,\"not a foo, delimiter\"\nd,e,foo g\n" it would output "not a foo instead of not a foo, delimiter), etc.

  • Regarding the comment by @Stéphane Chazelas about portability, there is a possibility that \n inside of the [] might not match a newline on some systems. However, as of this writing (2024-09-01), both OpenBSD 7.5 sed and GNU sed 4.9 from OpenBSD (without setting POSIXLY_CORRECT) interpret [^\n] as "match any character except newline". Using the method from the accepted answer to sed: Portable solution to match "any character but newline", which juggles with exchanging the characters _ and newlines by using y, but is less elegant, we get:
$ printf "a,b,c foo\nd foo,e,foo f\ng,h,i\n" |
sed -En 'y/,/\n/;y/_\n/\n_/;/^[^_]*foo[^_]*(_|$)/{y/\n_/_\n/;P;y/_\n/\n_/;}
y/\n_/_\n/;D;y/_\n/\n_/'
6
  • I could use echo oof,foo1,foo2 | sed -nE -e 's/,/\n/g' -e 's/.*(\n|^)([^\n]*foo[^\n]*).*/\2/p' but it prints only the last match for an input line. Though replacing commas by newlines would not be needed in this approach. Commented Aug 29, 2024 at 9:20
  • In OpenBSD, that command outputs oofnfoo1nfoo2 because it uses \n inside of the replacement part of s, which is non-POSIX. When changed to a backslash followed by a newline, it outputs foo2. The problem is due to the s command matching and affecting the entire pattern space, like I mentioned. The pattern space extends to the whole initial input line, which was oof,foo1,foo2. The fact that the commas in pattern space were changed into newlines does not change the extent of the pattern space (it doesn't "split" the input line and reparses it by sed). Commented Aug 29, 2024 at 12:52
  • 1
    Something else could be used for this filtering, which is exactly what @Kusalananda suggested and I agree with. For example, advanced languages like Perl or Python, or some other general-purpose programming language. Commented Aug 29, 2024 at 12:56
  • For CSVs with quoted fields, Miller (mlr) could be used (Perl at least would require external modules to process CSVs, such as Text::CSV, I don't know about Python, but at least for relatively simple one-liners using Miller would most likely prove less cumbersome, and it's a pretty powerful tool anyways) Commented Aug 29, 2024 at 18:13
  • @Vilinkameni I added GNU tag so the example is valid. Unfortunately, even adding gmodifier would not make it print all foos . Commented Aug 29, 2024 at 18:53
1

sed is great for doing s/old/new/ on individual lines but for anything more complicated just use awk for clarity, simplicity, maintainability, portability, robustness, etc. e.g. using GNU awk for multi-char RS:

$ echo 'oof,foo' | awk -v RS='[,\n]' '/foo/'
foo

or using any awk:

$ echo 'oof,foo' | awk -v RS=',' '{sub(/\n$/,"")} /foo/'
foo

You probably could come up with some non-portable, write-only incantation to do this in GNU sed involving hold space, pattern space, single letters, punctuation characters, the batman symbol and sacrificing a goat but it'd just be pointless to do so when you have that better alternative.

2
  • As I noted in my answer, if the intention is to parse CSV, this doesn't work for the following input: printf "a,b foo,\"not a foo, delimiter\"\nd,e,foo g\n". Also, this doesn't decisively answer the question "Can that be done in one sed command in general so that I do not need shell piping?" Commented Sep 1, 2024 at 9:15
  • 1
    @Vilinkameni there's no indication in the question that this is to parse CSV in general (if there was then I'd have directed the OP to whats-the-most-robust-way-to-efficiently-parse-csv-using-awk) and it does answer the question by telling the OP to just do something simpler instead. Commented Sep 1, 2024 at 12:19
0

sed 's/,/\n/g' is not standard, to replace all ,s with \n standardly either use:

sed 'y/,/\n/'

Or:

sed 's/,/\
/g'

Though of course there's no need for sed here as that's a job for tr:

tr , '\n'

(and sed -n /foo/p is grep foo).

But after you've done y/,/\n/ in sed, you still have a the contents of the line in the pattern space but with all the ,'s replaced with newlines, and since one can't use [^\n] portably¹ in a sed regexp, you've basically shot yourself in the foot as it has become much more difficult to process.

If you wanted to extract the ,-delimited fields that contain foo in sed, you'd be better of doing it with the separator still being ,:

sed '
  s/.*/&\
/
  :1
    s/\([^,]*foo[^,]*\)\(.*\n.*\)/\2,\1/
  t1
  s/.*\n,\{0,1\}//
  /./!d'

Though using more modern tools like perl (or even awk even if that's still from the 70s) would make more sense.

perl -F, -lane '
  if (@matches = grep /foo/, @F) {
    print join ",", @matches;
  }'

¹ POSIX even requires that [^\n] not work at matching characters other than newline even if several implementations ignore that requirement (at least by default).

2
  • "one can't use [^\n] portably in a sed regexp" - can you comment on this answer then, since it quotes the POSIX specification of sed(1p) as explicitly allowing to match \n in pattern space? (Updated link to spec.) Commented Sep 1, 2024 at 9:32
  • @Vilinkameni, thanks. I just did. Commented Sep 1, 2024 at 10:27

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.