6

How can I grep/awk/sed a file looking for some pattern, and print the entire line (including continuation lines if the matched line ends with \?

File foo.txt contains:

something
whatever
thisXXX line \
    has a continuation line
blahblah
a \
multipleXXX \
continuation \
line

What should I execute to get (not necessarily in one line, not necessarily removing multiple spaces):

thisXXX line has a continuation line
a multipleXXX continuation line

BTW I'm using bash and fedora21, so it does not need to be POSIX-compliant (but I'll appreciate a solution if it is POSIX)

2
  • Do you want the search to span over continuation lines? i.e. if you're searching for hello, does hel\␤lo match? Commented Jun 4, 2015 at 22:08
  • @gilles, yes, same as with sh Commented Jun 5, 2015 at 13:43

7 Answers 7

6

Another approach using perl to remove newlines that are preceded by \ and whitespace:

$ perl -pe 's/\\\n/ /' file | grep XXX
thisXXX line      has a continuation line
a  multipleXXX  continuation  line

To remove extra spaces, pass it through sed:

$ perl -pe 's/\\\n/ /' file | grep XXX | sed 's/  */ /g'
thisXXX line has a continuation line
a multipleXXX continuation line
5
  • According to the shell rules for continuation lines, 's/\\\n/ /' should be changed to 's/\\\n//', i.e. backspace + newline should be replaced by nothing, not by a space. Commented Jan 13, 2022 at 13:53
  • @vinc17 if there is no trailing space in the first line, you will change hello\nworld to helloworld instead of hello world. That's why I want to replace with a space. Commented Jan 13, 2022 at 15:38
  • @terdon But this is not the correct rule. For instance, if you type echo "foo`, then Enter, then bar"` in a shell, you get foobar, not foo bar. If you want a space between foo and bar, then put one either before the backslash or at the beginning of the second line (just before bar). Commented Jan 15, 2022 at 0:09
  • @vnc but why are you thinking about shell commands? The question doesn't mention shell commands, the file's extension doesn't suggest a shell script and the OP's example is words, not code. You would be right for code, of course, but this doesn't seem to be about code. Commented Jan 15, 2022 at 10:43
  • Nice, this did the trick, wrapped the perl in a shell script so that I could invoke it easily from find(1): find . -type f -exec mysearch.sh {} +. To remove space (and tabs) uses sed 's/[[:space:]][[:space:]]*/ /g' Commented Sep 24, 2024 at 14:16
5

With POSIX sed:

$ sed -e '
:1
/\\$/{N
  s/\n//              
  t1
}
/\\/!d 
s/\\[[:blank:]]*//g
' file
3
  • @don_crissti Pipe this into grep XXX Commented Jun 4, 2015 at 22:10
  • @don_crissti: I don't see matching XXX in requirement. Commented Jun 5, 2015 at 1:07
  • @Gilles - no, it doesn't work like that. Change OP's input replacing something with XXX (without a trailing backslash) on first line and then try piping this sed command to grep. You won't get the XXX line in the final output. choroba's solution fails in a similar manner while jimmij's prints the second line too (it shouldn't). Commented Jun 5, 2015 at 17:04
5

With pcregrep without changing structure of the lines:

pcregrep -M '^(.|\\\n)*XXX(.|\n)*?[^\\]$' file
2
  • As long as a backslash + newline cannot appear in the regexp XXX, this is a nice solution as it can be used recursively with -r (with the other solutions, one would need to use find in such a case). There is only a possible issue with [^\\]$ if the matched text is at the end of the file. I think that this should be corrected to '^(\\\n|.)*XXX(\\\n|.)*' (with \\\n before .). Commented Jan 13, 2022 at 13:34
  • I meant: As long as a backslash + newline cannot appear in the text that should be matched. Indeed, as said in a comment to the question, when searching for hello, hel\␤lo should match. In such a case, this would need to introduce (\\\n)* at each "point" of the regexp! Commented Jan 13, 2022 at 13:50
5

Perl to the rescue:

perl -ne 'if (/\\$/) { $l .= $_ }
          else { print $l, $_ if $l =~ /XXX/;
                 $l = "";
          }' foo.txt

$l works as an accumulator. -n processes the input line by line (cf. sed), if the line ends in a backslash, it's added to the accumulator, if not, the accumulator plus the line is printed provided it matches XXX, and the accumulator is emptied.

0
4

My twist:

perl -0777 -ne '                           # read the entire file into $_
    s{ [[:blank:]]* \\ \n [[:blank:]]* }   # join continued lines
     { }gx;
    print grep {/XXX/} split /(?<=\n)/     # print the matching lines
' foo.txt 
thisXXX line has a continuation line
a multipleXXX continuation line
3

I'd say Perl is the simplest here. It isn't POSIX, though it's in the default installation of most non-embedded unices. If you want POSIX, use awk.

awk '{if (/\\$/) printf "%s" $0; else print}'

This collapses continuation lines. If you want to find patterns that spread over a continuation, pipe this into grep. If you want to match only uninterrupted patterns, let awk accumulate continued lines and do the matching.

awk '{
    if (sub(/\\$/,"")) {
        line = line $0;
    } else {
        if (/XXX/) print;
        line = "";
    }
}'
0

This is a small improvement to Gilles awk solution (thanks Gilles!), but does require nawk:

nawk '{if (/\\$/) {$0=substr($0,1,length($0)-2); printf "%s",$0} else print}'

This will create a continuous line if the line wraps, but does not include the "\" and space character. (I found this helpful when grepping for PATH statements since the "\" can lead to confusion when interpreting the results.)

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.