7

With grep I can filter lines. But if the lines are pretty long it gets messy. How can I only get "some chars around" my search-string?

f.txt

this is a red cat in the room
this is a blue house at the street
this is a white mouse in the corner
this is a blue mouse in the bowl

What I do

cat f.txt | grep blue

What I get

this is a blue house at the street
this is a blue mouse in the bowl

But what I want is e.g. only 10 chars after my searchword (whereever it is).

blue house
blue mouse

How can I get that?

4 Answers 4

9

Using GNU grep:

$ grep -oP '\w*blue\w*(\s*\w+)?' input.txt 
blue house
blue mouse

This shows the complete word containing the match and the following word (if any).

Note that with \w, underscores are considered to be "word" characters...so, for example, "light_blue" would be considered one word, while "light-blue" would not. If you want the regex to treat any non-space character as part of the word, use \S instead of \w. e.g.

$ grep -oP '\S*blue\S*(\s*\S+)?' input.txt 
blue house
blue mouse
light_blue mouse
light-blue mouse
blue-grey mouse
2
  • for anything more complicated than this, you should use perl. or maybe awk (but awk would be more work) Commented Oct 8 at 10:41
  • Using gawk: gawk '{ match($0,"blue"); delete a; if(RSTART!=0) { split(substr($0,RSTART+RLENGTH+1),a); print substr($0,RSTART,RLENGTH),a[1]; }}' f.txt Commented Oct 8 at 19:40
7
-E for extended regex

grep -oE 'blue.{0,6}' f.txt

-o with escape characters

grep -o 'blue.\{0,6\}' f.txt

grep & cut

grep -o 'blue.*' f.txt | cut -c1-10

Perl-RegEx

grep -Po 'blue.{0,6}' f.txt

2
  • 2
    Beware the GNU implementation of cut still cuts based on number of characters with -c like with -b so would not be equivalent to the other approaches and could end up cutting in the middle of a character. Commented Oct 8 at 18:40
  • 2
    Beware in on an input like blue12345blue, it would output blue12345b and not show the second blue. Commented Oct 8 at 18:41
2
$ cat file
This is the blue tips of blue teeth of a blue mouse in a blue house
$ grep -Eo 'blue.{0,10}' file
blue tips of b
blue mouse in
blue house

See how we're missing blue teeth... above as the start of that blue was swallowed by the .{0,10} of the previous search.

Alternatively, you could do:

$ pcre2grep -o1 -o2 '(blue)(?=(.{0,10}))' file
blue tips of b
blue teeth of
blue mouse in
blue house

Or:

$ pcre2grep -o -o1 'blue(?=(.{0,10}))' file
blue tips of b
blue teeth of
blue mouse in
blue house

Which shows them all and repeats the b of the second blue.

pcre2grep is the example command that comes with PCRE2, the library that GNU grep uses for its -P option (if enabled at build time which is not the case by default but is often enabled by distributions as perl regexps have become a de-facto standard these days).

-o is a non-standard extension originally introduced by the GNU implementation of grep. pcre2grep (and pcregrep before that) extended it so it can take an optional number argument to print what's matched by the corresponding capture group instead of by the whole regexp with -o alone (and a --om-separator to put between each capture group if given more than one -o<n>).

The trick here is that we're using the (?=...) look-ahead operator, so what's matched by the pattern inside is not part of the overall match, we're just looking ahead to check whether .{0,10} matches (which it always will as that matches even the empty string), but we're still capturing what that .{0,10} matches (using (...)) so are still able to report it without it consuming any input. So after finding the first blue in blue tips... and outputting that blue and the next 10 characters, pcre2grep resumes looking for more blues just after the first blue, not after blue tips of b.

Or you could use the real thing (the p in pcre2grep or grep -P):

$ perl -C -lne 'print $1.$2 while /(blue)(?=(.{0,10}))/g' file
blue tips of b
blue teeth of
blue mouse in
blue house

Or:

$ perl -C -lne 'print $&.$1 while /blue(?=(.{0,10}))/g' file
blue tips of b
blue teeth of
blue mouse in
blue house

Where $& is for the full match (the equivalent of -o), and $1/$2 for each capture group (equivalent of -o1/-o2).

1

Just use awk instead of grep, e.g. using any awk in any shell on every Unix box:

$ awk 'match($0,/blue/){print substr($0,RSTART,10)}' f.txt
blue house
blue mouse

The above assumes you just want to print the 10 chars starting from the first blue on each input line. If instead you wanted to print the 10 chars starting from every blue on each input line then, again using any awk, you could do:

$ echo 'there is a blueblue mouse blue foo in blue house' |
awk '{
    while ( match($0,"blue") ) {
        print substr($0,RSTART,10)
        $0 = substr($0,RSTART+RLENGTH)
    }
}'
blueblue m
blue mouse
blue foo i
blue house

If your requirements are for something other than that then edit your question to include more truly representative sample input/output that covers all of your requirements.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.