1

To test the basic functioning of the asterisk metacharacter in search operations using grep, I used a file (regex.txt) with the following contents:

$ cat regex.txt
1
11
111
1111
11111





$

There are 6 newlines after the 11111 in my example, intentionally added.

Question 1.

Why does the output of grep "11*" regex.txt not include the newlines after 11111 unlike the output of grep "1*" regex.txt?

Question 2.

The output of grep "111*" regex.txt is

11
111
1111
11111

The output of grep "1111*" regex.txt is

111
1111
11111

The output of grep "11111*" regex.txt is

1111
11111

Why are the top layers missing for every increase in '1' in the search key?

2
  • Please read the help on formatting tools to learn how to format your posts correctly. Commented Nov 14, 2015 at 12:42
  • Also, please take the time to read a basic regular expression primer as I asked before. You need to understand the basics before trying to use them. Commented Nov 14, 2015 at 12:47

2 Answers 2

1

Your two questions are basically the same. In regular expressions, * means "match the previous character 0 or more times". So, 1* will match 0 or more 1 but 11* will match a 1 followed by 0 or more 1. This means that

  • grep "11*" will only print lines containing at least one 1. The * only applies to the second 1, so the first is obligatory. That's why you don't see the empty lines, they don't match 1.

  • grep "1*" will match 0 or more 1. Here, you don't have a preceding character that needs to be matched so the blank lines are also matched because they contain 0 1.

  • grep "111*", grep "1111*" etc don't match for the same reason as the first point above. 111* will only print lines containing at least 2 1, possibly more. 1111* will match lines containing at least 3 1, possibly more.

The main point here is that the * only affects the character immediately before it, not the entire pattern. To give a quantifier to an entire pattern you can use parentheses:

$ grep '\(111\)*' regex.txt 
1
11
111
1111
11111





$    

Here, the parentheses (they need to be escaped with \ so they won't be included in the search pattern) group the characters together so the * is applied to the group. As you see above, that matches lines containing 0 or more occurrences of 111, so it prints all lines.

1
  • repeated sub-expressions select only the last successive occurrence of the match. probably not terribly important with grep, but could be interesting with GNU's -o, I guess. Commented Nov 14, 2015 at 12:58
0

'1*' selects zero or more characters that are one. 1 selects one character that is one. A blank line doesn't have have any characters that are one, but it does have zero characters that are one.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.