Skip to main content
added 279 characters in body
Source Link
ilkkachu
  • 147.8k
  • 16
  • 268
  • 441

That part from the GNU grep manual says:

\< Match the empty string at the beginning of word.

\> Match the empty string at the end of word.

They match at the start and end of a "word", so \<bar matches the string foo bar, or just bar, but not foobar. The matches are described as matching an empty string, since when matching \<bar against foo bar, the match is just bar, not e.g. <space>bar, the \< doesn't add any characters to the matched string (which is relevant for e.g. grep -o).

They're not standard.

\w Match word constituent, it is a synonym for [_[:alnum:]].

This is what the manual says next. Note the small print. Word characters include alphanumerics (whatever that means in the current locale), and the underscore. So someword_something doesn't match someword\> which is effectively what your second grep tries to look for.

And yes, that's because in many programming languages alphanumerics and underscores are allowed in identifier names. The hyphen isn't, it's the minus operator.

Though of course in C and Javascript, $ is also valid in identifier names, and identifier names can't start with digits, but you can't have everything.

\< Match the empty string at the beginning of word.

\> Match the empty string at the end of word.

They match at the start and end of a "word", so \<bar matches the string foo bar, or just bar, but not foobar. The matches are described as matching an empty string, since when matching \<bar against foo bar, the match is just bar, not e.g. <space>bar.

They're not standard.

\w Match word constituent, it is a synonym for [_[:alnum:]].

Note the small print. Word characters include alphanumerics (whatever that means in the current locale), and the underscore. So someword_something doesn't match someword\> which is effectively what your second grep tries to look for.

And yes, that's because in many programming languages alphanumerics and underscores are allowed in identifier names. The hyphen isn't, it's the minus operator.

Though of course in C and Javascript, $ is also valid in identifier names, and identifier names can't start with digits, but you can't have everything.

That part from the GNU grep manual says:

\< Match the empty string at the beginning of word.

\> Match the empty string at the end of word.

They match at the start and end of a "word", so \<bar matches the string foo bar, or just bar, but not foobar. The matches are described as matching an empty string, since when matching \<bar against foo bar, the match is just bar, not e.g. <space>bar, the \< doesn't add any characters to the matched string (which is relevant for e.g. grep -o).

They're not standard.

\w Match word constituent, it is a synonym for [_[:alnum:]].

This is what the manual says next. Note the small print. Word characters include alphanumerics (whatever that means in the current locale), and the underscore. So someword_something doesn't match someword\> which is effectively what your second grep tries to look for.

And yes, that's because in many programming languages alphanumerics and underscores are allowed in identifier names. The hyphen isn't, it's the minus operator.

Though of course in C and Javascript, $ is also valid in identifier names, and identifier names can't start with digits, but you can't have everything.

Source Link
ilkkachu
  • 147.8k
  • 16
  • 268
  • 441

\< Match the empty string at the beginning of word.

\> Match the empty string at the end of word.

They match at the start and end of a "word", so \<bar matches the string foo bar, or just bar, but not foobar. The matches are described as matching an empty string, since when matching \<bar against foo bar, the match is just bar, not e.g. <space>bar.

They're not standard.

\w Match word constituent, it is a synonym for [_[:alnum:]].

Note the small print. Word characters include alphanumerics (whatever that means in the current locale), and the underscore. So someword_something doesn't match someword\> which is effectively what your second grep tries to look for.

And yes, that's because in many programming languages alphanumerics and underscores are allowed in identifier names. The hyphen isn't, it's the minus operator.

Though of course in C and Javascript, $ is also valid in identifier names, and identifier names can't start with digits, but you can't have everything.