Skip to main content
1 of 13
Gilles 'SO- stop being evil'
  • 865.3k
  • 205
  • 1.8k
  • 2.3k

Unfortunately, for historical reasons, different tools have slightly different regular expression syntax, and sometimes some implementations have extensions that are not supported by other tools. While there is a common ground, it seems like every tool writer made some different choices.

The consequence is that if you have a regular expression that works in one tool, you may need to modify it to work in another tool. The main differences between common tools are:

  • whether the operators +?|(){} require a backslash;
  • what extensions are supported beyond the basics .[]*^$ and usually +?|()

In this answer, I list the main standards. Check the documentation of the tools you're using for the details.

Wikipedia's comparison of regular expression engines has a table listing the features supported by common implementations.

Basic regular expressions (BRE)

Basic regular expressions are codified by the POSIX standard. It is the syntax used by grep, sed and vi. This syntax provides the following features:

  • ^ and $ match only at the beginning and end of a line.
  • . matches any character (or any character except a newline).
  • […] matches any one character listed inside the brackets (character set). If the first character after the opening bracket is a ^, the characters which are not listed are matched instead.
  • Backslash before any of ^$.*\[ quotes the next character.
  • * matches the preceding character or subexpression 0, 1 or more times.
  • \(…\) is a syntactic group, for use with * or \DIGIT replacements.

The following features are also standard, but missing from some restricted implementations:

  • {m,n} matches the preceding character or subexpression exactly m times.
  • Inside brackets, character classes can be used, for example [[:alpha:]] matches any letter.

The following are common extensions (especially in GNU tools), but they are not found in all implementations. Check the manual of the tool you're using.

  • \| for alternation: foo\|bar matches foo or bar.
  • \? and \+ match the preceding character or subexpression at most 1 time, or at least 1 time respectively.
  • \n matches a newline, \t matches a tab, etc.
  • Backreferences \1, \2, … match the exact text matched by the corresponding group, e.g. \(fo*\)\(ba*\)\1 matches foobaafoo but not foobaafo.
  • \w matches any word constituent and \W matches any character that isn't a word constituent.
  • \< and \> match the empty string only at the beginning or end of a word respectively; \b matches either, and \B matches where \b doesn't.

Note that tools without the \| operator do not have the full power of regular expressions.

Extended regular expressions (ERE)

Extended regular expressions are codified by the POSIX standard. Their major advantage over BRE is regularity: all standard operators are bare punctuation characters, a backslash before a punctuation character always quotes it. It is the syntax used by awk, grep -E or egrep, GNU sed -r, and bash's =~ operator. This syntax provides the following features:

  • ^ and $ match only at the beginning and end of a line.
  • . matches any character (or any character except a newline).
  • […] matches any one character listed inside the brackets (character set). If the first character after the opening bracket is a ^, the characters which are not listed are matched instead. Character classes can be used but are missing from a few implementations.
  • (…) is a syntactic group, for use with * or \DIGIT replacements.
  • | for alternation: foo|bar matches foo or bar.
  • *, + and ? matches the preceding character or subexpression a number of times: 0 or more for *, 1 or more for +, 0 or 1 for ?.
  • Backslash quotes the next character if it is not alphanumeric.
  • {m,n} matches the preceding character or subexpression exactly m times (missing from some implementations).
  • \n matches a newline, \t matches a tab, etc.

PCRE (Perl-compatible regular expressions)

PCRE are extensions of ERE, originally introduced by Perl and adopted by many modern tools and programming languages, usually via the PCRE library. See the Perl documentation for nice formatting with examples. Not all features of the latest version of Perl are supported by PCRE (e.g. Perl code execution), see the PCRE manual for a summary of supported features.

Emacs

Emacs's syntax is intermediate between BRE and ERE. In addition to Emacs, it is the default syntax for -regex in GNU find. Emacs offers the following operators:

Shell globs

Shell globs (wildcards) perform pattern matching with a syntax that is completely different from regular expressions and less powerful. POSIX patterns include the following features:

  • ? matches any single character.
  • […] is a character set as in common regular expression syntaxes. Some shells do not support character classes. Some shells require ! instead of ^ to negate the set.
  • * matches any sequence of characters
  • Backslash quotes the next character.

Ksh offers additional features which give its pattern matching the full power of regular expressions. These features are also available in bash after running shopt -s extglob. Zsh has a different syntax but can also support ksh's syntax after setopt -s ksh_glob.

Gilles 'SO- stop being evil'
  • 865.3k
  • 205
  • 1.8k
  • 2.3k