There's no good reason why
[[ $a = a|b ]]
Should report an error instead of testing whether $a is the a|b string, while [[ $a =~ a|b ]] doesn't return an error.
The only reason is that | is generally (outside and inside [[ ... ]]) a special character. In that [[ $a = position, bash expects a type of token that is a normal WORD like the arguments or the targets of redirections in a normal shell command line (but as if the extglob option had been enabled since bash 4.1).
(by WORD here, I refer to a word in an hypothetical shell grammar like the one described by the POSIX specification, that is something that the shell would parse as one token in a simple shell command line, not other definition of words like the English one of a sequence of letters or a sequence of non-spacing characters. foo"bar baz", $(echo x y), are two such WORDs).
In a normal shell command line:
echo a|b
Is echo a piped to b. a|b is not a WORD, it's three tokens: a a WORD, a | token and a b WORD token.
When used in [[ $a = a|b ]], bash expects a WORD which it gets (a), but then finds an unexpected | token which causes the error.
Interestingly, bash doesn't complain in:
[[ $a = a||b ]]
Because it's now a a token followed by a || token followed by b, so it's parsed the same way as:
[[ $a = a || b ]]
Which is testing that $a is a or that the b string is non-empty.
Now, in:
[[ $a =~ a|b ]]
bash can't have the same parsing rule. Having the same parsing rule would mean that the above would give an error and that one would need to quote that | to ensure a|b is a single WORD. But, since bash 3.2, if you do:
[[ $a =~ 'a|b' ]]
That's no longer matching against the a|b regexp but against the a\|b regexp. That is, shell quoting has the side effect of removing the special meaning of regexp operators. It's a feature, so the behaviour is similar to the [[ $a = "?" ]] one, but wildcard patterns (used in [[ $a = pattern ]]) are shell WORDS (used in globs for instance), while regexps are not.
So bash has to treat all the extended regexp operators that are otherwise normally special shell characters like |, (, ) differently when parsing an argument of the =~ operator.
Still, note that while
[[ $a =~ (ab)*c ]]
now works,
[[ $a =~ [)}] ]]
doesn't. You need:
[[ $a =~ [\)}] ]]
[[ $a =~ [')'}] ]]
Which in previous versions of bash would incorrectly match on backslash. That one was fixed, but
[[ $a =~ [^]')'] ]]
Does not match on backslash like it should for instance. Because bash fails to realise that ) is within the brackets, so escapes the ) to result in a [^]\)] regexp that matches on any character but ], \ and ).
ksh93 has much worse bugs on that front.
In zsh, it's a normal shell word that is expected and quoting regexp operators doesn't affect the meaning of regexp operators.
[[ $a =~ 'a|b' ]]
Is matching against the a|b regexp.
That means the =~ can also be added to the [/test command:
[ "$a" '=~' 'a|b' ]
test "$a" '=~' 'a|b'
(also work in yash. The =~ needs to be quoted in zsh as =something is a special shell operator there).
bash 3.1 used to behave like zsh. It changed in 3.2, presumably to align with ksh93 (even though bash was the shell that first came up with [[ =~ ]]), but you can still do BASH_COMPAT=31 or shopt -s compat31 to revert to the previous behaviour (except that while [[ $a =~ a|b ]] would return an error in bash 3.1, it doesn't anymore in bash -O compat31 with newer versions of bash).
Hope it clarifies why I said the rules were confusing and why using:
[[ $a =~ $var ]]
helps including with portability to other shells.
|is special) is on by default in the right-hand side of[[ $var = $pattern ]]. It would be interesting to isolate the versions andshoptoption configurations where this behavior is seen -- if it's only those whereextglobis on, either by default or explicit configuration, well, there we are.pattern='a|b'and then expand$patternunquoted on the RHS.