I thought that bash variable substitution and globbing worked at character resolution, so I was rather surprised to see it acting at the byte level.
Everything in my locale is en_AU.UTF-8
When there is nothing to match and the pattern allows zero-to-many, the replacement occurs at the byte level, as seen by subsequent replacements. I would have expected it to move along to the next character, but it doesn't...
Maybe this is just a whacky fringe case pattern, or I'm missing something obvious, but I do wonder what is going on here, and can I expect this behaviour elsewhere besides this particular pattern?
Here is the script (which started as an attempt to split a string into characters).
I expected that the last test, with character ळ, would end up with only a single space preceding the ळ, but instead, the character's 3 UTF-8 bytes are each preceded by a space. This results in invalid UTF-8 output.
shopt -s extglob
for str in $'\t' "ab" ळ ;do
printf -- '%s' "${str//*($'\x01')/ }" |xxd
done
Output:
0000000: 2009 .
0000000: 2061 2062 a b
0000000: 20e0 20a4 20b3 . . .