Zsh variable names have to be made of alphanumeric characters only, and the first one can't be an ASCII digit which is reserved for position parameters. When the posixidentifiers
option is enabled (like in sh or ksh emulation), that's restricted to ASCII ones.
So you need a locale where iswalpha()
returns true for バ
and iswalnum()
returns true for ナ
and ス
.
Those are functions from the C library zsh
is linked against (generally the system's one) that use character classification information from the system's locale (as determined by the LC_ALL
, LC_CTYPE
and LANG
environment variable).
if [[ 'バ' = [[:alpha:]] ]]
or print -r -- 'バ' | grep -xq '[[:alpha:]]'
don't succeed in your locale, then you can't use that as a variable name.
On a GNU system:
$ print -r -- 'バナナス' | LC_ALL=C.UTF-8 grep -o '[[:alpha:]]'
バ
ナ
ナ
ス
They're all classified as alpha
(a subset of alnum
), even in the C.UTF-8
locale.
Note that alpha doesn't imply only letters in alphabetical scripts. Per ISO/IEC TR 14652 at least, that's characters to be classified as used to spell out the words for natural languages; such as letters, syllabic or ideographic characters.
So that バ
(U+30D0 KATAKANA LETTER BA) should be classified as alpha
. It is definitely classified as letter by Unicode.
Note that on most systems, environment variable names as opposed to the variables of many languages including shells can contain any sequence of bytes except for 0 and the encoding of =
, but beware some tools such as some shells can remove those they don't like.
For instance, mksh removes all those it can't map to shell variables, which for that shell is limited to ASCII alnum and underscores. It will even remove bash's exported functions which since shellshock have names like BASH_FUNC_funcname%%
.
So, in general it's a bad idea to have shell variable names exported to the environment whose name contains characters other than ASCII letters, digits and underscores.
Also, while characters in the ASCII set (Unicode characters U+0000 to U+007F) have an encoding that is invariant across locales on most systems¹, it's not the case for the other ones (what I think you meant by unicode characters), so you may find that if your script contains:
バナナス=BANANAS
It may be treated as a variable assignment in one locale but invoke a バナナス=BANANAS
command in another.
So I would also advise not to use variable names with non-ASCII characters, even if you don't export them to the environment.
For reference, in the rc
shell, variables can have any name (and you can even assign to that with the empty name in the original Plan9 implementation), and they're all exported to the environment.
In the original one, they're exported as-is (which causes problems at least on Unix for the ones whose name contains =
) while with the public domain clone by Byron Rakitzis and derivatives, they're encoded there using ASCII only alnums and underscores:
; '++' = zzz
; 'バナナス' = zzz
; env | grep zzz
__2b__2b=zzz
__e3__83__90__e3__83__8a__e3__83__8a__e3__82__b9=zzz
Which of course only other instances of rc
or derivatives executed in that environment decode back into the original variable names.
Functions are another matter. Some shells have the same restrictions on their name as variable names (when functions were added to the Bourne shell, they shared the same namespace, you couldn't have both a variable and function by the same name), some like bash allow a few extra characters, but that's rather confusing and unnecessary restrictions.
Functions share the namespace of command arguments, so it would seem normal they can have the same values as those. In zsh, a function name can be any sequence of bytes, including 0 (which zsh allows in command arguments, though that won't work for external commands because of a limitation of the execve()
system call), including empty regardless of whether they form part of any character or not in any locale.
$ ''() echo empty
$ $'\0'() echo NUL
$ $'\xde\xad\xbe\xef'() echo Dead Beef
$ $'\xDE\xAD\xBE\xEF'
Dead Beef
$ $'\u0000'
NUL
$ ""
empty
en_GB.UTF-8
on26.0.1 (25A362)
(zsh:5.9 (arm-apple-darwin22.1.0)
) andバナナス=BANANAS
,echo $バナナス
works finezsh 5.9 (x86_64-apple-darwin19.6.0)
, but still seeing same behavior.バ
바
文
are correctly classified withiswalpha
. I'm guessing that can be answered by looking at source files nearlibc/include/_ctype.h
.