This question asks how to get a list of environment variable names in POSIX sh. The top answer suggests invoking awk via the shell, but gives this caveat:
The output is ambiguous if the name of an environment variable contains a newline, which is extremely unusual but technically possible.
My initial reaction to this sentence was that it's incorrect in the sense there's no way the POSIX shell executing awk will export such a name. Environment variable names in general can contain anything except =
, as POSIX defines the environment as an array of name=value
strings. However, utilities (and thus, sh) are specifically limited in the names they use. For example:
$ cat >main.c <<EOF
#include <unistd.h>
int main(int argc, char *argv[]) {
char *env[] = {"A=1", "B\nC=2", (char *)0};
execve(argv[1], argv + 1, env);
}
EOF
$ cc main.c
$ ./a.out /usr/bin/env
A=1
B
C=2
$ ./a.out /bin/sh -c env
A=1
The relevant portion of the standard seems to be this:
Environment variable names used by the utilities in the Shell and Utilities volume of POSIX.1-2024 consist solely of uppercase letters, digits, and the <underscore> ('_') from the characters defined in Portable Character Set and do not begin with a digit. Other characters, and byte sequences that do not form valid characters, may be permitted by an implementation; applications shall tolerate the presence of such names.
The last sentence seems to suggest I'm wrong. I don't think names outside this criteria can become exported shell variables, since the shell standard says:
Shell variables shall be initialized only from environment variables that have valid names.
But, could an implementation of POSIX sh keep invalid names in the environment that it executes utilities under (and just not ever make them shell variables)?