System: python 3.4.2 on linux.
I'm woring on a django application (irrelevant), and I encountered a problem that it throws
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
when print is called (!). After quite a bit of digging, I discovered I should check
>>> sys.getdefaultencoding()
'utf-8'
but it was as expected, utf8. I noticed also that os.path.exists throws the same exception when used with a unicode string. So I checked
>>> sys.getfilesystemencoding()
'ascii'
When I used LANG=en_US.UTF-8 the issue disappeared. I understand now why os.path.exists had problems with that. But I have absolutely no clue why print statement is affected by the filesystem setting. Is there a third setting I'm missing? Or does it just assume LANG environment is to be trusted for everything?
Also... I don't get the reasoning here. LANG does not tell what encoding is supported by the filenames. It has nothing to do with that. It's set separately for the current environment, not for the filesystem. Why is python using this setting for filesystem filenames? It makes applications very fragile, as all the file operations just break when run in an environment where LANG is not set or set to C (not uncommon, especially when a web-app is run as root or a new user created specifically for the daemon).
Test code (no actual unicode input needed to avoid terminal encoding pitfalls):
x=b'\xc4\x8c\xc5\xbd'
y=x.decode('utf-8')
print(y)
Question:
- is there a good and accepted way of making the application robust to the
LANGsetting? - is there any real-world reason to guess the filesystem capabilities from environment instead of the filesystem driver?
- why is
printaffected?
sys.stdout.encodingset to?LANG.LANGdictates everything in a POSIX locale: Debian thinks my file system is encoded as ISO-8859-1