3

I've got an application where you can login via SAML2. I'm using apache mellon module and getting data:

name = request.environ['MELLON_name']
email = request.environ['MELLON_mail']

From those data I create JWT using flask_jwt_simpe library. Then I want to call get_jwt_identity(), but the name in response has wrong encoding, it looks JiÅí Manes instead of Jiří Manes (Czech language). How can i solve this problem?

Edit #1: locale command output

LANG=en_US.utf8
LANGUAGE=
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=en_US.utf8

Edit #2: Solved it on my VPS by following python code:

name = bytearray(request.environ['MELLON_name'], 'iso-8859-1').decode('utf-8')

But I would like to have another universal solution :-/

15
  • 1
    The strings in environ are passed through environment variables by, presumably, Apache/Mellon. It's storing UTF-8, but apparently Python/Flask doesn't know that, so it assumes environment variables are in your default locale, which appears to be Latin-1. So, you need to read them as raw bytes (so you can explicitly decode('utf-8') them), or you need to configure Flask to override the default encoding, or you need to configure your system to en_US.UTF-8 or something else appropriate. I'm not sure how you do the first two, but I'm sure it's in the Flask docs. Commented Mar 30, 2018 at 22:56
  • 1
    You might want to add the flask tag to attract the resident Flask experts (and make it clear exactly how your server is getting launched/dispatched, or whatever else seems relevant). Commented Mar 30, 2018 at 23:28
  • 1
    If you run sys.getdefaultencoding(), what is returned? Commented Apr 5, 2018 at 0:16
  • 1
    @user3216673 When print(request.environ['MELLON_name']) prints out raw bytes it must be a bytestring. Is it prefixed with a b like b'Ji\xc3\x85\xc2\x99\xc3\x83\xc2\xad Manes'? In that case something is not right, you can't decode bytestrings: bytearray(request.environ['MELLON_name'], 'iso-8859-1') This should throw an exception. Commented Apr 6, 2018 at 9:38
  • 1
    print(type(request.environ['MELLON_name'])) and print(repr(request.environ['MELLON_name'])) would be helpful. Commented Apr 6, 2018 at 9:44

1 Answer 1

2
+100

You have hit the WSGI encoding dance. Unfortunately, there isn't really a better solution than the one you've already found.

As you can see, you are already doing the same as werkzeug (which is the WSGI package used by Flask) to solve this issue.

If you wanted, you could use the compatibility functions in that package, but you may find they change without notification being in a private module, so you're probably best off sticking to your own equivalent code.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.