For query:
SHOW VARIABLES LIKE 'char%';
MySQL Database returns:
character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/local/mysql-5.7.27-macos10.14-x86_64/share/charsets/
In my Python script:
conn = get_database_connection()
conn.setdecoding(pyodbc.SQL_CHAR, encoding='latin1')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='latin1')
For one of the columns that has following value:
N’a pas
Python returns:
N?a pas
Between N and a, There is a star shaped question-mark. How do I read it as is? What's the best way to handle it? I have been reading about converting my db to utf-8 but that seems like a long shot with a good chance of breaking other things. Is there a more efficient way to do it?
At some of the places in code, I have done :
value = value.encode('utf-8', 'ignore').decode('utf-8')
to handle utf-8 data like accented characters but apostrophe did not get handled with the same and I ended up with ? instead of '
’(right single quotation mark, U+2019) is not part of Latin-1. Upgrading to UTF-8 is definitely the best option. It's 2020 now, UTF-8 is everywhere. (2) There are very rare cases wherevalue.encode('utf8', 'ignore').decode('utf8')has an effect. Typographic quotes are none of them. 99.9% of the time, this expression returns the originalvalueunchanged.