0

I am trying to get some Unicode characters printed out or written to a text file and am running into Errors..please advice, trying to google gave me a few hints but that error ed too..below is my code..What might I be doing wrong here..

I am trying to eventually use 'requests' and parse JSON with data that has Unicode values..

I am trying to parse JSON using requests from this url

https://api.discogs.com/releases/7828220

try:
        import requests
import json
url = 'https://api.discogs.com/releases/7828220'
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0' }
art = requests.get(url, headers=headers)
json_object = json.loads(art.text)
try:
    print str(json_object['companies'][0][name])
except:
    print "Genre list isn't defined"

    {u'name': u'\u041e\u041e\u041e "\u041f\u0430\u0440\u0430\u0434\u0438\u0437"', u'entity_type': u'10', u'catno': u'PARAD-432', u'resource_url': u'https://api.discogs.com/labels/210403', u'id': 210403, u'entity_type_name': u'Manufactured By'}

Here json_object['companies'][0][name] has a few Unicode characters that wont display on the command line terminal and also wont write to a file with the required output (Unicode)

Actual output looks like "ООО "Парадиз"", 

how can I get python to interpret these values as it appears?

3
  • 1
    what is the error? what is the question? Commented Dec 4, 2016 at 8:52
  • By the way bytes = u'' is already an unicode string Commented Dec 4, 2016 at 9:42
  • Are you sure that your terminal font supports those missing characters? Commented Dec 4, 2016 at 11:30

2 Answers 2

1

Your "bytes" is already unicode, so there should be no error.

>>> bytes = u'\xd0\x9e\xd0\x9e\xd0\x9e"\xd0\x9f\xd0\xb0\xd1\x80\xd0\xb0\xd0\xb4\xd0\xb8\xd0\xb7"'
>>> print unicode(bytes) 
ÐÐÐ "ÐаÑадиз"

However, if you are converting a python2 string / bytestring (without a u"" prefix) to unicode, the default encoding is ascii.

>>> bytes = '\xd0\x9e\xd0\x9e\xd0\x9e"\xd0\x9f\xd0\xb0\xd1\x80\xd0\xb0\xd0\xb4\xd0\xb8\xd0\xb7"'
>>> print unicode(bytes)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

The correct encoding to use here is UTF8. You can tell unicode() which encoding to use.

>>> print unicode(bytes, 'utf8')
ООО "Парадиз"
Sign up to request clarification or add additional context in comments.

2 Comments

Unicode is handled in a much more sensible way in python 3. If you are new to python, I highly recommend using that instead of python 2.
I would recommend that even if you are not new to python
0

won't display on the command line terminal

What errors do you get? In any event, the following works if you remove the unnecessary str() conversion and quote 'name' on a terminal that supports UTF-8, such as Linux:

import requests
import json

url = 'https://api.discogs.com/releases/7828220'
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0' }
art = requests.get(url, headers=headers)
json_object = json.loads(art.text)
print json_object['companies'][0]['name']

Output:

ООО "Парадиз"

On Windows, the command console may not default to an encoding that supports the characters you are trying to print. One easy way is to switch to a supported encoding, in this case chcp 1251 changes the code page to one supporting Russian, and will make the above work.

to write it to a file, use io.open with an encoding:

import io
with io.open('output.txt','w',encoding='utf8') as f:
    f.write(json_object['companies'][0]['name'])

1 Comment

Thank you for the detailed explaination, I am using Windows 10 and the command line terminal isnt supporting the fonts, but I was wondering why it wouldnt write to a file with those russian characters, but with the code you showed, it works! Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.