0

I have characters \u002d, \u2019, u\2022, \u25ba, \u2013 etc, coming in my data. I have to do json.loads(data)

I tried doing

data1 = data.encode('utf-8')
json.loads(data1)

I still get an error.

Also tried the below but ended up in an error

b1 = data.encode('ascii', 'ignore')
b2 = json.loads(b1)

It works if I replace the characters in my data, like '\u002d' to '-', but I do not know what other characters might creep in. So I am looking for a solution which would encode these characters

1 Answer 1

2

There is no need to encode the data.

Feed it directly to json.loads(); the JSON standard uses \u.... escape codes to denote unicode values too.

The values are not encoded in UTF-8, the Python json module will handle them for you.

Even if the data was encoded in UTF-8, the json module will handle that for you as well. Even if it didn't, you'd use str.decode(), not encode.

UTF-8 data looks different as well; the U+2019 codepoint looks like:

>>> u'\u2019'.encode('utf8')
'\xe2\x80\x99'

when encoded to UTF-8.

Sign up to request clarification or add additional context in comments.

2 Comments

Yes it is working. But now I am not able to write it to a file. It says: Traceback (most recent call last): File "C:\Python27\AureusBAXProject.py", line 202, in <module> outfile.writerows(outlist) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 0: ordinal not in range(128)
@user1946217: Then use io.open() to open your output file too. Your unicode data needs to be encoded in that case. What encoding you do that in depends on what you need to do with the output CSV.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.