0

I am trying to use data from an API. I am using request for the API access, but have also tried axios.

const request = require('request')
request('https://remoteok.io/api', function (error, response, body) {
  const data = JSON.parse(body)
  console.log(data)
})

When accessing the website remoteok.io/api in a browser, I can see sequences like \u00e2\u0080\u0099. This sequence should be a backtick apostrophe, but when I log to the console in JavaScript or use express to render res.json(body), I get the characters †instead.

How can I fix this encoding issue? Shouldn't JSON always just be plain UTF-8?

UPDATE: Here is a simple glitch project that shows the behavior.

6
  • What encoding does your console use? Commented Jul 30, 2019 at 4:50
  • Also, what is the actual problem? What do you expect to see? Commented Jul 30, 2019 at 4:52
  • Possible duplicate of JSON and escaping characters Commented Jul 30, 2019 at 4:53
  • I have updated the question to be more precise... When I rerender the received JSON using express, it differs. And so far I couldn't find out what the issue is. If it's a JSON parsing issue in request or axios or both; or whether it's an issue with the JSON rendering of express; or if there's something wrong with the actual content of the API - but a simple rerender should not convert the encoding/characters... Commented Jul 30, 2019 at 5:09
  • Sounds like mojibake at the source. \u00e2\u0080\u0099 does not directly resolve to a backtick in any way. Commented Jul 30, 2019 at 7:22

2 Answers 2

1

The problem is in the source data: the JSON sequence "\u00e2\u0080\u0099"does not represent a right closing quotation mark. There are three Unicode code points here, and the first represent "â", while the other two are control characters.

You can verify this in a dev console, or by running the snippet below:

console.log(JSON.parse('"\u00e2\u0080\u0099"'));

Apparently the author of that JSON mixed up two things:

  • JSON is encoded in UTF
  • A \u notation represents a Unicode Code Point

The first means that the file or stream, encoding the JSON text into bytes, should be UTF encoded (preference for UTF8). The second has nothing to do with that. JSON syntax allows to specify 16-bit Unicode Code Points using the \u syntax. It is not intended to produce a UTF8 byte sequence with a sequence1 of \u encodings. One should not be concerned about the lower-level UTF8 byte stream encoding when defining JSON text.

1 I may need to at least mention the surrogate pairs, but they are really unrelated to UTF8, but more with how Unicode Code Points beyond the 16-bit range can be encoded in JSON.

So although the right closing quotation mark has an UTF8 sequence of E2 80 99, this is not to be encoded with a \u notation for each of those three bytes.

The right closing quotation mark has Unicode Code Point \u2019. So either the source JSON should have that, or it should just have the character ’ literally (which will indeed be a UTF8 sequence in the byte stream, but that is a level below JSON)

See those two possibilities:

console.log(JSON.parse('"’"'));
console.log(JSON.parse('"\u2019"'));

And now?

I would advise you to contact the service provider of this particular API. They have a bug in their JSON producing service.

Whatever you do, do not try to fix this in your client that is using this service, trying to recognise such malformed sequences, and replacing them as if those characters represented UTF8 bytes. Such a fix will be hard to maintain, and may even hit false positives.

Sign up to request clarification or add additional context in comments.

Comments

0

I think this is not an error, you can use this extension to see JSON on browser JSON Viewer

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.