2
type MyStruct struct {
    Value json.RawMessage `json:"value"`
}

var resp *http.Response

if resp, err = http.DefaultClient.Do(req); err == nil {
    if resp.StatusCode == 200 {
        var buffer []byte
        if buffer, err = ioutil.ReadAll(resp.Body); err == nil {

            mystruct = &MyStruct{}
            err = json.Unmarshal(buffer, mystruct)

        }
    }
}

fmt.Println(string(mystruct.Value))

it produces something like:

   \u003Chead>\n  \u003C/head>\n  \u003Cbody>

Doc at: http://golang.org/pkg/encoding/json/#Unmarshal

says: When unmarshaling quoted strings, invalid UTF-8 or invalid UTF-16 surrogate pairs are not treated as an error. Instead, they are replaced by the Unicode replacement character U+FFFD.

I kinda think this is what is going on. I just can't see the answer as my experience with go is minimal and I'm tired.

6
  • 2
    Please provide sample code that demonstrates the output you see and your attempt to correct the unexpected behavior. Commented Mar 27, 2015 at 15:29
  • 1
    can you show exactly what is in your []byte slice? The literal value \u003C should be < since go always assumes utf-8. Commented Mar 27, 2015 at 15:30
  • I edited the post a little. Hopefully that provides more of a context. thx Commented Mar 27, 2015 at 15:46
  • Why are you using json.RawMessage there? If a json encoder encoded message, presumably you want a json decoder to decode it. Commented Mar 27, 2015 at 15:48
  • 1
    a json.RawMessage hasn't been unmarshaled. You either need to unmarshal it into a string(example), or unescape the unicode yourself. Commented Mar 27, 2015 at 16:17

2 Answers 2

6

There is a way to convert escaped unicode characters in json.RawMessage into just valid UTF8 characters without unmarshalling it. (I had to deal with the issue since my primary language is Korean.)

You can use the strconv.Quote() and strconv.Unquote() to do the conversion.

func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
    str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
    if err != nil {
        return nil, err
    }
    return []byte(str), nil
}

func main() {
    // Both are valid JSON.
    var jsonRawEscaped json.RawMessage   // json raw with escaped unicode chars
    var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars

    // '\u263a' == '☺'
    jsonRawEscaped = []byte(`{"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}`) // "\\u263a"
    jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped)                        // "☺"

    fmt.Println(string(jsonRawEscaped))   // {"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}
    fmt.Println(string(jsonRawUnescaped)) // {"HelloWorld": "안녕, 세상(世上). ☺"}
}

https://play.golang.org/p/pUsrzrrcDG-

Hope this helps :D

Sign up to request clarification or add additional context in comments.

1 Comment

This helped me a bunch! I was struggling with DecodeRuneInString when retrieving data from a db inserted by php. But this did the trick, thanks!
3

You decided to use json.RawMessage to prevent parsing of the value with key value in your json message.

The string literal "\u003chtml\u003e" is a valid json representation of "<html>".

Since you told json.Unmarshal not to parse this part, it does not parse it and returns it to you as-is.

If you want to have it parsed into an UTF-8 string, then change the definition of MyStruct to:

type MyStruct struct {
    Value string `json:"value"`
}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.