Node.js convert string from ISO-8859-2 to UTF-8

Question

When I am downloading page content by Node.js Request and the content is encoded by ISO-8859-2, it is impossible to convert it to UTF-8.

I am using node-iconv for it.

Code:

request('https://www.jakpsatweb.cz', function(err, resp, body){
    const title = regexToRetrieveTitle(body);
    const iconv = new Iconv('ISO-8859-2', 'UTF-8');
    const buffer = iconv.convert(title);
    console.log(buffer);
    console.log(buffer.toString('UTF8'));
})

Console:

<Buffer 52 65 6b 6c 61 6d 61 3a 20 6a 61 6b 20 66 75 6e 67 75 6a 65 20 77 65 62 6f 76 c4 8f c5 bc cb 9d 20 72 65 6b 6c 61 6d 61>
Reklama: jak funguje webovďż˝ reklama

Expected result:

Reklama: jak funguje webová reklama

Do anyone know where is problem?

EDIT:

For example I download THIS PAGE . I recognised ISO-8859-2 by meta tags (chrome browser also) and I need to convert the content of page and save to database. My Database is UTF-8 therefore I need to encode it.

Please provide the expected input and output strings (not just a buffer) — duncanhall
– duncanhall, Commented Oct 19, 2016 at 12:31
It is there. How you can see, there are two console.logs() It means there is buffer and the second line is string. Expected string is without buffer. — MakoBuk
– MakoBuk, Commented Oct 19, 2016 at 12:50
Title is parsed content of <title>content</title>. Question updated. — MakoBuk
– MakoBuk, Commented Oct 19, 2016 at 13:35

MakoBuk · Accepted Answer · 2016-10-24 10:42:21Z

2

The problem is in Node.js request. There is encoding set to UTF8 by default. I had to set it to null and now everything works fine.

request({ uri: 'https://www.jakpsatweb.cz', encoding: null}, function(err, resp, body){
    .....
})

edited Oct 24, 2016 at 10:42

answered Oct 19, 2016 at 16:01

MakoBuk

4743 gold badges8 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marcelo Over a year ago

In my case I've just changed request by axios

Bruno Haible · Accepted Answer · 2016-10-19 15:59:34Z

The conversion from ISO-8859-2 to UTF-8 worked fine. It was the input (the title variable) that has a wrong contents: The title contains the bytes EF BF BD. This means that the title was already UTF-8 encoded, but with a U+FFFD (REPLACEMENT CHARACTER) in the place where you would expect the letter á (LATIN SMALL LETTER A WITH ACUTE).

Now, the original web page https://www.jakpsatweb.cz/reklama/index.html is correctly encoded in ISO-8859-2 and also has the required charset declaration in the <head> section.

Therefore the problem must be in the software that downloads the web page (NodeJS) or the regexToRetrieveTitle function.

Collectives™ on Stack Overflow

Node.js convert string from ISO-8859-2 to UTF-8

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related