Convert HTML Character Entities back to regular text using JavaScript

Question

E.g. we have >, we need > using only JavaScript

It seems jQuery is the easy way out. But it would be nice to have a lightweight solution. More like a function which is capable to do this by itself.

If you need this, there is a certain probability that you're approching the problem the wrong way. — AndreKR
– AndreKR, Commented Dec 2, 2010 at 19:32
This is needed in one case where you have data that needs to be HTML friendly for display but can be saved to a text file and downloaded by a user. In that case, it is really necessary because users typically won't even know that it is a character entity, never mind which one it is. — David Rhoderick
– David Rhoderick, Commented Mar 4, 2016 at 14:43
Googlers: Skip all of these answers. The best solution: stackoverflow.com/a/7394787/114558 — rinogo
– rinogo, Commented Oct 5, 2016 at 21:33

Gumbo · Accepted Answer · 2010-12-02 19:40:42Z

33

You could do something like this:

String.prototype.decodeHTML = function() {
    var map = {"gt":">" /* , … */};
    return this.replace(/&(#(?:x[0-9a-f]+|\d+)|[a-z]+);?/gi, function($0, $1) {
        if ($1[0] === "#") {
            return String.fromCharCode($1[1].toLowerCase() === "x" ? parseInt($1.substr(2), 16)  : parseInt($1.substr(1), 10));
        } else {
            return map.hasOwnProperty($1) ? map[$1] : $0;
        }
    });
};

answered Dec 2, 2010 at 19:40

Gumbo

657k112 gold badges792 silver badges852 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

nuaavee Over a year ago

Neat solution. I have one question though - why are you checking for hexadecimal char code on line 5?

Gumbo Over a year ago

@nuaavee: Because character references can be either in decimal or hexadecimal notation:   =  .

nuaavee Over a year ago

Is this browser dependent? I mean do hex notations only apply to certain browsers?

Gumbo Over a year ago

@nuaavee: No, that’s basic SGML/HTML.

Ismail Over a year ago

Can anyone share the extended map var?

|

kennebec · Accepted Answer · 2014-07-24 18:49:06Z

22

function decodeEntities(s){
    var str, temp= document.createElement('p');
    temp.innerHTML= s;
    str= temp.textContent || temp.innerText;
    temp=null;
    return str;
}

alert(decodeEntities('&lt;'))

/*  returned value: (String)
<
*/

edited Jul 24, 2014 at 18:49

answered Dec 2, 2010 at 19:46

kennebec

105k32 gold badges109 silver badges127 bronze badges

1 Comment

nickf Over a year ago

This isn't safe to use on untrusted (user-entered) text. See this comment stackoverflow.com/questions/1147359/…

Nux · Accepted Answer · 2014-11-05 10:39:24Z

Here is a "class" for decoding whole HTML document.

HTMLDecoder = {
    tempElement: document.createElement('span'),
    decode: function(html) {
        var _self = this;
        html.replace(/&(#(?:x[0-9a-f]+|\d+)|[a-z]+);/gi,
            function(str) {
                _self.tempElement.innerHTML= str;
                str = _self.tempElement.textContent || _self.tempElement.innerText;
                return str;
            }
        );
    }
}

Note that I used Gumbo's regexp for catching entities but for fully valid HTML documents (or XHTML) you could simpy use /&[^;]+;/g.

CICDC · Accepted Answer · 2014-07-24 22:53:33Z

I know there are libraries out there, but here are a couple of solutions for browsers. These work well when placing html entity data strings into human editable areas where you want the characters to be shown, such as textarea's or input[type=text].

I add this answer as I have to support older versions of IE and I feel that it wraps up a few days worth of research and testing. I hope somebody finds this useful.

First this is for more modern browsers using jQuery, Please note that this should NOT be used if you have to support versions of IE before 10 (7, 8, or 9) as it will strip out the newlines leaving you with just one long line of text.

if (!String.prototype.HTMLDecode) {
    String.prototype.HTMLDecode = function () {
            var str = this.toString(),
            $decoderEl = $('<textarea />');

        str = $decoderEl.html(str)
            .text()
            .replace(/<br((\/)|( \/))?>/gi, "\r\n");

        $decoderEl.remove();

        return str;
    };
}

This next one is based on kennebec's work above, with some differences which are mostly for the sake of older IE versions. This does not require jQuery, but does still require a browser.

if (!String.prototype.HTMLDecode) {
    String.prototype.HTMLDecode = function () {
        var str = this.toString(),
            //Create an element for decoding            
            decoderEl = document.createElement('p');

        //Bail if empty, otherwise IE7 will return undefined when 
        //OR-ing the 2 empty strings from innerText and textContent
        if (str.length == 0) {
            return str;
        }

        //convert newlines to <br's> to save them
        str = str.replace(/((\r\n)|(\r)|(\n))/gi, " <br/>");            

        decoderEl.innerHTML = str;
        /*
        We use innerText first as IE strips newlines out with textContent.
        There is said to be a performance hit for this, but sometimes
        correctness of data (keeping newlines) must take precedence.
        */
        str = decoderEl.innerText || decoderEl.textContent;

        //clean up the decoding element
        decoderEl = null;

        //replace back in the newlines
        return str.replace(/<br((\/)|( \/))?>/gi, "\r\n");
    };
}

/* 
Usage: 
    var str = "&gt;";
    return str.HTMLDecode();

returned value: 
    (String) >    
*/

Oded · Accepted Answer · 2010-12-02 19:32:05Z

1

There is nothing built in, but there are many libraries that have been written to do this.

Here is one.

And here one that is a jQuery plugin.

answered Dec 2, 2010 at 19:32

Oded

501k102 gold badges899 silver badges1k bronze badges

Comments

j.j. · Accepted Answer · 2024-11-19 19:46:00Z

0

This answer isn't really new, but concise without needless cruft.
You can create any element like <p> or <b> or <span> or <foo> or <whatever>.
Unsuitable for untrusted user input.

function entity(x) {
    const foo = document.createElement("foo");
    foo.innerHTML = x;
    return foo.textContent;
}

console.log( entity("&hearts; &nbsp; &clubs;") );  // returns ♥   ♣

alert( entity("&hearts; &nbsp; &clubs;") );

answered Nov 19, 2024 at 19:46

j.j.

2,10018 silver badges12 bronze badges

Collectives™ on Stack Overflow

Convert HTML Character Entities back to regular text using JavaScript

6 Answers 6

7 Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

7 Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Related