Unescape HTML entities in JavaScript?

Question

I have some JavaScript code that communicates with an XML-RPC backend. The XML-RPC returns strings of the form:

<img src='myimage.jpg'>

However, when I use JavaScript to insert the strings into HTML, they render literally. I don't see an image, I see the string:

<img src='myimage.jpg'>

I guess that the HTML is being escaped over the XML-RPC channel.

How can I unescape the string in JavaScript? I tried the techniques on this page, unsuccessfully: http://paulschreiber.com/blog/2008/09/20/javascript-how-to-unescape-html-entities/

What are other ways to diagnose the issue?

The huge function included in this article seems to work fine: blogs.msdn.com/b/aoakley/archive/2003/11/12/49645.aspx I don't think that's the most clever solution but works. — mati
– mati, Commented Sep 13, 2010 at 12:52
As strings containing HTML entities are something different than escaped or URI encoded strings, those functions won't work. — Marcel Korpel
– Marcel Korpel, Commented Sep 13, 2010 at 13:15
@Matias note that new named entities have been added to HTML (e.g. via the HTML 5 spec) since that function was authored in 2003 - for instance, it doesn't recognise &zopf;. This is a problem with an evolving spec; as such, you should pick a tool that's actually being maintained to solve it with. — Mark Amery
– Mark Amery, Commented Feb 19, 2017 at 15:03
Possible duplicate of How to decode HTML entities using jQuery? — lucascaro
– lucascaro, Commented Nov 13, 2018 at 19:23
I've just realized how easy it is to confuse this question with encoding HTML entities. I've just realized I accidentally posted an answer for the wrong question on this question! I've deleted it, though. — shreyasm-dev
– shreyasm-dev, Commented Sep 25, 2020 at 16:59

Wladimir Palant · Accepted Answer · 2024-02-05 10:26:18Z

726

Most answers given here have a huge disadvantage: if the string you are trying to convert isn't trusted then you will end up with a Cross-Site Scripting (XSS) vulnerability. For the function in the accepted answer, consider the following:

htmlDecode("<img src='dummy' onerror='alert(/xss/)'>");

The string here contains an unescaped HTML tag, so instead of decoding anything the htmlDecode function will actually run JavaScript code specified inside the string.

This can be avoided by using DOMParser which is supported in all modern browsers:

function htmlDecode(input) {
  var doc = new DOMParser().parseFromString(input, "text/html");
  return doc.documentElement.textContent;
}

console.log(  htmlDecode("&lt;img src='myimage.jpg'&gt;")  )    
// "<img src='myimage.jpg'>"

console.log(  htmlDecode("<img src='dummy' onerror='alert(/xss/)'>")  )  
// ""

This function is guaranteed to not run any JavaScript code as a side-effect. Any HTML tags will be ignored, only text content will be returned.

Compatibility note: Parsing HTML with DOMParser requires at least Chrome 30, Firefox 12, Opera 17, Internet Explorer 10, Safari 7.1 or Microsoft Edge. So all browsers without support are way past their EOL and as of 2017 the only ones that can still be seen in the wild occasionally are older Internet Explorer and Safari versions (usually these still aren't numerous enough to bother).

edited Feb 5, 2024 at 10:26

answered Dec 3, 2015 at 11:09

Wladimir Palant

57.8k12 gold badges100 silver badges127 bronze badges

Sign up to request clarification or add additional context in comments.

21 Comments

Константин Ван Over a year ago

I think this answer is the best because it mentioned the XSS vulnerability.

PointedEars Over a year ago

Note that (according to your reference) DOMParser did not support "text/html" before Firefox 12.0, and there are still some latest versions of browsers that do not even support DOMParser.prototype.parseFromString(). According to your reference, DOMParser is still an experimental technology, and the stand-ins use the innerHTML property which, as you also pointed out in response to my approach, has this XSS vulnerability (which ought to be fixed by browser vendors).

Wladimir Palant Over a year ago

@PointedEars: Who cares about Firefox 12 in 2016? The problematic ones are Internet Explorer up to 9.0 and Safari up to 7.0. If one can afford not supporting them (which will hopefully be everybody soon) then DOMParser is the best choice. If not - yes, processing entities only would be an option.

Wladimir Palant Over a year ago

@PointedEars: <script> tags not being executed isn't a security mechanism, this rule merely avoids the tricky timing issues if setting innerHTML could run synchronous scripts as a side-effect. Sanitizing HTML code is a tricky affair and innerHTML doesn't even try - already because the web page might actually intend to set inline event handlers. This simply isn't a mechanism intended for unsafe data, full stop.

Wladimir Palant Over a year ago

@ИльяЗеленько: Do you plan to use this code in a tight loop or why does the performance matter? Your answer is again vulnerable to XSS, was it really worth it?

|

Mark Amery · Accepted Answer · 2017-02-19 15:12:13Z

317

Do you need to decode all encoded HTML entities or just & itself?

If you only need to handle & then you can do this:

var decoded = encoded.replace(/&amp;/g, '&');

If you need to decode all HTML entities then you can do it without jQuery:

var elem = document.createElement('textarea');
elem.innerHTML = encoded;
var decoded = elem.value;

Please take note of Mark's comments below which highlight security holes in an earlier version of this answer and recommend using textarea rather than div to mitigate against potential XSS vulnerabilities. These vulnerabilities exist whether you use jQuery or plain JavaScript.

edited Feb 19, 2017 at 15:12

Mark Amery

158k92 gold badges434 silver badges476 bronze badges

answered Sep 13, 2010 at 12:31

LukeH

271k59 gold badges373 silver badges411 bronze badges

8 Comments

Mark Amery Over a year ago

Beware! This is potentially insecure. If encoded='<img src="bla" onerror="alert(1)">' then the snippet above will show an alert. This means if your encoded text is coming from user input, decoding it with this snippet may present an XSS vulnerability.

Mottie Over a year ago

@MarkAmery I not a security expert, but it looks like if you immediate set the div to null after getting the text, the alert in the img isn't fired - jsfiddle.net/Mottie/gaBeb/128

Mark Amery Over a year ago

@Mottie note sure which browser that worked for you in, but the alert(1) still fires for me on Chrome on OS X. If you want a safe variant of this hack, try using a textarea.

Mohammad Kermani Over a year ago

How to do this on Node server?

Waruyama Over a year ago

This fails on Firefox if there is an inline style with the font-family set, because the font's name is put in quotation marks, which are escaped, so the resulting string will look like this: style="font-family: "Roboto";"

|

Wladimir Palant · Accepted Answer · 2019-04-30 15:06:33Z

EDIT: You should use the DOMParser API as Wladimir suggests, I edited my previous answer since the function posted introduced a security vulnerability.

The following snippet is the old answer's code with a small modification: using a textarea instead of a div reduces the XSS vulnerability, but it is still problematic in IE9 and Firefox.

function htmlDecode(input){
  var e = document.createElement('textarea');
  e.innerHTML = input;
  // handle case of empty input
  return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}

htmlDecode("&lt;img src='myimage.jpg'&gt;"); 
// returns "<img src='myimage.jpg'>"

Basically I create a DOM element programmatically, assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the innerHTML insertion. Since it just creates an element but never adds it, no site HTML is modified.

It will work cross-browser (including older browsers) and accept all the HTML Character Entities.

EDIT: The old version of this code did not work on IE with blank inputs, as evidenced here on jsFiddle (view in IE). The version above works with all inputs.

UPDATE: appears this doesn't work with large string, and it also introduces a security vulnerability, see comments.

@S.Mark: ' doesn't belongs to the HTML 4 Entities, that's why! w3.org/TR/html4/sgml/entities.html fishbowl.pastiche.org/2003/07/01/the_curse_of_apos
See also @kender's note about the poor security of this approach.
This function is a security hazard, JavaScript code will run even despite the element not being added to the DOM. So this is only something to use if the input string is trusted. I added my own answer explaining the issue and providing a secure solution. As a side-effect, the result isn't being cut off if multiple text nodes exist.
This doesn't work if JS is not running in the browser, i.e. with Node.

Mark Amery · Accepted Answer · 2017-02-15 16:33:58Z

A more modern option for interpreting HTML (text and otherwise) from JavaScript is the HTML support in the DOMParser API (see here in MDN). This allows you to use the browser's native HTML parser to convert a string to an HTML document. It has been supported in new versions of all major browsers since late 2014.

If we just want to decode some text content, we can put it as the sole content in a document body, parse the document, and pull out the its .body.textContent.

var encodedStr = 'hello &amp; world';

var parser = new DOMParser;
var dom = parser.parseFromString(
    '<!doctype html><body>' + encodedStr,
    'text/html');
var decodedString = dom.body.textContent;

console.log(decodedString);

We can see in the draft specification for DOMParser that JavaScript is not enabled for the parsed document, so we can perform this text conversion without security concerns.

The parseFromString(str, type) method must run these steps, depending on type:

"text/html"

Parse str with an HTML parser, and return the newly created Document.

The scripting flag must be set to "disabled".

NOTE
script elements get marked unexecutable and the contents of noscript get parsed as markup.

It's beyond the scope of this question, but please note that if you're taking the parsed DOM nodes themselves (not just their text content) and moving them to the live document DOM, it's possible that their scripting would be reenabled, and there could be security concerns. I haven't researched it, so please exercise caution.

Community · Accepted Answer · 2017-05-23 11:55:00Z

Matthias Bynens has a library for this: https://github.com/mathiasbynens/he

Example:

console.log(
    he.decode("J&#246;rg &amp J&#xFC;rgen rocked to &amp; fro ")
);
// Logs "Jörg & Jürgen rocked to & fro"

I suggest favouring it over hacks involving setting an element's HTML content and then reading back its text content. Such approaches can work, but are deceptively dangerous and present XSS opportunities if used on untrusted user input.

If you really can't bear to load in a library, you can use the textarea hack described in this answer to a near-duplicate question, which, unlike various similar approaches that have been suggested, has no security holes that I know of:

function decodeEntities(encodedString) {
    var textArea = document.createElement('textarea');
    textArea.innerHTML = encodedString;
    return textArea.value;
}

console.log(decodeEntities('1 &amp; 2')); // '1 & 2'

But take note of the security issues, affecting similar approaches to this one, that I list in the linked answer! This approach is a hack, and future changes to the permissible content of a textarea (or bugs in particular browsers) could lead to code that relies upon it suddenly having an XSS hole one day.

Matthias Bynens' library he is absolutely great! Thank you very much for the recommendation!

Chris Fulstow · Accepted Answer · 2009-12-16 05:40:02Z

40

If you're using jQuery:

function htmlDecode(value){ 
  return $('<div/>').html(value).text(); 
}

Otherwise, use Strictly Software's Encoder Object, which has an excellent htmlDecode() function.

answered Dec 16, 2009 at 5:40

Chris Fulstow

42k10 gold badges90 silver badges114 bronze badges

6 Comments

Michael Lorton Over a year ago

Do not (repeat NOT) use this for user-generated content other than content generated by this user. If there's a <script> tag in the value, the contents of the script will be executed!

TRiG Over a year ago

I can't find a license for that anywhere on the site. Do you know what the license is?

Chris Fulstow Over a year ago

There's a license in the source header, it's GPL.

Dinis Cruz Over a year ago

YES, that function open the way for XSS: try htmlDecode("<script>alert(12)</script> 123 >")

Echo Yang Over a year ago

what's meaning of the $('<div/>')?

|

I am L · Accepted Answer · 2018-02-23 09:53:47Z

32

You can use Lodash unescape / escape function https://lodash.com/docs/4.17.5#unescape

import unescape from 'lodash/unescape';

const str = unescape('fred, barney, &amp; pebbles');

str will become 'fred, barney, & pebbles'

answered Feb 23, 2018 at 9:53

I am L

4,6806 gold badges38 silver badges55 bronze badges

3 Comments

Rick Penabella Over a year ago

probably better to do "import _unescape from 'lodash/unescape';" so it doesn't conflict with the deprecated javascript function of the same name: unescape

Eugene Barsky Over a year ago

The best answer. We already have lodash in our project and it also escapes more correctly than he.

ruffin Over a year ago

Lodash only unescapes five entities ('&': '&', '<': '<', '>': '>', '"': '"', ''': "'"), and warns its users, "Note: No other HTML entities are unescaped. To unescape additional HTML entities use a third-party library like _he._" fwiw, 2¢, etc.

WaiKit Kung · Accepted Answer · 2014-01-02 10:23:57Z

var htmlEnDeCode = (function() {
    var charToEntityRegex,
        entityToCharRegex,
        charToEntity,
        entityToChar;

    function resetCharacterEntities() {
        charToEntity = {};
        entityToChar = {};
        // add the default set
        addCharacterEntities({
            '&amp;'     :   '&',
            '&gt;'      :   '>',
            '&lt;'      :   '<',
            '&quot;'    :   '"',
            '&#39;'     :   "'"
        });
    }

    function addCharacterEntities(newEntities) {
        var charKeys = [],
            entityKeys = [],
            key, echar;
        for (key in newEntities) {
            echar = newEntities[key];
            entityToChar[key] = echar;
            charToEntity[echar] = key;
            charKeys.push(echar);
            entityKeys.push(key);
        }
        charToEntityRegex = new RegExp('(' + charKeys.join('|') + ')', 'g');
        entityToCharRegex = new RegExp('(' + entityKeys.join('|') + '|&#[0-9]{1,5};' + ')', 'g');
    }

    function htmlEncode(value){
        var htmlEncodeReplaceFn = function(match, capture) {
            return charToEntity[capture];
        };

        return (!value) ? value : String(value).replace(charToEntityRegex, htmlEncodeReplaceFn);
    }

    function htmlDecode(value) {
        var htmlDecodeReplaceFn = function(match, capture) {
            return (capture in entityToChar) ? entityToChar[capture] : String.fromCharCode(parseInt(capture.substr(2), 10));
        };

        return (!value) ? value : String(value).replace(entityToCharRegex, htmlDecodeReplaceFn);
    }

    resetCharacterEntities();

    return {
        htmlEncode: htmlEncode,
        htmlDecode: htmlDecode
    };
})();

This is from ExtJS source code.

-1; this fails to handle the vast majority of named entities. For instance, htmlEnDecode.htmlDecode('€') should return '€', but instead returns '€'.

Ben White · Accepted Answer · 2017-10-20 14:51:51Z

20

The trick is to use the power of the browser to decode the special HTML characters, but not allow the browser to execute the results as if it was actual html... This function uses a regex to identify and replace encoded HTML characters, one character at a time.

function unescapeHtml(html) {
    var el = document.createElement('div');
    return html.replace(/\&[#0-9a-z]+;/gi, function (enc) {
        el.innerHTML = enc;
        return el.innerText
    });
}

answered Oct 20, 2017 at 14:51

Ben White

3933 silver badges6 bronze badges

2 Comments

TheAtomicOption Over a year ago

The regex can be matched a bit tighter with /\&#?[0-9a-z]+;/gi since # should only appear as the 2nd character if at all.

Emmanuel Over a year ago

This is the best answer. Avoids XSS vulnerability, and doesn't strip HTML tags.

laggingreflex · Accepted Answer · 2015-03-14 19:02:05Z

17

element.innerText also does the trick.

edited Mar 14, 2015 at 19:02

laggingreflex

34.9k36 gold badges146 silver badges201 bronze badges

answered Nov 19, 2012 at 16:42

avg_joe

1951 silver badge2 bronze badges

1 Comment

kernel Over a year ago

Yup +1 targetElement = document.querySelector('.target'); targetElement.innerHTML = targetElement.textContent;

cslotty · Accepted Answer · 2018-01-14 10:49:06Z

14

In case you're looking for it, like me - meanwhile there's a nice and safe JQuery method.

https://api.jquery.com/jquery.parsehtml/

You can f.ex. type this in your console:

var x = "test &amp;";
> undefined
$.parseHTML(x)[0].textContent
> "test &"

So $.parseHTML(x) returns an array, and if you have HTML markup within your text, the array.length will be greater than 1.

answered Jan 14, 2018 at 10:49

cslotty

1,81723 silver badges29 bronze badges

7 Comments

Jonathan Nielsen Over a year ago

Worked perfectly for me, this was exactly what i was looking for, thank you.

Andrew Hodgkinson Over a year ago

If x has a value of <script>alert('hello');</script> the above will crash. In current jQuery it won't actually try to run the script, but [0] will yield undefined so the call to textContent will fail and your script will stop there. $('<div />').html(x).text(); looks safer - via gist.github.com/jmblog/3222899

cslotty Over a year ago

@AndrewHodgkinson yeah, but the question was "Decode & back to & in JavaScript" - so you'd test the contents of x first or make sure you only use it in the correct cases.

Andrew Hodgkinson Over a year ago

I don't really see how that follows. The code above works in all cases. And just how exactly would you "make sure" the value of x needed fixing? And what if the script example above alerted '&' so that it really did need correction? We have no idea where the OP's strings come from, so malicious input must be considered.

cslotty Over a year ago

@AndrewHodgkinson I like your consideration, but that's not the question here. Feel free to answer that question, though. I guess you could remove script tags, f.ex.

|

Jason Williams · Accepted Answer · 2016-09-28 20:57:21Z

10

jQuery will encode and decode for you. However, you need to use a textarea tag, not a div.

var str1 = 'One & two & three';
var str2 = "One &amp; two &amp; three";
  
$(document).ready(function() {
   $("#encoded").text(htmlEncode(str1)); 
   $("#decoded").text(htmlDecode(str2));
});

function htmlDecode(value) {
  return $("<textarea/>").html(value).text();
}

function htmlEncode(value) {
  return $('<textarea/>').text(value).html();
}

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>

<div id="encoded"></div>
<div id="decoded"></div>

answered Sep 28, 2016 at 20:57

Jason Williams

2,85830 silver badges36 bronze badges

3 Comments

Mark Amery Over a year ago

-1 because there's a (surprising) security hole here for old jQuery versions, some of which probably still have a significant user base - those versions will detect and explicitly evaluate scripts in the HTML passed to .html(). Thus even using a textarea isn't enough to ensure security here; I suggest not using jQuery for this task and writing equivalent code with the plain DOM API. (Yes, that old behaviour by jQuery is mad and awful.)

Jason Williams Over a year ago

Thank you for pointing that out. However, the question does not include a requirement to check for script injection. The question specifically asks about html rendered by the web server. Html content saved to a web server should probably be validated for script injection before save.

Luis Lobo Over a year ago

I used your example and made the vanilla version (down the page)

Community · Accepted Answer · 2017-05-23 12:18:17Z

CMS' answer works fine, unless the HTML you want to unescape is very long, longer than 65536 chars. Because then in Chrome the inner HTML gets split into many child nodes, each one at most 65536 long, and you need to concatenate them. This function works also for very long strings:

function unencodeHtmlContent(escapedHtml) {
  var elem = document.createElement('div');
  elem.innerHTML = escapedHtml;
  var result = '';
  // Chrome splits innerHTML into many child nodes, each one at most 65536.
  // Whereas FF creates just one single huge child node.
  for (var i = 0; i < elem.childNodes.length; ++i) {
    result = result + elem.childNodes[i].nodeValue;
  }
  return result;
}

See this answer about innerHTML max length for more info: https://stackoverflow.com/a/27545633/694469

Łukasz K · Accepted Answer · 2020-07-07 22:02:30Z

5

To unescape HTML entities* in JavaScript you can use small library html-escaper: npm install html-escaper

import {unescape} from 'html-escaper';

unescape('escaped string');

Or unescape function from Lodash or Underscore, if you are using it.

*) please note that these functions don't cover all HTML entities, but only the most common ones, i.e. &, <, >, ', ". To unescape all HTML entities you can use he library.

answered Jul 7, 2020 at 22:02

Łukasz K

5726 silver badges11 bronze badges

Comments

Chris · Accepted Answer · 2016-06-07 09:30:47Z

4

First create a <span id="decodeIt" style="display:none;"></span> somewhere in the body

Next, assign the string to be decoded as innerHTML to this:

document.getElementById("decodeIt").innerHTML=stringtodecode

Finally,

stringtodecode=document.getElementById("decodeIt").innerText

Here is the overall code:

var stringtodecode="<B>Hello</B> world<br>";
document.getElementById("decodeIt").innerHTML=stringtodecode;
stringtodecode=document.getElementById("decodeIt").innerText

edited Jun 7, 2016 at 9:30

Chris

59.7k20 gold badges121 silver badges144 bronze badges

answered Jan 9, 2013 at 3:03

Infoglaze.com

852 silver badges3 bronze badges

1 Comment

Mark Amery Over a year ago

-1; this is dangerously insecure to use on untrusted input. For instance, consider what happens if stringtodecode contains something like <script>alert(1)</script>.

Andrew Hodgkinson · Accepted Answer · 2020-03-11 23:03:32Z

The question doesn't specify the origin of x but it makes sense to defend, if we can, against malicious (or just unexpected, from our own application) input. For example, suppose x has a value of & <script>alert('hello');</script>. A safe and simple way to handle this in jQuery is:

var x    = "&amp; <script>alert('hello');</script>";
var safe = $('<div />').html(x).text();

// => "& alert('hello');"

Found via https://gist.github.com/jmblog/3222899. I can't see many reasons to avoid using this solution given it is at least as short, if not shorter than some alternatives and provides defence against XSS.

(I originally posted this as a comment, but am adding it as an answer since a subsequent comment in the same thread requested that I do so).

kender · Accepted Answer · 2009-12-16 05:34:33Z

2

Not a direct response to your question, but wouldn't it be better for your RPC to return some structure (be it XML or JSON or whatever) with those image data (urls in your example) inside that structure?

Then you could just parse it in your javascript and build the <img> using javascript itself.

The structure you recieve from RPC could look like:

{"img" : ["myimage.jpg", "myimage2.jpg"]}

I think it's better this way, as injecting a code that comes from external source into your page doesn't look very secure. Imaging someone hijacking your XML-RPC script and putting something you wouldn't want in there (even some javascript...)

answered Dec 16, 2009 at 5:34

kender

87.5k26 gold badges106 silver badges146 bronze badges

3 Comments

Joseph Turian Over a year ago

Does the @CMS approach above have this security flaw?

kender Over a year ago

I just checked the following argument passed to htmlDecode fuction: htmlDecode("<img src='myimage.jpg'><script>document.write('xxxxx');</script>") and it creates the <script></script> element that can be bad, imho. And I still think returning a structure instead of text to be inserted is better, you can handle errors nicely for example.

Roatin Marth Over a year ago

I just tried htmlDecode("<img src='myimage.jpg'><script>alert('xxxxx');</script>") and nothing happened. I got the decoded html string back as expected.

Community · Accepted Answer · 2017-05-23 11:47:30Z

2

a javascript solution that catches the common ones:

var map = {amp: '&', lt: '<', gt: '>', quot: '"', '#039': "'"}
str = str.replace(/&([^;]+);/g, (m, c) => map[c])

this is the reverse of https://stackoverflow.com/a/4835406/2738039

edited May 23, 2017 at 11:47

CommunityBot

11 silver badge

answered Oct 7, 2016 at 19:07

HK JR

2701 silver badge8 bronze badges

4 Comments

Eldelshell Over a year ago

If you use map[c] || '' unrecognized ones won't be shown as undefined

Mark Amery Over a year ago

Very limited coverage; -1.

Trần Quốc Hoài new 2015 Over a year ago

+1, more is

unescapeHtml(str){     var map = {amp: '&', lt: '<', le: '≤', gt: '>', ge: '≥', quot: '"', '#039': "'"}     return str.replace(/&([^;]+);/g, (m, c) => map[c]|| '')   }

Sergio A. Over a year ago

Manual coverage. Not recommended.

ninhjs.dev · Accepted Answer · 2017-07-28 18:03:35Z

2

For one-line guys:

const htmlDecode = innerHTML => Object.assign(document.createElement('textarea'), {innerHTML}).value;

console.log(htmlDecode('Complicated - Dimitri Vegas &amp; Like Mike'));

answered Jul 28, 2017 at 18:03

ninhjs.dev

8,7732 gold badges53 silver badges40 bronze badges

Comments

buycanna.io · Accepted Answer · 2019-08-05 10:26:07Z

You're welcome...just a messenger...full credit goes to ourcodeworld.com, link below.

window.htmlentities = {
        /**
         * Converts a string to its html characters completely.
         *
         * @param {String} str String with unescaped HTML characters
         **/
        encode : function(str) {
            var buf = [];

            for (var i=str.length-1;i>=0;i--) {
                buf.unshift(['&#', str[i].charCodeAt(), ';'].join(''));
            }

            return buf.join('');
        },
        /**
         * Converts an html characterSet into its original character.
         *
         * @param {String} str htmlSet entities
         **/
        decode : function(str) {
            return str.replace(/&#(\d+);/g, function(match, dec) {
                return String.fromCharCode(dec);
            });
        }
    };

Full Credit: https://ourcodeworld.com/articles/read/188/encode-and-decode-html-entities-using-pure-javascript

This is an incomplete solution; it only handles decimal numeric character references, not named character references or hexadecimal numeric character reference.

Slavik Meltser · Accepted Answer · 2021-04-28 17:26:58Z

I know there are a lot of good answers here, but since I have implemented a bit different approach, I thought to share.

This code is a perfectly safe security-wise approach, as the escaping handler dependant on the browser, instead on the function. So, if a new vulnerability will be discovered in the future, this solution will be covered.

const decodeHTMLEntities = text => {
    // Create a new element or use one from cache, to save some element creation overhead
    const el = decodeHTMLEntities.__cache_data_element 
             = decodeHTMLEntities.__cache_data_element 
               || document.createElement('div');
    
    const enc = text
        // Prevent any mixup of existing pattern in text
        .replace(/⪪/g, '⪪#')
        // Encode entities in special format. This will prevent native element encoder to replace any amp characters
        .replace(/&([a-z1-8]{2,31}|#x[0-9a-f]+|#\d+);/gi, '⪪$1⪫');

    // Encode any HTML tags in the text to prevent script injection
    el.textContent = enc;

    // Decode entities from special format, back to their original HTML entities format
    el.innerHTML = el.innerHTML
        .replace(/⪪([a-z1-8]{2,31}|#x[0-9a-f]+|#\d+)⪫/gi, '&$1;')
        .replace(/#⪫/g, '⪫');
   
    // Get the decoded HTML entities
    const dec = el.textContent;
    
    // Clear the element content, in order to preserve a bit of memory (it is just the text may be pretty big)
    el.textContent = '';

    return dec;
}

// Example
console.log(decodeHTMLEntities("<script>alert('&awconint;&CounterClockwiseContourIntegral;&#x02233;&#8755;⪪#x02233⪫');</script>"));
// Prints: <script>alert('∳∳∳∳⪪##x02233⪫');</script>

By the way, I have chosen to use the characters ⪪ and ⪫, because they are rarely used, so the chance of impacting the performance by matching them is significantly lower.

nerijus · Accepted Answer · 2012-06-26 10:32:03Z

1

Chris answer is nice & elegant but it fails if value is undefined. Just simple improvement makes it solid:

function htmlDecode(value) {
   return (typeof value === 'undefined') ? '' : $('<div/>').html(value).text();
}

answered Jun 26, 2012 at 10:32

nerijus

5265 silver badges12 bronze badges

1 Comment

SynCap Over a year ago

If do improve, then do: return (typeof value !== 'string') ? '' : $('<div/>').html(value).text();

TheLethalCoder · Accepted Answer · 2018-12-13 17:23:19Z

I tried everything to remove & from a JSON array. None of the above examples, but https://stackoverflow.com/users/2030321/chris gave a great solution that led me to fix my problem.

var stringtodecode="<B>Hello</B> world<br>";
document.getElementById("decodeIt").innerHTML=stringtodecode;
stringtodecode=document.getElementById("decodeIt").innerText

I did not use, because I did not understand how to insert it into a modal window that was pulling JSON data into an array, but I did try this based upon the example, and it worked:

var modal = document.getElementById('demodal');
$('#ampersandcontent').text(replaceAll(data[0],"&amp;", "&"));

I like it because it was simple, and it works, but not sure why it's not widely used. Searched hi & low to find a simple solution. I continue to seek understanding of the syntax, and if there is any risk to using this. Have not found anything yet.

Your first propose is just a bit tricky, but it works nice without much effort. The second one, on the other hand, uses only brute force to decode characters; this means it could take a LOT of effort and time to accomplish a full decoding function. That's why no one is using that way to solve OP's problem.

David Chopin · Accepted Answer · 2019-11-25 04:39:10Z

I was crazy enough to go through and make this function that should be pretty, if not completely, exhaustive:

function removeEncoding(string) {
    return string.replace(/&Agrave;/g, "À").replace(/&Aacute;/g, "Á").replace(/&Acirc;/g, "Â").replace(/&Atilde;/g, "Ã").replace(/&Auml;/g, "Ä").replace(/&Aring;/g, "Å").replace(/&agrave;/g, "à").replace(/&acirc;/g, "â").replace(/&atilde;/g, "ã").replace(/&auml;/g, "ä").replace(/&aring;/g, "å").replace(/&AElig;/g, "Æ").replace(/&aelig;/g, "æ").replace(/&szlig;/g, "ß").replace(/&Ccedil;/g, "Ç").replace(/&ccedil;/g, "ç").replace(/&Egrave;/g, "È").replace(/&Eacute;/g, "É").replace(/&Ecirc;/g, "Ê").replace(/&Euml;/g, "Ë").replace(/&egrave;/g, "è").replace(/&eacute;/g, "é").replace(/&ecirc;/g, "ê").replace(/&euml;/g, "ë").replace(/&#131;/g, "ƒ").replace(/&Igrave;/g, "Ì").replace(/&Iacute;/g, "Í").replace(/&Icirc;/g, "Î").replace(/&Iuml;/g, "Ï").replace(/&igrave;/g, "ì").replace(/&iacute;/g, "í").replace(/&icirc;/g, "î").replace(/&iuml;/g, "ï").replace(/&Ntilde;/g, "Ñ").replace(/&ntilde;/g, "ñ").replace(/&Ograve;/g, "Ò").replace(/&Oacute;/g, "Ó").replace(/&Ocirc;/g, "Ô").replace(/&Otilde;/g, "Õ").replace(/&Ouml;/g, "Ö").replace(/&ograve;/g, "ò").replace(/&oacute;/g, "ó").replace(/&ocirc;/g, "ô").replace(/&otilde;/g, "õ").replace(/&ouml;/g, "ö").replace(/&Oslash;/g, "Ø").replace(/&oslash;/g, "ø").replace(/&#140;/g, "Œ").replace(/&#156;/g, "œ").replace(/&#138;/g, "Š").replace(/&#154;/g, "š").replace(/&Ugrave;/g, "Ù").replace(/&Uacute;/g, "Ú").replace(/&Ucirc;/g, "Û").replace(/&Uuml;/g, "Ü").replace(/&ugrave;/g, "ù").replace(/&uacute;/g, "ú").replace(/&ucirc;/g, "û").replace(/&uuml;/g, "ü").replace(/&#181;/g, "µ").replace(/&#215;/g, "×").replace(/&Yacute;/g, "Ý").replace(/&#159;/g, "Ÿ").replace(/&yacute;/g, "ý").replace(/&yuml;/g, "ÿ").replace(/&#176;/g, "°").replace(/&#134;/g, "†").replace(/&#135;/g, "‡").replace(/&lt;/g, "<").replace(/&gt;/g, ">").replace(/&#177;/g, "±").replace(/&#171;/g, "«").replace(/&#187;/g, "»").replace(/&#191;/g, "¿").replace(/&#161;/g, "¡").replace(/&#183;/g, "·").replace(/&#149;/g, "•").replace(/&#153;/g, "™").replace(/&copy;/g, "©").replace(/&reg;/g, "®").replace(/&#167;/g, "§").replace(/&#182;/g, "¶").replace(/&Alpha;/g, "Α").replace(/&Beta;/g, "Β").replace(/&Gamma;/g, "Γ").replace(/&Delta;/g, "Δ").replace(/&Epsilon;/g, "Ε").replace(/&Zeta;/g, "Ζ").replace(/&Eta;/g, "Η").replace(/&Theta;/g, "Θ").replace(/&Iota;/g, "Ι").replace(/&Kappa;/g, "Κ").replace(/&Lambda;/g, "Λ").replace(/&Mu;/g, "Μ").replace(/&Nu;/g, "Ν").replace(/&Xi;/g, "Ξ").replace(/&Omicron;/g, "Ο").replace(/&Pi;/g, "Π").replace(/&Rho;/g, "Ρ").replace(/&Sigma;/g, "Σ").replace(/&Tau;/g, "Τ").replace(/&Upsilon;/g, "Υ").replace(/&Phi;/g, "Φ").replace(/&Chi;/g, "Χ").replace(/&Psi;/g, "Ψ").replace(/&Omega;/g, "Ω").replace(/&alpha;/g, "α").replace(/&beta;/g, "β").replace(/&gamma;/g, "γ").replace(/&delta;/g, "δ").replace(/&epsilon;/g, "ε").replace(/&zeta;/g, "ζ").replace(/&eta;/g, "η").replace(/&theta;/g, "θ").replace(/&iota;/g, "ι").replace(/&kappa;/g, "κ").replace(/&lambda;/g, "λ").replace(/&mu;/g, "μ").replace(/&nu;/g, "ν").replace(/&xi;/g, "ξ").replace(/&omicron;/g, "ο").replace(/&piρ;/g, "ρ").replace(/&rho;/g, "ς").replace(/&sigmaf;/g, "ς").replace(/&sigma;/g, "σ").replace(/&tau;/g, "τ").replace(/&phi;/g, "φ").replace(/&chi;/g, "χ").replace(/&psi;/g, "ψ").replace(/&omega;/g, "ω").replace(/&bull;/g, "•").replace(/&hellip;/g, "…").replace(/&prime;/g, "′").replace(/&Prime;/g, "″").replace(/&oline;/g, "‾").replace(/&frasl;/g, "⁄").replace(/&weierp;/g, "℘").replace(/&image;/g, "ℑ").replace(/&real;/g, "ℜ").replace(/&trade;/g, "™").replace(/&alefsym;/g, "ℵ").replace(/&larr;/g, "←").replace(/&uarr;/g, "↑").replace(/&rarr;/g, "→").replace(/&darr;/g, "↓").replace(/&barr;/g, "↔").replace(/&crarr;/g, "↵").replace(/&lArr;/g, "⇐").replace(/&uArr;/g, "⇑").replace(/&rArr;/g, "⇒").replace(/&dArr;/g, "⇓").replace(/&hArr;/g, "⇔").replace(/&forall;/g, "∀").replace(/&part;/g, "∂").replace(/&exist;/g, "∃").replace(/&empty;/g, "∅").replace(/&nabla;/g, "∇").replace(/&isin;/g, "∈").replace(/&notin;/g, "∉").replace(/&ni;/g, "∋").replace(/&prod;/g, "∏").replace(/&sum;/g, "∑").replace(/&minus;/g, "−").replace(/&lowast;/g, "∗").replace(/&radic;/g, "√").replace(/&prop;/g, "∝").replace(/&infin;/g, "∞").replace(/&OEig;/g, "Œ").replace(/&oelig;/g, "œ").replace(/&Yuml;/g, "Ÿ").replace(/&spades;/g, "♠").replace(/&clubs;/g, "♣").replace(/&hearts;/g, "♥").replace(/&diams;/g, "♦").replace(/&thetasym;/g, "ϑ").replace(/&upsih;/g, "ϒ").replace(/&piv;/g, "ϖ").replace(/&Scaron;/g, "Š").replace(/&scaron;/g, "š").replace(/&ang;/g, "∠").replace(/&and;/g, "∧").replace(/&or;/g, "∨").replace(/&cap;/g, "∩").replace(/&cup;/g, "∪").replace(/&int;/g, "∫").replace(/&there4;/g, "∴").replace(/&sim;/g, "∼").replace(/&cong;/g, "≅").replace(/&asymp;/g, "≈").replace(/&ne;/g, "≠").replace(/&equiv;/g, "≡").replace(/&le;/g, "≤").replace(/&ge;/g, "≥").replace(/&sub;/g, "⊂").replace(/&sup;/g, "⊃").replace(/&nsub;/g, "⊄").replace(/&sube;/g, "⊆").replace(/&supe;/g, "⊇").replace(/&oplus;/g, "⊕").replace(/&otimes;/g, "⊗").replace(/&perp;/g, "⊥").replace(/&sdot;/g, "⋅").replace(/&lcell;/g, "⌈").replace(/&rcell;/g, "⌉").replace(/&lfloor;/g, "⌊").replace(/&rfloor;/g, "⌋").replace(/&lang;/g, "⟨").replace(/&rang;/g, "⟩").replace(/&loz;/g, "◊").replace(/&#039;/g, "'").replace(/&amp;/g, "&").replace(/&quot;/g, "\"");
}

Used like so:

let decodedText = removeEncoding("Ich hei&szlig;e David");
console.log(decodedText);

Prints: Ich Heiße David

P.S. this took like an hour and a half to make.

& needs to be replaced last; otherwise " will be incorrectly decoded twice to " rather than once to ".

Daniel · Accepted Answer · 2020-02-13 00:57:01Z

0

This is the most comprehensive solution I've tried so far:

const STANDARD_HTML_ENTITIES = {
    nbsp: String.fromCharCode(160),
    amp: "&",
    quot: '"',
    lt: "<",
    gt: ">"
};

const replaceHtmlEntities = plainTextString => {
    return plainTextString
        .replace(/&#(\d+);/g, (match, dec) => String.fromCharCode(dec))
        .replace(
            /&(nbsp|amp|quot|lt|gt);/g,
            (a, b) => STANDARD_HTML_ENTITIES[b]
        );
};

answered Feb 13, 2020 at 0:57

Daniel

1,7492 gold badges17 silver badges20 bronze badges

1 Comment

Dan Dascalescu Over a year ago

"The most comprehensive"? Have you tried running it against an actually comprehensive test suite?

weiya ou · Accepted Answer · 2020-10-29 08:14:16Z

0

Closures can avoid creating unnecessary objects.

const decodingHandler = (() => {
  const element = document.createElement('div');
  return text => {
    element.innerHTML = text;
    return element.textContent;
  };
})();

A more concise way

const decodingHandler = (() => {
  const element = document.createElement('div');
  return text => ((element.innerHTML = text), element.textContent);
})();

answered Oct 29, 2020 at 8:14

weiya ou

4,7041 gold badge21 silver badges26 bronze badges

1 Comment

shwz Over a year ago

wouldnt innerHTML introduce XSS vulnerability here as string is is being passed into it? Better to use innertText

tmx976 · Accepted Answer · 2017-07-14 08:09:52Z

I use this in my project: inspired by other answers but with an extra secure parameter, can be useful when you deal with decorated characters

var decodeEntities=(function(){

    var el=document.createElement('div');
    return function(str, safeEscape){

        if(str && typeof str === 'string'){

            str=str.replace(/\</g, '&lt;');

            el.innerHTML=str;
            if(el.innerText){

                str=el.innerText;
                el.innerText='';
            }
            else if(el.textContent){

                str=el.textContent;
                el.textContent='';
            }

            if(safeEscape)
                str=str.replace(/\</g, '&lt;');
        }
        return str;
    }
})();

And it's usable like:

var label='safe <b> character &eacute;ntity</b>';
var safehtml='<div title="'+decodeEntities(label)+'">'+decodeEntities(label, true)+'</div>';

General Grievance · Accepted Answer · 2024-08-02 12:31:19Z

-1

function decodeEntities(input) {
  const temp = document.createElement('div');
  temp.innerHTML = `<data value="${input.replaceAll('"', '&quot;')}"></data>`;
  return temp.firstElementChild.value;
}

console.log(decodeEntities('&lt;img src="x"&gt;&lt;/img&gt;'));
console.log(decodeEntities('"/><img src="x" onerror="alert(\'xss\')"></img>'));

edited Aug 2, 2024 at 12:31

General Grievance

5,12039 gold badges39 silver badges58 bronze badges

answered Jul 31, 2024 at 13:38

firejox

314 bronze badges

3 Comments

General Grievance Over a year ago

Sorry about the edit... I accidentally hit "Enter" when testing...

General Grievance Over a year ago

Hm... I guess this does work, but why do this instead of just using DOMParser?

ruud Over a year ago

given the number of other replies, is a code only answer the best you can offer?

Luis Lobo · Accepted Answer · 2024-09-01 16:43:02Z

UPDATE 09-01-2024

This worked for my purpose

 html-escaping.js v1
function htmlEscaping(string) { // https://stackoverflow.com/a/7382028/11212275
    return string.replace(/&/g, "&amp;")
        .replace(/</g, "&lt;")
        .replace(/>/g, "&gt;")
        .replace(/"/g, "&quot;")
        .replace(/'/g, "&#39;")
}

OLD (previous version containing errors)

It only works in simple specific cases

// decode-html.js v1
function decodeHtml(html) {
    const textarea = document.createElement('textarea');
    textarea.innerHTML = html;
    const decodedHtml = textarea.textContent;
    textarea.remove();
    return decodedHtml;
};

// encode-html.js v1
function encodeHtml(html) {
    const textarea = document.createElement('textarea');
    textarea.textContent = html;
    const encodedHtml = textarea.innerHTML;
    textarea.remove();
    return encodedHtml;
};

// example of use:
let htmlDecoded = 'one & two & three';
let htmlEncoded = 'one &amp; two &amp; three';

console.log(1, htmlDecoded);
console.log(2, encodeHtml(htmlDecoded));

console.log(3, htmlEncoded);
console.log(4, decodeHtml(htmlEncoded));

EricP · Accepted Answer · 2017-10-07 01:48:48Z

All of the other answers here have problems.

The document.createElement('div') methods (including those using jQuery) execute any javascript passed into it (a security issue) and the DOMParser.parseFromString() method trims whitespace. Here is a pure javascript solution that has neither problem:

function htmlDecode(html) {
    var textarea = document.createElement("textarea");
    html= html.replace(/\r/g, String.fromCharCode(0xe000)); // Replace "\r" with reserved unicode character.
    textarea.innerHTML = html;
    var result = textarea.value;
    return result.replace(new RegExp(String.fromCharCode(0xe000), 'g'), '\r');
}

TextArea is used specifically to avoid executig js code. It passes these:

htmlDecode('&lt;&amp;&nbsp;&gt;'); // returns "<& >" with non-breaking space.
htmlDecode('  '); // returns "  "
htmlDecode('<img src="dummy" onerror="alert(\'xss\')">'); // Does not execute alert()
htmlDecode('\r\n') // returns "\r\n", doesn't lose the \r like other solutions.

No, using a different tag does not solve the issue. This is still an XSS vulnerability, try htmlDecode("</textarea><img src=x onerror=alert(1)>"). You posted this after I already pointed out this issue on the answer by Sergio Belevskij.
I'm unable to reproduce the issue you describe. I have your code in this JsFiddle, and no alert displays when running. jsfiddle.net/edsjt15g/1 Can you take a look? What browser are you using?
I'm using Firefox. Chrome indeed handles this scenario differently, so the code doesn't execute - not something you should rely on however.

Collectives™ on Stack Overflow

Unescape HTML entities in JavaScript?

34 Answers 34

21 Comments

8 Comments

17 Comments

2 Comments

1 Comment

6 Comments

3 Comments

1 Comment

2 Comments

1 Comment

7 Comments

3 Comments

Comments

Comments

1 Comment

Comments

3 Comments

4 Comments

Comments

1 Comment

Comments

1 Comment

1 Comment

2 Comments

1 Comment

1 Comment

Comments

3 Comments

Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

34 Answers 34

21 Comments

8 Comments

17 Comments

2 Comments

1 Comment

6 Comments

3 Comments

1 Comment

2 Comments

1 Comment

7 Comments

3 Comments

Comments

Comments

1 Comment

Comments

3 Comments

4 Comments

Comments

1 Comment

Comments

1 Comment

1 Comment

2 Comments

1 Comment

1 Comment

Comments

3 Comments

Comments

3 Comments

Linked

Related