2

I need to unescape strings in Javascript, but my string sometimes is already unescaped, and other times it is not:

// String 1
<?xml version="1.0" encoding="UTF-8" ?>

// String 2
<?xml version="1.0" encoding="UTF-8"?>

I use the following method:

function htmlDecode(input)
{
  var doc = new DOMParser().parseFromString(input, "text/html");
  return doc.documentElement.textContent;
}

But the problem is that when I "decode" string 2, the answer comes out as ?xml version="1.0" encoding="UTF-8"?

Help is appreciated.

2 Answers 2

3

You can do a regex check on the string to see if encoded versions of the characters exist. If they do exist, then do the decode, otherwise, just return back what you handed in to the function.

var string1 = '&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;';
var string2 = '<?xml version="1.0" encoding="UTF-8"?>';

function decode(input) {
  if (/&amp;|&quot;|&#39;|'&lt;|&gt;/.test(input)) {
    var doc = new DOMParser().parseFromString(input, "text/html");
    return doc.documentElement.textContent;
  }
  return input;
}

console.log(decode(string1));
console.log(decode(string2))


Even Simpler (and better):

This method requires no regex and will always return back the unescaped strings without "over-unescaping" them:

var string1 = '&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;';
var string2 = '<?xml version="1.0" encoding="UTF-8"?>';

function decode(input) {
  var txt = document.createElement("textarea");
  txt.innerHTML = input;
  return txt.value;
}

console.log(decode(string1));
console.log(decode(string2))

Sign up to request clarification or add additional context in comments.

Comments

-1

You can use this if your function doesn't have access to the DOMParser function (For example in a web worker). This is probably not good for long strings, as it does a search of the entire string each time.

If you are using this with long strings, you might want to use the second argument of indexOf to pass the starting index that you want to search the string from. Of course, that might require rewriting the function other than how this is done.

/** 
* Replace XML chars with HTML chars
**/
function unescapeXml( str ){
    [ "&lt;", "&gt;", "&amp;", "&quot;", "&apos;" ].forEach(function(x, i){
        if( str.indexOf(x) > -1  )
            str = str.replace(new RegExp(x, "gi"), [ '<', '>', '&', '"', '\'' ][i]);
    });
    return str;
}
var string1 = '&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;';
console.log(unescapeXml(string1));

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.