2

I need to replace less or greater than(< >) characters, but keep any html tags(simple tags will do like <b>text</b> without arguments). So the input below:

<b>> 0.5 < 0.4</b> - <>

Should be:

<b>&gt; 0.5 &lt; 0.4</b> - &lt;&gt;

All I managed to find and edit now is this expr:

<\/?[a-z][a-z0-9]*[^>\s]*>|([<>])

It groups the < and > characters but it also matches tags witch I don't need to replace

UPD: Thanks to @Sree Kumar, here's the final functions:

String.prototype.replaceAt = function (index, char) {
    let arr = this.split('');
    arr[index] = char;
    return arr.join('');
};

String.prototype.escape = function () {
    let p = /(?:<[a-zA-Z]+>)|(?:<\/[a-zA-Z]+>)|(?<lt><)|(?<gt>>)/g,
        result = this,
        match = p.exec(result);

    while (match !== null) {
        if (match.groups.lt !== undefined) {
            result = result.replaceAt(match.index, '&lt;');
        }
        else if (match.groups.gt !== undefined) {
            result = result.replaceAt(match.index, '&gt;');
        }
        match = p.exec(result);
    }
    return result;
};
5
  • What tags are "simple"? Please provide clear requirements. Commented Dec 7, 2021 at 12:21
  • Does this answer your question? regex to escape non-html tags' angle brackets Commented Dec 7, 2021 at 12:23
  • @WiktorStribiżew any tags without arguments will do, I need to keep simple html formatting. It'll be good if I manage to keep tags even with arguments, but it's not necessary Commented Dec 7, 2021 at 12:24
  • Something like .replace(/<\s*\/?\s*\w+\s*\/?\s*>|(<)|(>)/g, (m, g1, g2) => g2 ? '&gt;' : g1 ? '&lt;' : m) might work. Or .replace(/<\s*\/?\s*\w+[^>]*>|(<)|(>)/g, (m, g1, g2) => g2 ? '&gt;' : g1 ? '&lt;' : m). This is not precise, but might be enough. To make it more precise you will need to list the tags, to avoid matching <my_word> like strings. Commented Dec 7, 2021 at 12:32
  • 1
    Are you open to using named groups? Then you may name the one you are interested in and get only that group. If it is null, discard it. Commented Dec 7, 2021 at 12:52

3 Answers 3

2

Here is a way to do it using named groups. That is, name your desired group and look for it. It may be null or undefined at times because it didn't match. Hence, you will have to add the null check.

Notice (?<B>...) surrounding the "desired" group. Also, notice the null check in the 5th line.

let p = /(?:<[a-zA-Z]+>)|(?:<\/[a-zA-Z]+>)|(?<B>[<>])/g
let input = '<b>> 0.5 < 0.4</b> - <>';
let match = p.exec( input );
while( match !== null) {
    if( match.groups.B !== undefined ) console.log( match.groups.B );
    match = p.exec( input )
}
Sign up to request clarification or add additional context in comments.

Comments

2

Try this regex:

<(?!\/?\w+>)|(?<!<\w+|<\/\w+)>

Explanation

  • <(?!\/?\w+>) finds all '<' symbols (except in tags)
  • (?<!<\w+|<\/\w+)> finds all '>' symbols (except in tags)

You can use them separately:

let str = '<b>> 0.5 < 0.4</b> - <>';
let lessThen = /<(?!\/?\w+>)/g;
let greaterThen = /(?<!<\w+|<\/\w+)>/g;

str = str.replace(lessThen, '&lt;');
str = str.replace(greaterThen, '&gt;');

console.log(str); // <b>&gt; 0.5 &lt; 0.4</b> - &lt;&gt;

NB! It only finds symbols '<' and '>' between tags. It doesn't check that html is valid. For text like that <a></b> it will not find any matches.

1 Comment

How would you expand this to also support classes and other HTML attributes that tags can have?
0

I've came across this very same problem. The answer provided in the post is good, but doesn't handle correctly some (huge) cases: a tag is recognized as such only when the tag is closed right after the first word, but isn't when there's a space.

For example, <hello> is recognized as a valid tag, while <hello > isn't. This is (quite a big) problem, as tags with inline styling or as simple as won't be recognized as such. To address this, the regular expression can be modified to handle attributes as well. Here's the updated regex:

(?:<[a-zA-Z]+\s*[^>]*>)|(?:<\/[a-zA-Z]+>)|(?<lt><)|(?<gt>>)

This modification allows for optional whitespace and attributes within the opening tag. Now, it should match tags with or without attributes correctly.

This can be incorporated into your escape function like this:

String.prototype.escape = function () {
    let p = /(?:<[a-zA-Z]+\s*[^>]*>)|(?:<\/[a-zA-Z]+>)|(?<lt><)|(?<gt>>)/g,
    result = this,
    match = p.exec(result);

    while (match !== null) {
        if (match.groups.lt !== undefined) {
            result = result.replaceAt(match.index, '&lt;');
        } else if (match.groups.gt !== undefined) {
            result = result.replaceAt(match.index, '&gt;');
        }
        match = p.exec(result);
    }
    return result;
};

With these changes, it should now handle HTML tags with attributes correctly while replacing the < and > characters outside of HTML tags.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.