0

I'm trying to parse some text so that _this is emphasized!_ is wrapped in <em> tags like so: <em>this is emphasized!</em>.

My component currently looks like this:

export default class TextParser extends React.Component {
  render() {
    let text = this.props.text,
        parsed, regex, paragraphs;

    regex = {
      paragraph: /(?:\r\n){2,}/g,
      emphasize: /\_(.*?)\_/g,
      strong: /\*(.*?)\*/g,
    }

    // Apply regex
    text = text.replace(regex.emphasize, (str) => {
      let parsed = str.substr(1, str.length - 1);

      return ('<em>' + parsed + '</em>')
    })

    paragraphs = text.split(regex.paragraph) || []
    paragraphs = paragraphs.map((text, i) => {
      return (
        <p key={i}>
          {text}
        </p>
      )
    })

    return (
      <div className="document">{paragraphs}</div>
    )
  }
}

This does not work, however the output html displays the tags in plain text instead of using them in the html. This is of course because of sanitization.

I could dangerouslySetInnerHTML but I want to avoid that. How can I replace the underscores between text with <em> tags?

1 Answer 1

2

As you noticed, placing the string "<em>" as part of the result of replace just adds that string and not an actual tag.

You will not be able create tags directly inside of replace because that is operating on a string.

Instead, break the string up into separate elements and add the tags where you need them. You already do something like this in the paragraph case.

Because the paragraph case also operates on a string, these kind of operations can only be done nested, since once you complete the operation you no longer have a plain text string, you have an array of objects. So in this example I moved the <em> parsing inside the paragraph parsing.

One last note, I had to modify the regex for emphasize so that it captured the underscores, because I need to check again whether it was a match or not after I have done the split.

let text = this.props.text,
    parsed, regex, paragraphs;

regex = {
  paragraph: /(?:\r\n){2,}/g,
  emphasize: /(\_.*?\_)/g,
  strong: /\*(.*?)\*/g,
}

paragraphs = text.split(regex.paragraph) || []
paragraphs = paragraphs.map((text, i) => {
  return (
    <p key={i}>
      {        
           // Apply regex
           text.split(regex.emphasize).map((str) => {
           let parsed = str.search(regex.emphasize) !== -1 
              ? (<em>{str.substr(1, str.length - 2)}</em>) 
              : str;
            return parsed;
        })}
    </p>
  )
})

return (
  <div className="document">{paragraphs}</div>
)

Based on your comments below, you also want to know how to handle either/or formatting case. So for completeness I have included the code for that here. I chose to combine the formatting patterns into a single regex, and then I explicitly check for '_' or '*' to decide whether to add em or b tags. I then recursively call this when there is a match, in case there are additional matches within. You may choose to clean this up differently, but I hope this helps.

let text = this.props.text,
    parsed, regex, paragraphs;

regex = {
  paragraph: /(?:\r\n){2,}/g,
  formatting: /(\_.*?\_)|(\*.*?\*)/g,
}

  let applyFormatting = (text) => {
    return text.split(regex.formatting).filter(n => n).map((str) => {
    let parsed = str[0] == '_'
        ? (<em>{applyFormatting(str.substr(1, str.length - 2))}</em>)
        : str[0] == '*'
        ? (<b>{applyFormatting(str.substr(1, str.length - 2))}</b>)
        : str;
    return parsed;
  });
};

paragraphs = text.split(regex.paragraph) || []
paragraphs = paragraphs.map((text, i) => {
  return (
    <p key={i}>
      { applyFormatting(text) }
    </p>
  )
})

return (
  <div className="document">{paragraphs}</div>
)
Sign up to request clarification or add additional context in comments.

7 Comments

Great answer! Is there a dry way to make it parse the * characters and turn them into <strong> as well?
This might be a bit trickier if it is possible to nest them. If not, then it is probably not too tough. Can we ignore the possibility of Italicized *and bold* text or do you need to support that type of nested emphasize/strong?
I need both, and possibly more in the future. This is so users can have rich text in their content.
I see, so you cannot just split once and then apply the tags. You will need to each time you find a match, apply the tags, and then continue to search for the other matches. Since they can be in any order (e.g. bold _and italic_ text) this will have to be done so that it finds the outermost tags first and then continues to search the resulting substrings.
This sounds incredibly complicated. Regex is such a braintwister for me. Can you help me?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.