1

I have following regex:

^(<span style=.*?font-weight:bold.*?>.*?</span>)

It matches the following code:

<span style="font-family:Arial; font-size:10pt"> r.</span></p><p style="margin:0pt"><span style="font-family:Arial; font-size:10pt; font-weight:bold">&#xa0;</span>

But I would like to match only this part (last span containing font-weight:bold style)

<span style="font-family:Arial; font-size:10pt; font-weight:bold">&#xa0;</span>
7
  • 1
  • 1
    Do not try to parse HTML with regular expressions. Go get the Html Agility Pack. Commented Jul 30, 2013 at 13:55
  • 2
    Guys! Kamil didn't ask whether parsing HTML using Regex is a good idea. He asked a nice and specific question about how to have his regex match a different part of the provided string. The fact that his string happens to look like HTML is completely irrelevant for this question. No need for the HTML-Regex-kneejerk-reflex... Commented Jul 30, 2013 at 13:57
  • 3
    @Mels - No, Kamil is about to shoot himself in the foot and various other body parts. We cannot, through inaction, allow a human being to come to harm. Commented Jul 30, 2013 at 14:07
  • 1
    @Mels The fact that his string happens to look like HTML is completely relevant as it shines light on the classic XY problem happening here. The OP is asking how to make his "solution" work, when he's clearly using the wrong tools for the job. When he comes back an hour later with another question about matching something else, it'll only add to the pollution on SO. Commented Jul 30, 2013 at 14:11

3 Answers 3

7

Use HTML Agility Pack to parse html:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlContent);

var boldSpans = from s in doc.DocumentNode.SelectNodes("//span")
                let style = s.Attributes["style"].Value
                where style.Contains("font-weight:bold")
                select s;

Or even better xpath, which selects all nodes in one line:

doc.DocumentNode.SelectNodes("//span[contains(@style, 'font-weight:bold')]")
Sign up to request clarification or add additional context in comments.

3 Comments

I actually prefer the first - it's easier to read in my opinion.
@dav_i that's why I leaved both options :)
Thanks!! I have HTML generated by external library so I assumed that the structure (way of creation) of HTML will be constans. Anyway HTML Agility Pack is better option :)
1

Don't use ^ since the line doesn't start with the span you want to match.

<span style=["'][^'"]*font-weight:bold[^'"]*['"]>[^<]*</span>

Or as escaped string:

"<span style=[\"'][^'\"]*font-weight:bold[^'\"]*['\"]>[^<]*</span>"

This matches strings starting with <span style= followed by single or double quote ', ". Then [^'"]* allows all characters except ending quotes.

Match string font-weight:bold, followed again by any amount of characters except ending qoutes leading up to the real ending qoutes and ending tag: [^'"]*['"]>.

(Note that you might or might not want to allow more attributes before and after the style attribute. In that case you need to alter the regex)

span may contain any amount of any characters except start tag <, then string has to end with closing </span> tag.

Comments

0

remove the ^, because it means beginning of the line. Therefore it will always get the first span. More so because .* means (any characters at all).

doing this the first match may stil be the output you have now, but the second match should be what you're after.

Furthermore tools like regexbuddy and such are good for testing Regex's.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.