1

This is my html string:

<p style="opacity: 1; color: #000000; font-weight: bold; font-style: italic; text-decoration: line-through; background-color: #ffffff;">100 gram n!uts</p>

I want to get the font-weight value, if there is one. How do i do this with regex?

3
  • 3
    With an HTML Parser. Do you consider using one? Commented Aug 31, 2015 at 8:31
  • @stribizhev no, I need to do it with regex Commented Aug 31, 2015 at 8:33
  • @petko_stankoski why do you need to do it with a regex? Regular expressions can't parse every input and HTML text is one of the cases where parsing works only in limited cases. Commented Aug 31, 2015 at 8:37

2 Answers 2

2

this should solve it

(?<=font-weight: )[0-9A-Za-z]+(?=;)

Explaination:

(?<=font-weight: ) the string previous to the result has to be font-weight:

[0-9A-Za-z]+ the result contains only letters and digits, at least one

(?=;) the first char after the result is a ;

Code:

string Pattern = @"(?<=font-weight: )[0-9A-Za-z]+(?=;)";
string Value = "<p style=\"opacity: 1; color: #000000; font-weight: bold; font-style: italic; text-decoration: line-through; background-color: #ffffff;\">100 gram n!uts</p>";
string Result = Regex.Match(Value, Pattern).Value; //bold
Sign up to request clarification or add additional context in comments.

Comments

0

If you plan to use some HTML parser in future, you might want to have a look at CsQuery. Just install the NuGet package for your solution and use it as shown in my snippet below.

var html = "<p style=\"opacity: 1; color: #000000; font-weight: bold; font-style: italic; text-decoration: line-through; background-color: #ffffff;\">100 gram n!uts</p>";
var cq = CsQuery.CQ.CreateFragment(html);
foreach (var obj in cq.Select("p"))
{
    var style = string.Empty;
    var has_attr = obj.TryGetAttribute("style", out style);
    if (has_attr)
    {
       // Using LINQ and string methods
       var fontweight = style.Split(';').Where(p => p.Trim().StartsWith("font-weight:")).FirstOrDefault();
       if (!string.IsNullOrWhiteSpace(fontweight.Trim()))
           Console.WriteLine(fontweight.Split(':')[1].Trim());
       // Or a regex
       var font_with_regex = Regex.Replace(style, @".*?\bfont-weight:\s*([^;]+).*", "$1", RegexOptions.Singleline);
       Console.WriteLine(font_with_regex);
    }
}

Note that running a regex replacement is quite safe now, since we only have a plain short string, with no optional quotes around, nor tags to care of.

If you need to load an URL, use

var cq = CsQuery.CQ.CreateFromUrl("http://www.example.com");

This is really much safer than using this regex that is hard to read and is likely to fail with a huge input text:

<p\s[^<]*\bstyle="[^<"]*\bfont-weight:\s*([^"<;]+)

1 Comment

Please check my answer, I tried to provide as complete answer as I could. Certainly, there are other parsers you might consider: HtmlAgilityPack, Fizzler, Angle Sharp. Use regex only when necessary, not just for every string-related task. There are strings and strings, you know.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.