3

I have the following string:

 string error = "<MESSAGES><MESSAGE SEVERITY=\"2\" NUMBER=\"16\" TEXT=\"The Record Case is locked by user\" /></MESSAGES>";

I want to match between the TEXT=\" and the following \

I'm using the following expression var regex = new Regex(@"TEXT=\\""(.*?)\\");

Expresso tells me this regex is correct. RegExr tells me this regex is correct.

But C# disagrees.

I've tried

  • Groups[] and match.Value.
  • \x22 instead of " as I thought it might be an escape problem.
  • /TEXT=\""(.*?)\/g

All, to no avail.

What am I missing?

3
  • Useful for this kind of stuff is Linqpad ( linqpad.net ) which basically gives you a C# console you can use to practice on. Commented Jun 1, 2015 at 8:15
  • 2
    The string doesn't have the backslashes, it's just to escape the quotes. You must not have them in your regexp Commented Jun 1, 2015 at 8:17
  • That's XML, can't you just parse it as an XElement and read the text attribute elem.Attribute("text").Value etc... Commented Jun 1, 2015 at 8:22

3 Answers 3

8

Use XElement, you have an XML fragment:

var error = "<MESSAGES><MESSAGE SEVERITY=\"2\" NUMBER=\"16\" TEXT=\"The Record Case is locked by user\" /></MESSAGES>";
var xe = XElement.Parse(error);
var res = xe.Elements("MESSAGE")
                   .Where(p => p.HasAttributes && p.Attributes("TEXT") != null)
                   .Select(n => n.Attribute("TEXT").Value)
                   .ToList();

Output:

enter image description here

Mind that with very large input strings, .*? may cause catastrophic backtracking, that is why you should avoid using it whenever possible. If you need a regex for this (because some of your input is not XML-valid), you can use:

var attr_vals = Regex.Matches(error, @"(?i)\bTEXT=""([^""]+)""")
             .OfType<Match>()
             .Select(p => p.Groups[1].Value)
             .ToList();

(2 times faster than Karthik's, tested on regexhero.com)

Output:

enter image description here

Mind that with regex, you will get all XML entities untouched (e.g. &amp; and not &). You will have to use System.Web.HttpUtility later.

Sign up to request clarification or add additional context in comments.

3 Comments

2 times faster... indeed.. +1 :)
+1 - this is the best implementation by a long shot - although I do feel like I need to mark Phuongs answer as correct.
Please note that Phuong's solution (that is actually @"TEXT=""(.*?)""") is worse in principle because (.*?) lazy matching is less safe compared to negated character class. If you need to match escaped sequences, it is not at all correct, you need to use @"\bTEXT=""[^""\\]*(?:\\.[^""\\]*)*" then.
2

Use the following (your actual string will be compiled to a string without \'s.. since you are just using them as escape characters):

var regex = new Regex(@"TEXT=""([^""]+)""");

1 Comment

the (.*?) should definitely be optimized into something like ([^"*]) to prevent the catastrophic backtracking coming with that matcher. If the value may contain quotation marks a lookbehind expression would be the better option
1

This works for me:

Regex.Match(error, "TEXT=\\\"(.*?)\\\"")

You need to escape both \ and " character with \

2 Comments

This was the correct answer to my question -"Why is my regex not working" - this made it work - it was an escape error -.-.I am however going with stribizhevs answer as this is a nicer implementation. Thanks Phuong
Actually, the regex is the same as "TEXT=\"(.*?)\"". You do not have to escape " in a .NET regular expression (see Character Escapes in Regular Expressions reference).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.