5

I got a good email vaidation regex from: Email regular expression

    public static void Main(string[] args)
    {
        string value = @"cvcvcvcvvcvvcvcvcvcvcvvcvcvcvcvcvvccvcvcvc";
        var regex = new Regex(
            @"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$",
            RegexOptions.Compiled);
        var x = regex.Match(value); // Hangs here !?!
        return;
    }

It works in most cases, but the code above hangs, burning 100% CPU... I've tested in a W8 metro App. and on a standard .Net 4.5 app.

Can anyone tell me why this happens, and if there is a good email validation REGEX that doesn't hang, or if there is a way to fix this one?

Many thanks, Jon

2
  • 1
    This may help you find out why it hangs. This may help you find out how to match email addresses properly with regex. Commented Oct 26, 2012 at 13:23
  • You should read this in order to create a proper email matching regex regular-expressions.info/email.html Commented Oct 26, 2012 at 13:34

3 Answers 3

15

The explanation why it hangs: Catastrophic backtracking.

Let's simplify the crucial part of the regex:

(\w*[0-9a-zA-Z])*@

You have

  • an optional part \w* that can match the same characters as the following part [0-9a-zA-Z], so the two combined translate, in essence, to \w+
  • nested quantifiers: (\w+)*

This means that, given s = "cvcvcvcvvcvvcvcvcvcvcvvcvcvcvcvcvvccvcvcvc", this part of the regex needs to check all possible permutations of s (which number at 2**(len(s)-1)) before deciding on a non-match when the following @ is not found.

Since you cannot validate an e-mail address with any regex (there are far too many corner cases in the spec), it's usually best to

  • do a minimal regex check (^.*@.*$)
  • use a parser to check validity (like @Fake.It.Til.U.Make.It suggested)
  • try and send e-mail to it - even a seemingly valid address may be bogus, so you'd have to do this anyway.

Just for completeness, you can avoid the backtracking issues with the help of atomic groups:

var regex = new Regex(
    @"^([0-9a-zA-Z](?>[-.\w]*[0-9a-zA-Z])*@(?>[0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$",
    RegexOptions.Compiled);
Sign up to request clarification or add additional context in comments.

1 Comment

Hi, Thanks for the detailed answer :-) I'll go with a validation like "do a minimal regex check (^.*@.*$)" - being as we're really just trying to help the user avoid typos like typing e.g. '..'. If they enter the wrong address, it's not the end of the world as we have other email recovery mechanisms. Cheers, Jon
4

Never ever use regex to validate an email..

You can use MailAddress class to validate it

try 
{
    address = new MailAddress(address).Address;
   //address is valid
} 
catch(FormatException)
{
    //address is invalid
}

3 Comments

Hi, I do like that approach, but unfortunately 'System.Net.Mail.MailAddress' isn't available in 'Win8 C#' / WinRT. Do you know an alternative which is available? It also doesn't answer why the above regex is hanging. Thanks, Jon
@JonRea in you regex u r using - in [] which need to be escaped like this: \-
@Fake.It.Til.U.Make.It: No, the - only needs to be escaped in a character class if it's not the first or last character.
1

guess it's because of [-.\w] in regex, try to use this:

^[a-zA-Z0-9_-]+(?:\.[a-zA-Z0-9_-]+)*@(?:(\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$

Also, in .net 4.5 EmailAttribute should be available, not sure though

2 Comments

regex is not good for email validation..an actual regex for email id would be far,far,far bigger than this...
It only depends on how you see correct email. MailAddress class may use regex for email validation too - reflect it :). Also email may be country specific, so regex is a way to go for me

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.