Email Validation Regular Expression

Posted on October 4, 2007
Filed Under Web Programming, email | 6 Comments

When I was looking for a regular expression to validate email addresses the examples I found tended to fall into two camps. Camp one were simple, easily confirmed, regular expressions that just as obviously did little beyond insisting on an @ in the middle. Camp two were more complicated than I wished to reverse engineer and did not display their derivation so that I could check what I was using, often not listing their trade-offs either.

Therefore, I developed my own and, in order to help those with similar requirements, I present it here, along with its known trade-offs and its derivation (in reverse order).

The RegEx Itself

If you’re in a hurry.

and in a PHP single quoted string:

The Tradeoffs

The Derivation

RFC 822 defines an address in the following fashion:

addr-spec   =  local-part "@" domain        ; global address

local-part  =  word *("." word)             ; uninterpreted

word        =  atom / quoted-string

atom        =  1*

CTL         =            ; (    177,     127.)

specials    =  "(" / ")" / "<" / ">" / "@"       ; Must be in quoted-
                 /  "," / ";" / ":" / "\" / <">  ;  string, to use
                 /  "." / "[" / "]"              ;  within a word.

quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or
                                            ;   quoted chars.

qtext       =  ,     ; => may be folded
                     "\" & CR, and including
                     linear-white-space>
quoted-pair =  "\" CHAR                     ; may quote any char

CHAR        =          ; (  0-177,  0.-127.)

To take this step by step.

quoted pair:

qtext: (Not strictly to spec as I’ve refused to handle LF as well as CR just in case of poor handling, further down the line, of header injection attempts.)

quoted-string: (not strictly to spec as I’m insisting on at least one character between the quotes.)

atom:

word:

local-part:

As mentioned above the domain part was not taken from the spec. The reg ex for the domain assumes there will be two or more dot separated sections (take note if you want to allow email addresses in top level domains or for internal systems) where each section consists solely of letters, numbers or hyphens and cannot start or end with a hyphen.

section:

domain:

Put it all together it makes the regular expression at the top of the page and repeated here for convenience:

and in a PHP single quoted string:

Comments

6 Responses to “Email Validation Regular Expression”

  1. Rudolf Olah on October 4th, 2007 14:54 UTC

    RFC 822 is obsoleted by RFC 2822. It says so right at the top of the RFC. So why not just go straight for RFC 2822 implementation rather than implementing the obsoleted version?

    In any case, using regular-expressions is stupid for this because they are ugly and unreadable and how do you *really* know that it validates all email addresses properly? If you read the the RFCs, you’ll see that there is a language grammar. You can generate a parser/validator using it. Why not hand-code (or generate) a function rather than fucking around with regular-expressions?

  2. admin on October 4th, 2007 15:20 UTC

    Firstly, if I were to implement 2822 in the same fashion as I have 822 (as I still intend to) then it will include this as the 822 local-part definition is used in the local-part definition for 2822. I have not yet had time to do 2822.

    Secondly, the inclusion of the regex formatted for PHP might give you a clue what language I was using when I developed this. I had limited time and could not find a php parser generator in time. Still can’t. You don’t always have the option, depending on your hosting, to do things in other languages and connect them to PHP neatly. If you can point me in the direction of a parser generator for PHP then I for one would be happy to stop “fucking around with regular expressions” in this case.

  3. Rudolf Olah on October 4th, 2007 23:45 UTC

    I just Googled for “LALR PHP” and found this:
    http://pear.php.net/pepr/pepr-proposal-show.php?id=416

    That could be useful. I would still recommend writing a custom function that handles email validation without using regular-expressions. That would be far more educational in my opinion, and far more readable.

  4. Mark Woodman on October 5th, 2007 16:31 UTC

    Just popping in to say hi. I didn’t see any author attributions on your site. Get an About page up for those of us who don’t know you.

    Good luck!

    - Mark

  5. NeverFriday » First Issue of Python Magazine!! on October 6th, 2007 00:04 UTC

    [...] email address may not work properly so you may not receive “. I actually ran across another regular-expression validation scheme for email addresses yesterday! I told the writer to try and avoid the use of regular expressions because it appears [...]

  6. admin on October 6th, 2007 08:21 UTC

    I think it could be worth writing a parser to validate input against a grammar, although largely for the ability to easily plug in a new grammar. Just got to find the time :( . It’s a case of investing time up front to save time later on, I hope.

    I think custom written code runs much the same risk as custom regular expressions. Which you are more comfortable with depends on your experience (and I speak as somebody with 15 years more experience of programming than regular expressions). They generally both need thorough testing and debugging before they are ready.

Leave a Reply