Email Validation Regular Expression
Posted on October 4, 2007
Filed Under Web Programming, email | 6 Comments
When I was looking for a regular expression to validate email addresses the examples I found tended to fall into two camps. Camp one were simple, easily confirmed, regular expressions that just as obviously did little beyond insisting on an @ in the middle. Camp two were more complicated than I wished to reverse engineer and did not display their derivation so that I could check what I was using, often not listing their trade-offs either.
Therefore, I developed my own and, in order to help those with similar requirements, I present it here, along with its known trade-offs and its derivation (in reverse order).
The RegEx Itself
If you’re in a hurry.
and in a PHP single quoted string:
The Tradeoffs
- So far this only implements, with a few tweaks, the local part of RFC 822. Not RFC 2822. I hope to get around to 2822 at some point, which includes in its definition for local-part obs-local-part which is the same as RFC 822.
- The domain part of this RegEx has nothing to do with RFC 822. It has gone for simplicity and maintainability. I did not think it was worth keeping the RegExp up to date when the possible tlds change. If you are that determined to ensure a valid emailable domain then a regular expression is the wrong tool anyway. You’re better off checking for an MX record on the domain.
The Derivation
RFC 822 defines an address in the following fashion:
addr-spec = local-part "@" domain ; global address
local-part = word *("." word) ; uninterpreted
word = atom / quoted-string
atom = 1*
CTL = ; ( 177, 127.)
specials = "(" / ")" / "<" / ">" / "@" ; Must be in quoted-
/ "," / ";" / ":" / "\" / <"> ; string, to use
/ "." / "[" / "]" ; within a word.
quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or
; quoted chars.
qtext = , ; => may be folded
"\" & CR, and including
linear-white-space>
quoted-pair = "\" CHAR ; may quote any char
CHAR = ; ( 0-177, 0.-127.)
To take this step by step.
quoted pair:
qtext: (Not strictly to spec as I’ve refused to handle LF as well as CR just in case of poor handling, further down the line, of header injection attempts.)
quoted-string: (not strictly to spec as I’m insisting on at least one character between the quotes.)
atom:
word:
local-part:
As mentioned above the domain part was not taken from the spec. The reg ex for the domain assumes there will be two or more dot separated sections (take note if you want to allow email addresses in top level domains or for internal systems) where each section consists solely of letters, numbers or hyphens and cannot start or end with a hyphen.
section:
domain:
Put it all together it makes the regular expression at the top of the page and repeated here for convenience:
and in a PHP single quoted string:
Comments
6 Responses to “Email Validation Regular Expression”
Leave a Reply

RFC 822 is obsoleted by RFC 2822. It says so right at the top of the RFC. So why not just go straight for RFC 2822 implementation rather than implementing the obsoleted version?
In any case, using regular-expressions is stupid for this because they are ugly and unreadable and how do you *really* know that it validates all email addresses properly? If you read the the RFCs, you’ll see that there is a language grammar. You can generate a parser/validator using it. Why not hand-code (or generate) a function rather than fucking around with regular-expressions?
Firstly, if I were to implement 2822 in the same fashion as I have 822 (as I still intend to) then it will include this as the 822 local-part definition is used in the local-part definition for 2822. I have not yet had time to do 2822.
Secondly, the inclusion of the regex formatted for PHP might give you a clue what language I was using when I developed this. I had limited time and could not find a php parser generator in time. Still can’t. You don’t always have the option, depending on your hosting, to do things in other languages and connect them to PHP neatly. If you can point me in the direction of a parser generator for PHP then I for one would be happy to stop “fucking around with regular expressions” in this case.
I just Googled for “LALR PHP” and found this:
http://pear.php.net/pepr/pepr-proposal-show.php?id=416
That could be useful. I would still recommend writing a custom function that handles email validation without using regular-expressions. That would be far more educational in my opinion, and far more readable.
Just popping in to say hi. I didn’t see any author attributions on your site. Get an About page up for those of us who don’t know you.
Good luck!
- Mark
[...] email address may not work properly so you may not receive “. I actually ran across another regular-expression validation scheme for email addresses yesterday! I told the writer to try and avoid the use of regular expressions because it appears [...]
I think it could be worth writing a parser to validate input against a grammar, although largely for the ability to easily plug in a new grammar. Just got to find the time
. It’s a case of investing time up front to save time later on, I hope.
I think custom written code runs much the same risk as custom regular expressions. Which you are more comfortable with depends on your experience (and I speak as somebody with 15 years more experience of programming than regular expressions). They generally both need thorough testing and debugging before they are ready.