jstedfast / EmailValidation

A simple (but correct) .NET class for validating email addresses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible Incorrect Validations?

Yortw opened this issue · comments

Hi,

I have had users enter the following addresses, which the code currently passes as valid.

02102535517@example
094222523example@com
3327156@example

Given the complexity of the relevant RCF specs, I am not sure if these are actually valid or if the system is incorrectly saying they are valid (not my area of expertise). Third party systems we integrate to say they are invalid, and several online validators said they were invalid, but it's possible they don't follow the RFC spec.

Can you advise if these are valid, and if not, how the code should be modified to reject them?

Thanks.

Syntactically they are valid, although I doubt they exist.

The syntax looks like this:

addr-spec       =   local-part "@" domain

local-part      =   dot-atom / quoted-string / obs-local-part

domain          =   dot-atom / domain-literal / obs-domain

domain-literal  =   [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]

dtext           =   %d33-90 /          ; Printable US-ASCII
                    %d94-126 /         ;  characters not including
                    obs-dtext          ;  "[", "]", or "\"

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                    "!" / "#" /        ;  characters not including
                    "$" / "%" /        ;  specials.  Used for atoms.
                    "&" / "'" /
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~"

atom            =   [CFWS] 1*atext [CFWS]

dot-atom-text   =   1*atext *("." 1*atext)

dot-atom        =   [CFWS] dot-atom-text [CFWS]

specials        =   "(" / ")" /        ; Special characters that do
                    "<" / ">" /        ;  not appear in atext
                    "[" / "]" /
                    ":" / ";" /
                    "@" / "\" /
                    "," / "." /
                    DQUOTE

In your first example, we would break up the tokens as follows:

02102535517
@
example

The first token is a local-part which matches dot-atom -> a single atom.

The second token is the @ of the addr-spec token.

The third token is the domain which matches dot-atom -> a single atom.

FWIW, most email address validators use pretty poor regex expressions that assume a domain must consist of 2 or more atoms. The really bad ones assume an address must end with ".com", ".net", ".org", or ".edu" (and don't even accept addresses that end with ".uk", ".us", etc.)

Hi,

Thanks for taking the time to answer and explain, it's really appreciated. I was 80% sure they were syntactically valid, but wanted to be sure before I go back to the third party rejecting them and tell them we're not going to change our validation. I'll probably lose anyway because it appears various email clients don't accept them and no doubt my boss will test that (with Outlook :( ) and then decide that means they're invalid even if the mail clients are known to have issues.

I wonder if those addresses (or ones like them) should be included in the list of valid test addresses so the code explicitly documents they are correct?

Also, if you don't mind, one final piece of advice - while those addresses are syntactically correct, are the 'correct' for 'public' usage? I mean, technically, could someone register one of those addresses, give it to me and if my mail client wasn't stupid, I could send mail to it from an address on another system? Or are those style addresses only valid within a single mail system and not inter-system? I don't think I've ever seen a domain without at least two atoms, so just trying to make sure I am fully armed before I get into arguments.

Thanks again.

According to rfc5321, section 2.3.5, single atom domains are allowed (they are called top-level domains).

They probably aren't super common in practice, but servers and clients should be ready to deal with them. Looks like this was asked on serverfault: http://serverfault.com/questions/721927/can-you-have-an-email-address-with-only-the-top-level-domain-as-the-domain-part According to a comment to the accepted answer, it appears that the newer specifications are trying to prohibit email addresses at top-level domains.

In your particular examples, the domain used is "example" which is not a registered top-level domain and so would not be possible to send messages to.

Here's the official list of registered top-level domains.

I've committed a patch which adds another boolean arg to the Validate() method that allows you to specify that you do not want to allow top-level domains.

Thanks, I hugely appreciate the help. I'm sure those addresses are garbage, the question is whether our system should have figured that out (given we only promise to validate structure, not that the mail box exists/is contactable etc). The patch to the code to support this is awesome too.

no prob