jstedfast / EmailValidation

A simple (but correct) .NET class for validating email addresses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

allowInternational breaks with four-byte UTF-8 characters

JakubJanowski opened this issue Β· comments

Hi,

Due to the limitations of .NET's implementation of string with UTF-16, characters from the upper end like "\U00010348" (𐍈) have a length of 2. This unfortunately breaks length checks in the validator.

For example:
"test@𐍈" passes the validation, while other single-character TLD addresses don't
"𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈𐍈@example.com" doesn't pass 64-character local part length check, even though it has 33 characters.

I don't think there is a built-in function to get character count in a string, but maybe char.IsSurrogatePair() will be of use.

Thanks for the bug report.

Yes, char.IsSurrogatePair() is the way to solve this. I use this approach when encoding headers in MimeKit.