linkedin / URL-Detector

A Java library to detect and normalize URLs in text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

name.lastname@gmail.com parsed as 2 urls

ddosoff opened this issue · comments

I am also seeing this - are there plans to fix it?

seeing the same problem as @davidthemarsh.

I am impacted by this also. Dot containing emails are much more common than I had realized.

Hmm this is a greedy evaluator, so it matches whatever completes a url first. What should be the correct behavior in this case in your opinions? Should it be one url as name.lastname@gmail.com with an identifier saying that the URL is an email?

@tzuhanjan: Expected behavior for me would be "name.lastname@gmail.com" is detected as a single Url with host gmail.com and username name.lastname.

@tzuhanjan Also having this issue. I agree with @worpet, it would be ideal if "name.lastname@gmail.com" was detected as a URL with host gmail.com and username name.lastname. For my use case, I need to exclude hits that look like pure e-mail addresses. So after detecting, I only include where url.getUsername().isEmpty(). This breaks for me on any e-mail address with a dot in the local part.