gernest / mention

Twitter like mentions and #hashtags parser for Go(Golang)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text with a @ such as email addresses is matched

arp242 opened this issue · comments

This code:

func main() {
	fmt.Printf("%#v\n",
		mention.GetTags('@', strings.NewReader("martin@example.com")))
	fmt.Printf("%#v\n",
		mention.GetTags('@', strings.NewReader("martin@example.com"), '.'))
}

Produces the following output:

[]string{"example.com"}
[]string{"example"}

Which is unexpected – at least it is in our use case. If there is a user with the handle @example and someone writes send an email to martin@example.com! then this user will be matched if . is in the terminator list. I think most people have . in the terminator list, since otherwise a @handle at the end of a sentence produces the wrong result.

I wanted to write a patch to fix this, but I'm not sure what the best way to handle this would be, as ,@handle or /@handle should be a match. Maybe ignoring [any-unicode-letter]@handle would be a good solution?

@Carpetsmoker glad to hear you are playing with this. I think maybe we need to discuss before you create a patch, since email address have different context than mention. It is somehow logical to think a space is following mention is a better terminator.

Please share your thought, and will be really happy to merge your patch. I can't do much right now as I'm at work, so I will come back later tonight and take another look.

Don't hesitate to start the patch as you think, we can start from there.

I created a basic PR at #11. I don't know if this will break stuff for other people, though; but I think this may make more sense for most people.

I can't do much right now as I'm at work, so I will come back later tonight and take another look.

That's okay, no hurries :-)

@Carpetsmoker i have revisited this issue.

The examples seems to work as the library intends. There is only one rule which is enforced for a mention, which is space the rest is completely on the user's hand.

so martin@home .com will match home as expected because of space

If you use martin@home.com there is no way for the lib to know you are dealing with emails, since it is beyond the scope of the library, one easy fix is, let it match martin.com then handle this match to weed out the .com suffixes of matches to reflect emails. Consider a user with a dot in a name @martin.home . It is beyond the scope of the lib to assume a lot of things.

I have a feeling this part is better handled on the user part .

It seems to me that this is a very common use case. Many people will want to put . in the terminator list, as otherwise e.g. Hello @martin. will get parsed as @martin., which is kind of unexpected. AFAIK most programs don't allow . in the mention list anyway.

I understand that this can be fixed on the caller's side by a strings.Trim() on the mention, but I think a lot of people may get bitten by this. We have been using this library for almost two years, and haven't noticed this problem until just recently.

Like I said, I don't know what the best fix is here; my PR is just what seems to work well for us, specifically. I don't know the other use cases.

This is fixed in the v2 branch (see discussion on #11).