undertheseanlp / underthesea

Underthesea - Vietnamese NLP Toolkit

Home Page:http://undertheseanlp.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug detecting names with hyphens.

home15c6 opened this issue · comments

I know hyphenated names like "Jean-Luc Godard" are not typical in Vietnamese, but they may appear in texts, such as news articles.

For ner('Jean-Luc Godard', deep=True)

Expected: B-PER, I-PER, I-PER -> 1 entity
Actual: B-PER, B-PER, I-PER -> 2 entities

Note: The model works as expected for Công ty TNHH Bảo hiểm Nhân thọ Dai-ichi Việt Nam -> 1 entity