yohanboniface / sulci

Sulci is a French textmining toolkit based on Libération corpus and thesaurus.

Home Page:http://sulci.enix.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tokenize_text: better matching for initials

yohanboniface opened this issue · comments

We are talking about textutils.tokenize_text function.
Regarding initials, we have to see if we can do better than now, because it matches also non initials. For example:

... avec des taux en net repli, malgré la perte du triple A. Les montants levés...

Sounds hard to fix :/