hunspell / hunspell

The most popular spellchecking library.

Home Page:http://hunspell.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exact word marked as a near miss

ciesiolka opened this issue · comments

It may be not a bug, but rather my mistake/misunderstanding of how hunspell dictionaries work.

I am trying to create a dictionary for latin language with accents. Let's consider a word románus. According to its declension one of its form is romanórum. To represent that I created the following dic and aff files:

1
románus/A
SET UTF-8

SFX A N 1
SFX A us órum

This doesn't work because those rules generate word románórum which is invalid since it has two accents. So what I did is that I added an OCONV entry:

(...)

OCONV 1
OCONV ánó anó

Running hunspell with that dictionary gives an odd result: románórum is accepted, but romanórum is considered a near miss with suggested spelling romanórum (exactly the same).

Hunspell 1.7.0
románórum
+ románus

romanórum
& romanórum 1 0: romanórum

Maybe I simply misunderstood how ICONV and OCONV work - the explanation in man isn't very detailed.

I'd suggest changing -ánus into -anórum with aff file

SET UTF-8
SFX A N 1
SFX A ánus anórum

Depending on stress patterns, you'll probably end up with different flags, one for each stress pattern

echo "romanórum" | hunspell -d la
+ románus

echo "románus" | hunspell -d la
*