hunspell / hunspell

The most popular spellchecking library.

Home Page:http://hunspell.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

invalid endchars in check compound pattern

shantanuo opened this issue · comments

I have this dictionary and affix file for the word

भानूत्सवः

and it is working correctly.

# cat dicts/sa.dic
2
भानु/x
उत्सवः/x

# cat dicts/sa.aff
SET UTF-8
COMPOUNDMIN 1
COMPOUNDFLAG x

CHECKCOMPOUNDPATTERN 1
CHECKCOMPOUNDPATTERN  ु उ  ू

But the same word is marked as incorrect if I add this entry

CHECKCOMPOUNDPATTERN  ा आ ा

I do not see any reason why adding an entry should mark the word incorrect that was previously considered accurate.

There is no problem if I add an entry something like this...

CHECKCOMPOUNDPATTERN ा आ ू

It means hunspell does not accept "ा" as endchar and the entire affix file stops working due to this single entry of endchar.

This is not expected. Looks like a bug.

It should be documented what characters are not alloed as endchars.

Closing this bug because when I tested the same word in python, it's working as expected.

import hunspell

spellchecker = hunspell.HunSpell(
    "./sa_IN.dic",
    "./sa_IN.aff",
)

spellchecker.spell('भानूत्सवः')

It seems that applications like firefox are implementing hunspell in different ways.