hunspell / hunspell

The most popular spellchecking library.

Home Page:http://hunspell.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NEEDAFFIX with both PFX and SFX

srtxg opened this issue · comments

Hello,

The walloon language has changes on the beginning of words depending on phonetic of previous word;
I implemented that with PFX rules.

Then I use various SFX rules.
For verbs I use scond level SFX rules, to decrease the number of rules.

It works quite well, however when SFX and PFX flags are used, the stem is made a valid word, despite being explicitely flagges with NEEDAFIX.
(at least is it like that in 1.7.0 version)

---- x.aff ----

SET UTF-8
FLAG UTF-8

TRY ersainthocuxdlpéymbzîvjåfèwgkêôûçERSAINTHOCUXDLPÉYMBZÎVJÅFÈWGKÊÔÛ’'Ç-

NEEDAFFIX *

# "v" flag is for verbs;
# if the stem given in dic file ends in "é" it is a verb of 1st group (flag "1"),
# and it is also "stem A" of verbs (they can have several stems, but kept simple here)
# the ending "é" is stipped, but I use the "*" flag to tell this stripped stem is not a word
SFX v Y 2
SFX v   é       /1*     é       po:v
SFX v   é       /A*     é       po:v

# rules for 1st group of verbs, the ending "é" is added back; is stemA (bdjA) and past participle (p.p.) form
SFX 1 Y 1
SFX 1   0       é       .       is:bdjA is:p.p.

# rules for "stemA" of verbs (here a signle rule for the 1st person plural of present tense
SFX A Y 2
SFX A   0       ans     [^k]    is:bdjA is:pr. is:1pl
SFX A   k       cans    k       is:bdjA is:pr. is:1pl

# and prefix rule, di- can be elided to d- (eg: diné -> dné)
PFX i Y 2
PFX i   0       0       di      sp:plin
PFW i   di      d       di      sp:spotch


------- x.dict -----

2
diné/iv*        st:diner
viké/v*         st:viker

------ testfile.txt -----

diné
dné
dinans
dnans
din
dn

viké
vicans
vik


in case of only using SFX (eg viké/v* ); the stripped stem "vik" is correctly ignored as a valid word;
however, when using also PFX (eg: diné/iv ) the stripped stem "din" as well as "dn" are incorrectly included as valid:

$ hunspell -d x -m test.txt
diné  sp:plin st:diner
diné  st:diner po:v is:bdjA is:p.p.
diné sp:plin  st:diner po:v is:bdjA is:p.p.

dné  sp:spotch st:diner
dné sp:spotch  st:diner po:v is:bdjA is:p.p.

dinans  st:diner po:v is:bdjA is:pr. is:1pl
dinans sp:plin  st:diner po:v is:bdjA is:pr. is:1pl

dnans sp:spotch  st:diner po:v is:bdjA is:pr. is:1pl

din sp:plin  st:diner po:v   <=== wrong

dn sp:spotch  st:diner po:v <==== wrong

viké    st:viker po:v is:bdjA is:p.p.

vicans          st:viker po:v is:bdjA is:pr. is:1pl

vik

I would have expected

diné/iv* st:diner

to be equivalent to

diné/v* sp:plin st:diner
dné/v* sp:spotch st:diner

(if I do write it like that, it's ok; but that would require rewriting a thousand lines)

Is there something I am missing, or is that behaviour incorrect ?

Thanks