jeanm / lijspell

Ligurian spellchecking

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Building a spellchecker for Ligurian verbs

This repository contains a minimal spellchecker for the present indicative tense of Ligurian verbs belonging to the first conjugation (infinitive ending in ).

First conjugation basics

In the most general case, Ligurian regular verb conjugations display apophony. Some inflected forms have a vowel change compared to the base form of the infinitive. We see an example here for the verb portâ “to carry”, which alternates between the theme pòrt- [ˈpɔːrt-] (which we may call stressed prefix) and port- [purt-] (the unstressed prefix).

Tense Forms
inf. portâ
past part. m: portou
f: portâ
pl: portæ
gerund portando
pres. ind. 1s: pòrto
2s: pòrti
3s: pòrta
1p: portemmo
2p: portæ
3p: pòrtan

Often, verb conjugations may not have any apophony, as is the case for lavâ (“to wash”). We can consider such verbs to be a special case, where the two prefixes (stressed and unstressed) are homographs:

Tense Forms
inf. lavâ
past part. m: lavou
f: lavâ
pl: lavæ
gerund lavando
pres. ind. 1s: lavo
2s: lavi
3s: lava
1p: lavemmo
2p: lavæ
3p: lavan

Much like in Italian, the letters ‹c› and ‹g› represent [tʃ] and [dʒ] respectively when they precede ‹i› or ‹e›, but otherwise they represent [k] and [ɡ].

In order to represent [tʃ] and [dʒ] in front of vowel letters other than ‹i› or ‹e›, one inserts an ‹i› which in this case will not represent any sound, but simply alters the nature of the preceding consonant. We therefore have e.g. canta [ˈkaŋta] “he/she sings” → cianta [ˈtʃaŋta] “plant”. Conversely, if we want to achieve the sounds [k] or [ɡ] in front of ‹i› or ‹e›, we must insert the silent letter ‹h›, which turns what would have been [tʃ] and [dʒ] into [k] and [ɡ] respectively. We have e.g. gia [ˈdʒiːa] “he/she turns” → ghia [ˈɡiːa] “guide”.

Sometimes, in verb conjugations, “silent” ‹h› and ‹i› have to be inserted. Consider the conjugation of pagâ “to pay”:

Tense Forms
inf. pagâ
past part. m: pagou
f: pagâ
pl: pagæ
gerund pagando
pres. ind. 1s: pago
2s: paghi
3s: paga
1p: paghemmo
2p: pagæ
3p: pagan

Note how we have paghemmo [paˈɡemˑu] and not *pagemmo [paˈdʒemˑu], as well as paghi [ˈpaːɡi] and not *pagi [ˈpaːdʒi]. The ‹h› gets inserted to maintain the -[ɡ]- sound in front of ‹e› and ‹i›.

Conversely, observe the conjugation for mangiâ [manˈdʒaː] “to eat”, where the ‹i› is “silent”, i.e. its only purpose is to represent -[dʒ]- rather than -[ɡ]-:

Tense Forms
inf. mangiâ
past part. m: mangiou
f: mangiâ
pl: mangiæ
gerund mangiando
pres. ind. 1s: mangio
2s: mangi
3s: mangia
1p: mangemmo
2p: mangiæ
3p: mangian

Note how we have mangemmo [maŋˈdʒemˑu] and not *mangiemmo: since ‹g› has a “soft” sound [dʒ] in front of ‹e›, the “silent” ‹i› is dropped. Note also how we have mangi and not *mangii.

There are however cases where the i is not silent, such as giâ “to turn” [dʒiˈaː] ~ [ˈdʒjaː]:

Tense Forms
inf. giâ
past part. m: giou
f: giâ
pl: giæ
gerund giando
pres. ind. 1s: gio
2s: gii
3s: gia
1p: giemmo
2p: giæ
3p: gian

Note in this case how we have gii [ˈdʒiː] and giemmo [dʒiˈemˑu] ~ [ˈdʒjemˑu], and not *gi [ˈdʒi] and *gemmo [ˈdʒemˑu].

Spellchecker

In order to build a spellchecker for a moderately inflected language like Ligurian, it would of course be completely infeasible to list out every possible form of every possible regular verb. Instead, the Hunspell software, which we will be using, allows us to define some inflection rules.

The files lij.aff and lij.dic define a basic spellchecker for the present indicative tense of portâ, lavâ, pagâ and mangiâ. The format of these files is described in hunspell’s section 4 man page.

Look at the .aff file. Ignoring for the time being the TRY, MAP and REP directives, you will want to focus on the SFX directives, which define suffixation rules to produce inflected forms of the verbs starting from the stressed prefix (the aa rules) and the unstressed prefix (the AA rules). The .dic file is where we tell hunspell what to apply these rules to, with lines such as pòrto/aa (= “apply stressed suffixation rules to pòrto”).

Have a look at the man page linked above to see if you can understand the syntax of these rules.

Once you’ve installed hunspell (on Mac, with homebrew: brew install hunspell), you can verify that this dictionary produces the right forms using the unmunch command, which generates all possible forms:

$ unmunch lij.dic lij.aff
…
pòrto
pòrti
pòrta
pòrtan
portemmo
portæ
lavo
lavi
lava
lavan
lavemmo
lavæ
pago
paghi
paga
pagan
paghemmo
pagæ
mangio
mangi
mangia
mangian
mangemmo
mangiæ

Next steps

The examples above only describe the present indicative of first conjugation verbs. The full conjugation tables for the first conjugation of the four verbs discussed can be found here.

Can you extend the .aff file to produce the full conjugation?

Can you also deal with the case of giâ? Remember: the ‹i› there is not silent! You may want to define a new set of suffixation rules for cases with a non-silent ‹i›, since whether ‹i› is silent or not cannot simply be determined by the spelling of the word.

About

Ligurian spellchecking