mhulden / foma

Automatically exported from code.google.com/p/foma

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Looking for advice on unsuccesful compilation on out of memory error

GoogleCodeExporter opened this issue · comments

Hi there, I've been implementing a morphological analyzer for a complicated 
language, mapudungun, I'm dealing with verb forms which basic form is 
"root/stem+suffixes" stems can be compisitions, reduplications or "basic verb 
forms".
suffixes are about 100 filling 36 slots, not all the slots are filled of 
course, reasons are prohibition: if one suffix A appears B is not present; 
obligation: if A appears B also does it; dependence: B needs A to appear, 
sequence: if A appears needs the sequence BC to appears also....

At this moment I'm dealing with prohibition only, which is the most extensive 
part, and even though I've tried many ways I can not make it compile with the 
prohibition rules, without them the file compiles and it works, it gives a lot 
of possible analyses many of which are wrong because of lacking of restrictions 
rules...

So, my questions is if you guys, gurus of fst, can have a look at my file, no 
need to be a deep look, and give me some advice on what should I change or what 
could I try, that would be great and I would be thankful for ever...

I attach my file, thanks again

Original issue reported on code.google.com by andreschandiaf on 11 Mar 2015 at 11:21

Attachments:

Hi, I have split all the prohibition rules and I have found many rules that can 
be avoided or summarized into one, this operation gave me 193 rules, and I 
still can not compile with all of them, I arrive to apply 117, but when I 
enable the 118 the compilation crashes because of lack of memory, I attach the 
new version of the file, maybe this can give a clue, that I can not find....
thanks again.

Original comment by andreschandiaf on 13 Mar 2015 at 8:27

Attachments:

This seems to be an illustration of a common problem with multiple rules or 
constraints. In general, there is the danger of an exponential growth in the 
size of a combined set of rules of constraints, since each composition or 
intersection of an independent rule or constraint may cause the resulting 
automaton/transducer grow by a constant factor.

The way to avoid this is to include a lexicon *before* the first 
composition/intersection with the constraints/rules.

For example, in a traditional rewrite-rule grammar, it is always recommended 
that one use the design:

define Grammar Lexicon .o. Rule1 .o. Rule2 .o. ... .o. RuleN;

instead of:

define Rules Rule1 .o. Rule2 .o. ... .o. RuleN ;
define Grammar Lexicon .o. Rules;

since the second option may produce an intermediate result that is very large 
(Rules), while the final result (Grammar) could still be very small.

For the current grammar, my recommendation is that you focus on introducing 
VERBFORM before the composition of the constraints. How this can be done 
depends on how you designed the grammar. If VERBFORM is an automaton/acceptor 
(which it looks like it is), you can just create that first, and do:

define PrRu VERBFORM .o. PrRu001 .o. PrRu002 .o. ...

since the order of composition doesn't matter in that case.

If, on the other hand, VERBFORM is a transducer, you may have to invert the 
constraint process and do it maybe like this:

define PrRu [VERBFORM.i .o. PrRu193 .o. PrRu192 ... ].i;

The main point is that you should never do a large composition/intersection of 
rules/constraints freely, but tie it to a lexicon first by introducing the 
lexicon as the first element in a chain of compositions.

See also:

https://code.google.com/p/foma/wiki/FAQ#It_takes_forever_to_compose_all_the_rewr
ite_rules_together_in_my

Original comment by mans.hul...@gmail.com on 14 Mar 2015 at 7:18

Thanks, after reordering some stuff and applying this strategy: define PrRu 
[VERBFORM.i .o. PrRu193 .o. PrRu192 ... ].i; the thing worked, but 
reduplication rules does not apply any more, I guess I have to find the 
appropriate place for them, but I haven't yet....
well, thanks again, and here you have the resulting file in case of further 
advice ;)

Original comment by andreschandiaf on 17 Mar 2015 at 3:17

Attachments:

Well, here I am again, as I told you at the previous message, the reduplication 
rules do not apply any more, I have tried many different things but I can not 
bring them into life again, for sure I'm not doing it well, can you please take 
again a look to my file and advice me what should I try, sorry but this is 
driving me a little nut, I attach the file again because it has a lot of 
changes....

Thanks and sorry again.

Original comment by andreschandiaf on 19 Mar 2015 at 7:15

Attachments: