chrisjbryant / lmgec-lite

A language model-based approach to Grammatical Error Correction for English that uses minimal annotated data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KenLM setup seems to be broken

nmatthews-asapp opened this issue · comments

When running on the 1b.txt file I get the following error

=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:29104080 2:684897168 3:3846225240 4:9279470400 5:14419969256
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
----------------------------------------------------------------------------------------------------Last input should have been poison.
[1]    15749 abort (core dumped)  ~/kenlm/build/bin/lmplz -o 5 -S 95% -T tmp/ < 1b.txt > 1b.arpa

Specifically this "last input should have been poison message" seems to be the problem. Not sure if it's caused by 1b.txt or another problem yet, but I haven't found any troubleshooting info on KenLM's site related to this issue.

A related issue: kpu/kenlm#177
although it happens on step 4, with a slightly different command

I think you ran out of disk space. The problem is exception unwinding is causing a destructor check to fire so it hides the real exception. I've changed the code to not abort so you can see the real error message.

thanks I'll reinstall from master and try again.

this is surprising though, as my machine has about 114 GiB memory free at time of running.
in step 2 the estimated memory footprint was < 100 GB and this repo suggests 20-40 GB memory footprint as expected

disk != memory

Whoops I misread your message. Ok, that could be it. I might not have installed it on the right disk (right = the bigger one)

@kpu confirmed: it was lack of disk space. thanks for fixing error exception reporting.

got it working!