kpu / kenlm

KenLM: Faster and Smaller Language Model Queries

Home Page:http://kheafield.com/code/kenlm/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WebAssembly support?

davidatsurge opened this issue · comments

Has anyone tried compiling this to WASM? Is it at all on the roadmap?

I can see this being appealing as a user of KenLM but from a quick look over the README and with some (limited) knowledge of Web Assembly I believe it would be a fairly substantial challenge - but of course I could easily have overlooked things/misinterpreted the situation and would be delighted to be proven wrong!

  • Firstly, there's a reference to murmurhash being used in the README, and apparently it won't "work in asm.js / Emscripten compilation targets, due to [its] their usage of unaligned reads" (source)
  • Secondly there's a mention of 64 bit hashes and Web Assembly is (currently) a 32 bit platform so that seems like it would also present a problem (as well as the file sizes being limited to 2Gb limiting what you could process with it from within wasm).

Edit: it seems the first point may be inaccurate, and unaligned reads may merely be slow in wasm, given the points mentioned here in the Emscripten documentation: https://emscripten.org/docs/porting/guidelines/portability_guidelines.html#other-issues