zseder / hundict

bilingual dictionary extractor from parallel corpora

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hundict is an experimental python project, that creates bilingual dictionary
from parallel corpora
Features (planned or done):
- easy to use (see hundict -h)
- fast (python fast, of course, not C fast)
- unigram pairs
  - A - B
- ngram-ngram extraction, not only unigram-unigram
  - ABC - DE
- multiple choice pairs
  - (A or B) - C
- stopword remove
- remaining corpora print

About

bilingual dictionary extractor from parallel corpora


Languages

Language:Python 100.0%