khazeena / Urdu-ngrams

unigram, bigram, trigram and 4gram frequencies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Urdu-ngrams

Urdu Corpus is collected by crawling online newspapers and books.

eins-s.csv has 134,871 unigrams with their frequencies.
zwei-s.txt has 1,337,591 bigrams with their frequencies.
drei-s.txt has 2,742,3554 trigrams with their frequencies.
vier-s.txt has 2,491,480 4grams with their frequencies.

The ngrams are created using cloud computing.

The work was done by Hamza Anwar, Manesh Vaswani and Tafseer Ahmed.

About

unigram, bigram, trigram and 4gram frequencies

License:MIT License