Urdu-ngrams

Urdu Corpus is collected by crawling online newspapers and books.

eins-s.csv has 134,871 unigrams with their frequencies.
zwei-s.txt has 1,337,591 bigrams with their frequencies.
drei-s.txt has 2,742,3554 trigrams with their frequencies.
vier-s.txt has 2,491,480 4grams with their frequencies.

The ngrams are created using cloud computing.

The work was done by Hamza Anwar, Manesh Vaswani and Tafseer Ahmed.

About

unigram, bigram, trigram and 4gram frequencies

urdu ngrams

MIT License