Urdu Corpus is collected by crawling online newspapers and books.
eins-s.csv has 134,871 unigrams with their frequencies.
zwei-s.txt has 1,337,591 bigrams with their frequencies.
drei-s.txt has 2,742,3554 trigrams with their frequencies.
vier-s.txt has 2,491,480 4grams with their frequencies.
The ngrams are created using cloud computing.
The work was done by Hamza Anwar, Manesh Vaswani and Tafseer Ahmed.