Source of language corpus

Question

Source of language corpus

DonaldTsang opened this issue 5 years ago · comments

Where is the source text dataset for the Ngrams of those 55 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.

Apparently it uses Wikipedia but did not say how.