Unique words accuracy?
ravidreams opened this issue · comments
Ravishankar Ayyakkannu commented
I ran this today and got
1962477 final-tamil.txt
But the .txt file has lot of repeat words. Is this normal? I thought final-tamil.txt gives unique list of words.
Shrinivasan T commented
Thanks for reporting.
Changed the process to find unique words.
Check now.
Ravishankar Ayyakkannu commented
I ran again and got 1360126 unique words. It is different from 1197913 you show in readme. Did you get this count today?
Also, you have left the sort function now. Need it back.
Shrinivasan T commented
My results are for the dump on Jan 6.
Added sorting.