tshrinivasan / tamil-wikipedia-word-list

To get all the tamil words from the tamil wikipedia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unique words accuracy?

ravidreams opened this issue · comments

I ran this today and got

1962477 final-tamil.txt

But the .txt file has lot of repeat words. Is this normal? I thought final-tamil.txt gives unique list of words.

Thanks for reporting.

Changed the process to find unique words.

Check now.

I ran again and got 1360126 unique words. It is different from 1197913 you show in readme. Did you get this count today?

Also, you have left the sort function now. Need it back.

My results are for the dump on Jan 6.
Added sorting.