boisvert42 / ranking-wikipedia

Perl scripts to rank Wikipedia page titles

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ranking-wikipedia

Perl scripts to rank Wikipedia page titles

This collection of Perl scripts will create files of ranked WIkipedia pages along the lines of those at http://crosswordnexus.com/wiki. To use:

  1. Download and extract https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
  2. Download and extract https://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2
  3. Run perl WikiExtract.pl to create WikiMonYr.storable
  4. Run perl WiktionaryExtract.pl to create WiktionaryMonYr.storable
  5. Run perl final_rankings.pl WikiMonYr.storable to create RankedWiki.txt and FamousNames.txt
  6. Run perl wiktionary_final_rankings.pl WiktionaryMonYr.storable to create RankedWiktionaryNoInflections.txt
  7. Run python WiktionaryInflect.py to create RankedWiktionary.txt
  8. Run perl combine_wiki_wikt.pl to create RankedWikiWikt.txt

About

Perl scripts to rank Wikipedia page titles

License:MIT License


Languages

Language:Perl 86.0%Language:Python 14.0%