jeffreyhorner / Wikipedia

Tools to collect Wikipedia string data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wikipedia Data Scrape

Tools to create data sets that mimic the SKEW and DISTINCT files from:

Askitis, Nikolas, and Justin Zobel. "Redesigning the string hash table, burst trie, and bst to exploit cache." Journal of Experimental Algorithmics (JEA) 15 (2010): 1-7.

Configure and Build

  1. Execute make all
  2. Execute R --vanilla < create_data_sets.R

Data and code are licenced under the Creative Commons Attribution-Share-Alike License 3.0.

About

Tools to collect Wikipedia string data.


Languages

Language:Python 63.6%Language:C 17.9%Language:R 12.4%Language:Shell 2.2%Language:C++ 1.7%Language:Makefile 1.5%Language:Julia 0.7%