flaviomartins / IndexWikipedia

A simple utility to index wikipedia dumps using Lucene.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IndexWikipedia

A simple utility to index wikipedia dumps using Lucene.

Usage:

Actual example:

nohup mvn compile && nohup mvn -e exec:java -Dexec.args="/home/dlemire/WikipediaDump/enwiki-20130102-pages-articles.xml.bz2 /home/dlemire/WikipediaIndex" &

Extracting word-frequency pairs

There is also a poorly named utility to extract all word-frequency pairs. Invoke it like so (this is an example):

java -cp target/classes:target/lib/* me.lemire.lucene.CreateFreqSortedDictionary /home/dlemire/enwiki-20130102-pages-articles.xml.bz2 garbagedict

About

A simple utility to index wikipedia dumps using Lucene.

License:Apache License 2.0


Languages

Language:Java 100.0%