taniki / wokipydia

a python library for information retrieval from wikipedia contents with antifascist purposes

Home Page:http://toolkit-python.readthedocs.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WeKeyPedia python toolkit Build Status Coverage Status

installation

using virtualenv

The pypi distribution is updated on important releases. During the development phase, this is approximatively every week.

$ mkdir e
$ virtualenv e/py
$ source e/py/bin/activate
(py)$ pip install wekeypedia
(py)$ python -m nltk.downloader punkt wordnet maxent_treebank_pos_tagger

using development version

If you need to get a up-to-last-second-update version, you might want to use the github master version. This is highly unstable. You both get work in progress features, their bugs and their bugfixes in realtime.

$ mkdir e
$ virtualenv e/py
$ source e/py/bin/activate
(py)$ pip install https://github.com/wekeypedia/toolkit-python/archive/master.zip
(py)$ python -m nltk.downloader punkt wordnet maxent_treebank_pos_tagger

usage

get the current content of a page

import wekeypedia

p = wekeypedia.WikipediaPage("Pi")
content = p.get_revision()

print content

parse diff result

diff = p.get_diff()
plusminus = p.extract_plusminus(diff)

p.print_plusminus_overview(plusminus)

count stems of a page

print p.count_stems([ content ])

examples and macros

You can explore the different current usages of the library by getting a look at the current we are using to build various datasets.

using virtualenv

$ virtualenv e/py --no-site-packages
$ source e/py/bin/activate
(py)$ pip install -r requirements.txt

About

a python library for information retrieval from wikipedia contents with antifascist purposes

http://toolkit-python.readthedocs.org/

License:MIT License


Languages

Language:Python 100.0%