kensho-technologies / kwnlp-preprocessor

Download, parse, and convert raw Wikimedia data into standard formats.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kensho Wikimedia for Natural Language Processing - Preprocessor

kwnlp_preprocessor is a Python package to help you convert raw Wikimedia data to standard formats.

Quick Install (Requires Python >= 3.6)

# Install the pre-commit setup (linters in our case)
pip install pre-commit
pre-commit install

pip install . # This package is not on pypi yet
# or "pip install -e ." to install in editable mode

Status

This code is not battle tested production code. It is mostly used by the R&D team to prototype new ideas using Wikimedia data.

About

Download, parse, and convert raw Wikimedia data into standard formats.

License:Apache License 2.0


Languages

Language:Python 100.0%