svlandeg / wikid

Generate a SQLite database from Wikipedia & Wikidata dumps.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸͺ spaCy Project: wikid

tests spaCy
No REST for the wikid πŸŽƒ - generate a SQLite database and a spaCy KnowledgeBase from Wikipedia & Wikidata dumps. wikid was designed with the use case of named entity linking (NEL) with spaCy in mind.
Note this repository is still in an experimental stage, so the public API might change at any time.

πŸ“‹ project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command Description
parse Parse Wiki dumps. This can take a long time if you're not using the filtered dumps!
download_model Download spaCy language model.
create_kb Creates KB utilizing SQLite database with Wiki content.
delete_db Deletes SQLite database generated in step parse_wiki_dumps with data parsed from Wikidata and Wikipedia dump.
clean Delete all generated artifacts except for SQLite database.

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all parse β†’ download_model β†’ create_kb

πŸ—‚ Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File Source Description
assets/wikidata_entity_dump.json.bz2 URL Wikidata entity dump. Download can take a long time!
assets/wikipedia_dump.xml.bz2 URL Wikipedia dump. Download can take a long time!
assets/wikidata_entity_dump_filtered.json.bz2 URL Filtered Wikidata entity dump for demo purposes (English only).
assets/wikipedia_dump_filtered.xml.bz2 URL Filtered Wikipedia dump for demo purposes (English only).

About

Generate a SQLite database from Wikipedia & Wikidata dumps.

License:MIT License


Languages

Language:Python 99.4%Language:Shell 0.6%