seamrvaulter / wikidata-dump-processor

Import Wikidata json dump (.json.bz2) into Mongodb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wikidata-dump-processor

import Wikidata json dump (.json.bz2) into Mongodb

  • Index fields: { id: 1 }, { sitelinks.enwiki.title: 1 }

  • Partial Index for Covered Query: { sitelinks.enwiki.title: 1, id: 1 }, { labels.en.value: 1, id: 1 }

  • Performance: ~3 hours for importing, ~1 hour for indexing (--nworker 12, --chunk_size 10000, based on 20180717 dump (25 GB))

About

Import Wikidata json dump (.json.bz2) into Mongodb


Languages

Language:Python 100.0%