A rust
based ZIM extractor, parser and enhancer for the Ethereum Swarm Wikipedia Gitcoin bounty.
This extractor:
-
Extracts a
zim
file fromhttps://dumps.wikimedia.org/other/kiwix/zim/wikipedia/
. -
Processes extracted files to:
a. Remove
head
to minimise space. b. Rewritessrc
attributes inimg
andmedia
tags. c. Gzip the data to minimise storage space (~6x - 8x compression factor achieved onwiki
articles).
The source code was forked from @dignifiedquire with the original being available at https://github.com/dignifiedquire/zim.