openzim / wikihow

WikiHow scraper

Home Page:https://download.kiwix.org/zim/wikihow/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wikiHow

wikihow2zim is an OpenZIM scraper to create offline versions of wikiHow websites, in all its supported languages.

⚡ Scraper is known to have a very significant issue linked to throttling (#150)

CodeFactor Docker License: GPL v3 PyPI version shields.io

Usage

wikihow2zim works off a language version that you must provide via the --language argument. The list of supported languages is visible in the --help message.

Docker

docker run -v my_dir:/output ghcr.io/openzim/wikihow wikihow2zim --help

Python

wikihow2zim is a Python3 (3.6+) software. If you are not using the Docker image, you are advised to use it in a virtual environment to avoid installing software dependencies on your system.

python3 -m venv env
source env/bin/activate

# using published version
pip3 install wikihow2zim
wikihow2zim --help

# running from source
python wikihow2zim/ --help

Call deactivate to quit the virtual environment.

See requirements.txt for the list of python dependencies.

Contributing

All contributions are welcome!

Please open an issue on Github and/or submit a Pull-request.

Guidelines

  • Don't take assigned issues. Comment if those get staled.
  • If your contribution is far from trivial, open an issue to discuss it first.
  • Ensure your code passed black formatting, isort and flake8 (88 chars)

We have a pre-commit hook ready for you. Install it with pip install pre-commit && pre-commit install

About

WikiHow scraper

https://download.kiwix.org/zim/wikihow/

License:GNU General Public License v3.0


Languages

Language:Python 89.4%Language:HTML 4.6%Language:CSS 3.5%Language:Shell 1.1%Language:Dockerfile 0.9%Language:JavaScript 0.6%