bgeneto / covid-19

Web scraping script for coronavirus (covid-19) data. It can output png graphics, mp4 animated bar chart races and dat files.See example output in the following site.

Home Page:https://bgeneto.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

covid-19

This project consists of a single python 3 script file that scrapes the web for coronavirus (covid-19) data and process it in order to output .png graphics files, mp4 animated bar chart race, and .dat data files that can be used by several plotting softwares, like gnuplot.

You can see the results here:

covid-19 @ github.io

Installing

Just clone this repository, install the required (prerequisites) packages and execute the python script.

cd $HOME
git clone https://github.com/bgeneto/covid-19.git

Prerequisites

This script relies on several python packages, namely: numpy, matplotlib and pandas. See requirements.txt You can install all prerequisites by running the following command:

cd $HOME/covid-19
pip3 install -r requirements.txt

Additionally, if you want to create HTML5 bar chart racing graphs (-a option), you need to have ffmpeg already installed on your system. A 64-bit binary for Windows is provided in the link below, you have to download it mannualy if using Windows OS and then paste/extract the binary (exe) to same directory as this python script.

FFmpeg Builds

OR

direct link

As always, life is easier on linux, just run your distribution install command (apt, yum etc...) and you are ready to go.

sudo apt update && sudo apt install ffmpeg -y

Running the code

The script generates a .ini config file in the first run. You can, as usual, edit this config file to satisfy your needs. There is also an input file named 'countries.txt' where you can select all the countries you want to scrape info about covid-19 number of cases, number of deaths, cases per million people, and all other info generated by the script. The generated files are outputed to the current script directory in a folder named 'output'.

python covid19scraper.py 

OR

python covid19scraper.py -a mp4 -l pt -p

Script options

covid19scraper.py [-h] [-v] [-a {gif,html,mp4,png,none}] [-d] [-f] [-g {all,latest,none}] 
                  [-l LANG] [--no-con] [-p]

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -a {gif,html,mp4,png,none}, --anim {gif,html,mp4,png,none}
                        create (html, mp4, png or gif) animated bar racing charts (requires ffmpeg)
  -d, --dat             output dat files
  -f, --force           force download and regeneration of all data and graphs
  -g {all,latest,none}, --graph {all,latest,none}
                        output line and bar graph files (all = plot every day)
  -l LANG, --lang LANG  output messages in your preferred language (es, de, pt, ...)
  --no-con              do not check for an active Internet connection
  -p, --parallel        parallel execution (min. 6 cores, 8GB RAM)

NOTE: Use -p or --parallel option with caution. This option will use 6-cores (max) and plenty of memory (8GB or more, depending on country list size).

Translation

Translation files for english and portuguese are already provided. To add another language/translation you can use the standard GNU gettext or pygettext.py. A template .pot translation file is available in the 'locale' folder. Additionally, you have to translate country names, contained in 'countries.json' file inside 'locale/' directory. Never translate the main 'countries.txt' file or the script will be unable to scrape country data from the web.

License

This project is licensed under the GPL v3 License - see the LICENSE file for details

About

Web scraping script for coronavirus (covid-19) data. It can output png graphics, mp4 animated bar chart races and dat files.See example output in the following site.

https://bgeneto.github.io/

License:GNU General Public License v3.0


Languages

Language:Python 100.0%