This project consists of a single python 3 script file that scrapes the web for coronavirus (covid-19) data and process it in order to output .png graphics files, mp4 animated bar chart race, and .dat data files that can be used by several plotting softwares, like gnuplot.
You can see the results here:
Just clone this repository, install the required (prerequisites) packages and execute the python script.
cd $HOME
git clone https://github.com/bgeneto/covid-19.git
This script relies on several python packages, namely: numpy, matplotlib and pandas. See requirements.txt You can install all prerequisites by running the following command:
cd $HOME/covid-19
pip3 install -r requirements.txt
Additionally, if you want to create HTML5 bar chart racing graphs (-a option), you need to have ffmpeg already installed on your system. A 64-bit binary for Windows is provided in the link below, you have to download it mannualy if using Windows OS and then paste/extract the binary (exe) to same directory as this python script.
OR
As always, life is easier on linux, just run your distribution install command (apt, yum etc...) and you are ready to go.
sudo apt update && sudo apt install ffmpeg -y
The script generates a .ini config file in the first run. You can, as usual, edit this config file to satisfy your needs. There is also an input file named 'countries.txt' where you can select all the countries you want to scrape info about covid-19 number of cases, number of deaths, cases per million people, and all other info generated by the script. The generated files are outputed to the current script directory in a folder named 'output'.
python covid19scraper.py
OR
python covid19scraper.py -a mp4 -l pt -p
covid19scraper.py [-h] [-v] [-a {gif,html,mp4,png,none}] [-d] [-f] [-g {all,latest,none}]
[-l LANG] [--no-con] [-p]
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-a {gif,html,mp4,png,none}, --anim {gif,html,mp4,png,none}
create (html, mp4, png or gif) animated bar racing charts (requires ffmpeg)
-d, --dat output dat files
-f, --force force download and regeneration of all data and graphs
-g {all,latest,none}, --graph {all,latest,none}
output line and bar graph files (all = plot every day)
-l LANG, --lang LANG output messages in your preferred language (es, de, pt, ...)
--no-con do not check for an active Internet connection
-p, --parallel parallel execution (min. 6 cores, 8GB RAM)
NOTE: Use -p or --parallel option with caution. This option will use 6-cores (max) and plenty of memory (8GB or more, depending on country list size).
Translation files for english and portuguese are already provided. To add another language/translation you can use the standard GNU gettext or pygettext.py. A template .pot translation file is available in the 'locale' folder. Additionally, you have to translate country names, contained in 'countries.json' file inside 'locale/' directory. Never translate the main 'countries.txt' file or the script will be unable to scrape country data from the web.
This project is licensed under the GPL v3 License - see the LICENSE file for details