yfiua / gdelt-downloader

Parallel GDELT data downloader with date filter

Home Page:https://hub.docker.com/r/yfiua/gdelt-downloader

Repository from Github https://github.comyfiua/gdelt-downloaderRepository from Github https://github.comyfiua/gdelt-downloader

gdelt-downloader

This docker image downloads the GDELT data. You can specify the number of jobs that run in parallel, and the start & end dates via environment variables.

Usage

Download historical data

docker run -i -e njobs=N -e start_date=YYYYMMDD -e end_date=YYYYMMDD -v $(pwd)/data:/app/data yfiua/gdelt-downloader
  • -e njobs=N: Specifies the number of parallel jobs to use. Default is 1.
  • -e start_date=YYYYMMDD: Specifies the start date for the data download in YYYYMMDD format, optional.
  • -e end_date=YYYYMMDD: Specifies the end date for the data download in YYYYMMDD format, optional.
  • -v $(pwd)/data:/app/data: Binds the local data directory to the container's /app/data directory to store the downloaded files.

Streaming data

docker run -d -v $(pwd)/data:/app/data yfiua/gdelt-downloader-streaming

Build yourself

docker build -t gdelt-downloader .

cd streaming
docker build -t gdelt-downloader-streaming .

Changelog

  • 0.2.2
    • Bugfix
  • 0.2.1
    • Do not download the same file when they are moved
  • 0.2
    • Add support for streaming data
  • 0.1.1
    • Less verbose output
    • Use Python 3.12
  • 0.1.0
    • Initial release

Author

yfiua

About

Parallel GDELT data downloader with date filter

https://hub.docker.com/r/yfiua/gdelt-downloader

License:Apache License 2.0


Languages

Language:Python 58.9%Language:Dockerfile 36.3%Language:Shell 4.8%