grc.py - the uncompromising results crawler

Inspired from the ggsipu-notice-tracker

grc aim is to automate the extraction and archiving of data from ggsipu results pdfs.

How it Works ?

It scrap and process the new results pdf from results website (RESULTS_URL) and save the last processed pdf (LAST_JSON) for future reference.

For pdf processing, ggsipu_result module is used and extracted data is passed to specialized classes inherited from BaseDump class which uploads/archive the data respectively.

Currently we have only Firebase Realtime Database (for json data) and Firebase CloudStorage (for student's images) as Dumps.

Requirements

Need Python >= 3.8

To install requirements:

pip -r requirements.txt

How to Use ?

There are two ways to use grc:

Local - python grc.py
In Server/CI - bash start.sh

Local - `python grc.py`

Since grc.py uses Firebase as backend you need to define two environment for authentication:

FIREBASE_CONFIG - For Firebase options, must have databaseURL and storageBucket set in it. Read More here.
GOOGLE_APPLICATION_CREDENTIALS - For authenticate with Google Cloud. Read More here

CI/Server/Container - `bash start.sh`

start.sh is a wrapper script for grc.py to run it in a isolated environment where file system may be temporary (ephemeral filesystem) like Heroku, CI Servers, Containers.

This loads last pdf info from git repo and start the grc.py and upload the last pdf details to git repo.

Same as running grc.py, it requires firebase and github authentication details using environment variables:-

GCLOUD_KEY - Contents of Google Cloud Auth Key file (GOOGLE_APPLICATION_CREDENTIALS).
ARCHIVE_GIT_REPO - Github repo to save last pdf details in, example ashutoshvarma/results_archive
ARCHIVE_GIT_BRANCH - Git branch for ARCHIVE_GIT_REPO
GIT_OAUTH_TOKEN - Github Auth Key with push rights to ARCHIVE_GIT_REPO
FIREBASE_CONFIG

Extra Configuration

See 'GLOBAL OPTIONs' in grc.py

ashutoshvarma / ggsipu_results_crawler

grc.py - the uncompromising results crawler

How it Works ?

Requirements

How to Use ?

Local - `python grc.py`

CI/Server/Container - `bash start.sh`

Extra Configuration

About

Languages

grc.py - the uncompromising results crawler

How it Works ?

Requirements

How to Use ?

Local - python grc.py

CI/Server/Container - bash start.sh

Extra Configuration

About

Languages

Local - `python grc.py`

CI/Server/Container - `bash start.sh`