Inspired from the ggsipu-notice-tracker
grc aim is to automate the extraction and archiving of data from ggsipu results pdfs.
It scrap and process the new results pdf from results website (RESULTS_URL
) and save
the last processed pdf (LAST_JSON
) for future reference.
For pdf processing, ggsipu_result module is used
and extracted data is passed to specialized classes inherited from BaseDump
class which uploads/archive
the data respectively.
Currently we have only Firebase Realtime Database (for json data) and Firebase CloudStorage (for student's images) as Dumps.
Need Python >= 3.8
To install requirements:
pip -r requirements.txt
There are two ways to use grc:
- Local -
python grc.py
- In Server/CI -
bash start.sh
Since grc.py uses Firebase as backend you need to define two environment for authentication:
FIREBASE_CONFIG
- For Firebase options, must havedatabaseURL
andstorageBucket
set in it. Read More here.GOOGLE_APPLICATION_CREDENTIALS
- For authenticate with Google Cloud. Read More here
start.sh
is a wrapper script for grc.py to run it in a isolated environment
where file system may be temporary (ephemeral filesystem) like Heroku, CI Servers, Containers.
This loads last pdf info from git repo and start the grc.py and upload the last pdf details to git repo.
Same as running grc.py, it requires firebase and github authentication details using environment variables:-
GCLOUD_KEY
- Contents of Google Cloud Auth Key file (GOOGLE_APPLICATION_CREDENTIALS
).ARCHIVE_GIT_REPO
- Github repo to save last pdf details in, exampleashutoshvarma/results_archive
ARCHIVE_GIT_BRANCH
- Git branch forARCHIVE_GIT_REPO
GIT_OAUTH_TOKEN
- Github Auth Key with push rights toARCHIVE_GIT_REPO
FIREBASE_CONFIG
See 'GLOBAL OPTIONs' in grc.py