OnionKiller / MAV-Event-Horizon

BME Software Architectures assignment in 2022. It scrapes MÁV RSS feed, and analyses data with NLP.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MAV-Event-Horizon

BME Software Architectures assignment in 2022. It scrapes MÁV RSS feed, and analyses data with NLP.

Install

To install the sotware you have to first set the enviroment variables in either the .env file, or in your operating system. The enviroment variables the program expects are:

  • COGNITIVE_SERVICE_KEY: this is required, it is an Azure API key
  • COGNITIVE_SERVICE_BASE default='bme-mav-nlp': this is the API endpoint name to use, it is automaticly appended to the full link.
  • COGNITIVE_SERVICE_ENDPOINT default='https://COGNITIVE_SERVICE_BASE.cognitiveservices.azure.com/': this is the actual endpoint to use for Azure NLP
  • RSS_FEED_STORAGE_LOCATION default='rss_feed_collection.csv': where data from the RSS feed are stored
  • INCIDENTS_STORAGE_LOCATION default='incidents.csv': where NLP results are stored
  • INCIDENTS_STORAGE_LOCATION default='feed.log': incident storage creates logs, which are stored here

The package requires Python 3.10 or later.

To install the packages simply run:

pip install -r requirements.txt
python -m nltk.downloader stopwords
python -m spacy download en_core_web_md

Run

To run the main loop there is a cli interface:

python -m scr main-scrape-loop --sleep_time <seconds>

where the parameter determines how much time should be between RSS fetches.

Setup

This repository uses pip-tools. To set up a development enviroment run:

pip install -r dev-requirements.txt

To update the requirements of the proejct, modify the correct *requirements.in file, and the run:

pip-compile requirements.in --upgrade --resolver=backtracking

To upgrade your virtual enviroment run:

pip-sync

To set up production use the requirements.txt.

About

BME Software Architectures assignment in 2022. It scrapes MÁV RSS feed, and analyses data with NLP.

License:MIT License


Languages

Language:Jupyter Notebook 99.6%Language:Python 0.4%Language:Dockerfile 0.0%