justinrgarrard / scriter

A web app that scrapes a job posting site and visualizes term frequencies for common technologies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

README

Scriter is a web application that scrapes job posting data and visualizes the outputs.

Visual of Webpage

Setup and Usage

An installer has been provided to simplify initial setup. Ensure that Ansible is installed on an Ubuntu 20.04 server, then use the following steps:

[user]$ sudo su
[root]# ansible-playbook setup/install.yml
[root]# sudo su scriter
[scriter]$ python setup/first_run.py

Test Local Deployment

Navigate to localhost:8000 with a web browser of your choice to interact with the application.

[scriter]$ python web_server/scriter/manage.py runserver
< CTRL-C to Kill>

Production Deployment

Be sure to replace the secret key in setup/deploy.yml before running this command.

[scriter]$ ansible setup/deploy.yml

Subsystem Usage

Web Scrape

[scriter]$ cd web_scraper/
[scriter]$ scrapy runspider web_scrape.py -o software_engineer.csv -s CLOSESPIDER_PAGECOUNT=1000 -a job_title='software+engineer'

Load Data into PostgreSQL

[scriter]$ cd web_scraper/
[scriter]$ python data_load.py software_engineer.csv software_engineer

Generate Metrics from Data

[scriter]$ cd data_modeler/
[scriter]$ python model_build.py software_engineer

Set Keywords

[scriter]$ vim data_modeler/tech.json

Architecture

The application is written in Python and consists of four separate components, loosely coupled.

  • Web Scraper Job to Gather Data (Scrapy)

  • DB to Store Data (sqlalchemy + Postgres)

  • Model Builder to convert raw job posting text into TFIDF metrics (sklearn + NLTK)

  • Web Server to Display Data (Django + Highcharts)

Visual of Architecture

About

A web app that scrapes a job posting site and visualizes term frequencies for common technologies


Languages

Language:Python 83.7%Language:HTML 14.4%Language:CSS 1.9%