onetonfoot / hkdataton

Code for polyu bigdataton

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hkdataton

Various different langauges have been used, bellow are breif install instructions and the logic behind using each.

Python

Install pandas via anaconda. Bellow are the libraies used

  • Scrapy - Web scraping framework
  • Pandas - Data for data cleaning

Scrapy was used to handle those sites that didn't use javascript to dynamically load the content with XHR request.

Node

Install the node > 8.0, easiest way is probaly nvm The headless chorme library puppeteer was used. Install it globaly using:

npm i puppeteer -g

Puppeteer was used to handle javascript heavy sites, those in which content is dynamically loaded via XHR request.

Julia

Install julia > 0.6 and the following libraries:

  • Cascadia - CSS Selector Libary
  • Gumbo - HTML Parsing Library
  • Request - HTTP Request
  • AbstractTrees - Defininting tree like structrues
  • IJulia - Julia kerenl for jupyter notebooks

Julia was used to reverse engginer the bus websites api.

Results

The results can be found in the results folder. The scripts used to generate them are in the own respective folders. The follower also contains a simple bash script which is used to convert the tsv files into the double \t\t format.

About

Code for polyu bigdataton


Languages

Language:HTML 79.6%Language:Python 10.8%Language:Jupyter Notebook 8.9%Language:JavaScript 0.8%