msandt3 / gthockey-stats

Scraping player statistics from achahockey.org. This is intended to be uses as a chron job for the gthockey php server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

STATS SCRAPING

This repo contains code to scrape player statistics from the achahockey.org web page. This is a proof of concept and being developed for use to minimize labor for data entry.

Dependencies

Running the Code

After cloning the repo you can scrape data as follows:

$ scrapy crawl acha

Saving the Results

Scrapy supports several standards for storing scraped data. In order to store them in JSON, CSV or XML execute the respective command:

$ scrapy crawl acha -o items.json -t json
$ scrapy crawl acha -o items.csv -t csv
$ scrapy crawl acha -o items.xml -t xml

Scripting the Routines

There is now an automated script for running the scraping routines as well. This is for future use in CGI on a nearlyfreespeech web server.

$ python crawl.py

The previous will automatically scrape and store the data in a json file

Issues

For more information on how to use Scrapy please see the Scrapy Reference

Contributing

This is an open source project. Feel free to fork it and submit pull requests at will.

About

Scraping player statistics from achahockey.org. This is intended to be uses as a chron job for the gthockey php server


Languages

Language:Python 100.0%