mperez4 / divvy_datachallenge_2013_toolkit

A collection of tools for visualizing data from the http://divvybikes.com/datachallenge.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Divvy Data Challenge 2013 Toolkit

Developed by Christopher Baker in collaboration with the openLab at the School of the Art Institute of Chicago.

This toolkit consists of a collection of examples, preprocessors and tools to support work on the http://divvybikes.com/datachallenge data set.

Components

DataPreprocessor

The DataPreprocessor is a processing sketch that makes it easy to pre-process the raw Divvy data. The existing version removes several columns, shortens enumeration names and fixes several data anomalies found in the original raw data set. Please see the extensive comments in the Processing sketch. Additionally, the stations data will be keyed to the station ids in the trips file (which will result in better MySQL table normalization).

To use the DataPreprocesor, place the raw data (available here) in the data folder of the Processing sketch.

The open the sketch in the latest version of Processing and press the Run button. The sketch will yield several "cleaned" files for use in other Processing sketches. The current version can also be easily imported into a MySQL database for online access via the API backend.

API Backend Installation

While working with huge CSV files is quite possible, for online visualization and exploration, it is helpful to have a JSON compatible data api. The preprocessed files generated by the DataPreprocessor can be easily imported into a MySQL database and when used with the PHP API backend, online API queries are trivial.

To set up your own Divvy data API, you will need a web server with PHP and MySQL. Most modern servers, including shared hosting, offer this capability. To set up the data API follow these steps:

  1. Generate the Divvy_Stations_2013_Cleaned.csv and Divvy_Trips_2013_Cleaned.csv (~50MB) files using the DataPreprocessor.

  2. On your server, create a MySQL database called divvy_2013 and a MySQL user called divvy. The divvy user should have read access to the divvy_2013 database (if this doesn't make sense, that's ok, there is an easier alternative below).

  3. Next create a table called Divvy_Stations_2013 in the divvy_2013 database with using the following structure:

     CREATE TABLE IF NOT EXISTS `Divvy_Stations_2013` (
       `id` int(11) NOT NULL,
       `name` varchar(255) DEFAULT NULL,
       `latitude` float DEFAULT NULL,
       `longitude` float DEFAULT NULL,
       `capacity` int(11) DEFAULT NULL,
       PRIMARY KEY (`id`)
     ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
    

Import the Divvy_Stations_2013_Cleaned.csv file into this table.

  1. Next create a tabled called Divvy_Trips_2013 in the divvy_2013 database with the following structure:

     CREATE TABLE IF NOT EXISTS `Divvy_Trips_2013` (
       `trip_id` int(11) NOT NULL,
       `start_time` datetime NOT NULL,
       `stop_time` datetime NOT NULL,
       `bike_id` int(11) NOT NULL,
       `from_station_id` int(11) NOT NULL DEFAULT '-1',
       `to_station_id` int(11) NOT NULL DEFAULT '-1',
       `user_type` varchar(255) DEFAULT NULL,
       `gender` varchar(255) DEFAULT NULL,
       `birth_year` int(11) DEFAULT NULL,
       PRIMARY KEY (`trip_id`),
       KEY `from_station_id` (`from_station_id`),
       KEY `to_station_id` (`to_station_id`),
       KEY `user_type` (`user_type`),
       KEY `stop_time` (`stop_time`),
       KEY `start_time` (`start_time`),
       KEY `birth_year` (`birth_year`)
     ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    

Import the Divvy_Trips_2013_Cleaned.csv file into this table. _Note: The Divvy_Trips_2013_Cleaned.csv is quite large and it may not be possible to import the file via a simple interface like phpMyAdmin. Instead, consider uploading the CSV file to your server, logging in via SSH and running the following command:

    mysqlimport  --ignore-lines=1 \
    --fields-terminated-by=, \
    --columns='trip_id,starttime,stoptime,bikeid,tripduration,from_station_id,from_station_name,to_station_id,to_station_name,usertype,gender,birthyear' \
    --local -u root -p divvy_2013 Divvy_Trips_2013.csv
  1. Next, upload api.php and credentials_example.php to the location of your choice on your server.
  2. Rename credentials_example.php to credentials.php and enter your database location and password.
  3. You are now ready to query the api using the parameters defined below.

Data API Parameters

The API is currently a single endpoint with a simple set of parameters.

Trip Start and Stop Times

Starting and stopping times are represented in UTC time inside the database. It is the user's responsibility to convert those times back to local Chicago time. Trip selections based on trip start and stop times can be done with the following parameters:

Parameter Description Example Values
start_min Set the start time range minimum 2013-06-01 or 2013-06-01 12:00:00
start_max Set the start time range maximum 2013-06-01 or 2013-06-01 12:00:00
stop_min Set the stop time range minimum 2013-06-01 or 2013-06-01 12:00:00
stop_min Set the stop time range minimum 2013-06-01 or 2013-06-01 12:00:00
from_station_id Set the starting station id See the stations list for valid ids
to_station_id Set the ending station id See the stations list for valid ids
bike_id Set the bike id See the trips data for valid ids
trip_id1 Set the trip id See the trips data for valid ids
trip_id_min1 Set the minimum trip id See the trips data for valid ids
trip_id_max1 Set the maximum trip id See the trips data for valid ids
user_type Set the user type subscriber or customer
gender Set the user gender male or female
birth_year2 Set the user birth year Any year >= 0
birth_year_min2 Set the minimum birth year Any year >= 0
birth_year_max2 Set the maximum birth year Any year >= 0
age2 _Set the user age Any age >= 0
age_min2 Set the minimum age Any age >= 0
age_max2 Set the maximum age Any age >= 0
page3 The results page Any page >= 0
rpp3 Set the maximum age 0 <= rpp <= 100
callback A JSONP callback Any valid javascript method name.

Data Api Examples

For convenience the openLab has established a public endpoint for testing. The base endpoint URL is:

http://data.olab.io/divvy/api.php

All query parameter strings build from that endpoint. If you install your own API, your endpoint URL will be different.

Select the first page of 25 results for for trips between 2013-06-01 and 2013-07-01 for males over the age of 50:

To get results 26 - 50 from the same query:

To get all trips taken by 33 year old females:

Footnotes

Footnotes

  1. If a trip_id parameter is passed along with a trip_id_min and / or trip_id_max parameter, the trip_id parameter is ignored and the range style parameters are preferred. 2 3

  2. Both age and birth_year select on the same birth_year column of the database. Since it's easier to think in terms of age, when both age and birth_year parameters are included, all birth_year parameters will be ignored in favor of the age parameter. Like trip_id, the corresponding range-based versions of the age and birth_year parameters will be used. 2 3 4 5 6

  3. Since this is a massive data set, it is not advisable to let a user return huge quantities of data with a single query. Instead, the trip results are broken down into pages of results. rpp is set to 100 by default and is also the default maximum. The page parameter determines which trip id to begin with. For instance, to return results starting with the 200th trip, one might pass rpp=100 and page=1. 2

About

A collection of tools for visualizing data from the http://divvybikes.com/datachallenge.

License:MIT License