Project EPIC, Geography 5303
This project looks at all of the Geo-Coded tweets from Hurricane Sandy
##About Somewhere on order of 1% of tweets are geo-tagged. What can be learned about a Twitterer's movement behavior during Hurricane Sandy?
##Dependencies
Ruby Requirements
gem install georuby
gem install rgeo-shapefile
gem install bson
gem install mongo
gem install bson_ext
####Mongo Connection
The data for this project is held on Project EPIC's local analytics server on the CU campus. There are multiple collections created under the sandygeo
database
-
edited_tweets
: The main collection of tweets cut to the study timeframe: October 20 to November 7, 2012. Each document is a full tweet, as retrieved from the Twitter API. -
coastal_users
: The final collection of 17,627 users that were identified as having a tweet within the highly affected eastern seaboard area as defined by FEMA. -
after_sandy
: Tweets between October 1, 2012 and October 22, 2012 that were excluded from the project analysis. -
before_sandy
: Tweets between November 7, 2012 and December 1, 2012 that were excluded from the project analysis. -
tweets
: The original 260,859 geo-coded tweets extracted from the 22 million tweet keyword collection. Used to identify geo-coding Twitterers for contextual stream fetching. -
userpaths
: Distinct paths for 32,842 users. Each document contains an array of tweets where each tweet has date, text, entities, and place information. A GeoJSON Linestring Object exists for each user that tracks the user's path. -
user_indiv_tweets
: Similar to userpaths, but not a Linestring, instead each individual tweet with place, text, and timestamp as properties. -
most_impacted_users
:
##Project Conventions
- Users are referenced by their id (A long number). If a user has multiple screen names as their tweets are aggregated, their
handle
that is written will be a string of unique usernames separated by commas.
##Project Directories
###fileio/ #####kml_output.rb Writes
#####identified_users.kml
#####tweet_io.rb
Includes the two classes for interacting with Mongo. SandyMongoClient
creates an object for querying the database and returning tweets based on various parameters and Tweet_JSON_Reader
reads and imports to mongo, a text file containing valid JSON tweets, separated by newlines. The main runtime for this script runs an import; however, other scripts use the SandyMongoClient
class for interacting with the database.
#####write_geojson.rb
Uses the json
library to write valid geojson from a variety of inputs. The main runtime of this script will write both a tweet and a userpath geojson file from a users collection.
#####write_user_tweets_geojson.rb
Requires the write_geojson.rb
script to generate a folder containing valid geojson objects for each user's tweets.
#####tweet_shape.rb
Uses the georuby
library to create shapefiles from Tweets. This functionality is deprecated because creating shapefiles for viewing the data is less convenient than KML or GeoJSON files.
###mongo/
#####linestring_reduce.js
Map reduce function to generate the usertracks
collection from the edited_tweets
collection.
###extract_scripts/ #####tracks.rb Write two shapefiles from the collection, one of linestrings for each user, representing their path and one of just the the tweets as points.
#####geo_bounded_tracks.rb Performs the same task as tracks.rb, but allows for geo-sensitive queries.
#####mongo_extractor.rb A very simple Mongo --> Shapefile script for quick visualizations of data.
#####find_users_within_area.rb A cleaner, more robust version of geo_bounded_tracks.rb, built for a bounding box of any shape polygon.
###parsers/ #####extract_geo_json.rb Line by line parsing of a text file of JSON tweets delimitated by newlines. Identifies tweets which are geotagged and writes them to a separate text file of the same format (JSON tweets separated by \n character)
#####get_geo_contextual.rb Parses contextual stream text file, extrating geo-tagged tweets and inserting them into a Mongo collection. The filepaths to the contextual streams are built dynamically based on the username that the script collects.
#####reformat_date.rb A small helper function to reformat the string date to an ISOdate so that Mongo recognizes it. Should be built into import -- otherwise, it's deprecated. A javascript loop in the Mongo shell is more convenient.
#####user_indiv_path.rb Creates a new collection where each document represents a single user and their tweet coordinates are stored as line strings to observe their movement path.
#####user_indiv_tweet.rb Creates a new collection where each document represents a single user. Their tweet coordinates are stored as points.
#####user_node_collection.rb (Unfinished) Store a user's tweets in 3 timebins: before, during, after
#####user_track_parser.rb
(Deprecated) Performs similar function to user_indiv_path.rb
###analysis/ #####Twitter_In_Evac.py A python script that uses ArcPy to parse a CSV of before, during, and after locations for a particular user and perform comparisons of these locations to known evacuation zones. Performs set intersect operations on lists of users to determine who sheltered in place in an evacuation zone.
###userAnalysis/ The Visual Studio project that take the extracted users and outputs diagnostic files, such as a KML, a CSV of perimeters, and each user's median points.