libbyh / Socrates

A platform for collecting, analyzing, and visualizing social media data.

Home Page:http://peopleanalytics.org/socrates/front-end

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#SOCRATES Live demo of SOCRATES

This document primarily consists of development ideas. To get more information on SOCRATES please visit our documentation

#Future Goals

  • Multiple working_sets and user based system
  • Integration of Flask into Apache
  • Many more modules
  • More documentation and video tutorials

#Example Use Cases:

  • Seeing sentiment vs. number of followers of tweets
  • Seeing the sentiment vs. word_count of the comments in Reddit (there actually is probably somewhat of a correlation)

#Ideas

  • Have the back-end store data in MongoDB to avoid messaging large datasets back and forth. When a user calls the API for fetching data, it will return the id of the record as well as the data-types and a SINGLE post to see what the data looks like. Then the user can choose which fields to analyze etc.
  • Each module will have a single Python file which handles:
    • Specification of whether it is a collection or analysis module and a description
    • Specification of parameters and return values
    • Initial call, it should have a function run(param, working_set) which gets the data
  • Cache the working_set to reduce redundant downloading [done]

#Quick Notes

  • Primitive parameters are converted to their actual values, fields are converted to arrays of the entry data they refer to
  • Fields defined in the specs for parameters and return values can be either strings or objects

#Issues

  • On Chrome locally there is an issue where ajax calls take 15-20 seconds. This does not occur in Firefox. [Edit 3/4/2014 : this is a Chrome bug]
  • When I ran an analysis on a data set, a different entry was returned as the first entry (this was with tw_search)
  • Total for word count is undefined [3/5/2014 fixed]

#Packages Installed

#Data types for fields

  • numeric
  • text
  • date YYYY-MM-DD hh:mm:ss
  • array <data type (not array)>
  • boolean
  • geo (longitude,latitude)
  • field_reference (this is from an analysis module which requires a reference to a field)

#Analysis There are two types of results from analysis:

  • Per Entry: this will likely be more common and will give a value to each entry in the data set. The analysis data will be directly added to the entry under an analysis property
  • Aggregate: a single value describing all of the data (e.g. a sum of all word counts)

Snapshot of JSON after collection and analysis:

{
	meta: {}
	data: [{},{},...],
	analysis: [{
		aggregate_analysis: {},
		entry_analysis: [{},{}...],
		entry_meta: {},
		aggregate_meta: {}
	}]
}

#Useful commands: Fetch twitter posts: curl http://localhost:5000/op/collection/twitter/tw_search -d "lang=en&count=1&query=derp"

Sentiment Analysis: curl http://localhost:5000/op/analysis/text/sentiment -d "field=content&reference_id="

About

A platform for collecting, analyzing, and visualizing social media data.

http://peopleanalytics.org/socrates/front-end