martinywwan / spark-twitter-streaming

Streaming Twitter data in near real-time using Apache Spark Streaming API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Streaming Twitter data using Apache Spark


Synopsis


Simple Spark application that connects to Twitter and prints twitter messages based on a filter (if any).
The Spark application can be run as a Standalone Application or on Hadoop.

Motivation


The motivation behind this project was to provide support to developers and researchers in connecting to Twitter using Apache Spark.

Execution


Prerequisites:
1)If you are running on Hadoop, ensure ${HADOOP_CONF_DIR} and ${HADOOP_HOME} are set

Instructions to run the application using an IDE:
1) Edit the run configuration to include the following arguments: [args0 - consumerKey] [args1 - consumerSecret] [args2 - accessToken] [args3 - accessTokenSecret]
2) Run the SparkApplication class - Main method is located here (Optional: edit the FILTERS array to filter out the tweets received)

Instructions to run the application on the command line:
1) Ensure maven is installed and enter "mvn clean package"
2) In the target folder, you should see a jar file with dependencies. Run "java -jar [generated_jar].jar [args0 - consumerKey] [args1 - consumerSecret] [args2 - accessToken] [args3 - accessTokenSecret]


About

Streaming Twitter data in near real-time using Apache Spark Streaming API


Languages

Language:Java 100.0%