#Coursework 1 - Twitter Analysis by Alexandre Novais de Medeiros (alemedeiros)
Student ID 140667280
##Compilation
There is a ant build file available in the root directory. For a full recompilation, use the command
ant clean dist
##Running
All the Hadoop job classes are be included in the dist/TwitterAnalysis.jar
file.
To run the class of each job is:
- Histogram generation: bigdata.twitter.text.Histogram
- Average Tweet length: bigdata.twitter.text.AverageLength
- Time Analysis: bigdata.twitter.time.TimeAnalysis
- Hashtag Counter: bigdata.twitter.hashtag.HashtagCount
- Support Hashtag Analysis: bigdata.twitter.hashtag.HashtagAnalysis
There is a simple shell file which adds a function to run Hadoop jobs. The function runs the job and downloads the merged output from the HDFS.
To use the function, first source the file with source hadoop.sh
then just use
the runhadoop command with the last two names of the desired Hadoop job, i.e.,
runhadoop text.AverageLength
.
##Results
The data generated by the above jobs is available on the output
directory.
There is also a version of the data, sorted using the unix sort
command, in
some cases, there is also a filtered version, that was generated manually.
##Report
The report pdf and it's source files are on the report
directory.
##Repository
The source code for the coursework is also available on a github repository.