alvalentini / BigDataProject

Big Data course project - UNITN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BigDataProject

Social (Twitter) Data analysis of User Profiling

Create a mapreduce spark distributed program that processes a large twitter dataset and generates a set of people profiles. A profile is a vector of terms for every user. Tweets need to possibly enhanced or cleaned, then clustered and then profiles be generated for the users. The main point of the project is the creation of the distributed task for the tweet processing in spark. Some techniques from the literature will be provided.

About

Big Data course project - UNITN

License:GNU General Public License v3.0


Languages

Language:Java 94.5%Language:Python 5.5%