Create a mapreduce spark distributed program that processes a large twitter dataset and generates a set of people profiles. A profile is a vector of terms for every user. Tweets need to possibly enhanced or cleaned, then clustered and then profiles be generated for the users. The main point of the project is the creation of the distributed task for the tweet processing in spark. Some techniques from the literature will be provided.