The goal of the project is to study the Steam platform gaming community using both standard python, and Big Data tools.
More specifically, considering the users' reviews the objective is to understand, with the statement of some hypotheses, relevant correlation about the players behavior (e.g., identify factors such that a player like a video game, if the user is loyal to the same genre, ...).
The stated hypotheses have firstly been studied using a small portion of the overall Big Data source (obtained using snippet of code written with Big Data tools).
Once seen if the selected hypotheses are statistically significant using the small portion of data, the same hypotheses were replicated on the whole data source (using Spark/HDFS/...) and checked if still relevant.
The used datasets (Kaggle) are the following ones: