This project was done as a part of a dissertation submitted for KCL MSc Data Science, 2018, and scored a Distinction
User histograms are created from a variety of data found in check-in services. One such service which provides readily available information is Foursquare, and their data can be readily found on Kaggle
Paper Abstract
This paper aims to find and test algorithms that would successfully cluster users using Foursquare’s check-in data from New York. Users in the dataset will be represented as a histogram, as a matrix of check-ins, described by the venue categories they have checked into. In this paper, I find a novel method to cluster users histograms according to the semantic spaces they have visited, rather than directly using venue categories they have checked into. This has a two-fold effect of reducing dimensionality of the histograms, and possibly gaining better understanding of user clusters. This paper will contribute to the understanding of clustering users from Location Based Social Networks, which may pave the way for better applications in location recommendation or collaborative filtering in the future.