The goal of that project is to study the performance of Topic Modeling Algorithms on the News Category Dataset.
- Analyse the text corpus ( mean size, types of words used, stopwords, most common words, etc)
- Select 3 methodologies of Topic Modeling/Clustering for our problem
- Define one or multiple metrics to measure the quality of our models
- Make a comparative test between each model
- Conclude on the best methodology to use in our case and identify areas for improvement in our analysis