LyricsMood

A machine learning approach to classify songs' lyrics by mood

The goal of this project is to build a recommendation system for users who wants to listen to happy or sad songs. LyricsMood is able to predict whether a song is happy or sad based on its lyrics.

Links

Summary

Dataset overview

The happy and sad song details (title and artist) are collected by crawling last.fm. Song details are used to get the proper URL to get the lyrics that are collected by crawling metrolyrics.com. All songs for which lyrics have not been available were removed from the dataset. Furthermore, non-english songs are removed from the dataset. The dataset contains 2.800 songs (1.400 happy songs and 1.400 sad songs).

Exploratory data analysis

wordcloud Happy:

wordcloud Sad:

Results

Model performance of the song lyrics classifier. A: Receiver operating characteristic (ROC) curves of the Bernoulli naive Bayes classifier performance using a 2-gram sequence model tfidf as feature vectors for song lyrics classification by mood. The performance was evaluated via 10-fold cross validation on the lyrics dataset. The true positive rate was calculated from songs labeled as happy that were correctly classified, and the false positive rate was calculated from sad songs that were misclassified as happy. B is the confusion matrix of the classifier based on a testing dataset.

The accuracy is: 0,87 +/- 0,06. The performance was evaluated via 10-fold cross validation on the lyrics dataset. Further results can be found in the technical report

About

A machine learning approach to classify songs' lyrics by mood

Languages

Language:Jupyter Notebook 100.0%