The goal of this project is to build a recommendation system for users who wants to listen to happy or sad songs. LyricsMood is able to predict whether a song is happy or sad based on its lyrics.
- Demo Application
- Data collection (IPython Notebook)
- Model training using binary features (IPython Notebook)
- Model training using TFIDF (IPython Notebook)
- Technical report
- Dataset overview
The happy and sad song details (title and artist) are collected by crawling last.fm. Song details are used to get the proper URL to get the lyrics that are collected by crawling metrolyrics.com. All songs for which lyrics have not been available were removed from the dataset. Furthermore, non-english songs are removed from the dataset. The dataset contains 2.800 songs (1.400 happy songs and 1.400 sad songs).
- Exploratory data analysis
wordcloud Happy:
wordcloud Sad:
- Results
Model performance of the song lyrics classifier. A: Receiver operating characteristic (ROC) curves of the Bernoulli naive Bayes classifier performance using a 2-gram sequence model tfidf as feature vectors for song lyrics classification by mood. The performance was evaluated via 10-fold cross validation on the lyrics dataset. The true positive rate was calculated from songs labeled as happy that were correctly classified, and the false positive rate was calculated from sad songs that were misclassified as happy. B is the confusion matrix of the classifier based on a testing dataset.
The accuracy is: 0,87 +/- 0,06. The performance was evaluated via 10-fold cross validation on the lyrics dataset. Further results can be found in the technical report