This is the group project for class COMP 562 Fall 2019 at UNC. In this project, we help Novelist, which is a publisher founded by UNC alumni to solve a problem in their work. In brief, they define "tones" for each book to help readers find the most appealing books. https://www.ebscohost.com/promoMaterials/NoveList-Guide-to-Story-Elements.pdf
However, their current workflow is to have human assessors to read the book and manually assign the label, which is quite time consuming. Therefore, they turn to us to see whether we could develop some Machine Learning algorithms to automatically tag a book using its metadata.
For report, please refer to the report folder.
For scripts, please refer to the scripts folder.
For tex source file, please refer to the tex folder.
Name | PID | |
---|---|---|
Jiaming Qu | 730205251 | jiaming AT ad DOT unc DOT edu |
Ximing Wen | 730347350 | ximing AT live DOT unc DOT edu |
Jiesong He | 730264869 | j DOT he AT unc DOT edu |
Wan Zhang | 730341932 | wanz63 AT live DOT unc DOT edu |
As we have an agreement with Novelist for data privacy, we do not upload the dataset to a public reporsitory. For anyone who is interested in the project or the data, please reach out to the staff at Novelist. Their contact info could be found at their website: https://www.ebscohost.com/novelist/novelist-contact-us
Unigram model with TF-IDF weighting + Logistic Regression
We appreciate Novelist for providing the data and Dr. Yue Wang from the School of Information and Library Science for surpervising and giving suggestions.