nlp-machine-learning nlp vietnamese-nlp vietnamese-text-classification text-classification text-classification-python vietnamese vietnamese-language

Vietnamese-News-Classification

Dataset included news content and comments in smartphone that crawled from VnExpress.

We use LSTM, BiLSTM, BERT and SVM with TF-IDF, Word2vec and Bag-of-words to classify this documents to positive (labeled as 1), neutral (labeled as 0) and negative (labeled as 2)
Accepted at ICSMB 2020 (International Conference for Small and Medium Business 2020)

With Word2Vec, we used pre-trained model that retrieved from https://github.com/sonvx/word2vecVN

Please feel free to contact us if you want to use our data at: anhthuan1389@gmail.com or if you have any question.

Best regards

Thuan Tran Anh, Faculty of Information Systems, University of Economics and Law, Vietnam National University Ho Chi Minh City
Nhat Nguyen Anh, Faculty of Information Systems, University of Economics and Law, Vietnam National University Ho Chi Minh City
Thanh Bui Xuan, Faculty of Information Systems, University of Economics and Law, Vietnam National University Ho Chi Minh City
An Vo Nguyen Tam, Faculty of Information Systems, University of Economics and Law, Vietnam National University Ho Chi Minh City
Su Le Hoanh, Faculty of Information Systems, University of Economics and Law, Vietnam National University Ho Chi Minh City

About

We use LSTM, BiLSTM, BERT and SVM with TF-IDF, Word2vec and Bag-of-words to classify this documents to positive (labeled as 1), neutral (labeled as 0) and negative (labeled as 2)

nlp-machine-learning nlp vietnamese-nlp vietnamese-text-classification text-classification text-classification-python vietnamese vietnamese-language

MIT License

Languages

Language:Jupyter Notebook 100.0%