CRF & Active Learning to Improve Sentiment Identification

Incorporating Conditional Random Fields (CRF) and Active Learning to Improve Sentiment Identification

Many machine learning, statistical, and computational linguistic methods have been developed to identify sentiment in sentences in documents, yielding promising results. However, most of the current state-of-the-art methods focus on individual sentences and ignore the impact of context on the meaning of a sentence. In this paper, we propose a method based on conditional random fields to incorporate sentence structure and context information in addition to syntactic information. We also investigate how human interaction affects the accuracy of sentiment labeling using limited training data. We propose and evaluate two different active learning strategies for labeling sentiment data.

Publication 35th ACM SIGIR 2012 Kunpeng Zhang, Yusheng Xie, Yu Cheng, Doug Downey, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. Sentiment Identification by Incorporating Syntactics, Semantics and Context Information
Publication under journal review Incorporating Conditional Random Fields and Active Learning to Improve Sentiment Identification

Main Challenges

How to take full advantage of the sentence structure;
How to use context information to capture the relationship among sentences and to improve document-level sentiment classification;
How to account for Internet language word set and emoticons;
How to incorporate human interaction to improve sentiment identification accuracy and construct a large training dataset.

Why CRF Model

We want to capture the context information (e.g., neighboring sentences or sentences connected by transition words) among sentences in a document. The procedure of sentiment identification therefore becomes a kind of sequence labeling.
The goal of the model is to give a label to each sentence corresponding to the sentence sequence. We use CRF as a tool to model this sequence labeling problem.