LeoYao / StackExchangeQuestionClassifier

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

StackExchangeQuestionClassifier

Nowadays, Q&A communities are becoming more and more popular. Almost all of those communities allow users to tag the topic (category) of their questions. The main purpose of having topic tags is to help users find posts which focus on a specific area from the extremely large question pool. Despite such benefits, users may still feel troublesome to add tags to their posts. One of the reasons is that people don’t know which tag is appropriate for their posts. Another reason for this can be that people just simply feel it troublesome to think of and add tags to their posts. Our project intends to address this problem by building a classifier to find appropriate topics for questions based on their text content. This classifier can potentially be used as a tag recommendation engine or topic tag generator in real scenarios.

We used about 22000 questions (attributes include title, excerpt and topic) from 10 different topics as training and testing data. To preprocess our raw data, we use several technology including document­term matrix and word pruning. Then we will apply Naive Bayes, SVM (Support Vector Machine), Decision Tree, ANN (Artificial Neural Network) and k­NN (k­Nearest Neighbor) algorithms on the data. Together with cross validation, we then compare the performance (prediction, error rate, recall, F­Score) of these algorithms. To boost our classifiers’ performance, we introduced LSA (Latent Semantic Analysis) and Ensemble Method.

About


Languages

Language:R 98.9%Language:Python 1.1%