gitabtion / text-classification

text classification in traditional ways

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

text classification

text classification by some machine learning algorithm.

model accuracy
SVC with Linear kernel 0.718

download

$git clone git@github.com:gitabtion/text-classification.git

getting start

$cd text-classification

$python3 test.py

dictionary

├── LICENSE
├── README.md
├── data
│   ├── stopwords.txt           # stopword
│   ├── test_set.txt            # testing set
│   ├── test_set_name.txt      
│   ├── train_set.txt           # trainning set
│   └── ver_set.txt             # verification set
├── models
│   ├── __init__.py
│   └── svm.py                  # svm model
├── test.py
└── utils
    ├── __init__.py
    ├── data_helper.py          # preprocess util of primer data which like test_set.txt upon 
    └── extract_samples.py      # extracting samples from ACE data

procedures

extracting samples(optional)

  1. extract sentences for ace chinese data set.
  2. mark up the sentences in following types:
0 1 2 3 4 5 6 7 8
not any class life movement transaction business conflict contact personnel justice

segment to words(optional)

if you using chinese data set, you have to using data_helper.py like:

train_text, train_labels, ver_text, ver_labels, test_text, test_labels = data_helper.get_data_and_labels()

get stopwords

stopwords = data_helper.get_stopwords()

initial models

# svm
model = SVM(train_text, train_labels, ver_text, ver_labels, test_text, test_labels, stopwords)

train, verification and test

model.train()

model.verification()

model.test()

get result

print('verification accuracy: {:.3}'.format(model.ver_acc))
    
print('test accuracy: {:.3}'.format(model.test_acc))

power by

About

text classification in traditional ways

License:Apache License 2.0


Languages

Language:Python 100.0%