- Consists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005.
- Class Labels: 5 (business, entertainment, politics, sport, tech)
- Multinomial Naive Baye
- Convert all letter to lower case
- Remove punctuation
- Tokenize word
- Remove stopwords
- Remove stopwords
- Lemmatize
-
Random swap characters in each significant word collected from training Corpus (others are applied with fix probability )
-
Change some characters to a similar one.
-
Dropout some frequent words.
python3 controller.py
Input | Accuracy | F1 |
---|---|---|
News report | 0.964 | 0.963 |
News report with perturbation | 0.424 | 0.396 |