nomoreoneday / cnews_Suzi

Classified news according to the content of the report

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cnews Chinese text classification

Introduction:

Filtered and generated by Tsinghua University based on historical data of Sina News RSS subscription channel from 2005 to 2011

  • Training set 50000
  • Validation set 5000
  • Test set 10000
  • Vocabulary (Word) 5000

There are 10 categories, which including: 'Sports', 'Finance', 'Real Estate', 'Home', 'Education', 'Technology', 'Fashion', 'Times', 'Games', 'Entertainment'

Method:TextRNN

  • Model detail:
class TextRNN(nn.Module):   
    def __init__(self):
        super(TextRNN, self).__init__()
        # 进行词嵌入
        self.embedding = nn.Embedding(5000, 64)  
        self.rnn = nn.GRU(input_size=64, hidden_size=128, bidirectional=True, batch_first = True,dropout = 0.5)
        self.fc = nn.Sequential(nn.Linear(256,10),
                                #nn.Dropout(0.8),
                                nn.Softmax())
    def forward(self, x):
        x = self.embedding(x)
        x,_ = self.rnn(x)
        x = F.dropout(x,p=0.8)
        x = self.fc(x[:,-1,:])
        return x

Model Accuracy:

  • train accuracy: 0.97
  • Validation accuracy: 0.89

About

Classified news according to the content of the report


Languages

Language:Jupyter Notebook 100.0%