Filtered and generated by Tsinghua University based on historical data of Sina News RSS subscription channel from 2005 to 2011
- Training set 50000
- Validation set 5000
- Test set 10000
- Vocabulary (Word) 5000
There are 10 categories, which including: 'Sports', 'Finance', 'Real Estate', 'Home', 'Education', 'Technology', 'Fashion', 'Times', 'Games', 'Entertainment'
- Model detail:
class TextRNN(nn.Module):
def __init__(self):
super(TextRNN, self).__init__()
# 进行词嵌入
self.embedding = nn.Embedding(5000, 64)
self.rnn = nn.GRU(input_size=64, hidden_size=128, bidirectional=True, batch_first = True,dropout = 0.5)
self.fc = nn.Sequential(nn.Linear(256,10),
#nn.Dropout(0.8),
nn.Softmax())
def forward(self, x):
x = self.embedding(x)
x,_ = self.rnn(x)
x = F.dropout(x,p=0.8)
x = self.fc(x[:,-1,:])
return x
- train accuracy: 0.97
- Validation accuracy: 0.89