In this project, we mainly fine-tune based on mDeberta-v3-base model. In addition, we have also tried Intrustloss , FGM,Childtune in ours experiments.
Because the samples generated by some models are too similar or even the same as human samples, this is an important interference term for mDeBERTa model.
Therefore, we use the incomplete trust loss function (in trust) of acl2021 paper named entity recognition via noise aware training mechanism with data filter. It can prevent the mDeBERTa from being disturbed by noise
class In_trust_Loss(nn.Module):
def __init__(self, alpha=1, beta=0.8,delta=0.5, num_classes=14):
super().__init__()
self.alpha = alpha
self.beta = beta
self.num_classes = num_classes
self.delta = delta
self.cross_entropy = torch.nn.CrossEntropyLoss()
#self.crf = CRF(num_tags= num_classes, batch_first=True)
def forward(self, logits,labels):
#loss_mask = labels.gt(0)
#Loss CRF
self.num_classes = logits.size(1)
ce = self.cross_entropy(logits,labels)
#Loss In_trust
active_logits = logits.view(-1,self.num_classes)
active_labels = labels.view(-1)
pred = F.softmax(active_logits, dim=1)
pred = torch.clamp(pred, min=1e-7, max=1.0)
label_one_hot = torch.nn.functional.one_hot(active_labels,self.num_classes).float()
label_one_hot = torch.clamp(label_one_hot, min=1e-4, max=1.0)
dce = (-1*torch.sum(pred * torch.log(pred*self.delta + label_one_hot*(1-self.delta)), dim=1))
# Loss
loss = self.alpha * ce - self.beta * dce.mean()
return loss
FGM can improve the robustness of the model by adding noise to embedding.
(Experiments show that the improvement is not great)
Childtune is a regularization method, which freezes most of the parameters in the model to avoid the model forgetting the knowledge learned in the pre training stage
(Experiments show that the improvement is not great)
In the final result, we will add Logits models with different random seeds and different strategies, and take the maximum value as the final result
mDeberta-v3-base model is used by default.
Please put the data into /code/data
in advance
The model and experimental records after training will be saved in the /code/paperlog
folder.
I used RTX3090 with 24G memory to train the model.
First, you need to run read.py
, read the data in data and divide it into train set and valid set
cd code
python read.py
python mdeberta_baseline.py
Highest score of valid set: 0.6206
python mdeberta_baseline.py
Highest score of valid set: 0.6142
python mdeberta_fgm.py
Highest score of valid set: 0.6189
python mdeberta_fgm.py
Highest score of valid set: 0.6168
python mdeberta_fgm.py
Highest score of valid set: 0.6221
python test.py
Through model fusion, the highest score of 64.5 can be achieved in the test set.
@article{liartificial,
title={Artificial Text Detection with Multiple Training Strategies},
author={Li, Bin and Weng, Yixuan and Song, Qiya and Deng, Hanjun}
}