WENGSYX/RuATD_multi

Pre-training versus pre-training, Who will win with spear and shield?

Artificial Text Detection with Multiple Training Strategies

Method

In this project, we mainly fine-tune based on mDeberta-v3-base model. In addition, we have also tried Intrustloss , FGM,Childtune in ours experiments.

Intrustloss

Because the samples generated by some models are too similar or even the same as human samples, this is an important interference term for mDeBERTa model.

Therefore, we use the incomplete trust loss function (in trust) of acl2021 paper named entity recognition via noise aware training mechanism with data filter. It can prevent the mDeBERTa from being disturbed by noise

class In_trust_Loss(nn.Module):
    def __init__(self, alpha=1, beta=0.8,delta=0.5, num_classes=14):
        super().__init__()
        self.alpha = alpha
        self.beta = beta
        self.num_classes = num_classes
        self.delta = delta
        self.cross_entropy = torch.nn.CrossEntropyLoss()
        #self.crf = CRF(num_tags= num_classes, batch_first=True)
    def forward(self, logits,labels):

        #loss_mask = labels.gt(0)
        #Loss CRF
        self.num_classes = logits.size(1)
        ce = self.cross_entropy(logits,labels)
        #Loss In_trust
        active_logits = logits.view(-1,self.num_classes)
        active_labels = labels.view(-1)

        pred = F.softmax(active_logits, dim=1)
        pred = torch.clamp(pred, min=1e-7, max=1.0)
        label_one_hot = torch.nn.functional.one_hot(active_labels,self.num_classes).float()
        label_one_hot = torch.clamp(label_one_hot, min=1e-4, max=1.0)
        dce = (-1*torch.sum(pred * torch.log(pred*self.delta + label_one_hot*(1-self.delta)), dim=1))

        # Loss

        loss = self.alpha * ce - self.beta * dce.mean()
        return loss

FGM

FGM can improve the robustness of the model by adding noise to embedding.

(Experiments show that the improvement is not great)

Childtune

Childtune is a regularization method, which freezes most of the parameters in the model to avoid the model forgetting the knowledge learned in the pre training stage

(Experiments show that the improvement is not great)

In the final result, we will add Logits models with different random seeds and different strategies, and take the maximum value as the final result

Code

mDeberta-v3-base model is used by default.

Please put the data into /code/data in advance

The model and experimental records after training will be saved in the /code/paperlog folder.

I used RTX3090 with 24G memory to train the model.

First, you need to run read.py, read the data in data and divide it into train set and valid set

cd code
python read.py

Baseline with mDeBERTa

python mdeberta_baseline.py

Highest score of valid set: 0.6206

Baseline with DeBERTa-v3-large(Pretraining in English)

python mdeberta_baseline.py

Highest score of valid set: 0.6142

FGM with mDeBERTa

python mdeberta_fgm.py

Highest score of valid set: 0.6189

ChildTune with mDeBERTa

python mdeberta_fgm.py

Highest score of valid set: 0.6168

Intrustloss with mDeBERTa

python mdeberta_fgm.py

Highest score of valid set: 0.6221

Result

The model with the highest score will be used to generate results in test.

python test.py

Through model fusion, the highest score of 64.5 can be achieved in the test set.

Cite

@article{liartificial,
  title={Artificial Text Detection with Multiple Training Strategies},
  author={Li, Bin and Weng, Yixuan and Song, Qiya and Deng, Hanjun}
}

WENGSYX / RuATD_multi