dropreg / R-Drop

dropreg/R-Drop Issues

Clarification on Using Concatenated Input for R-Drop Training
Closed 6 months ago1
Training configuration for the WMT14 EnDe dataset?
Closed 2 years ago5
How to use the data parallel in r-drop.
Closed 2 years ago1
Question of the proof
Closed 2 years ago1
Some question about reproducing GLUE
Updated 2 years ago
Can not reproduce following the hyperparameter in the paper for finefuning ViT on Cifar100
Closed 2 years ago4
How the `warmup steps` affects the performance?
Closed 2 years ago2
kl loss in ViT example supposed to be divided by 2?
Closed 2 years ago1
Can I use R-Drop in Semantic Search?
Closed 2 years ago1
Unable to preprocess data for summarization
Closed 2 years ago2
error: argument --task: invalid choice: 'rdrop_translation'
Closed 2 years ago1
pip install --editable .报错
Closed 2 years ago1
can not reproduce the results following the hyparameters in the paper
Closed 2 years ago1
About the implementation in transformers, where the reduction in ce_loss uses the mean (by default), while KL uses the reduction is sum ?
Closed 2 years ago1
Inconsistency for KL loss and CE loss hyper-parameters and baselines results in GLUE
Closed 3 years ago5
JS divergence in the research paper?
Closed 2 years ago1
R-drop makes my model broken.
Closed 3 years ago9
Where is R-Drop code in R-Drop/huggingface_transformer_src/bert_rdrop/run_glue.py?
Closed 3 years ago6
difference between R-Drop and SimCse + Smart
Closed 3 years ago1
unable to reproduce results on GLUE
Closed 3 years ago2
Can mseloss replace KL divergence？
Closed 3 years ago1
What's Wrong with my TensorFlow (1.14 or 1.15) implementation?
Closed 3 years ago2
Will KLD loss degrease very fast?
Closed 3 years ago9
Readme File for RoBerta example.
Closed 3 years ago1
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED
Closed 3 years ago2
Summarization task fails with 'Trying to backward through the graph a second time'
Closed 3 years ago2
A simple way to double the impact of R-Drop
Closed 3 years ago3
what the dropout should be set when we predict or test?
Closed 3 years ago2
Fairseq tasks install work?
Closed 3 years ago1
Why you use (p, q_tec) and (q, p_tec) rather than (p, q) and (q, p) to compute kl-loss?
Closed 3 years ago
What are the core code lines of R-Drop? Thank you very much.
Closed 3 years ago1
What are the core code lines of R-Drop? Thank you very much.
Closed 3 years ago