Giters
dropreg
/
R-Drop
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
864
Watchers:
5
Issues:
32
Forks:
107
dropreg/R-Drop Issues
Clarification on Using Concatenated Input for R-Drop Training
Closed
6 months ago
Comments count
1
Training configuration for the WMT14 EnDe dataset?
Closed
2 years ago
Comments count
5
How to use the data parallel in r-drop.
Closed
2 years ago
Comments count
1
Question of the proof
Closed
2 years ago
Comments count
1
Some question about reproducing GLUE
Updated
2 years ago
Can not reproduce following the hyperparameter in the paper for finefuning ViT on Cifar100
Closed
2 years ago
Comments count
4
How the `warmup steps` affects the performance?
Closed
2 years ago
Comments count
2
kl loss in ViT example supposed to be divided by 2?
Closed
2 years ago
Comments count
1
Can I use R-Drop in Semantic Search?
Closed
2 years ago
Comments count
1
Unable to preprocess data for summarization
Closed
2 years ago
Comments count
2
error: argument --task: invalid choice: 'rdrop_translation'
Closed
2 years ago
Comments count
1
pip install --editable .报错
Closed
2 years ago
Comments count
1
can not reproduce the results following the hyparameters in the paper
Closed
2 years ago
Comments count
1
About the implementation in transformers, where the reduction in ce_loss uses the mean (by default), while KL uses the reduction is sum ?
Closed
2 years ago
Comments count
1
Inconsistency for KL loss and CE loss hyper-parameters and baselines results in GLUE
Closed
3 years ago
Comments count
5
JS divergence in the research paper?
Closed
2 years ago
Comments count
1
R-drop makes my model broken.
Closed
3 years ago
Comments count
9
Where is R-Drop code in R-Drop/huggingface_transformer_src/bert_rdrop/run_glue.py?
Closed
3 years ago
Comments count
6
difference between R-Drop and SimCse + Smart
Closed
3 years ago
Comments count
1
unable to reproduce results on GLUE
Closed
3 years ago
Comments count
2
Can mseloss replace KL divergence?
Closed
3 years ago
Comments count
1
What's Wrong with my TensorFlow (1.14 or 1.15) implementation?
Closed
3 years ago
Comments count
2
Will KLD loss degrease very fast?
Closed
3 years ago
Comments count
9
Readme File for RoBerta example.
Closed
3 years ago
Comments count
1
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED
Closed
3 years ago
Comments count
2
Summarization task fails with 'Trying to backward through the graph a second time'
Closed
3 years ago
Comments count
2
A simple way to double the impact of R-Drop
Closed
3 years ago
Comments count
3
what the dropout should be set when we predict or test?
Closed
3 years ago
Comments count
2
Fairseq tasks install work?
Closed
3 years ago
Comments count
1
Why you use (p, q_tec) and (q, p_tec) rather than (p, q) and (q, p) to compute kl-loss?
Closed
3 years ago
What are the core code lines of R-Drop? Thank you very much.
Closed
3 years ago
Comments count
1
What are the core code lines of R-Drop? Thank you very much.
Closed
3 years ago