drpout
riyajatar37003 opened this issue · comments
Hi,
where exactly dropout is being applied ? can anyone point to code/file.
thanks
Hi,
We reuse the dropouts implemented by Huggingface's transformers, which are applied to attention blocks and hidden states of each transformer layer. See modeling_bert.py and modeling_roberta.py from transformers' source code for details.
Thanks got it.