To use k-nucleus sampling or beam search when training on mono data?
Jeevesh8 opened this issue · comments
Jeevesh Juneja commented
We use k-nucleus sampling that is differentiable. Should we use this differentiable sampling while training on monolingual data too? Or use the currently implemented beam search only?