Edit decoding to force sample a timestamp after every speaker turn
akashmjn opened this issue · comments
Akash Mahajan commented
This would likely be a small patch to the logit filtering applied during decoding.
Doing so makes for readable transcripts and sets things up for downstream global diarization (clustering).