What the best practises of using this lib for stt?

Question

What the best practises of using this lib for stt?

Alex-Kopylov opened this issue 4 years ago · comments

I have zero experience in making STT models so please, advise me.

I'm using your open_stt (thanks!) with SeanNaren/deepspeech.pytorch for building STT model. So as you know, I must provide labels for training.

What the intuition behind using string.punctuation and uppercase or lowercase at the same time? Should I provide this(below) as labels or left only space and chars (e.g. lowercase)?

# punctuation + space + rus
self.tgt_vocab = {token: i+5 for i, token in enumerate(punctuation + rus_letters + ' ' + '«»—')}