wav2letter-ctc-pytorch
Wave2Letter (paper) with a waveform input.
The model was trained on LibriSpeech-960. In training, BatchNorm and Dropout were used, which can be fused into the weights to make them compatible with the Wave2Letter
from torchaudio.models
.
Pretrained weights
for model.Wav2Letter
(link)
for torchaudio.models.Wav2Letter
(link)
Greedy decoding
dataset | CER | WER |
---|---|---|
dev-clean | 0.111 | 0.331 |
test-clean | 0.105 | 0.318 |
Example
from torchaudio.models import Wav2Letter
model = Wav2Letter(num_classes=len(labels)).cuda()
model.load_state_dict(torch.load('./pretrained/states_fused.pth'))
Some filter kernels from the first Conv1d layer