speechbrain / speechbrain

A PyTorch-based Speech Toolkit

Home Page:http://speechbrain.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to set asr model trained from zero, tokenizer is 4257_unigram.model

ChasingStar95 opened this issue · comments

part of setting as followed, i don't know how to find tokenizer.ckpt because tokenizer format is of *.model :
pretrained_path: speechbrain/asr-crdnn-rnnlm-librispeech
pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
collect_in: !ref <save_folder>
loadables:
lm: !ref <lm_model>
tokenizer: !ref
model: !ref
paths:
lm: !ref <pretrained_path>/lm.ckpt
tokenizer: !ref <pretrained_path>/tokenizer.ckpt
model: !ref <pretrained_path>/asr.ckpt

If you use the paths argument, you can provide any filepath, so you can just change
tokenizer: !ref <pretrained_path>/tokenizer.ckpt to tokenizer: your/path/to/4257_unigram.model

If you use the paths argument, you can provide any filepath, so you can just change tokenizer: !ref <pretrained_path>/tokenizer.ckpt to tokenizer: your/path/to/4257_unigram.model

if test loss of lm is 1.5 or closed to that, how can i do to fix this problem?

I don't understand, could you elaborate? Why do mention the LM test loss here? Or do you mean that you have a separate issue about your LM training? Please use a separate issue for that and provide some more context.

I don't understand, could you elaborate? Why do mention the LM test loss here? Or do you mean that you have a separate issue about your LM training? Please use a separate issue for that and provide some more context.

ok, thank you