how to set asr model trained from zero, tokenizer is 4257_unigram.model

Question

how to set asr model trained from zero, tokenizer is 4257_unigram.model

ChasingStar95 opened this issue 2 years ago · comments

part of setting as followed, i don't know how to find tokenizer.ckpt because tokenizer format is of *.model :
pretrained_path: speechbrain/asr-crdnn-rnnlm-librispeech
pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
collect_in: !ref <save_folder>
loadables:
lm: !ref <lm_model>
tokenizer: !ref
model: !ref
paths:
lm: !ref <pretrained_path>/lm.ckpt
tokenizer: !ref <pretrained_path>/tokenizer.ckpt
model: !ref <pretrained_path>/asr.ckpt

Aku Rouhe · Answer 1 · Mon Oct 25 2021 17:55:17 GMT+0800 (China Standard Time)

If you use the paths argument, you can provide any filepath, so you can just change
tokenizer: !ref <pretrained_path>/tokenizer.ckpt to tokenizer: your/path/to/4257_unigram.model

Chasing Star · Answer 2 · Mon Oct 25 2021 21:51:42 GMT+0800 (China Standard Time)

If you use the paths argument, you can provide any filepath, so you can just change tokenizer: !ref <pretrained_path>/tokenizer.ckpt to tokenizer: your/path/to/4257_unigram.model

if test loss of lm is 1.5 or closed to that, how can i do to fix this problem?

Aku Rouhe · Answer 3 · Mon Oct 25 2021 22:02:33 GMT+0800 (China Standard Time)

I don't understand, could you elaborate? Why do mention the LM test loss here? Or do you mean that you have a separate issue about your LM training? Please use a separate issue for that and provide some more context.

Chasing Star · Answer 4 · Mon Oct 25 2021 22:04:06 GMT+0800 (China Standard Time)

I don't understand, could you elaborate? Why do mention the LM test loss here? Or do you mean that you have a separate issue about your LM training? Please use a separate issue for that and provide some more context.

ok, thank you