For line 675 of soundstorm.py.

Question

For line 675 of soundstorm.py.

0417keito opened this issue 8 months ago · comments

https://github.com/ZhangXInFD/soundstorm-speechtokenizer/blob/main/soundstorm_speechtokenizer/soundstorm.py#L675C8

Why is this?
Shouldn't this part be following ?

all_mask_num_tokens = all_mask_num_tokens if q < num_full_sampling_levels else torch.zeros((1, batch_size), dtype = torch.long, device = device)

ZhangXin · Answer 1 · Thu Nov 02 2023 14:21:47 GMT+0800 (China Standard Time)

Thank you very much for pointing out this issue and I have made the necessary corrections. This mistake seems resulting in the audio quality deteriorating as the number of iterations increases. I really appreciate your feedback!