[Feature] why use a GroupVQ not a simple VQ ?
BridgetteSong opened this issue · comments
BridgetteSong commented
Can you try to compress mels to a token seq of shape [L, 1] by a simple VQ like VQ-VAE or FSQ not a GroupVQ? if some results you had made, what are reasons for using GVQ?
Leng Yue commented
GVQ can include more info than naive one.
BridgetteSong commented
To keep more info in VQ, we can increase codebook size, egs from 1024 to 8192 like Tortoise, because a token seq of [L, 1] is easy for LM training and optimization. Have you tried to a simple VQ for FishSpeech training? Or in the beginning only GVQ is used?
Leng Yue commented
Two 1024 codebooks = one 1024 * 1024 ~ 1M codebooks. It's not equal to a single codebook with 2048.