Some Questions and Comments
tom99763 opened this issue · comments
-
Do you consider that instead of the feature map from CNN, using vector-quantized AE (VQVAE) for the future work? I think the result will be surprised due to its feature compression and sampleable properties for image-to-image translation task.
-
It seems like the input-output pixel correlation largely impacts the translation result during early training process (multimodal translation or Animal-to-Human translation). Instead of predicting all at ones, two stage model (first contour, next texture) may improves the result.
Thank you
Hello, thanks for suggestions.
- I think incorporating VQVAE can be a good direction, particularly for saving compute.
- It may, especially if we go to higher resolution. But two-stage approaches are also more cumbersome to train.