haochen-rye / HNeRV

Official Pytorch implementation for HNeRV: a hybrid video neural representation (CVPR 2023)

Home Page:https://haochen-rye.github.io/HNeRV/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

De-synchronized frames after quantization and decoding

aegroto opened this issue · comments

I have tried encoding and decoding a video using the reference software and it seems that, in the comparisons generated, original and quantized decoded frames are not synchronized. This happens when decoding the video 'bunny' using the provided weights as well. This is the comparison image for the first frame, named "pred_0000_13.83.png":

pred_0000_13 83

I have run the following command, which is the one reported in the README:

python train_nerv_all.py  --outf 1120  --data_path data/bunny --vid bunny      --conv_type convnext pshuffel --act gelu --norm none  --crop_list 640_1280      --resize_list -1 --loss L2  --enc_strds 5 4 4 2 2 --enc_dim 64_16     --dec_strds 5 4 4 2 2 --ks 0_1_5 --reduce 1.2      --modelsize 1.5  -e 300 --eval_freq 30  --lower_width 12 -b 2 --lr 0.001    --eval_only --weight checkpoints/hnerv-1.5m-e300.pth    --quant_model_bit 8 --quant_embed_bit 6     --dump_images --dump_videos

The GIF file is not synchronized as well. This problem does not seem to affect the unquantized predictions. What could the problem be? I have installed required dependencies using the provided file.

Hardware specifications:

GPU: Tesla K80
Driver Version: 470.141.03
CUDA Version: 11.4

commented

For video decoding, we run two models (un-quantized and quantized one) on the full_dataloader, for quantized model, we use the de-quantized frame embed from quantized one.

for model_ind, cur_model in enumerate(model_list):

Since we shuffled frames for full_dataloader, the resulting de-quantized frame embed (via un-quantized model) is shuffled as well.
full_dataloader = torch.utils.data.DataLoader(full_dataset, batch_size=args.batchSize, shuffle=(sampler is None),

The decoding frames for quantized model (input de-quantized embed by un-quantized model) is therefore shuffled.
img_out, embed_list, dec_time = cur_model(cur_input, dequant_vid_embed[i] if model_ind else None)

We fix the frame order now for full_dataloader, it should work well now.
full_dataloader = torch.utils.data.DataLoader(full_dataset, batch_size=args.batchSize, shuffle=False,