haochen-rye / HNeRV

I have tried encoding and decoding a video using the reference software and it seems that, in the comparisons generated, original and quantized decoded frames are not synchronized. This happens when decoding the video 'bunny' using the provided weights as well. This is the comparison image for the first frame, named "pred_0000_13.83.png":

I have run the following command, which is the one reported in the README:

python train_nerv_all.py  --outf 1120  --data_path data/bunny --vid bunny      --conv_type convnext pshuffel --act gelu --norm none  --crop_list 640_1280      --resize_list -1 --loss L2  --enc_strds 5 4 4 2 2 --enc_dim 64_16     --dec_strds 5 4 4 2 2 --ks 0_1_5 --reduce 1.2      --modelsize 1.5  -e 300 --eval_freq 30  --lower_width 12 -b 2 --lr 0.001    --eval_only --weight checkpoints/hnerv-1.5m-e300.pth    --quant_model_bit 8 --quant_embed_bit 6     --dump_images --dump_videos

The GIF file is not synchronized as well. This problem does not seem to affect the unquantized predictions. What could the problem be? I have installed required dependencies using the provided file.

Hardware specifications:

GPU: Tesla K80
Driver Version: 470.141.03
CUDA Version: 11.4

For video decoding, we run two models (un-quantized and quantized one) on the full_dataloader, for quantized model, we use the de-quantized frame embed from quantized one.

HNeRV/train_nerv_all.py

Line 388 in 4872129

for model_ind, cur_model in enumerate(model_list):

Since we shuffled frames for full_dataloader, the resulting de-quantized frame embed (via un-quantized model) is shuffled as well.

HNeRV/train_nerv_all.py

Line 156 in c13d72b

    
           full_dataloader = torch.utils.data.DataLoader(full_dataset, batch_size=args.batchSize, shuffle=(sampler is None),

The decoding frames for quantized model (input de-quantized embed by un-quantized model) is therefore shuffled.

HNeRV/train_nerv_all.py

Line 404 in 4872129

    
           img_out, embed_list, dec_time = cur_model(cur_input, dequant_vid_embed[i] if model_ind else None)

We fix the frame order now for full_dataloader, it should work well now.

HNeRV/train_nerv_all.py

Line 156 in 4872129

    
           full_dataloader = torch.utils.data.DataLoader(full_dataset, batch_size=args.batchSize, shuffle=False,

De-synchronized frames after quantization and decoding