Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Home Page:https://lightning.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: range() arg 3 must not be zero - Need to Identify the Root Cause

YuyaWake opened this issue · comments

Bug description

I am encountering a ValueError: range() arg 3 must not be zero while processing video frames in batches. The relevant code section is provided below.

What version are you seeing the problem on?

v2.0

How to reproduce the bug

## Code Snippet

### Definition of the `VideoDataset` Class


class VideoDataset:
    def __init__(self, frame_batch_size=1, ...):
        self.frame_batch_size = frame_batch_size
        print(f"Initialized frame_batch_size: {self.frame_batch_size}")

        # Other initialization code
        # ...

    def __getitem__(self, idx):
        print(f"self.get_j_frames: {self.get_j_frames}")  # Debug output
        print(f"frame_batch_size before call: {self.get_j_frames.frame_batch_size}")  # Debug output
        
        # Set a default value if frame_batch_size is zero
        if self.get_j_frames.frame_batch_size == 0:
            print("Warning: frame_batch_size is 0, setting to default value 1")
            self.get_j_frames.frame_batch_size = 1

        top_j_sim_video_embeddings_list = self.get_j_frames(df)
        print(f"Video frames for index {idx} fetched")

        video_output_avg = self.video_processor(top_j_sim_video_embeddings_list)
        return video_output_avg


### Initialization of self.get_j_frames
```python
class GetJFrames:
    def __init__(self, frame_batch_size):
        self.frame_batch_size = frame_batch_size
        print(f"Initialized GetJFrames frame_batch_size: {self.frame_batch_size}")

# Initialization within VideoDataset
class VideoDataset:
    def __init__(self, frame_batch_size=1, ...):
        self.frame_batch_size = frame_batch_size
        print(f"Initialized VideoDataset frame_batch_size: {self.frame_batch_size}")

        # Initialize get_j_frames here
        self.get_j_frames = GetJFrames(frame_batch_size)
        print(f"Initialized self.get_j_frames with frame_batch_size: {self.get_j_frames.frame_batch_size}")

        # Other initialization code
        # ...

Error messages and logs

ValueError                                Traceback (most recent call last)
Cell In[95], line 4
      2 print(f"Length of dataset: {len(dataset)}")
      3 print("Fetching first item from dataset...")
----> 4 first_item = dataset[0]
      5 print("First item fetched:", first_item)

File ~/main/reproduct/choi/video_dataset.py:179, in VideoDataset.__getitem__(self, idx)
    177 print(f"self.get_j_frames: {self.get_j_frames}")  # デバッグ出力
    178 print(f"frame_batch_size before call: {self.get_j_frames.frame_batch_size}")  # デバッグ出力
--> 179 top_j_sim_video_embeddings_list = self.get_j_frames(df)
    180 print(f"Video frames for index {idx} fetched")
    182 video_output_avg = self.video_processor(top_j_sim_video_embeddings_list)

File ~/anaconda3/envs/choi_venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
   1530     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1531 else:
-> 1532     return self._call_impl(*args, **kwargs)

File ~/anaconda3/envs/choi_venv/lib/python3.8/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
   1536 # If we don't have any hooks, we want to skip the rest of the logic in
   1537 # this function, and just call forward.
   1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1539         or _global_backward_pre_hooks or _global_backward_hooks
...
--> 259 for i in range(0, len(frame_paths), frame_batch_size):
    260     batch_frame_paths = frame_paths[i:i+frame_batch_size]
    261     batch_frames = [load_image(frame_path).unsqueeze(0) for frame_path in batch_frame_paths]

ValueError: range() arg 3 must not be zero

Environment

Current environment
#- PyTorch Lightning Version: 2.0.8
#- PyTorch Version: 2.3.0
#- Python version: 3.8.18
#- OS: Ubuntu 20.04
#- CUDA/cuDNN version: 11.8
#- How you installed Lightning(: `conda`
#- Running environment of LightningApp: remote server

More info

Debug Output
The following debug output shows that frame_batch_size is zero at some point:

frame_batch_size before call: 0
self.frame_batch_size: 0

What I Have Tried

  1. Added debug statements to trace where frame_batch_size becomes zero.
  2. Set a default value for frame_batch_size when it is zero to prevent the error, but I want to identify the root cause.

Questions

  1. What could be causing frame_batch_size to be zero at initialization or at some point in the code execution?
  2. What are the best practices to prevent such issues where default values are overridden unexpectedly?

All Relevant Code

All relevant code can be downloaded from the following link:

https://note.com/rafo/n/n979dc84fdf14

The issue likely resides within the video_dataset.py file.
Any help or guidance to identify the root cause and fix this issue would be greatly appreciated. Thank you!

@YuyaWake I don't see any direct relationship with Lightning here. Can you provide details why you believe this is a bug caused by Lightning? If you're requesting implementation help, please consider posting in our forum or Discord channels.

--> 259 for i in range(0, len(frame_paths), frame_batch_size):
    260     batch_frame_paths = frame_paths[i:i+frame_batch_size]
    261     batch_frames = [load_image(frame_path).unsqueeze(0) for frame_path in batch_frame_paths]

ValueError: range() arg 3 must not be zero

Have you checked the values to the input of the range function there like it is explained in the error message? frame_batch_size seems to be 0.