Noticed bug in STE data augmentation (core/dataset.py#L158-L162)

Question

Noticed bug in STE data augmentation (core/dataset.py#L158-L162)

btamm12 opened this issue 3 years ago · comments

In core/dataset.py#L158-L162

# random crop
width, height = video_data[0].size
f = random.uniform(0.5, 1)
i, j, h, w = RandomCrop.get_params(video_data[0], output_size=(int(height*f), int(width*f)))
video_data = [s.crop(box=(j, i, w, h)) for s in video_data]

[Source]

you pass the arguments (left, upper, width, height) into Image.crop() when it should be (left, upper, right, lower). The result is that the training boxes are smaller than intended.

PyTorch's implementation is the following

def crop(img: Image.Image, top: int, left: int, height: int, width: int) -> Image.Image:
    if not _is_pil_image(img):
        raise TypeError('img should be PIL Image. Got {}'.format(type(img)))

    return img.crop((left, top, left + width, top + height))

[Source]

with (top, left, height, width) the output of RandomCrop.get_params(). I would recommend using the following to avoid argument-conversion mistakes.

# random crop
width, height = video_data[0].size
f = random.uniform(0.5, 1)
crop_module = RandomCrop(size=(int(height*f), int(width*f)))
video_data = [crop_module.forward(img) for img in images]

Visualization (first existing code, then fixed code)

Note: the exact position of the crop should be ignored. Only the image size is relevant. Also notice how the old pictures are generally not square crops.

f == 0.98 : negligible

f == 0.52 : significant

I haven't run your code with this fix, so I don't know how much the results would improve (if at all).

fuankarion · Answer 1 · Wed May 12 2021 01:51:04 GMT+0800 (China Standard Time)

Hi,

Thanks, we also discovered this bug recently. The effect on performance is minimal, but after fixing it the model does converge faster (it needs only 70-80 epochs). I'll post the updated code this week.