sithu31296 / semantic-segmentation

SOTA Semantic Segmentation Models in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Augmentation configuration

markdjwilliams opened this issue · comments

Presently the augmentations used during training are hard-coded in semseg/augmentations.py. By default horizontal flip is enabled, which will be problematic for datasets where the orientation of the image matters (e.g. facial datasets may label the left and right eyes independently, and flipping the image would also require swapping these two labels).

Are there any future plans to allow for augmentation to be configurable?

All of the augmentations provided in augmentations.py are customized to work well in semantic segmentation. which means in the case of horizontal flip, the segmentation labels will also be flipped along with the image. Take a look at here.

You can customize the augmentation pipeline as you want by manually editing in the Compose function. I think this way is more faster to experiment because each augmentation has its own parameters. So, I don't have a plan to make it configurable.

Flipping the labels along with the image is sufficient for many datasets, but not for those where the labels have specific spatial meaning. So in those cases swapping specific labels is required in addition to flipping the image of the labels. For example, in the case of a symmetrical CelebAHQ-Mask face the current horizontal flip augmentation will result in two conflicting data samples being used during training due to the fact that this particular dataset uses labels such as "left ear" and "right ear".

I don't get what you mean. All of the spatial-level transforms are like that. How will you make them configurable?

Translations of the image and the corresponding labels is not an issue, because the semantics of the labels are not changed by that operation. In the case of a horizontal flip issues arise with any dataset which relies on concepts such as "left" and "right" in its labelling. Flipping the image swaps the meaning of those labels - a screen-left eye would now be incorrectly labeled "left eye" in the case of a face, for example, whereas it should actually be "right eye". A hypothetical dataset which has "top" and "bottom" labels would also suffer the same problem under a vertical flip augmentation.

You will see when training a segmentation model using the CelebAHQ-Mask dataset that the eyes, eyebrows, and ears are frequently mislabelled - the left ear is incorrectly classified as the right ear, etc - in particular when only one of those paired features is visible.

For now I'll simply modify the code to remove that specific augmentation. Configuration could be achieved by leveraging on a package such as mlconfig.

As I said above, the augmentation pipeline is hardcoded intentionally. I am asking because you are emphasizing on spatial-level transforms like horizontal flip and I understood that you want to make them configurable. And the first issue you reported is the label is not swapped in the horizontal flip in your argumentflipping the image would also require swapping these two labels. If the specific augmentation does not suit your case, you can try not to use it at all.