ecs-vlc / FMix

Official implementation of 'FMix: Enhancing Mixed Sample Data Augmentation'

Home Page:https://arxiv.org/abs/2002.12047

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

On Using FMIX my dataloader is unable to mix the data in training loop

IamSparky opened this issue · comments

I have used the following training loop for my plant image dataset

def train_loop_fn(data_loader, model, optimizer, device, scheduler=None):
    running_loss = 0.0
    running_corrects = 0
    
    model.train()
    
    alpha, decay_power = 1.0, 3.0
    
    for batch_index,dataset in enumerate(data_loader):
        image = dataset["image"]
        label = dataset["label"]
        
        image = image.to(device, dtype=torch.float)
        label = label.to(device, dtype=torch.float)
        
        image, perm, lambda_value = sample_and_apply(image, alpha, decay_power, (224, 224))
        optimizer.zero_grad()

        outputs = model(image)
        
        loss = loss_fn(outputs, label) * lambda_value + loss_fn(outputs, label[perm]) * (1 - lambda_value)

        loss.backward()
        xm.optimizer_step(optimizer)

        running_loss += loss.item()

    scheduler.step()
            
    train_loss = running_loss / float(len(train_dataset))
    
    xm.master_print('training Loss: {:.4f} '.format(train_loss))

and my dataset class look like this

import cv2
import torch
from torchvision import transforms
import albumentations
from PIL import Image

class leaf_classification(Dataset):
    def __init__(self, ids, image_id, label, mean , std , is_valid):
        self.ids = ids
        self.image_id = image_id
        self.label = label
        self.is_valid = is_valid
        if self.is_valid == 1: # transforms for validation images
            self.aug = albumentations.Compose([
               albumentations.Normalize(mean , std , always_apply = True) 
            ])
        else:                  # transfoms for training images 
            self.aug = albumentations.Compose([
                albumentations.Normalize(mean , std , always_apply = True),
                albumentations.ShiftScaleRotate(shift_limit = 0.0625,
                                                scale_limit = 0.1 ,
                                                rotate_limit = 5,
                                                p = 0.9)
            ]) 
        
    def __len__(self):
        return len(self.ids)
    
    def __getitem__(self, index):
        # converting jpg format of images to numpy array
        img = np.array(Image.open('../input/cassava-leaf-disease-classification/train_images/' + self.image_id[index])) 
        
        img = cv2.resize(img, dsize=(224, 224), interpolation=cv2.INTER_CUBIC)
        img = self.aug(image = img)['image']
        img = np.transpose(img , (2,0,1)).astype(np.float32) # 2,0,1 because pytorch excepts image channel first then dimension of image
        
       
        return {
            'image' : torch.tensor(img, dtype = torch.float) , 
            'label' : torch.tensor(self.label[index], dtype = torch.float)
        }

And its generating the error

while training it generating this error

image

please help me in resolving this error . Here's the link to my notebook

Hi, looks like you're on the right lines. I've created a copy of your notebook and made a few changes here: https://www.kaggle.com/ethanwharris/fmix-cassava-leaf-disease-classification

Changes made

  • used from FMix.fmix import ... instead of cd FMix (which cause the file error as after the cd everything was one level lower)
  • used sample_mask instead of sample_and_apply which used a mask from numpy and didn't seem to work with the XLA device
  • moved model.to(device) as the torchsummary package was moving the model to CPU
  • added train_loss to the scheduler.step call (although this should probably be a val loss)

There are still some errors in eval_loop_fn but these aren't related to FMix

Hope that helps!

Thanks brother , really appreciate your work as well as your help as it started working after I made the necessary changes . But I just wanna know am I going wrong with regards to your point number 4 ?

Hello Ethan ,
I am again started facing error for this line x1, x2 = image * mask, image[perm] * (1 - mask)
getting this error
image
in the function

defining the training loop

def train_loop_fn(data_loader, model, optimizer, device, scheduler=None):
    running_loss = 0.0
    running_corrects = 0
    
    model.train()
    
    alpha, decay_power = 1.0, 3.0
    
    for batch_index,dataset in enumerate(data_loader):
        image = dataset["image"]
        label = dataset["label"]
        
        
        lambda_value, mask = sample_mask(alpha, decay_power, (224, 224), 0.0, False)
        mask = torch.from_numpy(mask).to(device)
        perm = torch.randperm(image.size(0))

        x1, x2 = image * mask, image[perm] * (1 - mask)
        image = x1 + x2
        
        image = image.to(device, dtype=torch.float)
        label = label.to(device, dtype=torch.float)
        
        optimizer.zero_grad()

        outputs = model(image)
        
        loss = loss_fn(outputs, label) * lambda_value + loss_fn(outputs, label[perm]) * (1 - lambda_value)
#         loss = loss_fn(outputs, label)

        loss.backward()
        xm.optimizer_step(optimizer)

        running_loss += loss.item()
            
    train_loss = running_loss / float(len(train_data))
    scheduler.step(train_loss)
    
    return train_loss

don't konw why it was working fine earlier.

Notebook link

Hi, sorry I missed this.

Not sure what the error was here, it looks like the tensors are the wrong sizes. So might need to squeeze / unsqueeze in places to get it to work. Closing this issue as it looks like it's not a bug in our code.