Very slow loss.backward() when running PointMLP on custom task

Question

Very slow loss.backward() when running PointMLP on custom task

kaimingkuang opened this issue 2 years ago · comments

Hi,

I am trying to adopt the PointMLP in classification_ModelNet40/models/pointmlp.py on my own task (the default hyperparameter setting). However, the loss.backward() gets super slow (around 9 seconds for one backward for one batch of 64 pointclouds with 1024 points). When I run your own ModelNet40 experiments with the same configs and hardware/software environment, the training speed is normal. Here is my code:

        self.model.train()

        for i, sample in enumerate(self.dl_train):
            self.optimizer.zero_grad()

            pc = sample["xyz"]

            img_feats = sample["img"]
            pc = pc.cuda()
            img_feats = img_feats.cuda()
            pc_feats = self.model(pc)

            pc2img_loss = self.criterion(pc_feats, img_feats)

            pc2img_loss.backward()

            self.optimizer.step()

The loss function is a simple contrastive loss:

class ContrastiveLoss(nn.Module):

    def forward(self, feat_0, feat_1, labels=None):
        feat_0 = F.normalize(feat_0, dim=1)
        feat_1 = F.normalize(feat_1, dim=1)
        dot_prods = torch.einsum("mi,ni->mn", feat_0, feat_1)
        loss_0_1 = -F.log_softmax(dot_prods, dim=0).diag().mean()
        loss_1_0 = -F.log_softmax(dot_prods, dim=1).diag().mean()
        loss = 0.5 * (loss_0_1 + loss_1_0)

        return loss

Here is my hardware/software configs:
CPU: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
GPU: NVIDIA A100
CUDA: 11.1
PyTorch: 1.8.1
Python: 3.7.16
Can you help me with it? Many thanks.

kaimingkuang · Answer 1 · Thu Feb 16 2023 07:43:44 GMT+0800 (China Standard Time)

Switched to PyTorch 1.12.1 and it runs so much faster...