jhultman / vision3d

Research platform for 3D object detection in PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Abour the GPU memory using

muzi2045 opened this issue · comments

Tried to specfiy the model training on the second GPU card in the server.
But there still allocate some GPU memory in first GPU card.
is there any way to Force the training process only using the single card memory?

BTW: the training start with python3 multiprocess is really slow.

image
@jhultman

Hi, did you try with export CUDA_VISIBLE_DEVICES=1?

OK, I can force set the CUDA_VISIBLE_DEVICES=1,
but the main reason of it is the datasetloader process some thing in the GPU:0
which this part in proposal_targets.py are compute in a fork process in python3
image
the train data prepare part can switch to CPU deivce?
And the multiprocessing.set_start_method('spawn') can be comment..

Update:
the proposal target compute part is really slow in CPU...
looking for a faster way to prepare data without GPU.

@jhultman

I agree the target assignment needing to run on gpu is a problem. I have some fixes for this:

The proposal target part is probably slow on cpu because of rotated iou computation. But actually, the original SECOND does not use rotated iou for assigning anchors to ground truth. Instead it uses the "nearest standing/lying axis-aligned iou" which i think can run fast on cpu. In my private branch I have replaced the rotated iou with nearest standing/lying iou -- I will push this change hopefully soon.

The database sampling also uses rotated iou for collision check but the dimensions of the IOU matrix are much smaller (groundtruth x groundtruth as opposed to anchors x groundtruth) so I think that can run fast on cpu.

Here is the code I use for nearest standing/lying iou.

import math
import torch
from torchvision.ops import boxes as box_ops


def _snap_boxes_axis_aligned(boxes):
    xy, dxy, r = boxes.split([2, 2, 1], -1)
    flip = r.sin().abs() > 1 / math.sqrt(2)
    dxy = torch.where(flip, dxy.flip(-1), dxy)
    boxes = torch.cat((xy, xy + dxy), -1)
    return boxes


def box_iou_snapped(boxes1, boxes2):
    """Boxes in (x, y, dx, dy, theta) format."""
    iou = box_ops.box_iou(
       _snap_boxes_axis_aligned(boxes1),
       _snap_boxes_axis_aligned(boxes2),
    )
    return iou

thanks for reply!
In my opinion, this anchors matcher with iou can be used in large scale things such as truck, car ...
But the small things such as pedestrian don't need rotated iou compute, just using nearest distance is fine.
@jhultman