Abour the GPU memory using

Question

Abour the GPU memory using

muzi2045 opened this issue 4 years ago · comments

Tried to specfiy the model training on the second GPU card in the server.
But there still allocate some GPU memory in first GPU card.
is there any way to Force the training process only using the single card memory?

BTW: the training start with python3 multiprocess is really slow.

@jhultman

Jacob Hultman · Answer 1 · Thu Apr 02 2020 11:03:42 GMT+0800 (China Standard Time)

Hi, did you try with export CUDA_VISIBLE_DEVICES=1?

Liheng · Answer 2 · Tue Apr 07 2020 15:09:52 GMT+0800 (China Standard Time)

OK, I can force set the CUDA_VISIBLE_DEVICES=1,
but the main reason of it is the datasetloader process some thing in the GPU:0
which this part in proposal_targets.py are compute in a fork process in python3

the train data prepare part can switch to CPU deivce?
And the multiprocessing.set_start_method('spawn') can be comment..

Update:
the proposal target compute part is really slow in CPU...
looking for a faster way to prepare data without GPU.

@jhultman

Jacob Hultman · Answer 3 · Tue Apr 07 2020 19:28:17 GMT+0800 (China Standard Time)

I agree the target assignment needing to run on gpu is a problem. I have some fixes for this:

The proposal target part is probably slow on cpu because of rotated iou computation. But actually, the original SECOND does not use rotated iou for assigning anchors to ground truth. Instead it uses the "nearest standing/lying axis-aligned iou" which i think can run fast on cpu. In my private branch I have replaced the rotated iou with nearest standing/lying iou -- I will push this change hopefully soon.

The database sampling also uses rotated iou for collision check but the dimensions of the IOU matrix are much smaller (groundtruth x groundtruth as opposed to anchors x groundtruth) so I think that can run fast on cpu.

Jacob Hultman · Answer 4 · Tue Apr 07 2020 19:47:35 GMT+0800 (China Standard Time)

Here is the code I use for nearest standing/lying iou.

import math
import torch
from torchvision.ops import boxes as box_ops


def _snap_boxes_axis_aligned(boxes):
    xy, dxy, r = boxes.split([2, 2, 1], -1)
    flip = r.sin().abs() > 1 / math.sqrt(2)
    dxy = torch.where(flip, dxy.flip(-1), dxy)
    boxes = torch.cat((xy, xy + dxy), -1)
    return boxes


def box_iou_snapped(boxes1, boxes2):
    """Boxes in (x, y, dx, dy, theta) format."""
    iou = box_ops.box_iou(
       _snap_boxes_axis_aligned(boxes1),
       _snap_boxes_axis_aligned(boxes2),
    )
    return iou

Liheng · Answer 5 · Tue Apr 07 2020 21:05:01 GMT+0800 (China Standard Time)

thanks for reply!
In my opinion, this anchors matcher with iou can be used in large scale things such as truck, car ...
But the small things such as pedestrian don't need rotated iou compute, just using nearest distance is fine.
@jhultman