lyuwenyu / RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PyTorch code training may have memory leak

DrRyanHuang opened this issue · comments

commented

image
image

I encounter memory overflow on another server, leading to system freeze, which may cause the following problems:

( add related issue #93, #172

Can you do more test locally and try to solve this problem?

commented

2 days ago, I used gc to analyze memory leaks.
It seemed that the data set was not released after training/eval for one epoch, but I was very unsure because I didn't have enough time to do it.

image

image

Hope this helps you solve this problem, I add these codes after train_one_epoch.

    # if cuda_empty_cache:
    #     del metric_logger
    #     gc.collect()
    #     # torch.cuda.empty_cache()
    
    # print(f"Number of objects in gc.garbage: {len(gc.garbage)}")

    # ann = []
    # for cycle in cycles:
    #     if isinstance(cycle, dict) and 'bbox' in cycle:
    #         ann.append(cycle)

    # for obj in ann: 
    #     referrers = gc.get_referrers(obj)
    #     print(f"Referrers of {obj}: {referrers}")
    #     break

2 days ago, I used gc to analyze memory leaks. It seemed that the data set was not released after training/eval for one epoch, but I was very unsure because I didn't have enough time to do it.

image

image

Hope this helps you solve this problem, I add these codes after train_one_epoch.

    # if cuda_empty_cache:
    #     del metric_logger
    #     gc.collect()
    #     # torch.cuda.empty_cache()
    
    # print(f"Number of objects in gc.garbage: {len(gc.garbage)}")

    # ann = []
    # for cycle in cycles:
    #     if isinstance(cycle, dict) and 'bbox' in cycle:
    #         ann.append(cycle)

    # for obj in ann: 
    #     referrers = gc.get_referrers(obj)
    #     print(f"Referrers of {obj}: {referrers}")
    #     break

hello would you mind providing the full file? I'm confused how to use your solution. For example, I don't understand what's contained in the cycles variable.