[Performance] gpu memory doesn't get released when ort session gets deleted

Question

[Performance] gpu memory doesn't get released when ort session gets deleted

baqt99 opened this issue 21 days ago · comments

Describe the issue

when i just load a model session with cuda and cpu providers then set the session to None or clear it, there will still be around 320mb in use in the vram, and i can't find anywhere a method to clear it.

To reproduce

import onnxruntime

session = onnxruntime.InferenceSession(model_path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
session = None #or del session
input('notice how vram still isnt clear until the program terminates, press enter to terminate')

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU, CUDA

Execution Provider Library Version

No response

Model File

webface_r50.onnx from insightface https://drive.google.com/file/d/1N0GL-8ehw_bz2eZQWz2b0A5XBdXdxZhg/view?usp=sharing

Is this a quantized model?

Unknown

baqt99 · Answer 1 · Sat May 04 2024 05:11:00 GMT+0800 (China Standard Time)

Solution: using this can free up all vram used by the current script, i found out that making many sessions and deleting them all with del session always leaves the same amount, so this can only be used when no model will run later in the current script, otherwise your vram usage is still necessary and is as low as it can be

i think that this function should be added to the core onnxruntime like torch.cuda.empty_cache()

import ctypes
def clear_cuda():
    cuda = ctypes.CDLL("PATH/TO/CUDART64_version.dll")
    cuda.cudaDeviceReset()