[Performance] gpu memory doesn't get released when ort session gets deleted
baqt99 opened this issue · comments
Describe the issue
when i just load a model session with cuda and cpu providers then set the session to None or clear it, there will still be around 320mb in use in the vram, and i can't find anywhere a method to clear it.
To reproduce
import onnxruntime
session = onnxruntime.InferenceSession(model_path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
session = None #or del session
input('notice how vram still isnt clear until the program terminates, press enter to terminate')
Urgency
No response
Platform
Windows
OS Version
11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.15.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU, CUDA
Execution Provider Library Version
No response
Model File
webface_r50.onnx from insightface https://drive.google.com/file/d/1N0GL-8ehw_bz2eZQWz2b0A5XBdXdxZhg/view?usp=sharing
Is this a quantized model?
Unknown
Solution: using this can free up all vram used by the current script, i found out that making many sessions and deleting them all with del session always leaves the same amount, so this can only be used when no model will run later in the current script, otherwise your vram usage is still necessary and is as low as it can be
i think that this function should be added to the core onnxruntime like torch.cuda.empty_cache()
import ctypes
def clear_cuda():
cuda = ctypes.CDLL("PATH/TO/CUDART64_version.dll")
cuda.cudaDeviceReset()