out of gpu memmory (12GB card running 7b models (8GB) which should be easy but running fast out of memory.

Question

out of gpu memmory (12GB card running 7b models (8GB) which should be easy but running fast out of memory.

PGTBoos opened this issue 6 months ago · comments

well it seams this often crashes, long chats often crash out of cuda memory errors. Tested another chat client which doesnt have this problem, but which was not realy a great chat app. I just like to invest those models, the character part in this is quite strong i think.
Essential its a huge preprompt or so,. well i'm on win 11 have just ran the latest updates.
But as always i run out of memory quick despite my gpu has about 12GB RTX3080Ti
So i was thinking maybe its possible to do before a model load a cache clean something

inside the menu
gc.collect()
torch.cuda.empty_cache()
--
continou loading the model

so at least you can easily reload it again without haveing to reboot PC...

There is also a nice article her https://medium.com/@soumensardarintmain/manage-cuda-cores-ultimate-memory-management-strategy-with-pytorch-2bed30cab1#:~:text=The%20recommended%20way%20is%20to,first%20and%20then%20call%20torch.

about how to handle memory fragmentation in combination with tourch, though i myself am not deeply into this code base.

github-actions · Answer 1 · Tue Dec 19 2023 07:22:13 GMT+0800 (China Standard Time)

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.