oobabooga / text-generation-webui-extensions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

out of gpu memmory (12GB card running 7b models (8GB) which should be easy but running fast out of memory.

PGTBoos opened this issue · comments

well it seams this often crashes, long chats often crash out of cuda memory errors. Tested another chat client which doesnt have this problem, but which was not realy a great chat app. I just like to invest those models, the character part in this is quite strong i think.
Essential its a huge preprompt or so,. well i'm on win 11 have just ran the latest updates.
But as always i run out of memory quick despite my gpu has about 12GB RTX3080Ti
So i was thinking maybe its possible to do before a model load a cache clean something

  • inside the menu
    gc.collect()
    torch.cuda.empty_cache()
    --
    continou loading the model

so at least you can easily reload it again without haveing to reboot PC...

There is also a nice article her https://medium.com/@soumensardarintmain/manage-cuda-cores-ultimate-memory-management-strategy-with-pytorch-2bed30cab1#:~:text=The%20recommended%20way%20is%20to,first%20and%20then%20call%20torch.

about how to handle memory fragmentation in combination with tourch, though i myself am not deeply into this code base.

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.