3090量化llama3 70b爆显存

Question

lg123666 opened this issue a month ago · comments

在3090显卡，24g显存，使用lmdeploy lite awq量化llama3 70b在79层爆显存，按照建议增加了PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
在nvidia-smi看只使用了第一张卡，可以使用多张卡吗？