[Question] Usage about the `adapter-memory-fraction`

Question

[Question] Usage about the `adapter-memory-fraction`

thincal opened this issue 3 months ago · comments

LS commented 3 months ago

Feature request

does adapter-memory-fraction include the base_model memory ?
what's the difference between adapter-memory-fraction and cuda-memory-fraction ? what will happend if both are set ?

Motivation

Just a question.

Your contribution

Just a question.

Btw, maybe we could create a new category of issue for the question ?

LS · Answer 1 · Mon Apr 22 2024 17:21:13 GMT+0800 (China Standard Time)

free_memory = max(
    0, total_free_memory - (1 - MEMORY_FRACTION + ADAPTER_MEMORY_FRACTION) * total_gpu_memory
)
logger.info("Memory remaining for kv cache: {} MB", free_memory / 1024 / 1024)

OK, so that the reserved memory of cuda-memory-fraction and adapter memory are counted into the total usage besides the kv cache.