[Question] Usage about the `adapter-memory-fraction`
thincal opened this issue · comments
LS commented
Feature request
- does
adapter-memory-fraction
include thebase_model
memory ? - what's the difference between
adapter-memory-fraction
andcuda-memory-fraction
? what will happend if both are set ?
Motivation
Just a question.
Your contribution
Just a question.
Btw, maybe we could create a new category of issue for the question ?
LS commented
free_memory = max(
0, total_free_memory - (1 - MEMORY_FRACTION + ADAPTER_MEMORY_FRACTION) * total_gpu_memory
)
logger.info("Memory remaining for kv cache: {} MB", free_memory / 1024 / 1024)
OK, so that the reserved memory of cuda-memory-fraction
and adapter memory are counted into the total usage besides the kv cache.