configure the chunk split size

Question

ageorgios opened this issue 6 months ago · comments

Mac M1 Max 32GB user here without ability to bitsandbites quantize

Is there a way configure the chunk size for the inference to be quicker ? I think the 32GB memory is not efficiently used.