Optimize for consumer GPU, eg 11GB or 16GB

Question

Optimize for consumer GPU, eg 11GB or 16GB

profintegra opened this issue 4 months ago · comments

I'm not sure it makes sense to load more than one layer from performance standpoint, but using 1.6GB out of 11GB/16GB of typical consumer GPU is not optimal (and super slow).

I've red on haggungface that it doesn't make sense to load more layers because only one layer is evaluated. But may be we can split into bigger chunks (several layers) and it can be done so that multiple layers are evaluated at once?

I can even try doing it by myself, will be nice to get a bit of guidance like: is it even feasible to do, where in the code the optimization should happen, etc