Official Support for GGUF Quantization in BigCode Starcoder2 to Enhance Accessibility and Efficiency

Question

Official Support for GGUF Quantization in BigCode Starcoder2 to Enhance Accessibility and Efficiency

babycommando opened this issue 3 months ago · comments

Baby Commando (JP) commented 3 months ago

Dear BigCode team, what a wonderful project!

I am writing this feature request for official implementation of GGUF quantization for Starcoder2 to enhance its adoption with coding platforms and APIs such as Ollama and LMStudio.

Despite the model's advanced capabilities with its versions, its integration and usability in the OpenAI-API style coding ecosystem, including extensions like "Continue" for VSCode, could be significantly improved. The current lack of support for GGUF quantization limits its potential reach and utility.

An official implementation by your team would ensure optimal performance and compatibility, eliminating the need for community-driven workarounds. I urge you to consider this proposal as a step towards making BigCode Starcoder2 a more versatile and inclusive tool for the developer community. Official GGUF quantization could significantly impact its adoption and effectiveness across diverse development environments.

Thank you for your time and consideration of this important enhancement. I look forward to your positive response and the future success of BigCode Starcoder2.

Loubna Ben Allal · Answer 1 · Tue Mar 05 2024 03:56:55 GMT+0800 (China Standard Time)

Hi, StarCoder2 is now integrated in llama.cpp: ggerganov/llama.cpp#5795
(from: https://x.com/sourab_m/status/1764583142499954942?s=20)

Baby Commando (JP) · Answer 2 · Tue Mar 05 2024 04:48:22 GMT+0800 (China Standard Time)

Hi there, sorry but I'm not sure how this is related to my request. This is exactly what I am complaining about - awful random users sweating to make some ugly cross compatibility that you owners of the project should be taking care about and including it in the release.

How hard is it to start quantizing your own models as well along the release? C'mon 😛

An official implementation by your team would ensure optimal performance and compatibility, eliminating the need for community-driven workarounds.

Loubna Ben Allal · Answer 3 · Tue Mar 05 2024 23:09:32 GMT+0800 (China Standard Time)

awful random users sweating to make some ugly cross compatibility that you owners of the project should be taking care about and including it in the release.

I have to disagree, things implemented by the community are in many cases much better than what the authors can come up with, hence the great power of open-source 🙂

And FYI the llama.cpp integration is perfectly functional and was done by an HF employee, you can use it in Ollama as mentioned in the tweet.

Baby Commando (JP) · Answer 4 · Wed Mar 06 2024 00:00:30 GMT+0800 (China Standard Time)

Can't see where the GGUF quantized models are? The hugging face repo does not seem contain them yet? I suggest reading the initial post again. Is it hard to start quantizing your own models from day zero? Why would you want someone else to do that for you?

Open source is awesome but this time it looks more like some lazy effort from the team, specially for a project like starcoder2 that had some veery nice monetary support from Nvidia, ServiceNow and others.

The links you mentioned are more like a pull request thing with a lot of problem solving involved. For the mass end user this is barely useful.

The project is beautiful, useful and very well crafted. Hope to see more people using it.

Thanks.