bigcode-project / starcoder2

Home of StarCoder2!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Official Support for GGUF Quantization in BigCode Starcoder2 to Enhance Accessibility and Efficiency

babycommando opened this issue · comments

Dear BigCode team, what a wonderful project!

I am writing this feature request for official implementation of GGUF quantization for Starcoder2 to enhance its adoption with coding platforms and APIs such as Ollama and LMStudio.

Despite the model's advanced capabilities with its versions, its integration and usability in the OpenAI-API style coding ecosystem, including extensions like "Continue" for VSCode, could be significantly improved. The current lack of support for GGUF quantization limits its potential reach and utility.

An official implementation by your team would ensure optimal performance and compatibility, eliminating the need for community-driven workarounds. I urge you to consider this proposal as a step towards making BigCode Starcoder2 a more versatile and inclusive tool for the developer community. Official GGUF quantization could significantly impact its adoption and effectiveness across diverse development environments.

Thank you for your time and consideration of this important enhancement. I look forward to your positive response and the future success of BigCode Starcoder2.

Hi there, sorry but I'm not sure how this is related to my request. This is exactly what I am complaining about - awful random users sweating to make some ugly cross compatibility that you owners of the project should be taking care about and including it in the release.

How hard is it to start quantizing your own models as well along the release? C'mon 😛

An official implementation by your team would ensure optimal performance and compatibility, eliminating the need for community-driven workarounds.

awful random users sweating to make some ugly cross compatibility that you owners of the project should be taking care about and including it in the release.

I have to disagree, things implemented by the community are in many cases much better than what the authors can come up with, hence the great power of open-source 🙂

And FYI the llama.cpp integration is perfectly functional and was done by an HF employee, you can use it in Ollama as mentioned in the tweet.

Can't see where the GGUF quantized models are? The hugging face repo does not seem contain them yet? I suggest reading the initial post again. Is it hard to start quantizing your own models from day zero? Why would you want someone else to do that for you?

Open source is awesome but this time it looks more like some lazy effort from the team, specially for a project like starcoder2 that had some veery nice monetary support from Nvidia, ServiceNow and others.

The links you mentioned are more like a pull request thing with a lot of problem solving involved. For the mass end user this is barely useful.

The project is beautiful, useful and very well crafted. Hope to see more people using it.

Thanks.