Giters
qwopqwop200
/
GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
2983
Watchers:
42
Issues:
217
Forks:
457
qwopqwop200/GPTQ-for-LLaMa Issues
add support for minicpm
Updated
3 months ago
GPTQ vs bitsandbytes
Updated
6 months ago
Error when load GPTQ model
Updated
7 months ago
Dependency conflicts for `safetensors`
Closed
7 months ago
Comments count
1
datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}
Updated
8 months ago
Comments count
1
Syntax changed in triton.testing.do_bench() causing error when running llama_inference.py
Updated
9 months ago
_pickle.UnpicklingError: invalid load key, 'v'.
Updated
10 months ago
Comments count
1
inference with the saved model error: AttributeError: module 'torch.backends.cuda' has no attribute 'sdp_kernel'
Updated
10 months ago
Comments count
2
Porting GPTQ to CPU?
Updated
10 months ago
Comments count
2
the inference speed of GPTQ 4bit quantized model
Updated
10 months ago
Comments count
2
Support Mistral.
Updated
a year ago
error: block with no terminator, has llvm.cond_br %5624, ^bb2, ^bb3
Updated
a year ago
neox.py needs to add "import math"
Updated
a year ago
LoRa and diff with bitsandbytes
Updated
a year ago
Transformers broke again (AttributeError: 'GPTQ' object has no attribute 'inp1')
Updated
a year ago
Comments count
1
Would GPTQ be able to support LLaMa2?
Updated
a year ago
Comments count
1
Can i quantize HF version of llama model
Updated
a year ago
Why does the model quantization prompt KILLED at the end?
Updated
a year ago
Comments count
2
Help: Quantized llama-7b model with custom prompt format produces only gibberish
Updated
a year ago
Comments count
1
Proposed changes to reduce VRAM usage. Potentially quantize larger models on consumer hardware.
Updated
a year ago
Comments count
3
Issue with GPTQ
Updated
a year ago
Comments count
1
High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.
Updated
a year ago
Comments count
1
An error is reported when running python setup_cuda.py install
Updated
a year ago
Comments count
2
can it support openllama model?
Closed
a year ago
Could not obtain official perplexity using bloom_eval()
Updated
a year ago
llama_inference 4bits error
Updated
a year ago
AttributeError: 'QuantLinear' object has no attribute 'weight' (t5 branch) (Google/flan-ul2)
Closed
a year ago
Comments count
2
CUDA out of memory on flan-ul2
Closed
a year ago
Comments count
1
[Question] What is the expected discrepancy between simulated and actually computed values?
Updated
a year ago
Comments count
4
The detected CUDA version (12.1) mismatches the version that was used to compile PyTorch (11.7)
Updated
a year ago
Comments count
2
Sample code does not work
Updated
a year ago
Comments count
2
SqueezeLLM support?
Updated
a year ago
What is the right perplexity number?
Updated
a year ago
Finetuning Quantized LLaMA
Updated
a year ago
compare with llama.cpp int4 quantize?
Updated
a year ago
How to quantize bloom after lora/ptuning?
Updated
a year ago
AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'
Closed
a year ago
Comments count
2
I use python llama.py to generate a quantized model, but I can't find the .safetensors model
Closed
a year ago
Comments count
1
Wondering whether some of the triton or cuda kernel also speedup fp16 or not?
Updated
a year ago
Errors encountered when running benchmark FP16 baseline on multiple GPUs
Updated
a year ago
Comments count
2
Does this work for gptj specifically the cuda branch? Thanks!
Updated
a year ago
Does not support 3bit quantization?
Updated
a year ago
No CUDA_ENV / conda-froce cudatoolkit-dev freezes
Closed
a year ago
Unable to run 'python setup_cuda.py install'
Updated
a year ago
Build issue with newer torch pybind11 cast.h - workaround inside
Updated
a year ago
6-bit quantization
Updated
a year ago
Comments count
1
no module named quant_cuda (fastest-inference-4bit branch)
Updated
a year ago
Comments count
1
fastest-inference-4bit fails to build
Closed
a year ago
Comments count
3
Giepeto
Closed
a year ago
Benchmark broken on H100
Updated
a year ago
Previous
Next