Giters
fpgaminer
/
GPTQ-triton
GPTQ inference Triton kernel
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
271
Watchers:
12
Issues:
19
Forks:
20
fpgaminer/GPTQ-triton Issues
GPTQ guide?
Updated
10 months ago
Comments count
1
multi-gpu and triton kernel problem
Closed
a year ago
rotary embedding and layer norm
Updated
a year ago
Comments count
1
question about the quantization formula
Updated
a year ago
Comments count
3
Can I use a CUDA kernel with a model quantized using triton & vice-versa?
Closed
a year ago
Comments count
3
Get C++ when exception when trying to load model
Closed
a year ago
Comments count
5
Cache auto-tuning?
Updated
a year ago
Comments count
3
Does this support non -1 groupsize?
Closed
a year ago
Comments count
1
Apply flash attention
Closed
a year ago
Comments count
1
warmup_autotune and 4090 observations
Closed
a year ago
Comments count
2
percdamp clarification for dummies
Closed
a year ago
Comments count
2
Cuda vs Triton on an RTX 3060 12GB
Updated
a year ago
Comments count
12
num_beams > 1 sometimes breaks inference
Closed
a year ago
Comments count
9
1-bit acceleration support
Updated
a year ago
Comments count
2
Inference throwing: TypeError: forward() got an unexpected keyword argument 'position_ids
Closed
a year ago
Comments count
7
Weight conversion help
Closed
a year ago
Comments count
14
Testing triton on 30b model vs quant_cuda
Closed
a year ago
Comments count
2
Getting "CUDA error: an illegal memory access was encountered" with model.generate
Closed
a year ago
Comments count
5
Needs more VRAM than normal GPTQ CUDA version?
Updated
a year ago
Comments count
3