Giters
AnswerDotAI
/
fsdp_qlora
Training LLMs with QLoRA + FSDP
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
1263
Watchers:
20
Issues:
34
Forks:
171
AnswerDotAI/fsdp_qlora Issues
Dual GPU training instantly powers off my desktop
Closed
a month ago
Comments count
6
train.py
Updated
a month ago
Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference
Updated
a month ago
Comments count
2
ValueError report
Updated
a month ago
Question about GPU memory usage.
Updated
a month ago
DeepSeek VL support
Updated
a month ago
train.py script crashes when using HQQ
Updated
a month ago
Comments count
3
How does one load and do inference on fine-tuned LLama 3 using bnb_dora train script?
Updated
a month ago
BOFT support?
Updated
a month ago
Can i use this script to pre-train models?
Updated
a month ago
Fine tuning only runs on CPU
Updated
a month ago
Comments count
4
Issues with LLaMA-3-70B
Closed
a month ago
Comments count
1
ProcessExitedException: process 0 (2x 4090)
Updated
a month ago
Comments count
39
llama3?
Updated
2 months ago
What if I have three graphics cards?
Updated
2 months ago
Comments count
1
Results after running
Updated
2 months ago
How to load the saved model?
Updated
2 months ago
process 0 terminated with signal SIGKILL
Updated
2 months ago
Comments count
4
nan when the input length is large
Updated
2 months ago
Comments count
5
Question about adding / training Mixtral
Updated
2 months ago
Comments count
1
how to inference using 70b? or we need to implement it with the same way to train it by ourself?
Updated
2 months ago
Comments count
1
Why is o_proj not targetted?
Updated
2 months ago
Q on comparison with SFTTrainer
Updated
2 months ago
/opt/conda/conda-bld/pytorch_1708025847130/work/aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
Updated
2 months ago
Bigger context size?
Updated
2 months ago
Torch Compile?
Updated
3 months ago
Example with AMD ROCm/HIP
Closed
3 months ago
Comments count
4
RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase
Closed
3 months ago
Comments count
3
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Updated
3 months ago
Running into CUDA out of memory with hqq_lora
Closed
3 months ago
Comments count
3
bugs for fine-tune fsdp multinode
Updated
3 months ago
Comments count
1
NCCL issue training with two GPUs
Updated
3 months ago
Comments count
2
Training from e
Closed
3 months ago
Comments count
1
License
Closed
3 months ago