MiniCPM-Llama3-V 2.5 int4 版本支持微调吗？

Question

MiniCPM-Llama3-V 2.5 int4 版本支持微调吗？

myBigbug opened this issue a month ago · comments

mybigbug commented a month ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

因为MiniCPM-Llama3-V 2.5 支持微调，但是显卡内存只有24GB，不够使用，所以MiniCPM-Llama3-V 2.5 int4支持微调吗？
目前我微调会得到报错ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:Centos7
- Python: 3.10
- Transformers:4.40.0
- PyTorch:2.1.2
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

pip包
accelerate 0.30.1
addict 2.4.0
aiofiles 23.2.1
altair 5.3.0
annotated-types 0.7.0
anyio 4.4.0
attrs 23.2.0
bitsandbytes 0.42.0
blis 0.7.11
catalogue 2.0.10
certifi 2024.6.2
charset-normalizer 3.3.2
click 8.1.7
cloudpathlib 0.16.0
colorama 0.4.6
confection 0.1.5
contourpy 1.2.1
cycler 0.12.1
cymem 2.0.8
editdistance 0.6.2
einops 0.7.0
et-xmlfile 1.1.0
exceptiongroup 1.2.1
fairscale 0.4.0
fastapi 0.110.3
ffmpy 0.3.2
filelock 3.14.0
fonttools 4.53.0
fsspec 2024.6.0
gradio 4.26.0
gradio_client 0.15.1
h11 0.14.0
httpcore 1.0.5
httpx 0.27.0
huggingface-hub 0.23.3
idna 3.7
importlib_resources 6.4.0
Jinja2 3.1.4
joblib 1.4.2
jsonlines 4.0.0
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
langcodes 3.4.0
language_data 1.2.0
lxml 5.2.2
marisa-trie 1.2.0
markdown-it-py 3.0.0
markdown2 2.4.10
MarkupSafe 2.1.5
matplotlib 3.7.4
mdurl 0.1.2
more-itertools 10.1.0
mpmath 1.3.0
murmurhash 1.0.10
networkx 3.3
nltk 3.8.1
numpy 1.24.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.5.40
nvidia-nvtx-cu12 12.1.105
opencv-python-headless 4.5.5.64
openpyxl 3.1.2
orjson 3.10.4
packaging 23.2
pandas 2.2.2
Pillow 10.1.0
pip 24.0
portalocker 2.8.2
preshed 3.0.9
protobuf 4.25.0
psutil 5.9.8
pydantic 2.7.3
pydantic_core 2.18.4
pydub 0.25.1
Pygments 2.18.0
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
referencing 0.35.1
regex 2024.5.15
requests 2.32.3
rich 13.7.1
rpds-py 0.18.1
ruff 0.4.8
sacrebleu 2.3.2
safetensors 0.4.3
scipy 1.13.1
seaborn 0.13.0
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 70.0.0
shellingham 1.5.4
shortuuid 1.0.11
six 1.16.0
smart-open 6.4.0
sniffio 1.3.1
socksio 1.0.0
spacy 3.7.2
spacy-legacy 3.0.12
spacy-loggers 1.0.5
srsly 2.4.8
starlette 0.37.2
sympy 1.12.1
tabulate 0.9.0
thinc 8.2.4
timm 0.9.10
tokenizers 0.19.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.1.2
torchvision 0.16.2
tqdm 4.66.1
transformers 4.40.0
triton 2.1.0
typer 0.9.4
typing_extensions 4.8.0
tzdata 2024.1
urllib3 2.2.1
uvicorn 0.24.0.post1
wasabi 1.1.3
weasel 0.3.4
websockets 11.0.3
wheel 0.43.0

备注 | Anything else?

No response

whyiug · Answer 1 · Thu Jun 13 2024 19:51:29 GMT+0800 (China Standard Time)

Another question, can you guys (i mean authors) share the quantize scripts? we need the script after sft this model.

1SingleFeng · Answer 2 · Thu Jun 20 2024 11:17:55 GMT+0800 (China Standard Time)

Another question, can you guys (i mean authors) share the quantize scripts? we need the script after sft this model.

我也想知道是如何量化的，请问你得到了吗

mybigbug · Answer 3 · Thu Jun 27 2024 17:40:18 GMT+0800 (China Standard Time)

Another question, can you guys (i mean authors) share the quantize scripts? we need the script after sft this model.

Another question, can you guys (i mean authors) share the quantize scripts? we need the script after sft this model.

Another question, can you guys (i mean authors) share the quantize scripts? we need the script after sft this model.

我也想知道是如何量化的，请问你得到了吗

No, I'm still waiting

Nicky Cheng · Answer 4 · Wed Jul 03 2024 11:14:15 GMT+0800 (China Standard Time)

可以啊

在finetune_.lora.sh
改成
MODEL="openbmb/MiniCPM-Llama3-V-2_5-int4"

--tune_vision false
--deepspeed ds_config_zero3.json

就可以了

mybigbug · Answer 5 · Wed Jul 03 2024 20:23:36 GMT+0800 (China Standard Time)

可以啊

在finetune_.lora.sh 改成 MODEL="openbmb/MiniCPM-Llama3-V-2_5-int4"

--tune_vision false --deepspeed ds_config_zero3.json

就可以了

@nickyisadog
我是使用finetune_ds.sh脚本不是lora脚本微调int-4模型得到了报错，ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details 。请帮忙分析这是什么原因导致的？

Shreyanshu Bhushan · Answer 6 · Fri Jul 05 2024 17:34:42 GMT+0800 (China Standard Time)

@nickyisadog

I am facing this error, RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

I ran with these changes:

In finetune_.lora.sh,
change
MODEL="openbmb/MiniCPM-Llama3-V-2_5-int4"

--tune_vision false
--deepspeed ds_config_zero3.json