X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to finetune this model?

chuangzhidan opened this issue · comments

:)
another thing,will this model support chinese ocr soon?

Hi @chuangzhidan , both the training code and Chinese-and-English model are scheduled for release within this month. Our model is trained based on megatron, so it takes some time to prepare training code based on DeepSpeed. If you are urgent to finetune our model, you can refer to the training code of mPLUG-Owl2 and make some revisions to adjust to our model. Some hyper-parameters can refer to our paper.

thank u for your prompt reply ,because demo doesn't work well at all on chinese, looking forward to your new model :)

Hi @chuangzhidan , both the training code and Chinese-and-English model are scheduled for release within this month. Our model is trained based on megatron, so it takes some time to prepare training code based on DeepSpeed. If you are urgent to finetune our model, you can refer to the training code of mPLUG-Owl2 and make some revisions to adjust to our model. Some hyper-parameters can refer to our paper.

Whether there is a megatron framework training code will be open source?

Hi, @whalefa1I , we are not planning to release the Megatron code.

Hi, @chuangzhidan, We have released training codes for finetuning docowl1.5 in https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/DocOwl1.5. It's temporarily supported by DeepSpeed zero2. You can try fineuning a Chinese model with your own data~