Requirements.txt and Trained weights

Question

Requirements.txt and Trained weights

abdur75648 opened this issue 4 months ago · comments

Thanks for the good quality work.
It would be great if you could kindly upload requirements.txt (or specify important library versions).
Also, can the trained weights be also released?

田运杰 · Answer 1 · Fri Feb 16 2024 12:08:39 GMT+0800 (China Standard Time)

Thanks for your attention. We will upload requirements.txt and release the pre-trained weithts.

Abdur Rahman · Answer 2 · Fri Feb 16 2024 14:44:50 GMT+0800 (China Standard Time)

Thanks a lot @sunsmarterjie

Abdur Rahman · Answer 3 · Fri Feb 16 2024 14:54:58 GMT+0800 (China Standard Time)

Somehow, I was able to set up the environment and run the training script.
However, after loading the dataset and the model, when initializing deepspeed distributed training with backend nccl, I'm getting the following error:

[2024-02-15 22:41:11,208] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.1, git-hash=unknown, git-branch=unknown
[2024-02-15 22:41:11,209] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-02-15 22:41:11,209] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-02-15 22:41:21,484] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 4989
[2024-02-15 22:41:38,785] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 4990
[2024-02-15 22:41:38,785] [ERROR] [launch.py:321:sigkill_handler] ['/home/chemical/dual/ch7190150/.conda/envs/chatterbox/bin/python3.1', '-u', 'train_custom1.py', '--local_rank=1', '--version', 'llava-llama-2-13b-chat-lightning-preview'] exits with return code = -11

Found a similar issue here but still couldn't solve, seeing this seems like an issue with NCCL (The user says

NCCL backend in DeepSpeed not yet implemented, deepspeed uses torch.distribute, and uses torch.TorchBackend

but in your code, I see NCCL is being used.

Kindly help if you've any idea on this, I'll be thankful.

田运杰 · Answer 4 · Sat Feb 17 2024 18:21:40 GMT+0800 (China Standard Time)

We do not encounter this issue. You can try our downloaded llava model at: https://huggingface.co/sunsmarterjieleaf/ChatterBox/tree/main/llava-llama-2-13b-chat-lightning-preview

Abdur Rahman · Answer 5 · Sat Feb 17 2024 20:19:54 GMT+0800 (China Standard Time)

Thank you very much