microsoft / DeepSpeedExamples

Example models using DeepSpeed

microsoft/DeepSpeedExamples Issues

CPU OOM when inferencing Llama3-70B-Chinese-Chat
Updated 12 days ago
Confusion about Deepspeed Inference
Updated 15 days ago1
cannot pickle 'Stream' object
Updated 15 days ago
can not run the test-gpt.sh because of assertionError
Updated 16 days ago
请问fastgen 是否支持长文本和序列并行推理
Updated 18 days ago
run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely
Updated a month ago8
[Error] AutoTune: `connect to host localhost port 22: Connection refused`
Updated a month ago
How to use deepspeed for multi-node and multi-card task in slurm cluster
Updated a month ago
Does Zero-Inference support TP?
Updated a month ago11
Deepspeed support finetune extra model with lora ?
Updated 2 months ago1
不同机器上python环境变量路径不同，deepspeed启动后发现找不到其他机器的python环境，如何解决
Updated 2 months ago
when calculating actor loss, why the mask is "action_mask[:, start: ] "
Closed 2 months ago
The actor constantly generates ['</s>'] or ['<|endoftext|></s>'] after 200 steps in RLHF with hybrid engine disabled
Updated 2 months ago
About multiple-thread attention computation on CPU using zero-inference example.
Updated 2 months ago
Suggested GPU to run the demo code of step2_reward_model_finetuning (DeepSpeed-Chat)
Updated 2 months ago
[REQUEST] More fine-grained distributed strategies for RLHF training
Updated 2 months ago
RLHF problems when using Qwen model
Updated 2 months ago1
The reward value did not increase.
Updated 2 months ago1
`AttributeError: readonly attribute` while trying to run training/HelloDeepSpeed
Updated 2 months ago
Benchmark mii stalled and crashed
Updated 2 months ago
[BUG in Stable Diffusion inference] There's an error on CUDAGraph when using deepspeed inference. How to fix it?
Updated 3 months ago2
[Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)
Updated 3 months ago3
zero3 and enable hybrid engine are not suitable for llama2, how to solve it?
Updated 3 months ago1
Codellama finetune
Updated 3 months ago
Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?
Updated 4 months ago
The inaccurate flop results after several rounds
Updated 4 months ago1
How to resume Deepspeed-Chat RLHF step-3 training?
Closed 5 months ago
remove redundant code
Updated 5 months ago
Why is the shape of rm model all 0
Updated 5 months ago2
Question: Why not padding to the same sequence length within the batch during the sft training phase?
Updated 5 months ago
running gpt2-xl/test_tune.sh fails - ParquetConfig.__init__() got an unexpected keyword argument 'token'
Closed 5 months ago
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, remote process exited or there was a network error, NCCL version 2.18.6
Updated 5 months ago3
async_pipeline is not exposed in the library
Updated 5 months ago1
Step3 hanging for a long time
Closed 5 months ago1
Invalidate trace cache @ step 0: expected module 0, but got module 6
Updated 5 months ago
[Step2 RewardModel] Why use the last token as the reward of sentence ?
Closed 5 months ago1
Step3 PPO print error when enable --print_answers
Closed 5 months ago1
step3 use same memory when I increase GPUs
Updated 5 months ago1
[Discussion] Can anyone show the performance on every step with any dataset
Updated 5 months ago
[BUG] DeepSpeed-Chat Step3 - actor model repeats generating the same token when hybrid engine enabled
Updated 6 months ago6
Mistral and Orca Training
Updated 6 months ago
Llama2 as actor using zero_stage3
Updated 6 months ago1
Question: Why did you implemented LoRA on your hand instead of using peft?
Updated 6 months ago1
运行e2e_rlhf时报错
Closed 6 months ago
deeepspeed chat 支持pipline 并行吗？
Updated 6 months ago
Something wrong at step1_supervised_finetuning/main.py
Updated 6 months ago
Should it use global_rank as the condition for shared-disk?
Updated 6 months ago
DeepSpeed-VisualChat Tensor shape mismatch
Updated 6 months ago
Does the DeepSpeedVisualChat model have the capability to locate targets, such as generating coordinates for bounding box positions?
Updated 6 months ago
DeepSpeed-Chat Step-1 training error
Updated 6 months ago1