Giters
microsoft
/
DeepSpeedExamples
Example models using DeepSpeed
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
5761
Watchers:
75
Issues:
522
Forks:
976
microsoft/DeepSpeedExamples Issues
CPU OOM when inferencing Llama3-70B-Chinese-Chat
Updated
12 days ago
Confusion about Deepspeed Inference
Updated
15 days ago
Comments count
1
cannot pickle 'Stream' object
Updated
15 days ago
can not run the test-gpt.sh because of assertionError
Updated
16 days ago
请问fastgen 是否支持长文本和序列并行推理
Updated
18 days ago
run-example.sh fails with urllib3.exceptions.ProtocolError: Response ended prematurely
Updated
a month ago
Comments count
8
[Error] AutoTune: `connect to host localhost port 22: Connection refused`
Updated
a month ago
How to use deepspeed for multi-node and multi-card task in slurm cluster
Updated
a month ago
Does Zero-Inference support TP?
Updated
a month ago
Comments count
11
Deepspeed support finetune extra model with lora ?
Updated
2 months ago
Comments count
1
不同机器上python环境变量路径不同,deepspeed启动后发现找不到其他机器的python环境,如何解决
Updated
2 months ago
when calculating actor loss, why the mask is "action_mask[:, start: ] "
Closed
2 months ago
The actor constantly generates ['</s>'] or ['<|endoftext|></s>'] after 200 steps in RLHF with hybrid engine disabled
Updated
2 months ago
About multiple-thread attention computation on CPU using zero-inference example.
Updated
2 months ago
Suggested GPU to run the demo code of step2_reward_model_finetuning (DeepSpeed-Chat)
Updated
2 months ago
[REQUEST] More fine-grained distributed strategies for RLHF training
Updated
2 months ago
RLHF problems when using Qwen model
Updated
2 months ago
Comments count
1
The reward value did not increase.
Updated
2 months ago
Comments count
1
`AttributeError: readonly attribute` while trying to run training/HelloDeepSpeed
Updated
2 months ago
Benchmark mii stalled and crashed
Updated
2 months ago
[BUG in Stable Diffusion inference] There's an error on CUDAGraph when using deepspeed inference. How to fix it?
Updated
3 months ago
Comments count
2
[Bug] DeepSpeed Inference Does not Work with LLaMA (Latest verison)
Updated
3 months ago
Comments count
3
zero3 and enable hybrid engine are not suitable for llama2, how to solve it?
Updated
3 months ago
Comments count
1
Codellama finetune
Updated
3 months ago
Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?
Updated
4 months ago
The inaccurate flop results after several rounds
Updated
4 months ago
Comments count
1
How to resume Deepspeed-Chat RLHF step-3 training?
Closed
5 months ago
remove redundant code
Updated
5 months ago
Why is the shape of rm model all 0
Updated
5 months ago
Comments count
2
Question: Why not padding to the same sequence length within the batch during the sft training phase?
Updated
5 months ago
running gpt2-xl/test_tune.sh fails - ParquetConfig.__init__() got an unexpected keyword argument 'token'
Closed
5 months ago
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, remote process exited or there was a network error, NCCL version 2.18.6
Updated
5 months ago
Comments count
3
async_pipeline is not exposed in the library
Updated
5 months ago
Comments count
1
Step3 hanging for a long time
Closed
5 months ago
Comments count
1
Invalidate trace cache @ step 0: expected module 0, but got module 6
Updated
5 months ago
[Step2 RewardModel] Why use the last token as the reward of sentence ?
Closed
5 months ago
Comments count
1
Step3 PPO print error when enable --print_answers
Closed
5 months ago
Comments count
1
step3 use same memory when I increase GPUs
Updated
5 months ago
Comments count
1
[Discussion] Can anyone show the performance on every step with any dataset
Updated
5 months ago
[BUG] DeepSpeed-Chat Step3 - actor model repeats generating the same token when hybrid engine enabled
Updated
6 months ago
Comments count
6
Mistral and Orca Training
Updated
6 months ago
Llama2 as actor using zero_stage3
Updated
6 months ago
Comments count
1
Question: Why did you implemented LoRA on your hand instead of using peft?
Updated
6 months ago
Comments count
1
运行e2e_rlhf时报错
Closed
6 months ago
deeepspeed chat 支持pipline 并行吗?
Updated
6 months ago
Something wrong at step1_supervised_finetuning/main.py
Updated
6 months ago
Should it use global_rank as the condition for shared-disk?
Updated
6 months ago
DeepSpeed-VisualChat Tensor shape mismatch
Updated
6 months ago
Does the DeepSpeedVisualChat model have the capability to locate targets, such as generating coordinates for bounding box positions?
Updated
6 months ago
DeepSpeed-Chat Step-1 training error
Updated
6 months ago
Comments count
1
Previous
Next