Qs

Question

Qs

zws98 opened this issue 3 months ago · comments

I trained MOE on 8 gpus with 8 experts. When I conducted the inference in parallel, I found each process had a similar but different result. I would like to ask you what could be the cause of this?

ghostplant · Answer 1 · Tue Apr 16 2024 17:11:31 GMT+0800 (China Standard Time)

Maybe you can consider if drop-less MOE mode can solve your issue, which is achieved by setting capacity_factor=0

CBSR_WACV_CODE · Answer 2 · Tue Apr 16 2024 17:18:36 GMT+0800 (China Standard Time)

The results are still diverse for each process and the results are different from setting capacity_factor=1.25.

ghostplant · Answer 3 · Wed Apr 17 2024 09:34:55 GMT+0800 (China Standard Time)

Do you have more information? I didn't get what you said.

Outputs from different GPUs:

STEP-10: loss = 21.11541, step_time = 3.628716 sec, perf = 0.08 tflops.

[Summary] Average synchronized step_time = 0.3628715753555298 sec.
STEP-10: loss = 21.11541, step_time = 3.670310 sec, perf = 0.07 tflops.

[Summary] Average synchronized step_time = 0.36703104972839357 sec.
STEP-10: loss = 21.11541, step_time = 3.689584 sec, perf = 0.07 tflops.

[Summary] Average synchronized step_time = 0.3689584493637085 sec.
STEP-10: loss = 21.11541, step_time = 3.675405 sec, perf = 0.07 tflops.

[Summary] Average synchronized step_time = 0.36754045486450193 sec.
STEP-10: loss = 21.11541, step_time = 3.681213 sec, perf = 0.07 tflops.

[Summary] Average synchronized step_time = 0.36812126636505127 sec.
STEP-10: loss = 21.11541, step_time = 3.629702 sec, perf = 0.08 tflops.

[Summary] Average synchronized step_time = 0.3629701852798462 sec.
STEP-10: loss = 21.11541, step_time = 3.700365 sec, perf = 0.07 tflops.

[Summary] Average synchronized step_time = 0.37003653049468993 sec.
STEP-10: loss = 21.11541, step_time = 3.658189 sec, perf = 0.08 tflops.

[Summary] Average synchronized step_time = 0.3658188819885254 sec.