RWKV split CPU & GPU results in high perplexity

Question

RWKV split CPU & GPU results in high perplexity

3outeille opened this issue a year ago · comments

System Info

Using #22797 (comment) PR, I tried to evaluate perplexity on wikitext2 using HuggingFace RWKV but found a weird behavior (gist to reproduce the bug: https://gist.github.com/3outeille/e74ec833ec2800a94325f8dad8e0da3d).

When model is fully loaded on CPU or GPU, perlexity is fine
When some block of RWKV are loaded in CPU and GPU, perplexity is high

Any idea ?

Who can help?

@sgugger, @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

https://gist.github.com/3outeille/e74ec833ec2800a94325f8dad8e0da3d

Expected behavior

Full CPU ✔️ :
- nlls: tensor([2.0129, 2.3220, 2.3500])
- Perplexity: 9.284077644348145
Full GPU ✔️ :
- nlls: tensor([2.0137, 2.3223, 2.3496], device='cuda:0', dtype=torch.float16)
- Perplexity: 9.2890625
Split 🔴 :
- nlls: tensor([15.6641, 15.9141, 16.5469], device='cuda:0', dtype=torch.float16)
- Perplexity: 9312564.0

Ferdinand Mom · Answer 1 · Tue May 23 2023 20:13:00 GMT+0800 (China Standard Time)

@younesbelkada Any update ?

Younes Belkada · Answer 2 · Mon Jun 26 2023 15:46:22 GMT+0800 (China Standard Time)

Hi @3outeille
Sadly I didn't had time to check that out, are you still facing the issue with the latest main branch of transformers & accelerate?

Ferdinand Mom · Answer 3 · Mon Jun 26 2023 17:21:14 GMT+0800 (China Standard Time)

Hi @younesbelkada, I update transformers & accelerate to the latest release version as shown here: https://github.com/3outeille/hf_rwkv_bug/blob/master/requirements.txt and the bug is still here

github-actions · Answer 4 · Thu Jul 20 2023 23:03:08 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.