What's wrong with this? Do I need to do anything else? It will affect my result?

Question

What's wrong with this? Do I need to do anything else? It will affect my result?

lixiangMindSpore opened this issue 3 years ago · comments

lixiangMindSpore commented 3 years ago

Describe the bug
A clear and concise description of what the bug is.

Environment

Your operating system and version: Ubuntu18.04
Your python version:3.8.12
Your PyTorch version:11.1
How did you install python (e.g. apt or pyenv)? Did you use a virtualenv?:conda create -n torch python=3.8
Have you tried using latest bagua master (python3 -m pip install --pre bagua)?:0.8.1.post1

Reproducing

Please provide a minimal working example. This means the runnable code.

Please also write what exact commands are required to reproduce your results.

Additional context
Add any other context about the problem here.

Shawn · Answer 1 · Thu Oct 28 2021 10:01:38 GMT+0800 (China Standard Time)

The message means the memory layout of your PyTorch tensor is inconsistent with latest PyTorch, so Bagua will fallback to a less efficient way to get a tensor's memory address. It will not affect your result. You can safely ignore it if your training runs fine.

lixiangMindSpore · Answer 2 · Thu Oct 28 2021 10:52:17 GMT+0800 (China Standard Time)

The message means the memory layout of your PyTorch tensor is inconsistent with latest PyTorch, so Bagua will fallback to a less efficient way to get a tensor's memory address. It will not affect your result. You can safely ignore it if your training runs fine.

It will affect my velocity?

Shawn · Answer 3 · Thu Oct 28 2021 10:57:59 GMT+0800 (China Standard Time)

Only about 0.5%-2% difference in training speed in our tests. We actually plan to remove this warning in next release.

lixiangMindSpore · Answer 4 · Thu Oct 28 2021 14:29:21 GMT+0800 (China Standard Time)

Only about 0.5%-2% difference in training speed in our tests. We actually plan to remove this warning in next release.

Now I intend to remove this warning. How can I do?

Shawn · Answer 5 · Thu Oct 28 2021 14:35:10 GMT+0800 (China Standard Time)

Try to launch your program with environment variable LOG_LEVEL=error

Like this

export LOG_LEVEL=error
python -m bagua.distributed.launch ....