Sometimes running on device cuda and sometimes running on device cpu automatically, why?

Question

Sometimes running on device cuda and sometimes running on device cpu automatically, why?

icejean opened this issue 2 months ago · comments

Hi, I'm new to R torch and torch, just try to set up the environmnet and run the classic MNIST example.
But something happens strangely, the 1st network below will run on cuda automatically, while the 2nd will run on cpu.
But if I run the 1st network first, then switch to run the 2nd network, it'll run on cuda too.
Any idea why?

# 1. Set up running environment.
#    WSL2 Ubuntu22 behind the GFW
Sys.setenv(http_proxy = "http://127.0.0.1:7890")
Sys.setenv(https_proxy = "http://127.0.0.1:7890")
# 2、Point to CUDA 11.8+cuDNN 8.9.2 supported by R torch.
Sys.setenv(CUDA_HOME = "/usr/local/cuda-11")
Sys.setenv(LD_LIBRARY_PATH = "/usr/local/lib:/usr/local/cuda-11/lib64")
Sys.setenv(PATH = "/usr/local/cuda-11/bin:$PATH")
# 3. Have a check.
Sys.getenv("http_proxy")
Sys.getenv("https_proxy")
Sys.getenv("CUDA_HOME")
Sys.getenv("LD_LIBRARY_PATH")
Sys.getenv("PATH")
# 4. Path to MNIST dataset.
getwd()
dir <- "./dataset/mnist"

# 5. Load the librarys.
# install.packages("torch")
# install.packages("torchvision")
# install.packages("luz")
library(torch)
library(torchvision)
library(luz)
library(reshape2)
library(ggplot2)

# 6. Check is CUDA is available.
cuda_available <- torch::cuda_is_available()
device <- if (cuda_available) torch_device("cuda:0") else torch_device("cpu")


# 6. Load MNIST dataset.
train_ds <- mnist_dataset(
  dir,
  download = TRUE,
  transform = transform_to_tensor
)

test_ds <- mnist_dataset(
  dir,
  train = FALSE,
  transform = transform_to_tensor
)

train_dl <- dataloader(train_ds, batch_size = 128, shuffle = TRUE)
test_dl <- dataloader(test_ds, batch_size = 128)

# 7. Check the first image.
image <- train_ds$data[1,1:28,1:28]
image_df <- melt(image)
ggplot(image_df, aes(x=Var2, y=Var1, fill=value))+
  geom_tile(show.legend = FALSE) + 
  xlab("") + ylab("") +
  scale_fill_gradient(low="white", high="black")

# 8. Define a network.
net <- nn_module(
  "Net",
  ## The 1st network will be loaded to cuda device automatically.
  # initialize = function() {
  #   self$conv1 <- nn_conv2d(1, 32, 3, 1)
  #   self$conv2 <- nn_conv2d(32, 64, 3, 1)
  #   self$dropout1 <- nn_dropout2d(0.25)
  #   self$dropout2 <- nn_dropout2d(0.5)
  #   self$fc1 <- nn_linear(9216, 128)
  #   self$fc2 <- nn_linear(128, 10)
  # },
  # 
  # forward = function(x) {
  #   x %>%                                  # N * 1 * 28 * 28
  #     self$conv1() %>%                     # N * 32 * 26 * 26
  #     nnf_relu() %>%
  #     self$conv2() %>%                     # N * 64 * 24 * 24
  #     nnf_relu() %>%
  #     nnf_max_pool2d(2) %>%                # N * 64 * 12 * 12
  #     self$dropout1() %>%
  #     torch_flatten(start_dim = 2) %>%     # N * 9216
  #     self$fc1() %>%                       # N * 128
  #     nnf_relu() %>%
  #     self$dropout2() %>%
  #     self$fc2()                           # N * 10
  # }
  ## Epoch 10/10
  ## Train metrics: Loss: 0.0375 - Acc: 0.988                                                                    
  ## Valid metrics: Loss: 0.0272 - Acc: 0.9912
  
  ## The 2nd network will be laoded to cpu device automatically.
  ## If I run the 1st network first, then switch to the 2nd, it'll？be run on cuda device.
  ## Why???????????????????????????????
  
  initialize = function() {
    self$layer1 <- nn_linear(in_features = 784, out_features = 512)
    self$layer2 <- nn_linear(in_features = 512, out_features = 10)
  },
  
  forward = function(x) {
    x %>%
      torch_flatten(start_dim = 2) %>% # start_dim = 2
      self$layer1() %>%
      nnf_relu() %>%
      self$layer2() %>%
      nnf_softmax(dim = 2)
  }
  ## Epoch 10/10
  ## Train metrics: Loss: 1.476 - Acc: 0.9871                                                                     
  ## Valid metrics: Loss: 1.4838 - Acc: 0.9782  
)


# 9. Train
fitted <- net %>%
  setup(
    loss = nn_cross_entropy_loss(),
    optimizer = optim_adam,
    metrics = list(
      luz_metric_accuracy()
    )
  ) %>%
  fit(train_dl, epochs = 10, valid_data = test_dl)

# 10. Predict.
preds <- predict(fitted, test_dl)
preds$shape

icejean · Answer 1 · Wed May 08 2024 17:15:12 GMT+0800 (China Standard Time)

Well, it is my fault. Acutally every thing is O.K, both models are running on device cuda.
The cause is that, I'm running R torch on WSL2 Ubuntu22, and watching the GPU loads with Windows task manager, this leads to some misunderstanding.
The fact is that, CUDA toolkit on WSL2 Ubuntu is a special version that doesn't include Nvidia driver of Ubuntu，Windows will export the Windows Nvidia driver to Ubuntu at /usr/lib/wsl/lib/, and Windows task manager seems doesn't reflect all loads on the Nvidia GPU，especially those running on WSL2 Ubuntu. The main load in Windows task manaager of a Nvidia GPU is copy load, this isn't the fact.
And when I run nvidia-smi -l 1 on WSL2 Ubuntu, I can see the actual load on it, the GPU-Util & Compute M. indexes show that it's really running on the cuda device. May be nvidia-smi only looks for process running on Windows, not including those on WSL2 Ubuntu, it tells that 'No running processes found', but it isn't the truth.
The following is nvidia-smi output and windows task manager snapshot of the 1st model:

The folowing is the snapshot of the 2nd model:

After all, the fan of the GPU is running fast, it means tha the GPU is working hard.