RuntimeError: CUDA out of memory with 6M nodes, 8M edges on A100 GPU

Question

RuntimeError: CUDA out of memory with 6M nodes, 8M edges on A100 GPU

chi2liu opened this issue 2 years ago · comments

🐛 Bug

|-------------------------------------------------------------------------------------------------------|
    *** Running (`tmp_data.pt`, `unsup_graphsage`, `node_classification_dw`, `unsup_graphsage_mw`)
|-------------------------------------------------------------------------------------------------------|
Model Parameters: 1568
  0%|                                                                                | 0/500 [00:00<?, ?it/s]OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
  0%|                                                                                | 0/500 [00:47<?, ?it/s]
Traceback (most recent call last):
  File "generate_emb.py", line 12, in <module>
    outputs = generator(edge_index, x=x)
  File "/home/chiliu/miniconda3/envs/cogdl/lib/python3.7/site-packages/cogdl/pipelines.py", line 204, in __call__
    model = train(self.args)
  File "/home/chiliu/miniconda3/envs/cogdl/lib/python3.7/site-packages/cogdl/experiments.py", line 216, in train
    result = trainer.run(model_wrapper, dataset_wrapper)
  File "/home/chiliu/miniconda3/envs/cogdl/lib/python3.7/site-packages/cogdl/trainer/trainer.py", line 188, in run
    self.train(self.devices[0], model_w, dataset_w)
  File "/home/chiliu/miniconda3/envs/cogdl/lib/python3.7/site-packages/cogdl/trainer/trainer.py", line 334, in train
    training_loss = self.train_step(model_w, train_loader, optimizers, lr_schedulers, rank, scaler)
  File "/home/chiliu/miniconda3/envs/cogdl/lib/python3.7/site-packages/cogdl/trainer/trainer.py", line 468, in train_step
    loss = model_w.on_train_step(batch)
  File "/home/chiliu/miniconda3/envs/cogdl/lib/python3.7/site-packages/cogdl/wrappers/model_wrapper/base_model_wrapper.py", line 73, in on_train_step
    return self.train_step(*args, **kwargs)
  File "/home/chiliu/miniconda3/envs/cogdl/lib/python3.7/site-packages/cogdl/wrappers/model_wrapper/node_classification/unsup_graphsage_mw.py", line 43, in train_step
    neg_loss = -torch.log(torch.sigmoid(-torch.sum(x.unsqueeze(1).repeat(1, self.num_negative_samples, 1) * x[self.negative_samples], dim=-1))).mean()
RuntimeError: CUDA out of memory. Tried to allocate 11.02 GiB (GPU 0; 39.45 GiB total capacity; 29.23 GiB already allocated; 8.01 GiB free; 30.03 GiB reserved in total by PyTorch)

To Reproduce

Steps to reproduce the behavior:

from cogdl import pipeline
# build a pipeline for generating embeddings using unsupervised GNNs
# pass model name and num_features with its hyper-parameters to this API
import pandas as pd
graph = pd.read_csv("G1.weighted.edgelist", header=None,  sep=' ')
edge_index = graph[[0,1]].to_numpy()
edge_weight = graph[[2]].to_numpy(dtype=np.float16)
e = pd.read_csv("vertex_embeddings.csv", header=None, sep=' ')
x = e.iloc[:, :32].to_numpy(dtype=np.float16)
generator = pipeline("generate-emb", model="unsup_graphsage", no_test=True, num_features=32, hidden_size=16, walk_length=2, sample_size=[4, 2], is_large=True)
outputs = generator(edge_index, x=x)
pd.DataFrame("embeddings.csv")

the graph is 6M nodes, 8M edges on A100 GPU 40Gb

Expected behavior

Environment

CogDL version: 0.5.3
OS (e.g., Linux): ubuntu
Python version: 3.7
PyTorch version: 1.9.1.post3
CUDA/cuDNN version (if applicable): 11.7
Any other relevant information:

Additional context

Yukuo Cen · Answer 1 · Thu Aug 11 2022 14:07:57 GMT+0800 (China Standard Time)

Hi @chi2liu,

Thanks for your interest in CogDL. It seems that the unsupervised graphsage uses full-batch training. We are checking for this issue now.