Oracen-zz / MIDAS

Multiple imputation utilising denoising autoencoder for approximate Bayesian inference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPU utilization in AWS

MarKo9 opened this issue · comments

Hi,

Once again thanks for the effort.
Using the previous version of the library on AWS (p2.8xlarge) on a ~250GB dataset and although it seems that all GPUs are available
Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:17.0, compute capability: 3.7)
2018-10-17 06:30:29.402721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:00:18.0, compute capability: 3.7)
2018-10-17 06:30:29.402734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:00:19.0, compute capability: 3.7)
2018-10-17 06:30:29.402745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:00:1a.0, compute capability: 3.7)
2018-10-17 06:30:29.402757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:4) -> (device: 4, name: Tesla K80, pci bus id: 0000:00:1b.0, compute capability: 3.7)
2018-10-17 06:30:29.402768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:5) -> (device: 5, name: Tesla K80, pci bus id: 0000:00:1c.0, compute capability: 3.7)
2018-10-17 06:30:29.402779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:6) -> (device: 6, name: Tesla K80, pci bus id: 0000:00:1d.0, compute capability: 3.7)
2018-10-17 06:30:29.402801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:7) -> (device: 7, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)

when checking the utilization only one GPU is utilized

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:17.0 Off | 0 |
| N/A 77C P0 85W / 149W | 10931MiB / 11439MiB | 60% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 00000000:00:18.0 Off | 0 |
| N/A 54C P0 69W / 149W | 10877MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 On | 00000000:00:19.0 Off | 0 |
| N/A 78C P0 60W / 149W | 10877MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 On | 00000000:00:1A.0 Off | 0 |
| N/A 57C P0 70W / 149W | 10875MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 On | 00000000:00:1B.0 Off | 0 |
| N/A 74C P0 61W / 149W | 10875MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 On | 00000000:00:1C.0 Off | 0 |
| N/A 56C P0 70W / 149W | 10875MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 On | 00000000:00:1D.0 Off | 0 |
| N/A 77C P0 62W / 149W | 10873MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 59C P0 70W / 149W | 10871MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2244 C /home/ubuntu/src/anaconda3/bin/python 10912MiB |
| 1 2244 C /home/ubuntu/src/anaconda3/bin/python 10858MiB |
| 2 2244 C /home/ubuntu/src/anaconda3/bin/python 10858MiB |
| 3 2244 C /home/ubuntu/src/anaconda3/bin/python 10856MiB |
| 4 2244 C /home/ubuntu/src/anaconda3/bin/python 10856MiB |
| 5 2244 C /home/ubuntu/src/anaconda3/bin/python 10856MiB |
| 6 2244 C /home/ubuntu/src/anaconda3/bin/python 10854MiB |
| 7 2244 C /home/ubuntu/src/anaconda3/bin/python 10854MiB |
+-----------------------------------------------------------------------------+

Is the library designed so as to utilize all GPUs available in the system by default?

main library versions:
tensorflow 1.4.0rc0
numpy 1.13.3
pandas 0.20.3 py36h6022372_2
Cuda compilation tools, release 9.0, V9.0.176

Thanks in advance.

@MarKo9 Did you figure this out? I am having the same issue.

Nice manners against people testing your library and bother sharing any issues they found even if they are wrong.
BTW i was concerned and I did code my own solution, just not based on your library.
A MICE-like xgb based custom imputation was way more accurate at least on my dataset (no need to say about the speed), you can test it against it before your next release.

@MarKo9 do you have code fo this? Would be interested in checking it out. Thanks for the reply @ranjitlall