Different behaviour of Keras 2.0.9 and 2.0.8
mdoulaty opened this issue · comments
With PIP installation of Keras, when using 2.0.9 (latest as of now), when importing Keras with tf backend, it allocates all available GPU resources immediately after import - however, in 2.0.8, the GPU allocation was not happening immediately after importing Keras. Is this an expected behaviour in 2.0.9?
I am seeing the same behavior. The following code works for 2.0.8 but not 2.0.9
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
from keras import backend as K
K.set_session(sess)
Maybe this was implemented:
#8311
I encounter the same problem
Code here https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L50
_LOCAL_DEVICES = device_lib.list_local_devices()
attempts to allocates the GPU resources. Not sure it is an expected behavior
This is a known issue. The specific method call re-registers all the GPUs/resources instead of just counting the number of available devices. I intend to send a patch during the weekend.
The problem is that the device_lib.list_local_devices()
initialises a TF session and registers all available GPUs on the system. Judging from the name of the function I believe this is a bug on Tensorflow. I don't see why listing the devices requires registering them in a session.
Reproducing the problem is tricky as you need also more than 1 GPU. Here is a pure TF example which shows the problem:
$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>>
>>> tf.__version__
'1.4.0'
>>>
>>> config = tf.ConfigProto()
>>> config.gpu_options.per_process_gpu_memory_fraction = 0.9
>>> config.gpu_options.visible_device_list = str('1')
>>> sess = tf.Session(config=config)
2017-11-03 13:02:14.730453: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-11-03 13:02:14.966925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Quadro K2200 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:81:00.0
totalMemory: 3.95GiB freeMemory: 3.91GiB
2017-11-03 13:02:14.967000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0)
As we see it has registered only GPU1. The results can be confirmed using nvidia-smi:
$ nvidia-smi
Fri Nov 3 13:03:05 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K2200 Off | 00000000:03:00.0 On | N/A |
| 42% 51C P0 2W / 39W | 613MiB / 4040MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro K2200 Off | 00000000:81:00.0 Off | N/A |
| 42% 41C P8 1W / 39W | 3676MiB / 4042MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1397 G /usr/bin/X 399MiB |
| 0 2772 G compiz 200MiB |
| 1 16788 C python 3664MiB |
+-----------------------------------------------------------------------------+
On the same Python shell let's call the method to get the list of available GPUs:
>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2017-11-03 13:04:08.611074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Quadro K2200 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:03:00.0
totalMemory: 3.95GiB freeMemory: 3.31GiB
2017-11-03 13:04:08.611198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2017-11-03 13:04:08.611248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1
2017-11-03 13:04:08.611263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y N
2017-11-03 13:04:08.611275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: N Y
2017-11-03 13:04:08.611293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Quadro K2200, pci bus id: 0000:03:00.0, compute capability: 5.0)
2017-11-03 13:04:08.611315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13475891616555218543
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3240755200
locality {
bus_id: 1
}
incarnation: 15732487042202847083
physical_device_desc: "device: 0, name: Quadro K2200, pci bus id: 0000:03:00.0, compute capability: 5.0"
, name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 147456000
locality {
bus_id: 2
}
incarnation: 7726238831769587034
physical_device_desc: "device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0"
]
Ooops! It just registered both GPUs! Let's confirm with nvidia-smi:
$ nvidia-smi
Fri Nov 3 13:04:28 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K2200 Off | 00000000:03:00.0 On | N/A |
| 42% 51C P0 4W / 39W | 3740MiB / 4040MiB | 14% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro K2200 Off | 00000000:81:00.0 Off | N/A |
| 42% 42C P8 1W / 39W | 3676MiB / 4042MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1397 G /usr/bin/X 410MiB |
| 0 2772 G compiz 200MiB |
| 0 16788 C python 3116MiB |
| 1 16788 C python 3664MiB |
+-----------------------------------------------------------------------------+
As we see the process has acquired also the GPU0 and it's using all the available resources.
To temporarily solve this problem, you can make only specific GPUs visible before placing any keras import:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
from keras.engine import Model
Like this, keras will only use the GPU with ID 1.
This affects Horovod as well. Unfortunately, CUDA_VISIBLE_DEVICES
workaround is not desirable, as it prevents NCCL from doing CUDA IPC.
Please take a look at the outstanding fix: #8377
Closing as this is resolved