Different behaviour of Keras 2.0.9 and 2.0.8

Question

Different behaviour of Keras 2.0.9 and 2.0.8

mdoulaty opened this issue 7 years ago · comments

Mortaza (Morrie) Doulaty commented 7 years ago

With PIP installation of Keras, when using 2.0.9 (latest as of now), when importing Keras with tf backend, it allocates all available GPU resources immediately after import - however, in 2.0.8, the GPU allocation was not happening immediately after importing Keras. Is this an expected behaviour in 2.0.9?

Ben Lau · Answer 1 · Thu Nov 02 2017 22:01:58 GMT+0800 (China Standard Time)

I am seeing the same behavior. The following code works for 2.0.8 but not 2.0.9

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
from keras import backend as K
K.set_session(sess)

AndrasEros · Answer 2 · Thu Nov 02 2017 22:57:20 GMT+0800 (China Standard Time)

Maybe this was implemented:
#8311

Yen-chi Chen · Answer 3 · Thu Nov 02 2017 23:27:26 GMT+0800 (China Standard Time)

I encounter the same problem

Code here https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L50

_LOCAL_DEVICES = device_lib.list_local_devices()

attempts to allocates the GPU resources. Not sure it is an expected behavior

Vasilis Vryniotis · Answer 4 · Fri Nov 03 2017 17:27:50 GMT+0800 (China Standard Time)

This is a known issue. The specific method call re-registers all the GPUs/resources instead of just counting the number of available devices. I intend to send a patch during the weekend.

Vasilis Vryniotis · Answer 5 · Fri Nov 03 2017 21:32:45 GMT+0800 (China Standard Time)

The problem is that the device_lib.list_local_devices() initialises a TF session and registers all available GPUs on the system. Judging from the name of the function I believe this is a bug on Tensorflow. I don't see why listing the devices requires registering them in a session.

Reproducing the problem is tricky as you need also more than 1 GPU. Here is a pure TF example which shows the problem:

$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> 
>>> tf.__version__
'1.4.0'
>>> 
>>> config = tf.ConfigProto()
>>> config.gpu_options.per_process_gpu_memory_fraction = 0.9
>>> config.gpu_options.visible_device_list = str('1')
>>> sess = tf.Session(config=config)
2017-11-03 13:02:14.730453: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-11-03 13:02:14.966925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Quadro K2200 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:81:00.0
totalMemory: 3.95GiB freeMemory: 3.91GiB
2017-11-03 13:02:14.967000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0)

As we see it has registered only GPU1. The results can be confirmed using nvidia-smi:

$ nvidia-smi 
Fri Nov  3 13:03:05 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 00000000:03:00.0  On |                  N/A |
| 42%   51C    P0     2W /  39W |    613MiB /  4040MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K2200        Off  | 00000000:81:00.0 Off |                  N/A |
| 42%   41C    P8     1W /  39W |   3676MiB /  4042MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1397      G   /usr/bin/X                                   399MiB |
|    0      2772      G   compiz                                       200MiB |
|    1     16788      C   python                                      3664MiB |
+-----------------------------------------------------------------------------+

On the same Python shell let's call the method to get the list of available GPUs:

>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2017-11-03 13:04:08.611074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Quadro K2200 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:03:00.0
totalMemory: 3.95GiB freeMemory: 3.31GiB
2017-11-03 13:04:08.611198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2017-11-03 13:04:08.611248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 
2017-11-03 13:04:08.611263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y N 
2017-11-03 13:04:08.611275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   N Y 
2017-11-03 13:04:08.611293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Quadro K2200, pci bus id: 0000:03:00.0, compute capability: 5.0)
2017-11-03 13:04:08.611315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13475891616555218543
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3240755200
locality {
  bus_id: 1
}
incarnation: 15732487042202847083
physical_device_desc: "device: 0, name: Quadro K2200, pci bus id: 0000:03:00.0, compute capability: 5.0"
, name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 147456000
locality {
  bus_id: 2
}
incarnation: 7726238831769587034
physical_device_desc: "device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0"
]

Ooops! It just registered both GPUs! Let's confirm with nvidia-smi:

$ nvidia-smi 
Fri Nov  3 13:04:28 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 00000000:03:00.0  On |                  N/A |
| 42%   51C    P0     4W /  39W |   3740MiB /  4040MiB |     14%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K2200        Off  | 00000000:81:00.0 Off |                  N/A |
| 42%   42C    P8     1W /  39W |   3676MiB /  4042MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1397      G   /usr/bin/X                                   410MiB |
|    0      2772      G   compiz                                       200MiB |
|    0     16788      C   python                                      3116MiB |
|    1     16788      C   python                                      3664MiB |
+-----------------------------------------------------------------------------+

As we see the process has acquired also the GPU0 and it's using all the available resources.

Alexander Guth · Answer 6 · Fri Nov 03 2017 23:56:20 GMT+0800 (China Standard Time)

To temporarily solve this problem, you can make only specific GPUs visible before placing any keras import:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

from keras.engine import Model

Like this, keras will only use the GPU with ID 1.

Alex Sergeev · Answer 7 · Sun Nov 05 2017 11:39:43 GMT+0800 (China Standard Time)

This affects Horovod as well. Unfortunately, CUDA_VISIBLE_DEVICES workaround is not desirable, as it prevents NCCL from doing CUDA IPC.

François Chollet · Answer 8 · Sun Nov 05 2017 12:06:21 GMT+0800 (China Standard Time)

Please take a look at the outstanding fix: #8377

wt-huang · Answer 9 · Tue Nov 13 2018 08:49:13 GMT+0800 (China Standard Time)

Closing as this is resolved