keras-team / keras

Deep Learning for humans

Home Page:http://keras.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Different behaviour of Keras 2.0.9 and 2.0.8

mdoulaty opened this issue · comments

With PIP installation of Keras, when using 2.0.9 (latest as of now), when importing Keras with tf backend, it allocates all available GPU resources immediately after import - however, in 2.0.8, the GPU allocation was not happening immediately after importing Keras. Is this an expected behaviour in 2.0.9?

I am seeing the same behavior. The following code works for 2.0.8 but not 2.0.9

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
from keras import backend as K
K.set_session(sess)

Maybe this was implemented:
#8311

I encounter the same problem

Code here https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L50

_LOCAL_DEVICES = device_lib.list_local_devices()

attempts to allocates the GPU resources. Not sure it is an expected behavior

This is a known issue. The specific method call re-registers all the GPUs/resources instead of just counting the number of available devices. I intend to send a patch during the weekend.

The problem is that the device_lib.list_local_devices() initialises a TF session and registers all available GPUs on the system. Judging from the name of the function I believe this is a bug on Tensorflow. I don't see why listing the devices requires registering them in a session.

Reproducing the problem is tricky as you need also more than 1 GPU. Here is a pure TF example which shows the problem:

$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> 
>>> tf.__version__
'1.4.0'
>>> 
>>> config = tf.ConfigProto()
>>> config.gpu_options.per_process_gpu_memory_fraction = 0.9
>>> config.gpu_options.visible_device_list = str('1')
>>> sess = tf.Session(config=config)
2017-11-03 13:02:14.730453: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-11-03 13:02:14.966925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Quadro K2200 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:81:00.0
totalMemory: 3.95GiB freeMemory: 3.91GiB
2017-11-03 13:02:14.967000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0)

As we see it has registered only GPU1. The results can be confirmed using nvidia-smi:

$ nvidia-smi 
Fri Nov  3 13:03:05 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 00000000:03:00.0  On |                  N/A |
| 42%   51C    P0     2W /  39W |    613MiB /  4040MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K2200        Off  | 00000000:81:00.0 Off |                  N/A |
| 42%   41C    P8     1W /  39W |   3676MiB /  4042MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1397      G   /usr/bin/X                                   399MiB |
|    0      2772      G   compiz                                       200MiB |
|    1     16788      C   python                                      3664MiB |
+-----------------------------------------------------------------------------+

On the same Python shell let's call the method to get the list of available GPUs:

>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2017-11-03 13:04:08.611074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Quadro K2200 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:03:00.0
totalMemory: 3.95GiB freeMemory: 3.31GiB
2017-11-03 13:04:08.611198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2017-11-03 13:04:08.611248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 
2017-11-03 13:04:08.611263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y N 
2017-11-03 13:04:08.611275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   N Y 
2017-11-03 13:04:08.611293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Quadro K2200, pci bus id: 0000:03:00.0, compute capability: 5.0)
2017-11-03 13:04:08.611315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13475891616555218543
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3240755200
locality {
  bus_id: 1
}
incarnation: 15732487042202847083
physical_device_desc: "device: 0, name: Quadro K2200, pci bus id: 0000:03:00.0, compute capability: 5.0"
, name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 147456000
locality {
  bus_id: 2
}
incarnation: 7726238831769587034
physical_device_desc: "device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0"
]

Ooops! It just registered both GPUs! Let's confirm with nvidia-smi:

$ nvidia-smi 
Fri Nov  3 13:04:28 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 00000000:03:00.0  On |                  N/A |
| 42%   51C    P0     4W /  39W |   3740MiB /  4040MiB |     14%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K2200        Off  | 00000000:81:00.0 Off |                  N/A |
| 42%   42C    P8     1W /  39W |   3676MiB /  4042MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1397      G   /usr/bin/X                                   410MiB |
|    0      2772      G   compiz                                       200MiB |
|    0     16788      C   python                                      3116MiB |
|    1     16788      C   python                                      3664MiB |
+-----------------------------------------------------------------------------+

As we see the process has acquired also the GPU0 and it's using all the available resources.

To temporarily solve this problem, you can make only specific GPUs visible before placing any keras import:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

from keras.engine import Model

Like this, keras will only use the GPU with ID 1.

This affects Horovod as well. Unfortunately, CUDA_VISIBLE_DEVICES workaround is not desirable, as it prevents NCCL from doing CUDA IPC.

Please take a look at the outstanding fix: #8377

Closing as this is resolved