Memory breakdown table : invalid data type, no region type and no shape for operations in the memory profile tab

Question

Memory breakdown table : invalid data type, no region type and no shape for operations in the memory profile tab

RocaVincent opened this issue 4 years ago · comments

Hi,

I use the profiler with Tensorflow 2 and a NVIDIA Quadrio RTX 6000 GPU and when I inspect the memory usage with my model, I get some operations in the memory breakdown table in the memory profile tab with invalid data type, no shape and no region type. I'm wondering if it's normal or if I have to care about this for my model. Above I put a simple code which reproduces this kind of error.

import tensorflow as tf
import keras

IMAGE_SHAPE = [256,256,3]

def Discriminator():
    return keras.Sequential([
        keras.layers.Flatten(input_shape=IMAGE_SHAPE),
        keras.layers.Dense(1, activation="sigmoid")
    ])

def Generator():
    return keras.Sequential([
        keras.layers.Conv2D(filters=IMAGE_SHAPE[-1], kernel_size=3, strides=1, padding="same", use_bias=False,
                           input_shape=IMAGE_SHAPE)
    ])

generator_BtoA = Generator()
discriminator_A = Discriminator()

loss_obj = keras.losses.MeanSquaredError()

discriminator_A_optimizer = keras.optimizers.Adam(0.0002)

BATCH_SIZE = 32

@tf.function
def train_step():
    # entrainement discriminateurs
    imagesA = tf.random.uniform([BATCH_SIZE]+IMAGE_SHAPE)
    imagesB = tf.random.uniform([BATCH_SIZE]+IMAGE_SHAPE)
    fakesA = generator_BtoA(imagesB, training=False)
    with tf.GradientTape(persistent=True) as tape:
        disc_fakesA = discriminator_A(fakesA, training=True)
        discA_loss = loss_obj(tf.zeros_like(disc_fakesA), disc_fakesA)
    gradients_discA = tape.gradient(discA_loss, discriminator_A.trainable_variables)
    discriminator_A_optimizer.apply_gradients(zip(gradients_discA, discriminator_A.trainable_variables))


from tensorflow.profiler.experimental import Trace as Trace_profiler, start as start_profiler, stop as stop_profiler

start_profiler("toy_logdir/")
with Trace_profiler("train", step_num=1, _r=-1):
    train_step()
stop_profiler()

With this code, I get the following results in the memory profile tab :

Op Name	Allocation Size (GiBs)	Requested Size (GiBs)	Occurrences	Region type	Data type	Shape
sequential/conv2d/Conv2D	0.227	0.227	1		INVALID
sequential/conv2d/Conv2D	0.039	0.039	1		INVALID
sequential/conv2d/Conv2D	0.023	0.023	1	output	float	[32,3,256,256]
sequential/conv2d/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer	0.023	0.023	1	output	float	[32,3,256,256]

Do you know why I get this kind of strange results ?

ckluk-github · Answer 1 · Thu Dec 24 2020 02:08:48 GMT+0800 (China Standard Time)

Thanks for reporting and especially for the reproducer.

I think the two memory allocations with INVALID data type are real and actually consume memory. They may be caused by the implementation of Conv2D, and for some unknown reason, their (region type, data type, shape) cannot be inferred. I have filed a bug internally to further investigate.

Tianrun Li · Answer 2 · Tue Jan 12 2021 03:39:02 GMT+0800 (China Standard Time)

Hi RocaVincent,
In your code to reproduce this issue, you have
imagesA = tf.random.uniform([BATCH_SIZE]+IMAGE_SHAPE)
but imagesA is not used by any code later.

I think the invalid data type in memory breakdown table is related to this unused code. This unused code causes tf unable to infer the data type.
when I remove this line, the memory allocation table looks normal:

Op Name	Allocation Size (GiBs)	Requested Size (GiBs)	Occurrences	Region type	Data type	Shape
sequential_2/conv2d_1/Conv2D	0.125	0.105	1	temp	uint8	[113247504]
sequential_2/conv2d_1/Conv2D	0.031	0.023	1	output	float	[32,3,256,256]
sequential_2/conv2d_1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer	0.031	0.023	1	output	float	[32,3,256,256]
preallocated/unknown	0.002	0.002	1	persist/dynamic	INVALID	unknown

RocaVincent · Answer 3 · Wed Jan 13 2021 20:51:13 GMT+0800 (China Standard Time)

Hi @Terranlee
Thank you for the answer

When I remove this line, I get the exact same memory breakdown table than the first I show. In addition, the bug concerns convolutions operations which are not linked with this tensor creation. What surprises me also is that you have a uint8 data type for your first operation.

tinducvo · Answer 4 · Thu Jan 14 2021 02:23:39 GMT+0800 (China Standard Time)

Hi @Terranlee
Thank you for the answer

When I remove this line, I get the exact same memory breakdown table than the first I show. In addition, the bug concerns convolutions operations which are not linked with this tensor creation. What surprises me also is that you have a uint8 data type for your first operation.

I'm a user, not a contributor, but if you end up needing more memory, my current workaround is to explicitly limit the memory usage to a precalculated model size estimate

RocaVincent · Answer 5 · Tue Feb 16 2021 04:03:23 GMT+0800 (China Standard Time)

Hi @parkournerd
This solution doesn't increase your GPU capacity and doesn't explain the strange results given by the Profiler.