tensorflow / profiler

A profiling and performance analysis tool for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inference profiling

ghjeong12 opened this issue · comments

Hi, I would like to ask you is there any way to use profiler for inference. I was able to run for training, but it didn't work for inference (putting a callback function in predict function).

commented

Hi, thank you for your quick answer.

From the guide document you suggested, I can see the following codes for programmatic code.

tf.profiler.experimental.start('logdir')
#Train the model here
tf.profiler.experimental.stop()

Instead of train section in the comment, is it enough to put predict function there for profiling inference? Or, do you have any sample code for profiling inference? Also, could you tell me which tf version I should be using?

commented

I appreciate it, and I was able to profile the inference.

Another quick question is,
could you tell me what _FusedConv2D operation type is?

It takes 70% of my inference time (for tensorflow.keras.applications.resnet50, ResNet50), but I couldn't find any documentation explaining the operation.

Another interesting point was that I couldn't find any conv2d in operation name. Instead, BiasAdd operation (also marked as _FusedConv2D type) took the longest time. Do you think this is reasonable? I'd like to ask you where Conv2D / MatMul operation time is shown at.

commented

Hi @ckluk ,

I have a similar question. I want to profile not the inference of a model but simply a single op. How can I do that?

I basically want to do something like:

import tensorflow as tf

with tf.device('/GPU:0'):
  a = tf.constant([7] * 100000)

with tf.profiler.experimental.Profile("logs"):
  with tf.device('/device:GPU:0'):
    b = tf.math.floormod(a, 3)

This is what I try to do in this colab, where I also switch on tensorboard for visualization.

You can see that the main problem is that the function is executed eagerly and therefore no GPU use is recorded. This is kind of related to this SO question I wrote.

Basically I want to be able to compare ops between PyTorch and Tensorflow, so as to have benchmarks like this one, which currently suffer from eager execution.

I guess I could always put the op in a model on its own and use that with predict but that seems overcomplicated.

As a matter of fact, for a model not using keras layers, it seems that it doesn't even record GPU activity. See for example this colab.

In essence, I am doing the following:

import tensorflow as tf
from tensorflow.keras.models import Model

a = tf.constant([7] * 100000)
a = a[None, :]

class MyModel(Model):
  def call(self, inputs):
    with tf.device('/GPU:0'):
      return inputs + 3

model = MyModel()

with tf.profiler.experimental.Profile("logs"):
  for i in range(1):
    model.predict(a)

EDIT

Actually it's not even a question of using a keras layer or not, because if I use a lambda layer containing the op, it also doesn't work.

Actually, it's just a matter of dtype...

If I set the type of the constant a to be tf.float32, I have the GPU computations as expected without needing a model.

Maybe I can write another issue for that particular matter.

Anyway I would love to know how to profile without eager execution if possible.

EDIT

Actually for floormod even with dtype tf.float32 it doesn't work so I don't understand what's going on for this op in particular. It might be because tf.math.floormod doesn't have a GPU implementation yet (see this).

commented

Hi @ghjeong12
I was trying to do similar thing you mentioned above
I tried to add

tf.profiler.experimental.start('logdir')
#inference part
tf.profiler.experimental.stop()

but I got

2020-07-21 07:44:50.141421: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:223]  GpuTracer has collected 0 callback api events and 0 activity events.

May I ask how did you get it working?

commented

Hi, I don't see anything wrong with how you invoke the profiler. It should work. Are you sure that your inference code is using GPU? Do you see CPU activities in the profile that you collected?

Thank you so much for your reply.

I was using tensorflow:2.2.0-gpu, 2.3.0rc1-gpu, nightly-gpu docker images, but turns out they have the same result.

2020-07-21 08:57:12.514005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1
2020-07-21 08:57:12.514097: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-21 08:57:13.635530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-21 08:57:13.635584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 1 
2020-07-21 08:57:13.635594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N Y 
2020-07-21 08:57:13.635601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 1:   Y N 

I don't see any CPU as well, I got a blank page for the profile tag on the tensorboard.

The inference part that I modified is function the predict_dataset in calamari_ocr/ocr/predictor.py from Calamari OCR

 def predict_dataset(self, dataset, progress_bar=True, tf_profiler=None):
        start_time = time.time()
        with StreamingInputDataset(dataset, self.predictors[0].data_preproc, self.predictors[0].text_postproc, None,
                                   processes=self.processes,
                                   ) as input_dataset:
            def progress_bar_wrapper(l):
                if progress_bar:
                    return tqdm(l, total=int(np.ceil(len(dataset) / self.batch_size)), desc="Prediction")
                else:
                    return l

            def batched_data_params():
                batch = []
                for data_idx, (image, _, params) in enumerate(input_dataset.generator(epochs=1)):
                    batch.append((data_idx, image, params))
                    if len(batch) == self.batch_size:
                        yield batch
                        batch = []

                if len(batch) > 0:
                    yield batch

            def process_one_batch(one_batch):
                sample_ids, batch_images, batch_params = zip(*one_batch)
                batch_samples = [dataset.samples()[i] for i in sample_ids]
                batch_prediction = self.predict_raw(batch_images, params=batch_params, progress_bar=False,
                                                    apply_preproc=False)
                return batch_prediction, batch_samples

            for batch in progress_bar_wrapper(batched_data_params()):
                if tf_profiler:
                    tf.profiler.experimental.start(tf_profiler) # WHERE PROFILER START
                    prediction, samples = process_one_batch(batch)
                    tf.profiler.experimental.stop()
                else:
                    prediction, samples = process_one_batch(batch)
                for result, sample in zip(prediction, samples):
                    yield result, sample

        print("Prediction of {} models took {}s".format(len(self.predictors), time.time() - start_time))
commented

Where is your GPU located? Is it on the same machine that you launch the tensorboard?

Yes, they are in the same machine. It is an instance on GCP with 2 Tesla T4.

commented

Your usage looks fine. It would be helpful if you post the log of your model running. Any interesting log related to profiling?

Hi Qiuminxu,

Thanks for your reply. I checked again in the log while running the profiling.

Here is the complete log

For output that related to profiler and seem related, I found the following lines. There is an error called CUPTI_ERROR_INSUFFICIENT_PRIVILEGES. I checked a little bit from nvidia
Since, I am running the profiling inside a docker container. I think I could consider I have the admin privilege(please correct me If this statement is wrong). So, according to the CUDA 10.2 version, I should be able to get CUPTI_SUCCESS.

2020-07-23 10:58:42.644930: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-23 10:58:44.165413: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started.
2020-07-23 10:58:44.167708: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1391] Profiler found 2 GPUs
2020-07-23 10:58:44.197236: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcupti.so.10.1
2020-07-23 10:58:44.299258: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1441] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2020-07-23 10:58:44.299437: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:223]  GpuTracer has collected 0 callback api events and 0 activity events. 
2020-07-23 10:58:44.314346: I tensorflow/python/profiler/internal/profiler_wrapper.cc:111] Creating directory: logs/profiler2/plugins/profile/2020_07_23_10_58_44Dumped tool data for xplane.pb to logs/profiler2/plugins/profile/2020_07_23_10_58_44/f8b4d99f5406.xplane.pb
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   60C    P0    29W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
| N/A   63C    P0    19W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This line is causing error:
2020-07-23 10:58:44.299258: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1441] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

This error occurs because cupti 10.1 requires root permission to profile. You can try the possible solutions here or https://www.tensorflow.org/guide/profiler#resolve_privilege_issues

Hi Qiumin,

Thank you so much for your help. After I added the docker run option --privileged=true, the error gone.
but unfortunately I still couldn't see anything on the profile page.

2020-07-24 10:13:41.001116: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-07-24 10:13:42.340081: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started.
2020-07-24 10:13:42.340176: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1391] Profiler found 2 GPUs
2020-07-24 10:13:42.341347: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcupti.so.10.1
2020-07-24 10:13:42.723911: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1513] CUPTI activity buffer flushed
2020-07-24 10:13:42.724007: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:223]  GpuTracer has collected 0 callback api events and 0 activity events. 
2020-07-24 10:13:42.728144: I tensorflow/python/profiler/internal/profiler_wrapper.cc:111] Creating directory: logs/profiler6/plugins/profile/2020_07_24_10_13_42Dumped tool data for xplane.pb to logs/profiler6/plugins/profile/2020_07_24_10_13_42/44f687585eb8.xplane.pb

The log changed to buffer flushed, but as the following line said. GpuTracer has collected 0 callback api events and 0 activity events. I think that's the reason why I didn't do not see anything on Tensorboard.

@edwardpwtsoi did you ever get this resolved? I have similar issue with simple code

  1 import tensorflow as tf
  2
  3 def main():
  4     with tf.compat.v1.Session() as sess:
  5         with tf.device("/GPU:0"):
  6             a = tf.random.uniform((4,4))
  7             b = tf.random.uniform((4,4))
  8             c = tf.matmul(a,b)
  9
 10             tf.profiler.experimental.start("./logs")
 11             sess.run(c)
 12             tf.profiler.experimental.stop()
 13
 14 if __name__ == '__main__':
 15     main()

I get following output, which seems fine but I get nothing on tensorboard

2021-01-07 17:08:55.627734: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started.
2021-01-07 17:08:55.627795: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1391] Profiler found 4 GPUs
2021-01-07 17:08:55.628709: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcupti.so.10.1
2021-01-07 17:08:56.372003: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-01-07 17:08:56.901856: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1513] CUPTI activity buffer flushed
2021-01-07 17:08:56.902300: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:223]  GpuTracer has collected 5 callback api events and 5 activity events.
2021-01-07 17:08:56.912179: I tensorflow/core/profiler/rpc/client/save_profile.cc:176] Creating directory: ./logs/plugins/profile/2021_01_07_17_08_56
2021-01-07 17:08:56.912569: I tensorflow/core/profiler/rpc/client/save_profile.cc:182] Dumped gzipped tool data for trace.json.gz to ./logs/plugins/profile/2021_01_07_17_08_56/moxli001.trace.json.gz
2021-01-07 17:08:56.915778: I tensorflow/core/profiler/rpc/client/save_profile.cc:176] Creating directory: ./logs/plugins/profile/2021_01_07_17_08_56
2021-01-07 17:08:56.916089: I tensorflow/core/profiler/rpc/client/save_profile.cc:182] Dumped gzipped tool data for memory_profile.json.gz to ./logs/plugins/profile/2021_01_07_17_08_56/moxli001.memory_profile.json.gz
2021-01-07 17:08:56.916256: I tensorflow/python/profiler/internal/profiler_wrapper.cc:111] Creating directory: ./logs/plugins/profile/2021_01_07_17_08_56Dumped tool data for xplane.pb to ./logs/plugins/profile/2021_01_07_17_08_56/moxli001.xplane.pb
Dumped tool data for overview_page.pb to ./logs/plugins/profile/2021_01_07_17_08_56/moxli001.overview_page.pb
Dumped tool data for input_pipeline.pb to ./logs/plugins/profile/2021_01_07_17_08_56/moxli001.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to ./logs/plugins/profile/2021_01_07_17_08_56/moxli001.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to ./logs/plugins/profile/2021_01_07_17_08_56/moxli001.kernel_stats.pb

Hi @qiuminxu @ckluk I think I'm the third who met "nothing show up in profile page".

Here's my code and logs:

import tensorflow as tf
import pandas as pd
import numpy as np


mo = tf.keras.models.load_model('/home/allxu/Desktop/tensorflow/model_1')
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

x_train_2d = x_train.reshape(x_train.shape[0], x_train.shape[1]* x_train.shape[2])
x_train_2d_60k = x_train_2d


from datetime import datetime
logs = "logs/" + datetime.now().strftime("%Y%m%d-%H%M%S")

#  Straight and Naive
tf.profiler.experimental.start(logs)
out_60k = mo(x_train_2d_60k)
tf.profiler.experimental.stop()

logs

2021-04-05 23:53:30.102056: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2021-04-05 23:53:30.102078: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2021-04-05 23:53:30.102103: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2021-04-05 23:53:30.102712: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcupti.so.11.0
2021-04-05 23:53:30.403486: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-04-05 23:53:30.672671: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-04-05 23:53:30.686260: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data.
2021-04-05 23:53:30.686424: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
2021-04-05 23:53:30.839188: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:228]  GpuTracer has collected 9 callback api events and 9 activity events.
2021-04-05 23:53:30.840027: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2021-04-05 23:53:30.841110: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: logs/20210405-235328/plugins/profile/2021_04_05_23_53_30
2021-04-05 23:53:30.841501: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for trace.json.gz to logs/20210405-235328/plugins/profile/2021_04_05_23_53_30/allxu-pc.trace.json.gz
2021-04-05 23:53:30.843534: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: logs/20210405-235328/plugins/profile/2021_04_05_23_53_30
2021-04-05 23:53:30.843732: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for memory_profile.json.gz to logs/20210405-235328/plugins/profile/2021_04_05_23_53_30/allxu-pc.memory_profile.json.gz
2021-04-05 23:53:30.843874: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: logs/20210405-235328/plugins/profile/2021_04_05_23_53_30Dumped tool data for xplane.pb to logs/20210405-235328/plugins/profile/2021_04_05_23_53_30/allxu-pc.xplane.pb
Dumped tool data for overview_page.pb to logs/20210405-235328/plugins/profile/2021_04_05_23_53_30/allxu-pc.overview_page.pb
Dumped tool data for input_pipeline.pb to logs/20210405-235328/plugins/profile/2021_04_05_23_53_30/allxu-pc.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logs/20210405-235328/plugins/profile/2021_04_05_23_53_30/allxu-pc.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logs/20210405-235328/plugins/profile/2021_04_05_23_53_30/allxu-pc.kernel_stats.pb

I encountered the same issue. Are there any workarounds?

The logs show some events being collected, but there is no "profile" tab in tensorboard, it says "No dashboards are active for the current data set."

Code:

root=tf.saved_model.load(...)
infer = root.signatures['serving_default']
tf.profiler.experimental.start('/tmp/tensorboard')
infer(...)
tf.profiler.experimental.stop()

Tensorboard command:

tensorboard --logdir /tmp/tensorboard --port 6006 --bind_all

Logs:

2021-09-23 16:46:29.486077: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1513] CUPTI activity buffer flushed
2021-09-23 16:46:29.552133: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:223]  GpuTracer has collected 1245 callback api events and 1245 activity events.
2021-09-23 16:46:29.699533: I tensorflow/core/profiler/rpc/client/save_profile.cc:176] Creating directory: /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29
2021-09-23 16:46:29.748353: I tensorflow/core/profiler/rpc/client/save_profile.cc:182] Dumped gzipped tool data for trace.json.gz to /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29/nvidia-desktop.trace.json.gz
2021-09-23 16:46:29.916101: I tensorflow/core/profiler/rpc/client/save_profile.cc:176] Creating directory: /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29
2021-09-23 16:46:29.927338: I tensorflow/core/profiler/rpc/client/save_profile.cc:182] Dumped gzipped tool data for memory_profile.json.gz to /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29/nvidia-desktop.memory_profile.json.gz
2021-09-23 16:46:29.933658: I tensorflow/python/profiler/internal/profiler_wrapper.cc:111] Creating directory: /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29Dumped tool data for xplane.pb to /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29/nvidia-desktop.xplane.pb
Dumped tool data for overview_page.pb to /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29/nvidia-desktop.overview_page.pb
Dumped tool data for input_pipeline.pb to /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29/nvidia-desktop.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29/nvidia-desktop.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to /tmp/tensorboard/plugins/profile/2021_09_23_16_46_29/nvidia-desktop.kernel_stats.pb
commented

@ckluk Hi, I met with this 'No dashboards are active for the current data set.' as well.
my inference & profiling code:

tf.profiler.experimental.start('logdir')
result = session.run(output_data, feed_dict={input_t1: input_data})
tf.profiler.experimental.stop()

open tensorboard
tensorboard --logdir=./logdir

logdir.zip