On demand profiling example / code changes

Question

On demand profiling example / code changes

shradhasehgal opened this issue a year ago · comments

Hi, is there an example for how we can enable on demand profiling with kineto?
The libkineto README mentions that we can send a 'signal' or 'trigger' on demand profiling, but I am unclear on how we can do so from outside the PyTorch script.

Would highly appreciate if somebody could provide an example or point me to the relevant APIs / source files. Thank you!!

Shradha Sehgal · Answer 1 · Thu Jun 15 2023 12:15:25 GMT+0800 (China Standard Time)

I also tried using the third party library Dynolog but ran into dependency issues during installation.
Since it uses Kineto under the hood, I was wondering if there was a way to implement on demand profiling with Kineto directly, rather than using the Dynolog library on top of it.

Brian Coutinho · Answer 2 · Fri Jun 16 2023 06:50:30 GMT+0800 (China Standard Time)

@shradhasehgal Yes, the README needs a lot more details and updates.
To enable the profiling via a signal you first need to set the config for it.
When kineto starts up it will read this file.

echo "ENABLE_SIGUSR2=YES" >  /etc/libkineto.conf

When you run your application set the env variable export KINETO_USE_DAEMON=1
Then while running the application you can send it a sigusr2

kill -USR2 <pid>

This should dump the trace in /tmp/libkineto.<>_.json.
PS: you can pass additional config options for your on-demand run by populating /tmp/libkineto.conf
Please let us know if this works.

Brian Coutinho · Answer 3 · Fri Jun 16 2023 06:52:31 GMT+0800 (China Standard Time)

@shradhasehgal Could you also explain the issue you saw with dynolog?
The flow for collecting traces is a lot more simpler with dynolog (assuming you are using a linux based OS)
https://github.com/facebookincubator/dynolog/blob/main/docs/pytorch_profiler.md#triggering-an-on-demand-trace

cc @anupambhatnagar

You don't need to setup all these files etc, and it will also allow you to configure the trace duration, which process to target etc.

Shradha Sehgal · Answer 4 · Tue Jun 20 2023 13:07:14 GMT+0800 (China Standard Time)

@shradhasehgal Could you also explain the issue you saw with dynolog? The flow for collecting traces is a lot more simpler with dynolog (assuming you are using a linux based OS) https://github.com/facebookincubator/dynolog/blob/main/docs/pytorch_profiler.md#triggering-an-on-demand-trace

cc @anupambhatnagar

You don't need to setup all these files etc, and it will also allow you to configure the trace duration, which process to target etc.

Hi @briancoutinho thanks for your reply! I tried building Dynolog from source. Although I am able to start the Dynolog server, I am not able to capture the traces.

Here's what all I tried:
I explicitly set the environment variable export KINETO_USE_DAEMON=1and then ran the python script. I also tried running ./build/bin/dyno gputrace --pids <pid> but I get the 'no processes matched' error.

Shradha Sehgal · Answer 5 · Tue Jun 20 2023 13:23:00 GMT+0800 (China Standard Time)

@shradhasehgal Yes, the README needs a lot more details and updates. To enable the profiling via a signal you first need to set the config for it. When kineto starts up it will read this file.
echo "ENABLE_SIGUSR2=YES" >  /etc/libkineto.conf
When you run your application set the env variable export KINETO_USE_DAEMON=1 Then while running the application you can send it a sigusr2
kill -USR2 <pid>
This should dump the trace in /tmp/libkineto.<>_.json. PS: you can pass additional config options for your on-demand run by populating /tmp/libkineto.conf Please let us know if this works.

Hey Brian, thanks for your answer. I wish to enable on-demand profiling such that I can trigger the 'start' and 'stop' of the profiler from outside a Python job.

Along with the set up of the files you mentioned above, should I declare a sigusr2_handler in the python script I wish to profile? So we would initialize the profiler in the handler the first time we receive a sigusr2 signal and stop the profiler the next time we receive it - is my understanding correct?

Brian Coutinho · Answer 6 · Wed Jun 21 2023 10:51:31 GMT+0800 (China Standard Time)

Hi @shradhasehgal,
On dynolog:

I explicitly set the environment variable export KINETO_USE_DAEMON=1and then ran the python script. I also tried running ./build/bin/dyno gputrace --pids but I get the 'no processes matched' error.

When you run with the environment variable set do you see this line? This does require using a version of PyTorch starting from 2.0 release.
Registering daemon config loader

If so, running the command ./build/bin/dyno gputrace in another terminal/shell should be able to collect the gpu trace for you. Also, assuming that dynolog server is already running, for example via systemctl.
Let me know if you have any questions.

PS: You can always download the binaries from dynolog release itself instead of building from source
= https://github.com/facebookincubator/dynolog/tree/main#installation
= https://github.com/facebookincubator/dynolog/tree/main#no-sudo-access

Brian Coutinho · Answer 7 · Wed Jun 21 2023 10:54:52 GMT+0800 (China Standard Time)

On your question using sig-usr2 approach (hoping you are able to get dynolog to work :))

Along with the set up of the files you mentioned above, should I declare a sigusr2_handler in the python script I wish to profile?

You need not setup a sigusr2 handler, Kineto will do that when it sees ENABLE_SIGUSR2=YES in the config file

So we would initialize the profiler in the handler the first time we receive a sigusr2 signal and stop the profiler the next time we receive it - is my understanding correct?

Yes, that is correct. Kineto/PyTorch will start the trace when you send it a sigusr2. Please let me know if either approach above works.

Shradha Sehgal · Answer 8 · Wed Jun 21 2023 13:19:52 GMT+0800 (China Standard Time)

Hi @shradhasehgal, On dynolog:

I explicitly set the environment variable export KINETO_USE_DAEMON=1and then ran the python script. I also tried running ./build/bin/dyno gputrace --pids but I get the 'no processes matched' error.

When you run with the environment variable set do you see this line? This does require using a version of PyTorch starting from 2.0 release. Registering daemon config loader

If so, running the command ./build/bin/dyno gputrace in another terminal/shell should be able to collect the gpu trace for you. Also, assuming that dynolog server is already running, for example via systemctl. Let me know if you have any questions.

PS: You can always download the binaries from dynolog release itself instead of building from source = https://github.com/facebookincubator/dynolog/tree/main#installation = https://github.com/facebookincubator/dynolog/tree/main#no-sudo-access

Hey Brian, thank you for your reply! You're right, I was previously on PyTorch version 1.13 and porting to 2.0 made it work.

However, due to the application that I am building, I do require torch version < 2.0.
Would there be any workarounds to make Dynolog work in such a case? Wondering what functionality of torch 2.0 enabled the registration of the daemon config loader.

Shradha Sehgal · Answer 9 · Wed Jun 21 2023 13:32:01 GMT+0800 (China Standard Time)

Hi @shradhasehgal, On dynolog:

I explicitly set the environment variable export KINETO_USE_DAEMON=1and then ran the python script. I also tried running ./build/bin/dyno gputrace --pids but I get the 'no processes matched' error.

When you run with the environment variable set do you see this line? This does require using a version of PyTorch starting from 2.0 release. Registering daemon config loader
If so, running the command ./build/bin/dyno gputrace in another terminal/shell should be able to collect the gpu trace for you. Also, assuming that dynolog server is already running, for example via systemctl. Let me know if you have any questions.
PS: You can always download the binaries from dynolog release itself instead of building from source = https://github.com/facebookincubator/dynolog/tree/main#installation = https://github.com/facebookincubator/dynolog/tree/main#no-sudo-access

Hey Brian, thank you for your reply! You're right, I was previously on PyTorch version 1.13 and porting to 2.0 made it work.

However, due to the application that I am building, I do require torch version < 2.0. Would there be any workarounds to make Dynolog work in such a case? Wondering what functionality of torch 2.0 enabled the registration of the daemon config loader.

Does this method also require torch version > 2.0?
On torch 1.13, if I send a SIGUSR2 signal to the Pytorch job, it kills the job and outputs 'User defined signal 2'.

Would there be any other way to enable on-demand profiling using an older torch version - I was thinking we could set up a sigusr2 handler that toggles the state of the torch profiler (calling profiler.start() and profiler.stop() accordingly.

Brian Coutinho · Answer 10 · Thu Jun 22 2023 01:35:49 GMT+0800 (China Standard Time)

@shradhasehgal

Would there be any workarounds to make Dynolog work in such a case? Wondering what functionality of torch 2.0 enabled the registration of the daemon config loader.

oh it has nothing to do with 2.0, unfortunately that is when our commits landed to PyTorch so using versions above that make it easier to run this flow - both dynolog and sigusr2.
pytorch/pytorch@f4b804e
The reason we need these commits are to initialize the kineto library, which happens lazily otherwise.

Here is one trick you could try. Somewhere during the start of your PyTorch program just call the profiler. See this example code - https://github.com/facebookresearch/param/blob/bbd06456832b188777ca1d91cfe0bad751f93fdc/train/compute/python/pytorch/run_benchmark.py#L284-L290
Now kineto should be initialized.

You can then try sending SIGUSR2 to the program as discussed above.

Shradha Sehgal · Answer 11 · Fri Jun 23 2023 07:16:55 GMT+0800 (China Standard Time)

Hi @briancoutinho, I ran the code with the function you shared above. However, when I send the SIGUSR2 signal, it leads to the error "Failed to parse config: Invalid PROFILE_START_TIME: 00:00:00 - start time is more than 10s in the past ; line: PROFILE_START_TIME=0" (pic attached below) and does not generate traces.

I ensured that I let the script run for sufficient time and then sent the SIGUSR2 signal, so Kineto had sufficient time for initialization.

Here is the code I ran in case needed:

import time

import torch
import torch.profiler

import os

with torch.autograd.profiler.profile(
    enabled=True,
    use_cuda=True,
    use_kineto=True,
) as _:
    print("Running dummy profiler warmup for CUPTI.")

print(os.getpid())

dtype = torch.float
device = torch.device("cuda:0")  # Uncomment this to run on GPU

x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

p = torch.tensor([1, 2, 3], device=device)
xx = x.unsqueeze(-1).pow(p)
print(xx.device)
model = torch.nn.Sequential(torch.nn.Linear(3, 1).to(device), torch.nn.Flatten(0, 1))
loss_fn = torch.nn.MSELoss(reduction="sum")

learning_rate = 1e-6
for t in range(20000000):
    y_pred = model(xx)
    loss = loss_fn(y_pred, y)
    if t % 10000 == 99:
        time.sleep(200000)

    model.zero_grad()
    loss.backward()
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad
linear_layer = model[0]

Brian Coutinho · Answer 12 · Tue Jun 27 2023 00:55:09 GMT+0800 (China Standard Time)

@shradhasehgal I could not reproduce the error using your test code on the trunk i.e. I could collect traces using sigusr2.
Could you try this out - you can add options for the signal based trace in /tmp/libkineto.conf. Add this in /tmp/libkineto.conf.

PROFILE_START_TIME=0
ACTIVITIES_DURATION_MSECS=1000

You can change the trace duration if you like, just set it to 1 second for now. Profile start time = 0 will make kineto automatically fill the start time to something reasonable. Please let us know if this works

I'll take a stab at using torch.1.13 release to see if the issue pops up there. Just to confirm this is version you are using?
https://github.com/pytorch/pytorch/releases/tag/v1.13.1 or https://github.com/pytorch/pytorch/releases/tag/v1.13.0
I found a bugfix in March 2022 in this area #554 , but this should be included in torch.1.13.

Andrew Aikawa · Answer 13 · Tue Jul 04 2023 17:33:15 GMT+0800 (China Standard Time)

@briancoutinho I'm having trouble enabling stack, memory, and module tracing through the on demand config file in /tmp/libkineto.conf

# /tmp/libkineto.conf
PROFILE_REPORT_INPUT_SHAPES=true
PROFILE_PROFILE_MEMORY=true
PROFILE_WITH_STACK=true
PROFILE_WITH_FLOPS=true
PROFILE_WITH_MODULES=true

I run the scripts/pytorch/linear_model_example.py from dynolog after doing export KINETO_USE_DAEMON=1 which gives me Registering daemon config loader

Running in a separate terminal on the same machine:

dyno gputrace --log-file /tmp/libkineto_trace.json

which gives me

Kineto config = 
ACTIVITIES_LOG_FILE=/tmp/libkineto_trace.json\nPROFILE_START_TIME=0\nACTIVITIES_DURATION_MSECS=500
response length = 143
response = {"activityProfilersBusy":0,"activityProfilersTriggered":[19837],"eventProfilersBusy":0,"eventProfilersTriggered":[],"processesMatched":[19837]}
Matched 1 processes
Trace output files will be written to:
    /tmp/libkineto_trace_19837.json

but the output JSON contains no shape, memory, or stack information.

Shradha Sehgal · Answer 14 · Tue Jul 11 2023 08:19:31 GMT+0800 (China Standard Time)

@shradhasehgal I could not reproduce the error using your test code on the trunk i.e. I could collect traces using sigusr2. Could you try this out - you can add options for the signal based trace in /tmp/libkineto.conf. Add this in /tmp/libkineto.conf.
PROFILE_START_TIME=0
ACTIVITIES_DURATION_MSECS=1000
You can change the trace duration if you like, just set it to 1 second for now. Profile start time = 0 will make kineto automatically fill the start time to something reasonable. Please let us know if this works

I'll take a stab at using torch.1.13 release to see if the issue pops up there. Just to confirm this is version you are using? https://github.com/pytorch/pytorch/releases/tag/v1.13.1 or https://github.com/pytorch/pytorch/releases/tag/v1.13.0 I found a bugfix in March 2022 in this area #554 , but this should be included in torch.1.13.

Hi @briancoutinho I am using torch 1.12 https://pytorch.org/blog/pytorch-1.12-released/

Is the on demand tracing feature not available with the same?

Brian Coutinho · Answer 15 · Wed Jul 12 2023 09:52:41 GMT+0800 (China Standard Time)

@shradhasehgal yes, please try torch 1.13, you are likely hitting the bug and it is fixed in 1.13