pytorch / kineto

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

torch-tb-profiler TypeError: list indices must be integers or slices, not str

RaulPPelaez opened this issue · comments

commented

I am unable to visualize a torch profile trace in tensorboard using torch-tb-profiler. In the tensorboard dashboard for pytorch_profile the interface just hangs forever while loading the trace data. The trace data produced by this minimum example fails for me:

    import torch
    a = torch.ones((10,10))
    with torch.profiler.profile(
            schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
            profile_memory=True,
            record_shapes=True,
            with_stack=True,
            on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/logfile')
    ) as p:
        for _ in range(100):
            b = a*a
            p.step()

I then start tensorboard with:

$ tensorboard --logdir=./log/logfile

After navigating to pytorch_profiler tab in chrome this error is printed to the terminal by tensorboard:

W0223 13:42:06.020374 140055204267840 loader.py:102] Failed to parse profile data for Run logfile on host_1553185. Exception=list indices must be integers or slices, not str
Traceback (most recent call last):
  File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/loader.py", line 88, in _process_data
    data = RunProfileData.parse(worker, span, local_file, self.caches.cache_dir)
  File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/data.py", line 101, in parse
    trace_path, trace_json = RunProfileData._preprocess_file(path, cache_dir)
  File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/data.py", line 142, in _preprocess_file
    event_list = trace_json['traceEvents']
TypeError: list indices must be integers or slices, not str
W0223 13:42:06.020555 139911474751296 loader.py:102] Failed to parse profile data for Run logfile on host_1553185. Exception=list indices must be integers or slices, not str
Traceback (most recent call last):
  File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/loader.py", line 88, in _process_data
    data = RunProfileData.parse(worker, span, local_file, self.caches.cache_dir)
  File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/data.py", line 101, in parse
    trace_path, trace_json = RunProfileData._preprocess_file(path, cache_dir)
  File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/data.py", line 142, in _preprocess_file
    event_list = trace_json['traceEvents']
TypeError: list indices must be integers or slices, not str

The json file produced by the python code above contains:

[{"name": "aten::zeros", "ph": "X", "ts": 516, "dur": 11, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 519, "dur": 4, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zero_", "ph": "X", "ts": 527, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "ProfilerStep#1", "ph": "X", "ts": 541, "dur": 45, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 542, "dur": 1, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::mul", "ph": "X", "ts": 549, "dur": 3, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zeros", "ph": "X", "ts": 604, "dur": 3, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 605, "dur": 1, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zero_", "ph": "X", "ts": 607, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "ProfilerStep#2", "ph": "X", "ts": 614, "dur": 13, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 615, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::mul", "ph": "X", "ts": 620, "dur": 2, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zeros", "ph": "X", "ts": 637, "dur": 2, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 638, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zero_", "ph": "X", "ts": 639, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "ProfilerStep#3", "ph": "X", "ts": 644, "dur": 9, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 645, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::mul", "ph": "X", "ts": 649, "dur": 1, "tid": 1, "pid": "CPU functions", "args": {}}]

Relevant package versions:

tensorboard               2.12.0             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.7.0           py310h34c0648_0    conda-forge
tensorboard-plugin-profile 2.11.1                   pypi_0    pypi
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
pytorch                   1.13.1          cuda112py310he33e0d6_200    conda-forge
torch-tb-profiler         0.4.1                    pypi_0    pypi
python                    3.10.9          he550d4f_0_cpython    conda-forge

@RaulPPelaez , we have quite a few people running the tensorboard_trace_handler API in the TorchBench repository. @xuzhao9 also verified that its still working in TB. Do you want to give that run.py script a try? https://github.com/pytorch/benchmark/blob/main/run.py#L227

(Also, the JSON file doesn't look correct from first glance, its seems to have a list of events, rather than a dict with sub-list key "traceEvents" with value as list of events, ie ["metadata":"etc", "traceEvents":[{"name": "aten::zeros", "ph": "X", "ts": 516, "dur": 11, "tid": 1, "pid": "CPU functions", "args": {}}, {...}, ...]])

commented

Thanks for the quick response!
I ran the suggested script with:

python run.py -d cuda --profile --profile-detailed  maml

I get the same error as with my example. The JSON file looks similar to the one I got with my example, it starts with:

[{"name": "aten::zeros", "ph": "X", "ts": 1897, "dur": 30, "tid": 1, "pid": "CPU functions", "args":

By your comment I understand the issue here is that torch.profiler.tensorboard_trace_handler is writing an ill formed JSON file, which makes this a pytorch issue unrelated to this project.
Thanks for your help.
For future reference, this might be related: pytorch/pytorch#92988
I also opened a new issue on pytorch: pytorch/pytorch#95460