torch-tb-profiler TypeError: list indices must be integers or slices, not str
RaulPPelaez opened this issue · comments
I am unable to visualize a torch profile trace in tensorboard using torch-tb-profiler. In the tensorboard dashboard for pytorch_profile the interface just hangs forever while loading the trace data. The trace data produced by this minimum example fails for me:
import torch
a = torch.ones((10,10))
with torch.profiler.profile(
schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
profile_memory=True,
record_shapes=True,
with_stack=True,
on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/logfile')
) as p:
for _ in range(100):
b = a*a
p.step()
I then start tensorboard with:
$ tensorboard --logdir=./log/logfile
After navigating to pytorch_profiler tab in chrome this error is printed to the terminal by tensorboard:
W0223 13:42:06.020374 140055204267840 loader.py:102] Failed to parse profile data for Run logfile on host_1553185. Exception=list indices must be integers or slices, not str
Traceback (most recent call last):
File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/loader.py", line 88, in _process_data
data = RunProfileData.parse(worker, span, local_file, self.caches.cache_dir)
File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/data.py", line 101, in parse
trace_path, trace_json = RunProfileData._preprocess_file(path, cache_dir)
File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/data.py", line 142, in _preprocess_file
event_list = trace_json['traceEvents']
TypeError: list indices must be integers or slices, not str
W0223 13:42:06.020555 139911474751296 loader.py:102] Failed to parse profile data for Run logfile on host_1553185. Exception=list indices must be integers or slices, not str
Traceback (most recent call last):
File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/loader.py", line 88, in _process_data
data = RunProfileData.parse(worker, span, local_file, self.caches.cache_dir)
File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/data.py", line 101, in parse
trace_path, trace_json = RunProfileData._preprocess_file(path, cache_dir)
File "/home/raul/mambaforge/envs/test/lib/python3.10/site-packages/torch_tb_profiler/profiler/data.py", line 142, in _preprocess_file
event_list = trace_json['traceEvents']
TypeError: list indices must be integers or slices, not str
The json file produced by the python code above contains:
[{"name": "aten::zeros", "ph": "X", "ts": 516, "dur": 11, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 519, "dur": 4, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zero_", "ph": "X", "ts": 527, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "ProfilerStep#1", "ph": "X", "ts": 541, "dur": 45, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 542, "dur": 1, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::mul", "ph": "X", "ts": 549, "dur": 3, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zeros", "ph": "X", "ts": 604, "dur": 3, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 605, "dur": 1, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zero_", "ph": "X", "ts": 607, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "ProfilerStep#2", "ph": "X", "ts": 614, "dur": 13, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 615, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::mul", "ph": "X", "ts": 620, "dur": 2, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zeros", "ph": "X", "ts": 637, "dur": 2, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 638, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::zero_", "ph": "X", "ts": 639, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "ProfilerStep#3", "ph": "X", "ts": 644, "dur": 9, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::empty", "ph": "X", "ts": 645, "dur": 0, "tid": 1, "pid": "CPU functions", "args": {}}, {"name": "aten::mul", "ph": "X", "ts": 649, "dur": 1, "tid": 1, "pid": "CPU functions", "args": {}}]
Relevant package versions:
tensorboard 2.12.0 pyhd8ed1ab_0 conda-forge
tensorboard-data-server 0.7.0 py310h34c0648_0 conda-forge
tensorboard-plugin-profile 2.11.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pyhd8ed1ab_0 conda-forge
pytorch 1.13.1 cuda112py310he33e0d6_200 conda-forge
torch-tb-profiler 0.4.1 pypi_0 pypi
python 3.10.9 he550d4f_0_cpython conda-forge
@RaulPPelaez , we have quite a few people running the tensorboard_trace_handler API in the TorchBench repository. @xuzhao9 also verified that its still working in TB. Do you want to give that run.py script a try? https://github.com/pytorch/benchmark/blob/main/run.py#L227
(Also, the JSON file doesn't look correct from first glance, its seems to have a list of events, rather than a dict with sub-list key "traceEvents" with value as list of events, ie ["metadata":"etc", "traceEvents":[{"name": "aten::zeros", "ph": "X", "ts": 516, "dur": 11, "tid": 1, "pid": "CPU functions", "args": {}}, {...}, ...]])
Thanks for the quick response!
I ran the suggested script with:
python run.py -d cuda --profile --profile-detailed maml
I get the same error as with my example. The JSON file looks similar to the one I got with my example, it starts with:
[{"name": "aten::zeros", "ph": "X", "ts": 1897, "dur": 30, "tid": 1, "pid": "CPU functions", "args":
By your comment I understand the issue here is that torch.profiler.tensorboard_trace_handler
is writing an ill formed JSON file, which makes this a pytorch issue unrelated to this project.
Thanks for your help.
For future reference, this might be related: pytorch/pytorch#92988
I also opened a new issue on pytorch: pytorch/pytorch#95460