"list index out of range" when using torch Conv1d
ThePhoenixCoding opened this issue · comments
For debugging purposes i built a simple model based on PyTorch Lightning:
class TCNModel(nni.retiarii.evaluator.pytorch.lightning.LightningModule):
def init(self):
super().init()
self.output = nn.Conv1d(1, 1, kernel_size=1)
def forward(self, x):
x = self.output(x)
return x
and:
def compute_model_latency_in_ms(model, batch_size, latency_platform):
predictor = load_latency_predictor(latency_platform)
latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201])
return latency
When trying to predict the latency of this with nn-meter, i get the following error:
PS C:\Users\alexa\Desktop\Code\NAS_New_Trial> python -u "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Baseline_Train.py"
Global seed set to 42
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2022-05-12 12:03:43] INFO (root/MainThread) checking local kernel predictors at C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21
[2022-05-12 12:03:43] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\add.pkl
C:\Users\alexa\Python308\lib\site-packages\sklearn\base.py:310: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.23.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
warnings.warn(
C:\Users\alexa\Python308\lib\site-packages\sklearn\base.py:310: UserWarning: Trying to unpickle estimator RandomForestRegressor from version 0.23.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
warnings.warn(
[2022-05-12 12:03:43] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\addrelu.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\avgpool.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\bn.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\bnrelu.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\channelshuffle.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\concat.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\conv-bn-relu.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\dwconv-bn-relu.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\fc.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\global-avgpool.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\hswish.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\maxpool.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\relu.pkl
[2022-05-12 12:03:45] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\se.pkl
[2022-05-12 12:03:45] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\split.pkl
[2022-05-12 12:03:45] INFO (root/MainThread) Start latency prediction ...
[2022-05-12 12:03:45] INFO (root/MainThread) Onnx-based Torch Converter is applied for model conversion
Traceback (most recent call last):
File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Baseline_Train.py", line 53, in <module>
lat = compute_model_latency_in_ms(model, args.batch_size, latency_platform)
File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Utils.py", line 37, in compute_model_latency_in_ms
latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201])
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\nn_meter_predictor.py", line 107, in predict
self.kd.load_graph(graph)
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\kernel_detector\kernel_detector.py", line 19, in load_graph
new_graph = convert_nodes(graph)
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\kernel_detector\utils\ir_tools.py", line 42, in convert_nodes
cin = node["attr"]["input_shape"][0][3]
IndexError: list index out of range
After a lot of fiddling around, I noticed that this happens with a Conv1D but not with Conv2D.
Alternatively, I could change
cin = node["attr"]["input_shape"][0][3]
to
cin = node["attr"]["input_shape"][0][2]
in nn_meter\kernel_detector\utils\ir_tools.py, but obviously I don't know how this influences the prediction itself and if that leads to weird behaviour with models that utilize Conv2D.
At this point I'd like to ask you for
a) a quick fix I can apply safely (I need nn-meter for my bachelors thesis ;) )
b) an update for future users.
Also I find it weird that both fixes (usind Conv3D and modifying ir_tools.py) lead to a prediction of 0ms latency with this simple model. Is that plausible?
Thank you very much!
EDIT:
Using a more complex model leads to even more errors regarding the Conv1D:
class TCNModel(pl.LightningModule):
def __init__(self,
ninputs=1,
noutputs=1,
kernel_size=13,
dilation_growth=10,
channel_growth=1,
channel_width=32,
stack_size=10,
grouped=False,
causal=True,
lr = 5e-3,
train_loss = "l1+stft", # 'stft' or 'l1+stft' or 'l1'
save_dir = "UnknownEffect",
num_examples = 5):
super().__init__()
self.save_hyperparameters()
out1_ch = ninputs * channel_width * channel_growth
out2_ch = out1_ch * channel_growth
out3_ch = out2_ch * channel_growth
out4_ch = out3_ch * channel_growth
dilation1 = 1
dilation2 = dilation_growth ** (1 % stack_size)
dilation3 = dilation_growth ** (2 % stack_size)
dilation4 = dilation_growth ** (3 % stack_size)
self.block1 = TCNBlock(ninputs, out1_ch, kernel_size, dilation1, causal, grouped)
self.block2 = TCNBlock(out1_ch, out2_ch, kernel_size, dilation2, causal, grouped)
self.block3 = TCNBlock(out2_ch, out3_ch, kernel_size, dilation3, causal, grouped)
self.block4 = TCNBlock(out3_ch, out4_ch, kernel_size, dilation4, causal, grouped)
self.output = nn.Conv1d(out4_ch, noutputs, kernel_size=1)
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
x = self.block4(x)
x = self.output(x)
return x
class TCNBlock(nn.Module):
def __init__(self,
in_ch,
out_ch,
kernel_size=3,
dilation=1,
grouped=False,
causal=True):
super().__init__()
self.in_ch = in_ch
self.out_ch = out_ch
self.kernel_size = kernel_size
self.dilation = dilation
self.grouped = grouped
self.causal = causal
self.conv1 = nn.Conv1d(in_ch, out_ch, kernel_size=kernel_size, dilation=dilation)
# self.bn = nn.BatchNorm1d(out_ch)
# self.relu = nn.PReLU(out_ch)
# self.res = nn.Conv1d(in_ch, out_ch, kernel_size=1, groups=in_ch)
def forward(self, x):
x = self.conv1(x)
return x
leads to:
[2022-05-12 13:14:00] INFO (root/MainThread) Start latency prediction ...
[2022-05-12 13:14:00] INFO (root/MainThread) Onnx-based Torch Converter is applied for model conversion
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#0', 'input_tensors': [[4, 1, 88201]], 'ks': [13], 'strides': [1], 'cin': 88201, 'cout': 88189, 'inbounds': [], 'outbounds': ['conv#1']}
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#1', 'input_tensors': [[4, 32, 88189]], 'ks': [13], 'strides': [1], 'cin': 88189, 'cout': 88069, 'inbounds': ['conv#0'], 'outbounds': ['conv#2']}
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#2', 'input_tensors': [[4, 32, 88069]], 'ks': [13], 'strides': [1], 'cin': 88069, 'cout': 86869, 'inbounds': ['conv#1'], 'outbounds': ['conv#3']}
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#3', 'input_tensors': [[4, 32, 86869]], 'ks': [13], 'strides': [1], 'cin': 86869, 'cout': 74869, 'inbounds': ['conv#2'], 'outbounds': ['conv#4']}
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#4', 'input_tensors': [[4, 32, 74869]], 'ks': [1], 'strides': [1], 'cin': 74869, 'cout': 74869, 'inbounds': ['conv#3'], 'outbounds': []}
Traceback (most recent call last):
File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Baseline_Train.py", line 53, in <module>
lat = compute_model_latency_in_ms(model, args.batch_size, latency_platform)
File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Utils.py", line 37, in compute_model_latency_in_ms
latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201])
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\nn_meter_predictor.py", line 109, in predict
py = nn_predict(self.kernel_predictors, self.kd.kernels) # in unit of ms
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\predict_by_kernel.py", line 53, in nn_predict
features = get_predict_features(kernel_units)
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\extract_feature.py", line 49, in get_predict_features
ks = item["ks"][1]
IndexError: list index out of range
and after modifying extract_feature.py line 49
to ks = item["ks"][-1]
I get
[2022-05-12 13:17:00] INFO (root/MainThread) Start latency prediction ...
[2022-05-12 13:17:00] INFO (root/MainThread) Onnx-based Torch Converter is applied for model conversion
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#0', 'input_tensors': [[4, 1, 88201]], 'ks': [13], 'strides': [1], 'cin': 88201, 'cout': 88189, 'inbounds': [], 'outbounds': ['conv#1']}
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#1', 'input_tensors': [[4, 32, 88189]], 'ks': [13], 'strides': [1], 'cin': 88189, 'cout': 88069, 'inbounds': ['conv#0'], 'outbounds': ['conv#2']}
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#2', 'input_tensors': [[4, 32, 88069]], 'ks': [13], 'strides': [1], 'cin': 88069, 'cout': 86869, 'inbounds': ['conv#1'], 'outbounds': ['conv#3']}
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#3', 'input_tensors': [[4, 32, 86869]], 'ks': [13], 'strides': [1], 'cin': 86869, 'cout': 74869, 'inbounds': ['conv#2'], 'outbounds': ['conv#4']}
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#4', 'input_tensors': [[4, 32, 74869]], 'ks': [1], 'strides': [1], 'cin': 74869, 'cout': 74869, 'inbounds': ['conv#3'], 'outbounds': []}
Traceback (most recent call last):
File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Baseline_Train.py", line 53, in <module>
lat = compute_model_latency_in_ms(model, args.batch_size, latency_platform)
File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Utils.py", line 37, in compute_model_latency_in_ms
latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201])
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\nn_meter_predictor.py", line 109, in predict
py = nn_predict(self.kernel_predictors, self.kd.kernels) # in unit of ms
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\predict_by_kernel.py", line 53, in nn_predict
features = get_predict_features(kernel_units)
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\extract_feature.py", line 50, in get_predict_features
s = item["strides"][1] if "strides" in item else 1
IndexError: list index out of range
and after modifying extract_feature.py line 50 to
strides = item["strides"][-1]
I get
[2022-05-12 13:17:14] INFO (root/MainThread) Start latency prediction ...
[2022-05-12 13:17:14] INFO (root/MainThread) Onnx-based Torch Converter is applied for model conversion
[2022-05-12 13:17:15] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#0', 'input_tensors': [[4, 1, 88201]], 'ks': [13], 'strides': [1], 'cin': 88201, 'cout': 88189, 'inbounds': [], 'outbounds': ['conv#1']}
[2022-05-12 13:17:15] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#1', 'input_tensors': [[4, 32, 88189]], 'ks': [13], 'strides': [1], 'cin': 88189, 'cout': 88069, 'inbounds': ['conv#0'], 'outbounds': ['conv#2']}
[2022-05-12 13:17:15] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#2', 'input_tensors': [[4, 32, 88069]], 'ks': [13], 'strides': [1], 'cin': 88069, 'cout': 86869, 'inbounds': ['conv#1'], 'outbounds': ['conv#3']}
[2022-05-12 13:17:15] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#3', 'input_tensors': [[4, 32, 86869]], 'ks': [13], 'strides': [1], 'cin': 86869, 'cout': 74869, 'inbounds': ['conv#2'], 'outbounds': ['con File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Utils.py", line 37, in compute_model_latency_in_ms
latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201])
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\nn_meter_predictor.py", line 109, in predict
py = nn_predict(self.kernel_predictors, self.kd.kernels) # in unit of ms
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\predict_by_kernel.py", line 53, in nn_predict
features = get_predict_features(kernel_units)
File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\extract_feature.py", line 51, in get_predict_features
inputh = item["inputh"]
KeyError: 'inputh'
and so on.
Is it safe to assume that 1D Convolutions are not supported at this point?
Hi, thanks for raising the issue. Conv1D is not supported in nn-meter, so that: 1. the shape parser will go wrong when meeting Conv1D layers, and 2. there is no feasible latency predictor for this op. Therefore, even if you edit ir_tools.py
and run through the overall process, nn-meter will recognize all Conv1D layer to Conv2D and return wrong latency value. If the majority operators of the network are Conv1D, maybe nn-meter is not suitable in your case.
BTW, if nn-meter is essential in your essay, could you let me know the deadline of your essay? We will discuss if we have time to fix this problem and add Conv1D op in nn-meter project.
Thanks for the quick reply!
In fact my model only uses Conv1D operators (besides activation/normalization).
Using nn-meter would have been a nice choice for my work but it's not neccessary to get the latency predictions with this specific package. After realizing that Conv1D is not supported at all by now (thanks again for confirming!) I implemented the prediction with the Torch Profiler.
If you choose to implement support for Conv1D anyway, there's no need to hurry :)