makerlin1/MMT

Tools for quickly building operator latency tables and for accurately predicting model latency (based on Pytorch and MNN)

中文版

1.Installation

MMT is used in both server-side and inference-side situations:

on the server side, the operator list is generated according to the specified operator space; the delay of a given model is predicted according to the operator delay table.
On the inference side, test the operator delay according to the operator list to obtain the operator latency table.

The server side must install Pytorch and MNN(C++) at the same time, and the inference side must install MNN(C++)

Note: Be sure to add the build folder generated by compiling MNN to the environment variable!

After configuring the above dependencies, install MMT

pip install mmn-meter

2.Start

2.1 Modify your models

For your custom model(layer), please override repr() with unique representation of the parameters, for example:

    def __init__(self, ...)
     self.name = "ResNetBasicBlock-%d-%d-%d-%d-" % (in_channels, out_channels, stride, kernel)
    ...
    def __repr__(self):
        return self.name

If the results returned by __repr__() cannot be differentiated for the same type of operator input with different parameters, it is very easy to cause running errors or measurement errors!

See how to modify your model

2.2 Export the operators

After the mmt=2.x version, both description file generation and function generation are supported.

2.2.1 Method 1: Write an operator description file

The parameters that determine the specific delay of an operator include (operator type, operator instantiation parameters, input shape). The specific operator space needs to be expressed in the following way:

resnet18:
    ResNetBasicBlock:
        in_channels: [64, 128, 256, 512]
        out_channels: [64, 128, 256, 512]
        stride: [1]
        kernel: [3, 5, 7]
        input_shape: [[1, 64, 112, 112], [1, 128, 56, 56], [1, 256, 28, 28], [1, 512, 14, 14]]

torch.nn:
    Conv2d:
        in_channels: [3]
        out_channels: [64]
        kernel_size: [7]
        stride: [2]
        padding: [3]
        input_shape: [[1, 3, 224, 224]]

    BatchNorm2d:
        num_features: [64]
        input_shape: [[1, 64, 112, 112]]

    ReLU:
        no_params: true
        input_shape: [[1, 64, 112, 112]]

Refer to how to describe your operator

Then use the following command to create a list of operators and export the operators to mnn format.

from mmt.converter import generate_ops_list

generate_ops_list("ops.yaml", "/path/ops_folder")

ops.yaml is the operator description file, pathops_folder is the directory where the operator is saved, and the corresponding meta.pkl will be generated in this directory to save the metadata information of the operator.

2.2.1 Method 2: Functional Generation

Highly similar to Mode 1, it is directly registered and generated by using the mmt.register function, and supports multiple registration of operators of the same type to reduce redundant operators caused by unnecessary combinations (the disadvantage of Mode 1), for example

from mmt.converter import register
import torch.nn as nn
fp = "./mbv3_ops"
reg = lambda ops, **kwargs: register(ops, fp, **kwargs)
reg(nn.Linear,
    in_features=[576, 1024],
    out_features=[1024, 1000],
    bias=[True],
    input_shape=[[1, 576], [1, 1024]],
    )

Method 1:

torch.nn:
    Linear:
        in_features: [576, 1024]
        out_features: [1024, 1000]
        bias: [True]
        input_shape: [[1, 576], [1, 1024]]

The corresponding operator can be generated by directly running the written file. For more details, please refer to Example

mmt supports training a machine learning model with a small number of model real and predicted delays as training samples for more accurate predictors.

Here we use the function export_models to save model and convert it to the `mnn' format. You could create models with different configs, and save these models in one folder.

from mmt.converter import export_models
for i in range(16):
    cfg_ = generate_cfg(cfgs)  # generate config
    net = MobileNetV3(cfg_, id=i, mode="small")  # create a model
    export_models(net, [1, 3, 224, 224], "mbv3")
    # save this model at "./mbv3" and convert it to .mnn

2.3 Record operator delays on the deployment side, and build an operator latency table

from mmt.meter import meter_ops

meter_ops("./ops", times=100)

ops is the folder where the operator and meta.pkl are saved, times represents the number of repeated tests, run the modified program, the delay of the operator will be calculated, and the operator latency table will be saved as .ops/meta_latency.pkl . This file specifically records the metadata and corresponding latency of all operators.

Similarly, you can directly test the model in the folder and record the delay.

from mmt.meter import meter_models
meter_models("mbv3")

2.4 Predicting model latency on the server side

from mmt.parser import predict_latency

...
model = ResNet18()
pred_latency = predict_latency(model, path, [1, 3, 224, 224], verbose=False)

path is the path corresponding to meta_latency.pkl. Note that the shape of the input tensor must be the same as the input_shape set in the operator description.

Build more accurate predictors with machine learning models：

from mmt.converter import validation
from mmt.predictor import latency_predictor
validation("mbv3", "mbv3_ops/meta_latency.pkl", save_path="train_error.csv")
lp = latency_predictor("mbv3_ops", "train_error.csv")
pred_latency = lp(model, path, [1, 3, 224, 224])

3 Test the prediction error of MMT

Specific reference MobileNetV3 test

Model	Num	err(%)	device
MobileNet	334	4.1%	40 Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
Models for UNSWNB-15	414	15%	openmesh
Models for UNSWNB-15	414	11%	h3c
Models for UNSWNB-15	414	8%	edgecore

makerlin1 / MMT