microsoft / nn-Meter

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Roadmap

mydmdm opened this issue · comments

nn-Meter is not only a latency predictor but also a critical component in the hardware-aware model design. It empowers existing NAS (neural architecture search) and other efficient model design tasks to be specialized for the target hardware platform.

There are multiple aspects will be covered in this and related repo, including:

  • latency prediction and pre-trained predictors
    • the IR converter, kernel detection tools
    • builtin kernel predictors and pre-trained weights
  • algorithm integration (mainly in NNI), the integration of latency prediction in existing NAS and compression algorithms.
  • model latency dataset, the collected latencies of thousands of model architectures. Also includes data loaders and an improved GNN predictor.

Release Plan

version 1.0-alpha

  • Date: 2021 August
  • Latency prediction
    • basic framework and utilities for latency prediction (e.g., config management, artifacts downloading, builtin predictors)
    • basic CI workflow with integrated test
    • documentation and examples
  • Algorithm integration
    • initial multi-trial NAS example

version 1.0-beta

  • Date: 2021 November
  • Algorithm integration
    • SPOS / Proxyless NAS in NNI
    • SPOS: first integrate nn-meter in the evolution search (move to 2.0)
    • Proxyless NAS: predict the block latency in the search space, provide the lookup table
  • Dataset
    • make model-latency dataset public
    • reference design of an improved GNN latency predictor

version 2.0

  • Date: 2021 November December
  • Algorithm integration
    • SPOS: first integrate nn-meter in the evolution search
  • latency predictor building tools
    • fusion rule detecton
    • adaptive data sampler

Hello,

the paper mentions methods for:

  1. detecting the fusion rules on a device
  2. adaptive sampling for creating the latency dataset

Will these be added to the repository ?
If they will be added: Do you have a rough time frame for when they will be available ?

Hello,

the paper mentions methods for:

  1. detecting the fusion rules on a device
  2. adaptive sampling for creating the latency dataset

Will these be added to the repository ?
If they will be added: Do you have a rough time frame for when they will be available ?

@gmimsgt, Hi, we plan to add the fusion rule detection and adaptive sampling algorithms. We will start after version 1.0-beta finishes.

@Lynazhang Thanks for the quick answer.

I appreciate the effort put into polishing the code base as it allowed me to get started quickly.
Especially the fusion rule detection and adaptive sampling are very interesting as I am currently trying to predict/benchmark a new device. The paper has been very helpful in this regard and I would love to try out the implementation.

If it is not an inconvenience is it possible to get the current state of the code?

Hi, I'm wondering if you would share your modification code to TFLite, which implements the GPU operator-level profiling?

Hi, I'm wondering if you would share your modification code to TFLite, which implements the GPU operator-level profiling?

Hi, @liuyibox, we will soon share a patch about the GPU operator-level profiling.