JK-97 / ai-serving

A tensorflow-serving proxy background service。

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AI-Serving

[Attention] This repo's git ignored all *.c file

[Attention] In some system, protobuf will fallback to pure python implementation, which is much slower

  • export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp can help to solve this problem
  • Please verify that: run.py:L6 sets to a correct protocbuf implementation

[Attention] Currently only verified on python3 environment

Introduction

  • AIServing OnBoard is designed to:

    • provide an unified Deep Learning (DL) detection interface
    • a delegate agent of AIServing Cloud, under a distributed deployment
  • Currently supported DL framework:

    • Tensorflow, [ ] Tensorflow Serving, [ ] Tensorflow-Lite
    • PyTorch
    • RKNN
    • MXNet

Components

  • src stores all source files
  • build_release.py and setup.py is used to build binary
  • CHANGELOG is used to keep fixing, changing, adding features log
  • Dockerfile is used to build containerize image
  • ld.conf is used to link dynamic library inside container building

Installation

  • Requirements (OutDated)
    • Depends on the backend you are using, install all
    • Ex1. Tensorflow on Jetson GPU
apt-get -y install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev
pip3 install -U numpy grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta
pip3 install tensorflow-estimator==1.13.0 tensorboard==1.13.0
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.3
  • Build for Release (This helps to protect source code)

    • python build_release.py (in some environment, it needs python3 build_release.py)
    • after above, all release binaries are stored in release-pack
    • ATTENTION: We noticed that generated binaries are highly related to the version of python that executed building command above, hence always use the same version of python to build or execute
  • Deployment

    • Copy the whole release-pack folder to the path you want
  • Deployment by Docker

    • TODO

Concept

  • AIServing Model (JSM)

    • JSM is a folder that contains all information
  • AIServing has the following concepts

    • For each inference, all data will pass through workflow shown as blow
      • Image -> Pre-DataProcess -> DL Backend -> Post-DataProcess
      • Here, Pre/Post-DataProcess are implemented by the model provider
    • Pre-DataProcess
      • def pre_dataprocess(infer_data):
      • infer_data comes from HTTP Restful request (/api/v1alpha/detect)
      • expect to return dictionary object which includes a data frame needed by DL framework
    • Post-DataProcess
      • def post_dataprocess(pre_p, prediction, classes):
      • pre_p is Pre-DataProcess result, prediction is DL inference result
      • classes is an array contains all classes supported by this DL model
      • expect to return dictionary object which includes a data frame that you want to return to actual users

Usage

  • Config settings.py

    • set storage: path of a directory which stores all JSM bundle
    • set preheat: path of an image which is used to preheat nerual network
    • set redis.host: redis server host
    • set redis.port: redis server port, by default it is "6379"
    • other configurations need to check the specific list of each backend
  • Prepare JSM

    • Copy JSM bundle to the storage path you set in settings.py
    • See more details in Serving Bundle section
  • Run (w/ options) (Not Available after gRPC API checked-in)

    • format: python3 run.py [options]
    • --port, run on a specific port, by default is 8080
    • --debug, enable debug mode, print much more debug messages
    • --profile, enable profile mode, used to profile functions
  • Call gRPC API

    • Please check .proto files stored in src/serving/interface
  • Call HTTP Restful API (Deprecated)

    • API version (prefix): /api/v1alpha
    • POST /api/v1alpha/detect: block, use to inference an image
{
  path: "/absolute/path/of/image"
}
* `Response`
{
  status: "success" or "failed"     # status of inference
  result:                           # inference result, depens on models
  message: "inference message"
}
  • POST /api/v1alpha/switch: non-block, use to load or switch a model
{
  bid:     "backend id"             # backend id, a new backend will be created if not given
  btype:   "backend type"           # backend type, must be given if a new backend need to create
  model:   "model name"             # model's folder name under collection_path
  mode:    "model type"             # indicate which type of model is going to load
  device:  "device name"            # assign work load to a specific device if available
}
* `Response`
{
  status: "success"                 # this API will always returns success
}
  • GET /api/v1alpha/switch: block, used to get loading status
    • Response
{
  model:  "model name"             # the loading model's name
  status: "status"                 # includes: cleaning, loading, preheating, loaded or failed
  error:  "error message"          # error message, if failed to load model
}

Serving Bundle

  • AIServing will automatically serving a model once this model is constrcuted with a conventional structure

  • Tensorflow (Frozen)

    • model_core
    • pre_dataprocess.py
    • post_dataprocess.py
    • distros.json

About

A tensorflow-serving proxy background service。


Languages

Language:Python 94.4%Language:Dockerfile 2.9%Language:Makefile 2.6%