How to Use ONNX Runtime Server for Prediction

ONNX Runtime Server provides an easy way to start an inferencing server for prediction with GRPC endpoints.

The CLI command to start the server is shown below:

$ ./onnxruntime_server --helpfull
onnxruntime_server: ./onnxruntime_server --model_path trained.onnx

  Flags from onnxruntime_server.cpp:
    --address (The base server address); default: "0.0.0.0";
    --grpc_port (GRPC port to listen to requests); default: 50051;
    --log_level (Logging level. Allowed options (case sensitive): info, warning,
      error, fatal); default: INFO;
    --model_path (Path to ONNX model); default: ;
    --num_threads (Number of server threads); default: 0;

Note: The only mandatory argument for the program here is model_path

Start the Server

To host an ONNX model as an inferencing server, simply run:

./onnxruntime_server --model_path /<your>/<model>/<path>

Dependencies

The Abseil C++ library is cloned as a submodule. Run the following commands after cloning this repository:

git submodule init
git submodule update

Download the ONNX Runtime Release for your architecture.

You also need to build and install gRPC. Follow the gRPC Quick start

View ONNX Model Properties

View and inspect ONNX model properties using Netron and note model input names and expected tensor sizes.

onnxruntime_server implements a gRPC service that consumes input names in gRPC requests and produces output names in gRPC responses that directly map to ONNX model properties.

Build

Generate the Makefile:

% mkdir -p build && cd build
% cmake -DCMAKE_PREFIX_PATH=/<your>/<grpc>/<path> -DONNXRuntime_ROOT_DIR=/<your>/<onnxruntime>/<path>/onnxruntime-osx-x86_64-1.16.3 ..

Build the sources:

% make
...
[ 55%] Built target inference_grpc_proto
[ 66%] Building CXX object CMakeFiles/onnxruntime_serving.dir/serving.cc.o
[ 77%] Linking CXX static library libonnxruntime_serving.a
[ 77%] Built target onnxruntime_serving
[ 88%] Linking CXX executable onnxruntime_server
[100%] Built target onnxruntime_server

Built With

Abseil - An open-source collection of C++ code (compliant to C++11) designed to augment the C++ standard library.

GRPC Endpoint

To use the GRPC endpoint, the protobuf can be found here. You could generate your client and make a GRPC call to it. To learn more about how to generate the client code and call to the server, please refer to the tutorials of GRPC.

Advanced Topics

Number of Worker Threads

You can change this to optimize server utilization. The default is the number of CPU cores on the host machine.

Extensions

The following Visual Studio Code extensions are highly recommended for working with this project:

C/C++ for Visual Studio Code - Provides rich C and C++ language support, including features such as IntelliSense, debugging, and code navigation.
CMake For VisualStudio Code - Enables convenient configuration and building of CMake projects within VS Code.
CMake Tools - Provides additional CMake support, including capabilities for configuring, building, and testing CMake projects.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

regel / onnxruntime_server