We consider deploying deep learning inference service online to be a user-facing application in the future. The goal of this project: When you have trained a deep neural net with Paddle, you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed with one line command.
- Industrial serving features supported, such as models management, online loading, online A/B testing etc.
- Distributed Key-Value indexing supported which is especially useful for large scale sparse features as model inputs.
- Highly concurrent and efficient communication between clients and servers supported.
- Multiple programming languages supported on client side, such as Golang, C++ and python.
- Extensible framework design which can support model serving beyond Paddle.
We highly recommend you to run Paddle Serving in Docker, please visit Run in Docker
# Run CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
docker exec -it test bash
# Run GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker exec -it test bash
pip install paddle-serving-client
pip install paddle-serving-server # CPU
pip install paddle-serving-server-gpu # GPU
You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add -i https://pypi.tuna.tsinghua.edu.cn/simple
to pip command) to speed up the download.
Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
Paddle Serving provides HTTP and RPC based service for users to access
Paddle Serving provides a built-in python module called paddle_serving_server.serve
that can start a RPC service or a http service with one-line command. If we specify the argument --name uci
, it means that we will have a HTTP service with a url of $IP:$PORT/uci/prediction
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
Argument | Type | Default | Description |
---|---|---|---|
thread |
int | 4 |
Concurrency of current service |
port |
int | 9292 |
Exposed port of current service to users |
name |
str | "" |
Service name, can be used to generate HTTP request url |
model |
str | "" |
Path of paddle model directory to be served |
mem_optim |
bool | False |
Enable memory / graphic memory optimization |
ir_optim |
bool | False |
Enable analysis and optimization of calculation graph |
use_mkl (Only for cpu version) |
bool | False |
Run inference with MKL |
Here, we use curl
to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, requests.
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
A user can also start a RPC service with paddle_serving_server.serve
. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify --name
here.
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
# A user can visit rpc service through paddle_serving_client API
from paddle_serving_client import Client
client = Client()
client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
fetch_map = client.predict(feed={"x": data}, fetch=["price"])
print(fetch_map)
Here, client.predict
function has two arguments. feed
is a python dict
with model input variable alias name and values. fetch
assigns the prediction variables to be returned from servers. In the example, the name of "x"
and "price"
are assigned when the servable model is saved during training.
- Description:
Chinese word segmentation HTTP service that can be deployed with one line command.
- Download Servable Package:
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
- Host web service:
tar -xzf lac_model_jieba_web.tar.gz
python lac_web_service.py jieba_server_model/ lac_workdir 9292
- Request sample:
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京***"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
- Request result:
{"word_seg":"我|爱|北京|***"}
- Description:
Image classification trained with Imagenet dataset. A label and corresponding probability will be returned.
Note: This demo needs paddle-serving-server-gpu.
- Download Servable Package:
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
- Host web service:
tar -xzf imagenet_demo.tar.gz
python image_classification_service_demo.py resnet50_serving_model
- Request sample:
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
- Request result:
{"label":"daisy","prob":0.9341403245925903}
Key | Value |
---|---|
Model Name | Bert-Base-Baike |
URL | https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz |
Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
Description | Get semantic representation from a Chinese Sentence |
Key | Value |
---|---|
Model Name | Resnet50-Imagenet |
URL | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz |
Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
Description | Get image semantic representation from an image |
Key | Value |
---|---|
Model Name | Resnet101-Imagenet |
URL | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
Description | Get image semantic representation from an image |
Key | Value |
---|---|
Model Name | CNN-IMDB |
URL | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
Description | Get category probability from an English Sentence |
Key | Value |
---|---|
Model Name | LSTM-IMDB |
URL | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
Description | Get category probability from an English Sentence |
Key | Value |
---|---|
Model Name | BOW-IMDB |
URL | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
Description | Get category probability from an English Sentence |
Key | Value |
---|---|
Model Name | Jieba-LAC |
URL | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz |
Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
Description | Get word segmentation from a Chinese Sentence |
Key | Value |
---|---|
Model Name | DNN-CTR |
URL | https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz |
Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
Description | Get click probability from a feature vector of item |
- How to save a servable model?
- An End-to-end tutorial from training to inference service deployment
- Write Bert-as-Service in 10 minutes
- How to config Serving native operators on server side?
- How to develop a new Serving operator?
- How to develop a new Web Service?
- Golang client
- Compile from source code
- How to profile Paddle Serving latency?
- How to optimize performance?(Chinese)
- Deploy multi-services on one GPU(Chinese)
- CPU Benchmarks(Chinese)
- GPU Benchmarks(Chinese)
To connect with other users and contributors, welcome to join our Slack channel
If you want to contribute code to Paddle Serving, please reference Contribution Guidelines
For any feedback or to report a bug, please propose a GitHub Issue.