tensorflow tf-serving huggingface huggingface-transformers nlp tensorflow-serving huggingface-examples deep-learning transformers gradio

Simple examples of serving HuggingFace models with TensorFlow Serving.

Repository content

Setup
Start TensorFlow Serving
Inference
References

Setup

Docker - Docker installation guide
TensorFlow - TensorFlow installation guide
TensorFlow Serving - TensorFlow Serving installation guide
HuggingFace - HuggingFace installation guide

Start TensorFlow Serving

*requires Docker

*parameters refer to "DistilBERT (embedding)" sample example

MODEL_SOURCE=$(pwd)/models/embedding/saved_model/1 MODEL_TARGET=/models/embedding/1 MODEL_NAME=embedding sh scripts/start_tf_serving.sh

Parameters:

MODEL_SOURCE: path to the model in your local system.
MODEL_TARGET: path to the model in the Docker env.
MODEL_NAME: Model name used by TFServing, this name will be part of the API URL.

After finished you can use docker ps to check active containers and then docker stop to stop it.

If you don't have a model to use, you can create one using one of the sample models:

Available sample models:

DistilBERT (embedding)

python sample_models/text_models.py get_distilbert_embedding

DistilBERT (sequence classification)

python sample_models/text_models.py get_distilbert_sequence_classification

DistilBERT (token classification - NER)

python sample_models/text_models.py get_distilbert_token_classification

DistilBERT (multiple choice)

python sample_models/text_models.py get_distilbert_multiple_choice

DistilBERT (question answering)

python sample_models/text_models.py get_distilbert_qa

DistilGPT2 (text generation)

python sample_models/text_models.py get_distilgpt2_text_generation

DistilBERT (custom)

python sample_models/text_models.py get_distilbert_custom

Inference

We have two options to access the model and make inferences.

Notebook

Just use the notebook at notebooks/text_inference.ipynb

Gradio APP

Run the app.py command folder for your specific use case at the gradio_apps

Available use cases:

Text:

Generic

TF_URL="http://localhost:8501/v1/models/embedding:predict" TOKENIZER_PATH="./tokenizers/distilbert-base-uncased" python gradio_apps/text_app.py

token classification - NER

TF_URL="http://localhost:8501/v1/models/token_classification:predict" TOKENIZER_PATH="./tokenizers/distilbert-base-uncased" python gradio_apps/text_ner_app.py

multiple choice

TF_URL="http://localhost:8501/v1/models/multiple_choice:predict" TOKENIZER_PATH="./tokenizers/distilbert-base-uncased" python gradio_apps/text_multiple_choice_app.py

question answering

TF_URL="http://localhost:8501/v1/models/qa:predict" TOKENIZER_PATH="./tokenizers/distilbert-base-uncased" python gradio_apps/text_qa_app.py

text generation

TF_URL="http://localhost:8501/v1/models/text_generation:predict" TOKENIZER_PATH="./tokenizers/distilgpt2" python gradio_apps/text_generation_app.py

*_ To be more generic, predictions from the Gradio apps will return raw outputs_

*Gradio apps requires you to define environment variables

For all use cases:

TF_URL: REST API URL provided by your TF Serving.
- e.g. "http://localhost:8501/v1/models/embedding:predict"
  - Swap {embedding} with your model's name

Text use case:

TOKENIZER_PATH: path to the tokenizer in your local system.
- e.g. "./tokenizerstokenizers/distilbert-base-uncased"

References

About

Simple examples of serving HuggingFace models with TensorFlow Serving

tensorflow tf-serving huggingface huggingface-transformers nlp tensorflow-serving huggingface-examples deep-learning transformers gradio

Apache License 2.0

Languages

Language:Python 85.7%Language:Jupyter Notebook 12.6%Language:Shell 1.7%