dimitreOliveira / hf_tf_serving_examples

Simple examples of serving HuggingFace models with TensorFlow Serving

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simple examples of serving HuggingFace models with TensorFlow Serving.

Repository content

Setup

Start TensorFlow Serving

*requires Docker

*parameters refer to "DistilBERT (embedding)" sample example

MODEL_SOURCE=$(pwd)/models/embedding/saved_model/1 MODEL_TARGET=/models/embedding/1 MODEL_NAME=embedding sh scripts/start_tf_serving.sh

Parameters:

  • MODEL_SOURCE: path to the model in your local system.
  • MODEL_TARGET: path to the model in the Docker env.
  • MODEL_NAME: Model name used by TFServing, this name will be part of the API URL.

After finished you can use docker ps to check active containers and then docker stop to stop it.

If you don't have a model to use, you can create one using one of the sample models:

Available sample models:

  • DistilBERT (embedding)
python sample_models/text_models.py get_distilbert_embedding
  • DistilBERT (sequence classification)
python sample_models/text_models.py get_distilbert_sequence_classification
  • DistilBERT (token classification - NER)
python sample_models/text_models.py get_distilbert_token_classification
  • DistilBERT (multiple choice)
python sample_models/text_models.py get_distilbert_multiple_choice
  • DistilBERT (question answering)
python sample_models/text_models.py get_distilbert_qa
  • DistilGPT2 (text generation)
python sample_models/text_models.py get_distilgpt2_text_generation
  • DistilBERT (custom)
python sample_models/text_models.py get_distilbert_custom

Inference

We have two options to access the model and make inferences.

Notebook

  • Just use the notebook at notebooks/text_inference.ipynb

Gradio APP

  • Run the app.py command folder for your specific use case at the gradio_apps
  • Available use cases:
    • Text:
      • Generic
        TF_URL="http://localhost:8501/v1/models/embedding:predict" TOKENIZER_PATH="./tokenizers/distilbert-base-uncased" python gradio_apps/text_app.py
      • token classification - NER
        TF_URL="http://localhost:8501/v1/models/token_classification:predict" TOKENIZER_PATH="./tokenizers/distilbert-base-uncased" python gradio_apps/text_ner_app.py
      • multiple choice
        TF_URL="http://localhost:8501/v1/models/multiple_choice:predict" TOKENIZER_PATH="./tokenizers/distilbert-base-uncased" python gradio_apps/text_multiple_choice_app.py
      • question answering
        TF_URL="http://localhost:8501/v1/models/qa:predict" TOKENIZER_PATH="./tokenizers/distilbert-base-uncased" python gradio_apps/text_qa_app.py
      • text generation
        TF_URL="http://localhost:8501/v1/models/text_generation:predict" TOKENIZER_PATH="./tokenizers/distilgpt2" python gradio_apps/text_generation_app.py

*_ To be more generic, predictions from the Gradio apps will return raw outputs_

*Gradio apps requires you to define environment variables

For all use cases:

  • TF_URL: REST API URL provided by your TF Serving.
    • e.g. "http://localhost:8501/v1/models/embedding:predict"
      • Swap {embedding} with your model's name

Text use case:

  • TOKENIZER_PATH: path to the tokenizer in your local system.
    • e.g. "./tokenizerstokenizers/distilbert-base-uncased"

References

About

Simple examples of serving HuggingFace models with TensorFlow Serving

License:Apache License 2.0


Languages

Language:Python 85.7%Language:Jupyter Notebook 12.6%Language:Shell 1.7%