ra312 / model-server

A fastapi web application for real time inference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

inference service

flowchart TD
    A[ModelArtifact] -->B(Model Instance)
    E[Restaurants_Index:Elastic] -->B
    F[Restaurant API] -->B
    %% G[InferenceFeatures: ES_Index] -->B
    %% G --> H
    B -->C[VenueRatings] --> H(Elastic)
    %% H[ElasticIndex] --> B
    C -- 100 rps with 2s response single cpu-->D(Search List)

PyPI PyPI - Python Version PyPI - License Coookiecutter - Wolt codecov


Training Pipeline Source Code: https://github.com/ra312/personalization Source Code: https://github.com/ra312/model-server


A service to rate venues

Installation

python3 -m pip install recommendation-model-server

Running locally on host

If you choose to use pre-trained model in artifacts/rate_venues.pickle to start on 0.0.0.0:8000

/scripts/start_inference_service.sh

In separate tab, please run

curl -X 'POST' \
'http://0.0.0.0:8000/predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '[
  {
    "venue_id": -4202398962129790000,
    "conversions_per_impression": 0.3556765815,
    "price_range": 1,
    "rating": 8.6,
    "popularity": 4.4884057024,
    "retention_rate": 8.6,
    "session_id_hashed": 3352618370338455600,
    "position_in_list": 31,
    "is_from_order_again": 0,
    "is_recommended": 0
  }
]'

Running in container

docker pull akylzhanov/search-api
docker run -d --name search-api-container -p 8000:8000 --rm akylzhanov/search-api

Start search UI

Start elasticindex locally

./scripts/run_elastic_locally.sh

Create restaurant index querying local restaurants for lat=52.5024674, lon = 13.2810506 at Café Am Neuen See, Tiergarten, Mitte

poetry run python3 src/recommendation_model_server/indexer.py

Head to search UI at localhost:8000

alt text

Start redis locally

./scripts/run_redis_cache.sh

Development

  • Clone this repository
  • Requirements:
  • Create a virtual environment and install the dependencies
poetry install
  • Activate the virtual environment
poetry shell

Testing

pytest tests

Pre-commit

Pre-commit hooks run all the auto-formatters (e.g. black, isort), linters (e.g. mypy, flake8), and other quality checks to make sure the changeset is in good shape before a commit/push happens.

You can install the hooks with (runs for each commit):

pre-commit install

Or if you want them to run only for each push:

pre-commit install -t pre-push

Or if you want e.g. want to run all checks manually for all files:

pre-commit run --all-files

How to run load tests

  1. Start service locally, host=0.0.0.0, port=8000
/scripts/start_inference_service.sh
  1. Run load test with locust 1million users with spawn rate 100 users per second, i.e.
poetry shell && pytest tests/test_invokust_load.py -s

The output is similar to (the time is in milliseconds)

Ramping to 1000000 users at a rate of 100.00 per second
Type     Name  # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s
--------||-------|-------------|-------|-------|-------|-------|--------|-----------
POST     /predict    1453     0(0.00%) |    448       5    1948    390 |  167.83        0.00
--------||-------|-------------|-------|-------|-------|-------|--------|-----------
        Aggregated    1453     0(0.00%) |    448       5    1948    390 |  167.83        0.00

Is this production-ready?

No 😅 It's missing:

  • defensive programming against network calls
  • logging/tracing
  • start using predict endpoint to re-reank respone of search endpoint
  • we did load testing with locust, but still need to increase coverage (e2e tests, stress tests, unit tests)

This project was generated using the wolt-python-package-cookiecutter template.

About

A fastapi web application for real time inference

License:MIT License


Languages

Language:Python 80.0%Language:HTML 8.6%Language:Shell 7.1%Language:Dockerfile 4.3%