Shavvimal / RayLLM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deploying a Distributed Ray Python Server with Kubernetes, EKS & KubeRay to serve our own LLM

For Ray Serve, it is recommended in the docs to deploy it in production on Kubernetes, with the recommended practice to use the RayService controller that’s provided as part of KubeRay.

This integration offers the scalability and user experience of Ray Serve with the operational benefits of Kubernetes, including the ability to integrate with existing Kubernetes-based applications. RayService simplifies production deployment by managing health checks, status reporting, failure recovery, and updates for you.

This repo is intended to go alongside my YouTube video on the same topic:

Deploying a Distributed Ray Python Server with Kubernetes, EKS & KubeRay to serve our own LLM

Getting started

Install ray:

pip install -U "ray[default, serve]"

Aswell as the other dependencies:

pip install -r requirements.txt

This is repo with minimal code to test ray deployment of a Serve deployment integrated with FastAPI.

  1. Test a local ray serve deployment
    • serve run app.main:app
  2. Serve Deployment Locally
    • serve build app.main:app -o config/serve-deployment.yaml
    • Start a local Ray cluster: ray start --head
    • Start the application: serve deploy aws/serve-deployment.yaml
    • Stop the application: serve shutdown
    • Stop the local Ray cluster: ray stop
  3. Push to dockerHub:
    • docker build . -t shavvimal/ray_llm:latest
    • docker image push shavvimal/ray_llm:latest
  4. Deploy on Kubernetes Locally
  5. Deploy on Kubernetes on AWS

Notes

See my wiki for more details on the deployment process, including Log Persistence, Autoscaling, and more.

About


Languages

Language:Python 86.3%Language:Dockerfile 13.7%