For Ray Serve, it is recommended in the docs to deploy it in production on Kubernetes, with the recommended practice to use the RayService controller that’s provided as part of KubeRay.
This integration offers the scalability and user experience of Ray Serve with the operational benefits of Kubernetes, including the ability to integrate with existing Kubernetes-based applications. RayService simplifies production deployment by managing health checks, status reporting, failure recovery, and updates for you.
This repo is intended to go alongside my YouTube video on the same topic:
Install ray:
pip install -U "ray[default, serve]"
Aswell as the other dependencies:
pip install -r requirements.txt
This is repo with minimal code to test ray deployment of a Serve deployment integrated with FastAPI.
- Test a local
ray serve
deploymentserve run app.main:app
- Serve Deployment Locally
serve build app.main:app -o config/serve-deployment.yaml
- Start a local Ray cluster:
ray start --head
- Start the application:
serve deploy aws/serve-deployment.yaml
- Stop the application:
serve shutdown
- Stop the local Ray cluster:
ray stop
- Push to dockerHub:
docker build . -t shavvimal/ray_llm:latest
docker image push shavvimal/ray_llm:latest
- Deploy on Kubernetes Locally
- Deploy on Kubernetes on AWS
See my wiki for more details on the deployment process, including Log Persistence, Autoscaling, and more.