Building for rust & ml practice purposes (for now), hope I can finish this 😅
The goal of this project is not to make a general ML model serving framework, there are plenty out there. Instead, there will be a focus on certain areas of serving, like model autoscaling and orchestration for example.
- Fast ML Inference
- Model orchestration
- Inference from http endpoints
- Rust web server, coordinator, and control plane (blazingly fast)
- Tensorflow model
- Runs some python script to perform inference
- Robust message delivery from rust => python through some tbd network protocol/message queue/stream
- Simple python interface to configure model serving (instead of script)
- Deployment
- Containerized
- Batch Processing
- Observability
- Kubernetes
- Autoscaling
- Model orchestration
- Load balancing
- MLOps
- Training to production pipeline CI/CD
- Model validation
- Realtime, continuous learning
Should feel like npm
swerve build
swerve dev
swerve serve
orswerve start
to start the inference serverswerve test
- Python model environment using
envd
+ rust web server all packaged on top of docker under the hood. - Maybe have a
swerve.config
file?- Used to declaratively specify model deployment
- Separating server container and model container, scale individually.
- Separate instances and scaling for CPU and GPU instances.
- Multi-model serving
- Model repository
- Non-http entrypoints
- Online: grpc
- Offline: kafka, dagster etc.
- data pipeline => predict => output storage