This is a small project I made to solve an issue I had:
Users can submit different requests with multiple configurations and different arguments. We want to cache said predictions when possible.
This project will:
- Use FastAPI to create a REST API
- Use SQLModel to store predictions based on the config and on the request.
- Use HuggingFace Transformers to run the prediction.
Hopefully, this can help others.
DISCLAIMER
I am aware that model_runner.py
should be split in multiple "Task" objects. This is out of scope for this project.
If you want to use this project, let me know, and we can work on refactoring that part together. It won't be too hard.
Future work
- Expand set of capabilities
- Work on entire datasets
- Uncertainty estimation
- Progress bar on progress
- Async routes
- Run the app:
make DEVICE=cpu compose
(optionallyDEVICE=gpu
to use a GPU)- go to 0.0.0.0:8080/docs
- Execute the
/predictions
API with different requests.
- Run test
make test
- DEV Autoformatting
make black