Isolation of evaluations
Munsio opened this issue · comments
For going forward we need to isolate the evaluation runs. This will allow us in the end to run evaluations of multiple models in parallel on a single host or in a cluster.
1 Iteration:
- Create a Docker image
- Contains
eval-dev-quality
binary - Has all the necessary prerequisites installed from an archive with fixed versions
- Java
- Maven
- Gradle
- Go
-
eval-dev-quality install-all
- Contains
- Documentation
- How to build locally (bash script)
- How to run (bash script)
- Script to run multiple local docker instances simultaneously
-
Script to run multiple instances inside kubernetes
2 Iteration:
- Build the image on each PR (+ main) and publish it on Github registry
- Add an additional option
--runtime docker
(default is "local" which runs as before)- If specified each model will be run inside a docker container locally
- Add an additional option
--parallel $uint
(default is "1")- The
--parallel
defines how many models are running in parallel - The option is only allowed if the
runtime != local
-
Add a check for--sequential
to be only allowed whenruntime == local
- Print an information that
--sequential
is skipped ifruntime != local
but passed on to the subsequent runs
- The
3 Iteration:
- Add an additional runtime
kubernetes
- Runs all the models simultaneously on a Kubernetes cluster
- Uses the local installed
kubectl
cmd and default context
4 Iteration:
- Fetch data from
kubernets
runtime automatically (optional parameter) -
Merge the results from runtimesThis will be solved by #205docker
andkubernetes
back into into a summary
5 Iteration:
- Support Ollama in container