[FR]: Re-write the docker job container code
hacktobeer opened this issue · comments
What would this feature improve or what problem would it solve?
This would improve:
- the isolation of Jobs into their separate docker containers minimizing dependency problems
- minimizing the worker image
- improving maintainability and release process
- minimizing plaso version issues with Timesketch
- easy rollback of Job container versions when issues arise
- easy updating of Job code by contributors (they can just update their container)
What is the feature you are proposing?
Re-write the already existing docker job code as this code is based on having a shared docker host. We currently run in K8s and docker compose so the architecture of this needs to change slightly (eg working with a sidecar).
Re-write the docker code to not have the dependency of already pulled docker containers but pull them on an ad-hoc bases when needed. This will make it easier to update to new versions of job containers without having to update/restart the whole Turbinia setup.
What alternatives have you considered?
None.
Observation from docker code paths so far:
- docker code only checks if an image is already there but does not pull it
- return object of run/start container API call has changed, code not working
- pre-checks for worker verify if the actual docker image is has the program available, this would pull images in a phase it is not needed. Removing checks when ad-hoc loading images
- artifact.py uses image_export.py that is included with the plaso install bundle. If we want to get rid of these dependencies we need to rewrite the FileArtifactExtractionTask to be docker enabled
More observations:
- utils.py uses image_export directly in the export_file/export_artifact functions