google / turbinia

Automation and Scaling of Digital Forensics Tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FR]: Re-write the docker job container code

hacktobeer opened this issue · comments

What would this feature improve or what problem would it solve?

This would improve:

  • the isolation of Jobs into their separate docker containers minimizing dependency problems
  • minimizing the worker image
  • improving maintainability and release process
  • minimizing plaso version issues with Timesketch
  • easy rollback of Job container versions when issues arise
  • easy updating of Job code by contributors (they can just update their container)

What is the feature you are proposing?

Re-write the already existing docker job code as this code is based on having a shared docker host. We currently run in K8s and docker compose so the architecture of this needs to change slightly (eg working with a sidecar).

Re-write the docker code to not have the dependency of already pulled docker containers but pull them on an ad-hoc bases when needed. This will make it easier to update to new versions of job containers without having to update/restart the whole Turbinia setup.

What alternatives have you considered?

None.

Observation from docker code paths so far:

  • docker code only checks if an image is already there but does not pull it
  • return object of run/start container API call has changed, code not working
  • pre-checks for worker verify if the actual docker image is has the program available, this would pull images in a phase it is not needed. Removing checks when ad-hoc loading images
  • artifact.py uses image_export.py that is included with the plaso install bundle. If we want to get rid of these dependencies we need to rewrite the FileArtifactExtractionTask to be docker enabled

More observations:

  • utils.py uses image_export directly in the export_file/export_artifact functions