Parent Image Detection
A set of tools for collecting parent-image data, and detecting the parent-image of a given Docker image. Detection will succeed if the parent-image was previously scanned.
In the following text parent image and base image are used interchangeably.
Prerequisites
The tools are based on Scribe's valint tool (which generates an sbom for a given image). Valint capabilities can be explored here.
Downloading valint:
curl -sSfL https://get.scribesecurity.com/install.sh | sh -s -- -t valint
Quick Start
Generate an SBOM, for example:
valint bom kibana:8.6.2 --output-file kibana-sbom.json
Get the basic information about the base image of the image described in the sbom:
python3 get_base_image.py --sbom kibana-sbom.json --image_index small_image_index.json
Result:
INFO:my-logger:Request:kibana-sbom.json
Result:{'repo': 'ubuntu', 'image_tag': 'focal-20230126'}
In order to get more detailed information about the parent image, inflate the image_index.tgz file:
tar -xzf image_index.tgz
And then run:
python3 get_base_image.py --sbom kibana-sbom.json
Base Image Detection
To find the base image of a given image, create an sbom using valint and then run:
python3 get_base_image.py [options]
options:
--sbom valint generated sbom filename. If no sbom is given the script will act as a service
--output_file filename for saving the base-image metadata, defaults to base_image.json
--image_index image index filename, defaults to image_index.json
When running as service, the following endpoints are supported:
/base_image returns a JSON object describing the base image or an object with an error.
Receives and input an ordered list of image layer hashes (as strings)
/healthcheck returns the JSON object {"Status": "OK"}
/test returns the object in the data field of a GET request, intended for debug
You can check the service by trying:
curl -X GET -H "Content-Type:application/json" -d '[layer1_hash, layer2_hash, ... ]' http://127.0.0.1:5000/base_image
Base Image Database Population
The parent-image population process downloads all images of a repository of a set of parent-images, and creates an index file which is a map - mapping a base-image-id to base-image meta-data. The base-image-id is a concatenation of the hashes of the layers, ordered from lower layer and up, of the base image.
Defining the base images to download is done via a product.json file. in the following format:
[
{"repo":"ubuntu", "path":"library", "arch":"arm64", "refresh":"index"},
{"repo":"ubuntu", "path":"library", "arch":"amd64", "refresh":"all"},
{"repo":"alpine", "path":"library", "arch":"arm64"},
{"repo":"single-base-layer-test", "path":"scribesecurity", "arch":"arm64"}
]
The path parameter is part of the Docker API url; for DockerHub approved or recommended images it is library, and for others it is the DockerHub username.
The refresh parameter enables refreshing the image_index calculation: the "index" option re-calculates the image_index entries based on existing sboms, and the "all" also re-creates the sboms. The index option is much faster since it does not require downloading the image
To run image population run:
python3 get_bimage_index.py [options]
options:
--product_list product_list filename, defaults to product_list.json
--image_index image_index filename, defaults to image_index.json
--erase_index flag, if exist will erase the image_index file at the beginning of run
The script will create folder for each product in the product list, that will containt sboms and a layer synopsis for each of all base image versions, and files for tracking the which images have been downloaded and which are pending. This allows re-running the script without re-downloading everything.
Another option is to run via docker. Note that this runs docker in docker: (this example assumes the imaga name is scribesecurity/base-image-tool)
docker run -v ${pwd}:/ -v /var/run/docker.sock:/var/run/docker.sock scribesecurity/base-image-tool
The script creates (or updates) the image_index.json file, Which is a dictionary mapping base-image-ids to metadata about the base image.
In case one wants a condensed version - a map from hash concatination to the image tags only, run:
python3 ImgIndexCleaner.py [options]
options:
--image_index image index filename, defaults to image_inded.json
--outfile filename for output file, defaults to small_image_index.json
License
This project is licensed under the AGPL License. The full license can be found in the LICENSE file.