okigan / awscurl

curl-like access to AWS resources with AWS Signature Version 4 request signing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Compile to binary

speller opened this issue · comments

It would be nice to have a possibility to compile awscurl to a binary for optimal disk space usage especially in Docker containers. It's hard to pull all the dependencies required for the program to work. Particularly, I'm building an image for CI/CD that will have awscurl installed.

@speller please add more info what/which dependency is causing issues.

@speller please add more info what/which dependency is causing issues.

I'm not proficient in Python, so I can't say what's missing. I don't know anything about compiling Python programs to binaries. But having a binary pushed to GitHub releases would be super useful. And it also would be super nice if it will work under the Alpine linux.

What's the binary path then? The entrypoint says that it is run as a python script, not as a binary as I understood: ENTRYPOINT ["python", "-m", "awscurl.awscurl"]

Could you let me know what files should I copy from the awscurl docker image to make it working locally on another Alpine-based image?

I've managed to compile by myself. Here is my Dockerfile code that builds awscli and awscurl. The awscurl part shares almost everything from the awscli setup process. I had no time to strip the awscli part to leave awscurl only. The difficulty is to get a pyinstaller Alpine bootstrap binary which doesn't exists by default, that's why all these workarounds were made (aws-cli v2 doesn't have an official Alpine image).

# AWS CLI installation based on https://github.com/aws/aws-cli/issues/4685#issuecomment-829600284
ARG PYTHON_VERSION
ARG ALPINE_VERSION
ARG DOCKER_VERSION

FROM python:${PYTHON_VERSION}-alpine${ALPINE_VERSION} AS installer

RUN apk add --no-cache \
    curl \
    unzip \
    gcc \
    git \
    libc-dev \
    libffi-dev \
    openssl-dev \
    py3-pip \
    zlib-dev \
    make \
    cmake

ARG AWSCLI_VERSION
RUN git clone --recursive  --depth 1 --branch ${AWSCLI_VERSION} --single-branch https://github.com/aws/aws-cli.git \
    && cd /aws-cli \
    # Follow https://github.com/six8/pyinstaller-alpine to install pyinstaller on alpine
    && pip install --no-cache-dir --upgrade pip \
    && pip install --no-cache-dir pycrypto \
    && git clone --depth 1 --single-branch --branch v$(grep PyInstaller requirements-build.txt | cut -d'=' -f3) https://github.com/pyinstaller/pyinstaller.git /tmp/pyinstaller \
    && cd /tmp/pyinstaller/bootloader \
    && CFLAGS="-Wno-stringop-overflow -Wno-stringop-truncation" python ./waf configure --no-lsb all \
    && pip install .. \
    && rm -Rf /tmp/pyinstaller \
    && cd - \
    && boto_ver=$(grep botocore setup.cfg | cut -d'=' -f3) \
    && git clone --single-branch --branch v2 https://github.com/boto/botocore /tmp/botocore \
    && cd /tmp/botocore \
    && git checkout $(git log --grep $boto_ver --pretty=format:"%h") \
    && pip install . \
    && rm -Rf /tmp/botocore  \
    && cd - \
    && sed -i '/botocore/d' requirements.txt \
    && scripts/installers/make-exe \
    && unzip dist/awscli-exe.zip \
    && ./aws/install --bin-dir /aws-cli-bin

COPY awscurl/cli.py /awscurl-cli.py
ARG AWSCURL_VERSION
RUN cd / \
    && git clone --recursive  --depth 1 --branch v${AWSCURL_VERSION} --single-branch https://github.com/okigan/awscurl \
    && cd /awscurl \
    && pip install configargparse \
    && pip install requests \
    && cp /awscurl-cli.py cli.py \
    && pyinstaller cli.py --onefile --hidden-import=configargparser --hidden-import=requests --name awscurl
...

FROM docker:${DOCKER_VERSION}
...
COPY --from=installer /usr/local/aws-cli/ /usr/local/aws-cli/
COPY --from=installer /aws-cli-bin/ /usr/local/bin/
COPY --from=installer /awscurl/dist/awscurl /usr/local/bin/awscurl

Versions:

DOCKER_VERSION=20.10.8
AWSCLI_VERSION=2.2.32
AWSCURL_VERSION=0.24
PYTHON_VERSION=3.9.7
ALPINE_VERSION=3.14

The new entrypoint file cli.py is pretty standard:

from awscurl.__main__ import main

if __name__ == "__main__":
    main()

I guess you will need the requirements-build.txt file from awscli just for setup purposes if making awscurl-only Dockerfile.

The following Dockerfile code is used to compile binary under Python 3.9 Alpine 3.16

ARG PYTHON_VERSION
ARG DOCKER_VERSION

FROM python:${PYTHON_VERSION} AS installer

RUN set -ex; \
    apk add --no-cache \
    git \
    unzip \
    groff \
    curl \
    build-base \
    libffi-dev \
    cmake

COPY awscurl/cli.py /awscurl-cli.py
ARG AWSCURL_VERSION
RUN set -eux \
    && cd / \
    && git clone --recursive  --depth 1 --branch v${AWSCURL_VERSION} --single-branch https://github.com/okigan/awscurl \
    && cd /awscurl \
    && pip install configargparse \
    && pip install requests \
    && pip install pyinstaller==4.10 \
    && cp /awscurl-cli.py cli.py \
    && pyinstaller cli.py --onefile --hidden-import=configargparser --hidden-import=requests --name awscurl



FROM docker:${DOCKER_VERSION}

COPY --from=installer /awscurl/dist/awscurl /usr/local/bin/awscurl

Versions:

DOCKER_VERSION=20.10.8
AWSCURL_VERSION=0.26
PYTHON_VERSION=3.9-alpine3.16

This allows adding only binary to my image without pulling Python and raw sources. @okigan would you consider adding binaries only to the docker build instead of sources? It doesn't make sense to pull Python when only awscurl is required. And it also will simplify adding awscurl to custom docker images. Saving images' size as much as possible makes sense in deployment pipelines where many images are downloaded often, and bigger images slow down the whole process.

First of all, thank you for looking into this!

I have not used pyinstaller before so I looked at the relevant docs. Some of the internal caveats make me concerned
this may trip some users.

Also, if awscurl "was compiled to executable" I would like more context how that would be distributed/consumed. (feel free to respond here or grab some time at https://calendly.com/okigan/30min)

@okigan My use cases:

  1. Use awscurl docker image in a ci/cd environment when the job is not heavy and I need to perform some tasks with AWS. In this case, the size of the image makes sense - the smaller the size, the faster is job -> the faster the pipeline.

  2. A complex job in a pipeline - in this case, I make a custom Docker image with the preinstalled set of tools I need instead of downloading each tool as a docker image or install in other ways. Here the size of tools and ease of installation makes sense. Related to awscurl, if I have an image with the binary, I will only add one line to my dockerfile:

COPY --from=okigan/awscurl /usr/local/bin/awscurl /usr/local/bin/awscurl

Otherwise, without the binary, I will have to install sources and Python to make it work, which will increase the resulting image size significantly. You may see in my latest example that I use multi-stage build to compile the binary and then copy only it to my image, dropping off Python, sources, and all dependencies. I add many tools to my image so, again, the size is important. I'm building my custom image on top of the Docker base image for my purposes (which is based on Alpine). If you will make an image with the binary only, you most probably will use the pure Alpine base image.

Does this clarify the context of the binary usage?

You may also redistribute the tool as precompiled binaries for different platforms if you wish. I install some tools in my images by downloading binaries. This also helps to save size and time.

so I think your flow creates an "uber" docker image with all the necessary tools. And precompiled binaries are a way to avoid conflicts between the different tools.

in the pyinstaller step, the specific binaries are compiled for your version of the (alpine) OS. If this binaries are published I think we'd need to keep them updated per (worst case) OS version (which seems a lot of ongoing work)

if your and awscurl docker image is based on the same alpine base image the extra download should be rather small (i.e. docker does the diff for you)

Maybe the issue we could make the base image more reusable, i.e adjusting this line:
https://github.com/okigan/awscurl/blob/master/Dockerfile#L1

From my experience, the majority of linux binaries work well under alpine if they're compiled without external dependencies. In some cases, binaries compiled under any alpine could be required.

Maybe the issue we could make the base image more reusable

Yes, if it will contain a binary then it would solve my issue.

Any updates?

So this is still not officially supported, I've created a repo to create standalone awscurl (mostly based on what you've figured out above), additional image size seem within expected [see snapshot]. And there is Makefile to build and run/test that standalone awscurl work.

image