Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile to binary #128

Open
speller opened this issue Sep 3, 2021 · 14 comments
Open

Compile to binary #128

speller opened this issue Sep 3, 2021 · 14 comments

Comments

@speller
Copy link

speller commented Sep 3, 2021

It would be nice to have a possibility to compile awscurl to a binary for optimal disk space usage especially in Docker containers. It's hard to pull all the dependencies required for the program to work. Particularly, I'm building an image for CI/CD that will have awscurl installed.

@okigan
Copy link
Owner

okigan commented Sep 5, 2021

@speller please add more info what/which dependency is causing issues.

@speller
Copy link
Author

speller commented Sep 5, 2021

@speller please add more info what/which dependency is causing issues.

I'm not proficient in Python, so I can't say what's missing. I don't know anything about compiling Python programs to binaries. But having a binary pushed to GitHub releases would be super useful. And it also would be super nice if it will work under the Alpine linux.

@okigan
Copy link
Owner

okigan commented Sep 5, 2021 via email

@speller
Copy link
Author

speller commented Sep 5, 2021

What's the binary path then? The entrypoint says that it is run as a python script, not as a binary as I understood: ENTRYPOINT ["python", "-m", "awscurl.awscurl"]

@speller
Copy link
Author

speller commented Sep 6, 2021

Could you let me know what files should I copy from the awscurl docker image to make it working locally on another Alpine-based image?

@speller
Copy link
Author

speller commented Sep 6, 2021

I've managed to compile by myself. Here is my Dockerfile code that builds awscli and awscurl. The awscurl part shares almost everything from the awscli setup process. I had no time to strip the awscli part to leave awscurl only. The difficulty is to get a pyinstaller Alpine bootstrap binary which doesn't exists by default, that's why all these workarounds were made (aws-cli v2 doesn't have an official Alpine image).

# AWS CLI installation based on https://github.com/aws/aws-cli/issues/4685#issuecomment-829600284
ARG PYTHON_VERSION
ARG ALPINE_VERSION
ARG DOCKER_VERSION

FROM python:${PYTHON_VERSION}-alpine${ALPINE_VERSION} AS installer

RUN apk add --no-cache \
    curl \
    unzip \
    gcc \
    git \
    libc-dev \
    libffi-dev \
    openssl-dev \
    py3-pip \
    zlib-dev \
    make \
    cmake

ARG AWSCLI_VERSION
RUN git clone --recursive  --depth 1 --branch ${AWSCLI_VERSION} --single-branch https://github.com/aws/aws-cli.git \
    && cd /aws-cli \
    # Follow https://github.com/six8/pyinstaller-alpine to install pyinstaller on alpine
    && pip install --no-cache-dir --upgrade pip \
    && pip install --no-cache-dir pycrypto \
    && git clone --depth 1 --single-branch --branch v$(grep PyInstaller requirements-build.txt | cut -d'=' -f3) https://github.com/pyinstaller/pyinstaller.git /tmp/pyinstaller \
    && cd /tmp/pyinstaller/bootloader \
    && CFLAGS="-Wno-stringop-overflow -Wno-stringop-truncation" python ./waf configure --no-lsb all \
    && pip install .. \
    && rm -Rf /tmp/pyinstaller \
    && cd - \
    && boto_ver=$(grep botocore setup.cfg | cut -d'=' -f3) \
    && git clone --single-branch --branch v2 https://github.com/boto/botocore /tmp/botocore \
    && cd /tmp/botocore \
    && git checkout $(git log --grep $boto_ver --pretty=format:"%h") \
    && pip install . \
    && rm -Rf /tmp/botocore  \
    && cd - \
    && sed -i '/botocore/d' requirements.txt \
    && scripts/installers/make-exe \
    && unzip dist/awscli-exe.zip \
    && ./aws/install --bin-dir /aws-cli-bin

COPY awscurl/cli.py /awscurl-cli.py
ARG AWSCURL_VERSION
RUN cd / \
    && git clone --recursive  --depth 1 --branch v${AWSCURL_VERSION} --single-branch https://github.com/okigan/awscurl \
    && cd /awscurl \
    && pip install configargparse \
    && pip install requests \
    && cp /awscurl-cli.py cli.py \
    && pyinstaller cli.py --onefile --hidden-import=configargparser --hidden-import=requests --name awscurl
...

FROM docker:${DOCKER_VERSION}
...
COPY --from=installer /usr/local/aws-cli/ /usr/local/aws-cli/
COPY --from=installer /aws-cli-bin/ /usr/local/bin/
COPY --from=installer /awscurl/dist/awscurl /usr/local/bin/awscurl

Versions:

DOCKER_VERSION=20.10.8
AWSCLI_VERSION=2.2.32
AWSCURL_VERSION=0.24
PYTHON_VERSION=3.9.7
ALPINE_VERSION=3.14

The new entrypoint file cli.py is pretty standard:

from awscurl.__main__ import main

if __name__ == "__main__":
    main()

I guess you will need the requirements-build.txt file from awscli just for setup purposes if making awscurl-only Dockerfile.

@speller
Copy link
Author

speller commented Aug 24, 2022

The following Dockerfile code is used to compile binary under Python 3.9 Alpine 3.16

ARG PYTHON_VERSION
ARG DOCKER_VERSION

FROM python:${PYTHON_VERSION} AS installer

RUN set -ex; \
    apk add --no-cache \
    git \
    unzip \
    groff \
    curl \
    build-base \
    libffi-dev \
    cmake

COPY awscurl/cli.py /awscurl-cli.py
ARG AWSCURL_VERSION
RUN set -eux \
    && cd / \
    && git clone --recursive  --depth 1 --branch v${AWSCURL_VERSION} --single-branch https://github.com/okigan/awscurl \
    && cd /awscurl \
    && pip install configargparse \
    && pip install requests \
    && pip install pyinstaller==4.10 \
    && cp /awscurl-cli.py cli.py \
    && pyinstaller cli.py --onefile --hidden-import=configargparser --hidden-import=requests --name awscurl



FROM docker:${DOCKER_VERSION}

COPY --from=installer /awscurl/dist/awscurl /usr/local/bin/awscurl

Versions:

DOCKER_VERSION=20.10.8
AWSCURL_VERSION=0.26
PYTHON_VERSION=3.9-alpine3.16

This allows adding only binary to my image without pulling Python and raw sources. @okigan would you consider adding binaries only to the docker build instead of sources? It doesn't make sense to pull Python when only awscurl is required. And it also will simplify adding awscurl to custom docker images. Saving images' size as much as possible makes sense in deployment pipelines where many images are downloaded often, and bigger images slow down the whole process.

@okigan
Copy link
Owner

okigan commented Aug 24, 2022

First of all, thank you for looking into this!

I have not used pyinstaller before so I looked at the relevant docs. Some of the internal caveats make me concerned
this may trip some users.

Also, if awscurl "was compiled to executable" I would like more context how that would be distributed/consumed. (feel free to respond here or grab some time at https://calendly.com/okigan/30min)

@speller
Copy link
Author

speller commented Aug 24, 2022

@okigan My use cases:

  1. Use awscurl docker image in a ci/cd environment when the job is not heavy and I need to perform some tasks with AWS. In this case, the size of the image makes sense - the smaller the size, the faster is job -> the faster the pipeline.

  2. A complex job in a pipeline - in this case, I make a custom Docker image with the preinstalled set of tools I need instead of downloading each tool as a docker image or install in other ways. Here the size of tools and ease of installation makes sense. Related to awscurl, if I have an image with the binary, I will only add one line to my dockerfile:

COPY --from=okigan/awscurl /usr/local/bin/awscurl /usr/local/bin/awscurl

Otherwise, without the binary, I will have to install sources and Python to make it work, which will increase the resulting image size significantly. You may see in my latest example that I use multi-stage build to compile the binary and then copy only it to my image, dropping off Python, sources, and all dependencies. I add many tools to my image so, again, the size is important. I'm building my custom image on top of the Docker base image for my purposes (which is based on Alpine). If you will make an image with the binary only, you most probably will use the pure Alpine base image.

Does this clarify the context of the binary usage?

@speller
Copy link
Author

speller commented Aug 24, 2022

You may also redistribute the tool as precompiled binaries for different platforms if you wish. I install some tools in my images by downloading binaries. This also helps to save size and time.

@okigan
Copy link
Owner

okigan commented Aug 25, 2022

so I think your flow creates an "uber" docker image with all the necessary tools. And precompiled binaries are a way to avoid conflicts between the different tools.

in the pyinstaller step, the specific binaries are compiled for your version of the (alpine) OS. If this binaries are published I think we'd need to keep them updated per (worst case) OS version (which seems a lot of ongoing work)

if your and awscurl docker image is based on the same alpine base image the extra download should be rather small (i.e. docker does the diff for you)

Maybe the issue we could make the base image more reusable, i.e adjusting this line:
https://github.com/okigan/awscurl/blob/master/Dockerfile#L1

@speller
Copy link
Author

speller commented Aug 26, 2022

From my experience, the majority of linux binaries work well under alpine if they're compiled without external dependencies. In some cases, binaries compiled under any alpine could be required.

Maybe the issue we could make the base image more reusable

Yes, if it will contain a binary then it would solve my issue.

@speller
Copy link
Author

speller commented Mar 24, 2023

Any updates?

@okigan
Copy link
Owner

okigan commented Mar 24, 2023

So this is still not officially supported, I've created a repo to create standalone awscurl (mostly based on what you've figured out above), additional image size seem within expected [see snapshot]. And there is Makefile to build and run/test that standalone awscurl work.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants