Docker image variants #804

juhoinkinen · 2024-09-26T07:15:27Z

Installing the Pecos library for the proposed Xtransformer backend (PR #798) increases the Docker image size considerably:

Especially if we want to include this in the Docker images, the huge size could become a problem.

I build a Dockerimage from this branch, and its size is 7.21 GB, which is quite much bigger than the size of Annif 1.1 image, which is 2.07 GB.

Not all users and use cases probably won't need Xtransformer, or other optional dependencies, so we could build different variants of the image and push them to quay.io (just by setting different buildargs in GitHub Actions build step and tagging the images appropriately). But that can be done in separate PR; I'll create an issue for this now.

Originally posted by @juhoinkinen in #798 (comment)

The question is which names to use when tagging the images for which variants, and how to incorporate them to the current tag naming scheme. Probably the best is to add postfixes to the currently used tags (<major>.<minor>[.<patch>[-<YYYYMMDD>]]).

Dependency variants

I guess one option is these postfixes:

-slim or -minimal: no optional dependencies
- size 0.7 GBs
(no extra postfix): the currently used optional dependencies (voikko, fasttext, nn, omikuji, yake, spacy, stwfsa)
- size 2.1 GBs
-full: the above plus pecos
- size 7.2 GBs

What ever the chosen naming convention would be, it is important not to change it (too often at least).

Architecture variants

I think this (can be) specific to only pecos: Should there be some variants for different architecture variants (specific CUDA versions for specific architectures or something like that?). On PyPI there are wheels with names including x86-64 and ARM64.

But I'm not sure if this is at all necessary.

Inference-only variants

Another variants could be images supporting only inference, not training. These could have the ONNX runtime installed instead of TensorFlow, PyTorch or even Scikit-learn.

Using ONNX could both reduce image size and improve inference performance. An improvement like this, that is by 99% for a TensorFlow model, is most probably not going to happen, but we could test and see what the effect is.

The text was updated successfully, but these errors were encountered:

juhoinkinen added the docker label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker image variants #804

Docker image variants #804

juhoinkinen commented Sep 26, 2024 •

edited

Loading

Docker image variants #804

Docker image variants #804

Comments

juhoinkinen commented Sep 26, 2024 • edited Loading

Dependency variants

Architecture variants

Inference-only variants

juhoinkinen commented Sep 26, 2024 •

edited

Loading