-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
containerized code: setting and running containerized code (docker, sarus, singularity, conda) #5507
Conversation
@ltalirz It would be nice if you can running a more complex docker local code test example. Unfortunately, I don't have the use case and plugins for this type of code. |
Thanks @unkcpz . I think having support for containerized codes is great, and so we should push this through. However, I think the negative consequences of the original design of the I have been wanting to do that for a long time, but was always hesitant given that it is such a fundamental part of |
Thanks a lot for getting this PR ready @unkcpz and sorry for the late reply - is there any particular docker feature you would like me to test? Comments from my side:
|
@ltalirz thanks for the suggestion. Me and Seb will meet tomorrow to settle the issues and I'll then try to finish this PR, maybe without implementing local code for containerized code, in order to keep the first introduction of containerized code concept simple and useful. |
9a2bde6
to
89efeac
Compare
@sphuber I reimplement this with new code, it really simplifies the implementation a lot. But there are still some open questions about the actual use of this containerized code.
About this PR, if @sphuber good with current code structure, I'll go ahead with add CI test and documentation. And running some production run on my wrapped up pseudopotential generator code in container. Also pinning @giovannipizzi for comment. |
Another thought about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @unkcpz . Code is looking good, but have some minor changes/simplifications. It would indeed be good to have some examples of how to setup and run these with docker
, singularity
and sarus
.
I think the verdi code test
question can be left for later. This was only recently added and only really does something for InstalledCode
. Don't think we need it for the containerized codes now, anyway they are a new experimental feature that will require some testing.
aiida/engine/daemon/execmanager.py
Outdated
try: | ||
handle.write(code.base.repository.get_object_content(filename, mode='rb')) | ||
except: | ||
# raise TypeError('directory not supperted.') | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, I think it would maybe be nicer to use:
from aiida.repository import FileType
for obj in filter(code.base.repository.list_objects(), lambda o: o.file_type == FileType.FILE):
with NamedTemporaryFile(mode='wb+') as handle:
handle.write(code.base.repository.get_object_content(filename, mode='rb'))
Actually, thinking about this, this code is wrong. It only copies top-level files, but doesn't recurse into directories. Probably there is no test that actually tests this case. We should actually use code.base.repository.walk
and iterate over all the files and copy those. Maybe I will quickly fix this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I put the pass here for a further look into it and forget about it. I think what I expected here is as you said to walk through the directory and copy all from inside. This will then also support the that filepath_executable
of portable code is a real relative path of code inside a subfolder.
aiida/engine/daemon/execmanager.py
Outdated
@@ -174,14 +175,18 @@ def upload_calculation( | |||
# Still, beware! The code file itself could be overwritten... | |||
# But I checked for this earlier. | |||
for code in input_codes: | |||
if isinstance(code, PortableCode): | |||
if isinstance(code, (PortableCode, PortableContainerizedCode)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(code, (PortableCode, PortableContainerizedCode)): | |
if isinstance(code, PortableCode): |
Since PortableContainerizedCode
is actually a subclass, you don't have to specifically add it.
@@ -611,7 +620,8 @@ def presubmit(self, folder: Folder) -> CalcInfo: | |||
) | |||
) | |||
|
|||
if isinstance(code, PortableCode) and str(code.filepath_executable) in folder.get_content_list(): | |||
if isinstance(code, (PortableCode, PortableContainerizedCode)) and str(code.filepath_executable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(code, (PortableCode, PortableContainerizedCode)) and str(code.filepath_executable | |
if isinstance(code, PortableCode) and str(code.filepath_executable |
this_code = load_node( | ||
code_info.code_uuid, | ||
sub_classes=(Code, InstalledCode, PortableCode, InstalledContainerizedCode, PortableContainerizedCode) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this_code = load_node( | |
code_info.code_uuid, | |
sub_classes=(Code, InstalledCode, PortableCode, InstalledContainerizedCode, PortableContainerizedCode) | |
) | |
this_code = load_code(code_info.code_uuid) |
@@ -715,10 +728,20 @@ def presubmit(self, folder: Folder) -> CalcInfo: | |||
else: | |||
prepend_cmdline_params = [] | |||
|
|||
escape_exec_line = False | |||
if isinstance(this_code, (InstalledContainerizedCode, PortableContainerizedCode)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(this_code, (InstalledContainerizedCode, PortableContainerizedCode)): | |
if isinstance(this_code, ContainerizedCode): |
Since they share this base class, why not use that?
'engine_command': { | ||
'required': True, | ||
'prompt': 'Engine command', | ||
'help': 'The command to run container must contain {image} for image.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'help': 'The command to run container must contain {image} for image.', | |
'help': 'The command to run the container. It must contain the placeholder {image} that will be replaced with the `image_name`.', |
'help': 'The command to run container must contain {image} for image.', | ||
'type': click.STRING, | ||
}, | ||
'image': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'image': { | |
'image_name': { |
'required': True, | ||
'type': click.STRING, | ||
'prompt': 'Image', | ||
'help': 'Image of the container to run executable.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'help': 'Image of the container to run executable.', | |
'help': 'Name of the image container in which to the run the executable.', |
"""Data plugin representing an executable code on a remote computer. | ||
|
||
This plugin should be used if an executable is pre-installed on a computer. The ``InstalledCode`` represents the code by | ||
storing the absolute filepath of the relevant executable and the computer on which it is installed. The computer is | ||
represented by an instance of :class:`aiida.orm.computers.Computer`. Each time a :class:`aiida.engine.CalcJob` is run | ||
using an ``InstalledCode``, it will run its executable on the associated computer. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Data plugin representing an executable code on a remote computer. | |
This plugin should be used if an executable is pre-installed on a computer. The ``InstalledCode`` represents the code by | |
storing the absolute filepath of the relevant executable and the computer on which it is installed. The computer is | |
represented by an instance of :class:`aiida.orm.computers.Computer`. Each time a :class:`aiida.engine.CalcJob` is run | |
using an ``InstalledCode``, it will run its executable on the associated computer. | |
""" | |
"""Data plugins representing an executable code to be run in a container. | |
These plugins are directly analogous to the ``InstalledCode`` and ``PortableCode`` plugins, except that the executable | |
is present inside of a container. For the ``InstalledContainerizedCode`` the executable is expected to already be | |
present inside a container that is available on the target computer. With the ``PortableContainerizedCode`` plugin, the | |
target executable will be stored in AiiDA's storage, just as with the ``PortableCode`` and when launched, the code will | |
be copied inside the container on the target computer and run inside the container. | |
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! I forget about this part.
To fix the class ContainerizedCode(AbstractCode) |
1d9aa46
to
c242647
Compare
I was swinging between the name @sphuber thanks for reviewing. I will not re-request review at the moment. I will keep on add CI test and documentation. |
9ac7e18
to
b3a79b1
Compare
Just a comment on this:
I think we can (for now) assume that the user will have already fetched the image on the computer (a bit like we assume the code has been already compiled). This, of course, needs to be documented. |
I agree. Documenting how to use the Code's |
b5f8191
to
862f68e
Compare
@ltalirz thanks for the advice, but I think what I need here is in github CI I want to run
@giovannipizzi sorry for the delay. Yes, I agree. It is true this also a problem for installed code and validate by |
@chrisjsewell thanks for the comment and sorry for the late reply, I was on vacation last week.
Yes, I was considering this but then in the coding week, we decided to start from simple without changing too much of the original design. The other reason is that for the
Sadly, it is not supported in docker either. I think I mentioned in docs somewhere said only serial executable is supported for docker. I will take a look at if it is possible to get this done by what you suggested having extra placeholders. |
Yep that's fair, although I would like to make sure that this use case etc will be possible going forward, without any painful deprecations 😅 As, once it's released and people start using it, obviously it's difficult to retroactively change
thanks 😄 |
070de20
to
9d59d03
Compare
The containerized code is allowed to setting through cmdline and used in calculation. The containerized engines supported and tested are docker, Sarus and Singularity. It is shown below how to configure the code and running calculation. I will then move the example below into documentation. Test command line options for containerized code docstring added [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci review [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci
test docker run
3cefdd6
to
22051c0
Compare
pytest fix Add more docs in data_types Use localhost to test add-docker container
22051c0
to
63e7a2b
Compare
I add two optional parameters for containerized code setting, the engine_command = """conda run --name {image}"""
override_mpirun_command = 'inner_mpirun -np {tot_num_mpiprocs}'
containerized_code = orm.InstalledContainerizedCode(
default_calc_job_plugin='core.arithmetic.add',
filepath_executable='/bin/bash',
engine_command=engine_command,
image='myenv',
inner_mpi=True,
mpi_args=override_mpirun_command,
computer=aiida_localhost,
escape_exec_line=False,
).store() Will generate the run line as following:
It is not hard to add such a thing when more features are needed, but as @chrisjsewell said lots of code coupled at For the implementation itself, I think this PR is ready to have another review. If @chrisjsewell compromise with such nasty workaround for mpirun inside conda/docker, I'll add the cmdline options |
Superseded by #5667 |
This is the last part of the implementation of #5250.
In this PR, the containerized code is allowed to setting through cmdline and used in calculation. The containerized engines supported and tested are docker, Sarus and Singularity. It is shown below how to configure the code and running calculation. I will then move the example below into documentation.
Although all basic features are implemented, this PR still needs to be polished but I think it better to get a review before I move on.
Since running code in the container always require mapping the current directory where the input files locate to the working directory in container, the current directory is specified in the job script by
$PWD
. Therefore, we need to use double quotes to escape the command line parameters. It is set by settinguse_double_quotes
for computer setup.in the computer setup must set
use_double_quotes
to true, since theengine_command
escape is controlled by that and$VAR
will not be evaluated otherwisedocker
The docker engine support not only set the code installed in the container but also the code in the DB store where uploaded and run in the container where the container provides the needed libraries.
remote code
The typical code setup for it shown below. The option
escape_execline
is mandatory for running commands in docker container. It will put thecmdline_params
and the redirect parameters in the quotes so the whole command is recogonzed and run inside the container.Then you can launch the calculation with script:
local code (store in db)
The code can be set up by specifying the executable file in the local machine and uploaded to what ever the computers set in the aiida dababase. This is very useful for example when you have a python script that has special dependencies. You can create the image that contains the libraries and then able to running code on all kinds of machine only with docker installed.
The code setup config example is:
where the executable
eval_sh.py
is a dummy python script that executebash < aiida.in > aiida.out
only for demo purpose.Running the code by:
Just specify the computer and no need to worry about the dependencies.
Sarus and Singularity
The Sarus and Singularity share the same logic the only difference comes from the details of running containerized code which can be specified by
engine_command
when setting the code.I create a image jusong/qe-mpich314:v01 with q-e 6.8 compiled with MPICH and able to run pw calculation in the container with full parallelization capability.
code setup config files
For singularity the image is more than a image name but the path of the
sif
image file. In fact the Sarus also have image download and stored but on specific directory therefore only image name needed.conda
First, you need have a conda environment with the executable and mpi installed.
Here I'll show a example of using conda to run
pw.x
calculation from Quantum ESPRESSO.Create a new conda environment with
conda create -n container-run
and install the Quantum ESPRESSO from conda forgeconda install -c conda-forge qe
.The executables of QE can be found from
<ENV_PATH>/bin
.Then configure the code with the following config yaml file. Notice that for the conda container environment, it is similar to docker in that the MPI command should run from inside the container rather than mapping to the host MPI libraries, so the
inner_mpi
set toTrue
.The stdin and stdout is also redirect input and output fully from inside the container (env) which needs to be called by
bash -c
and inside the single quotes withescape_exec_line
set toTrue
.code config:
Launching the calculation with the code set.
The typical inputs for the process are all the same as regular code, only need to make sure the special MPI setting is specified for the image.