Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerized code: setting and running containerized code (docker, sarus, singularity, conda) #5507

Closed
wants to merge 6 commits into from

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Apr 29, 2022

This is the last part of the implementation of #5250.

In this PR, the containerized code is allowed to setting through cmdline and used in calculation. The containerized engines supported and tested are docker, Sarus and Singularity. It is shown below how to configure the code and running calculation. I will then move the example below into documentation.

Although all basic features are implemented, this PR still needs to be polished but I think it better to get a review before I move on.

  • cmdline test for new code type.
  • daemon CI test with a docker that will run a real test on docker (Could you tell me where better to add it? @sphuber).
  • documentation and include example below.
  • local code setup should also work with Sarus and Singularity on HPC, but not fully test yet.
  • conda supported (edited)

Since running code in the container always require mapping the current directory where the input files locate to the working directory in container, the current directory is specified in the job script by $PWD. Therefore, we need to use double quotes to escape the command line parameters. It is set by setting use_double_quotes for computer setup.

in the computer setup must set use_double_quotes to true, since the engine_command escape is controlled by that and $VAR will not be evaluated otherwise

docker

The docker engine support not only set the code installed in the container but also the code in the DB store where uploaded and run in the container where the container provides the needed libraries.

remote code

The typical code setup for it shown below. The option escape_execline is mandatory for running commands in docker container. It will put the cmdline_params and the redirect parameters in the quotes so the whole command is recogonzed and run inside the container.

---
label: add-docker
description: add docker
default_calc_job_plugin: core.arithmetic.add
on_computer: true
computer: localhost
filepath_executable: /bin/bash
image: ubuntu
engine_command: docker run -v $PWD:/workdir:rw -w /workdir {image} sh -c
escape_exec_line: true
prepend_text: ' '
append_text: ' '

Then you can launch the calculation with script:

from aiida import orm
from aiida.engine import run_get_node
from aiida.plugins import CalculationFactory

ArithmeticAddCalculation = CalculationFactory('core.arithmetic.add') 

inputs = {
    'code': orm.load_code('docker-bash-add@localhost'),
    'x': orm.Int(4),
    'y': orm.Int(6),
    'metadata': {
        # 'dry_run': True,
        'options': {
            'resources': {
                'num_machines': 1,
                'num_mpiprocs_per_machine': 1
            }
        }
    }
}

_, node = run_get_node(ArithmeticAddCalculation, **inputs)

local code (store in db)

The code can be set up by specifying the executable file in the local machine and uploaded to what ever the computers set in the aiida dababase. This is very useful for example when you have a python script that has special dependencies. You can create the image that contains the libraries and then able to running code on all kinds of machine only with docker installed.

The code setup config example is:

label: "docker-python-add"
description: "doing python add"
input_plugin: "core.arithmetic.add"
on_container: true
on_computer: false
image: "python:3.9.12-buster"
engine_command: "docker run -v $PWD:/workdir:rw -w /workdir {image} sh -c"
code_folder: "/home/jyu/Projects/WP-aiida/docker_python_demo/"
code_rel_path: "eval_sh.py"
use_double_quotes: true
prepend_text: " "
append_text: " "

where the executable eval_sh.py is a dummy python script that execute bash < aiida.in > aiida.out only for demo purpose.

#!/usr/bin/env python
import os

if __name__ == '__main__':
    inputfile = 'aiida.in'
    outputfile = 'aiida.out'
    
    with open(inputfile, 'r') as f:
        bashCommand = '/bin/bash < aiida.in > aiida.out'
        os.system(bashCommand)

Running the code by:

from click import Argument
from aiida import orm
from aiida.engine import run_get_node
from aiida.plugins import CalculationFactory

ArithmeticAddCalculation = CalculationFactory('core.arithmetic.add') 

inputs = {
    'code': orm.load_code('docker-python-add'),
    'x': orm.Int(4),
    'y': orm.Int(6),
    'metadata': {
        # 'dry_run': True,
        'computer': orm.load_computer('localhost'),
        'options': {
            'resources': {
                'num_machines': 1,
                'num_mpiprocs_per_machine': 1
            }
        }
    }
}

_, node = run_get_node(ArithmeticAddCalculation, **inputs)

Just specify the computer and no need to worry about the dependencies.

Sarus and Singularity

The Sarus and Singularity share the same logic the only difference comes from the details of running containerized code which can be specified by engine_command when setting the code.

I create a image jusong/qe-mpich314:v01 with q-e 6.8 compiled with MPICH and able to run pw calculation in the container with full parallelization capability.

code setup config files

  • Sarus
label: "sarus-pw-7.0"
description: "running pw.x in sarus containerized"
input_plugin: "quantumespresso.pw"
on_container: true
on_computer: true
image: "containers4hpc/qe-mpich314:0.1.0"
engine_command: "sarus run --mount=src=$PWD,dst=/workdir,type=bind --workdir=/rundir {image}"
escape_exec_line: False
filepath_executable: "/usr/local/bin/pw.x"
computer: "daint-mc-mr0"
use_double_quotes: true
prepend_text: " "
append_text: " "

For singularity the image is more than a image name but the path of the sif image file. In fact the Sarus also have image download and stored but on specific directory therefore only image name needed.

---
label: "singularity-pw-7.0"
description: "pw.x in singularity container"
default_calc_job_plugin: "quantumespresso.pw"
on_container: true
on_computer: true
image: "/tmp/singtest/qe-mpich314_0.1.0.sif"
engine_command: "singularity exec --bind $PWD:$PWD {image}"
escape_exec_line: False
inner_mpi: False
filepath_executable: "/usr/local/bin/pw.x"
computer: "localhost"
use_double_quotes: true
prepend_text: " "
append_text: " "

conda

First, you need have a conda environment with the executable and mpi installed.
Here I'll show a example of using conda to run pw.x calculation from Quantum ESPRESSO.
Create a new conda environment with conda create -n container-run and install the Quantum ESPRESSO from conda forge conda install -c conda-forge qe.
The executables of QE can be found from <ENV_PATH>/bin.
Then configure the code with the following config yaml file. Notice that for the conda container environment, it is similar to docker in that the MPI command should run from inside the container rather than mapping to the host MPI libraries, so the inner_mpi set to True.
The stdin and stdout is also redirect input and output fully from inside the container (env) which needs to be called by bash -c and inside the single quotes with escape_exec_line set to True.

code config:

---
label: "conda-pw-7.0"
description: "pw.x in canda container"
default_calc_job_plugin: "quantumespresso.pw"
on_container: true
on_computer: true
image: "container-run"
engine_command: "conda run --name {image} bash -c"
escape_exec_line: true
inner_mpi: true
filepath_executable: "/home/jyu/miniconda3/envs/container-run/bin/pw.x"
computer: "localhost"
use_double_quotes: false
prepend_text: " "
append_text: " "

Launching the calculation with the code set.

The typical inputs for the process are all the same as regular code, only need to make sure the special MPI setting is specified for the image.

from aiida import orm, plugins
from aiida.engine import submit

code = orm.load_code('<containerizedo_code_name>@<computer>')
builder = code.get_builder()

structure = orm.load_node(<structure_data_node>)
builder.structure = structure
pseudo_family = orm.load_group('SSSP/1.1/PBE/efficiency')
pseudos = pseudo_family.get_pseudos(structure=structure)
builder.pseudos = pseudos
parameters = {
  'CONTROL': {
    'calculation': 'scf',  # self-consistent field
  },
  'SYSTEM': {
    'ecutwfc': 30.,  # wave function cutoff in Ry
    'ecutrho': 240.,  # density cutoff in Ry
  },
}
builder.parameters = orm.Dict(dict=parameters)

KpointsData = plugins.DataFactory('array.kpoints')
kpoints = KpointsData()
kpoints.set_kpoints_mesh([4,4,4])
builder.kpoints = kpoints

builder.metadata.options.resources = {'num_machines': 1, 'num_mpiprocs_per_machine': 2}

calcjob_node = submit(builder)

@unkcpz unkcpz requested review from ltalirz and sphuber April 29, 2022 13:24
@unkcpz
Copy link
Member Author

unkcpz commented Apr 29, 2022

@ltalirz It would be nice if you can running a more complex docker local code test example. Unfortunately, I don't have the use case and plugins for this type of code.

@sphuber
Copy link
Contributor

sphuber commented May 2, 2022

Thanks @unkcpz . I think having support for containerized codes is great, and so we should push this through. However, I think the negative consequences of the original design of the Code plugin are getting exacerbated by adding yet another subclass. The CLI is becoming very complex with all the various options and branches to follow. I think a redesign of Code would significantly simplify all of this.

I have been wanting to do that for a long time, but was always hesitant given that it is such a fundamental part of aiida-core that we cannot break. Now I have taken the leap and made an attempt to improve things. See #5509 for a description of the problem and #5510 for the first implementation. I think that this new approach should make adding your ContainerizedCode a breeze. Maybe we can meet sometime to go through it and see how your implementation would look after my refactoring, to see if there are further changes that are needed.

@unkcpz
Copy link
Member Author

unkcpz commented May 2, 2022

@sphuber thanks for the quick refactoring of Code class. I agree it is the right direction for this PR, have to say I struggle a lot from the confusing local/remote code. I will have a look at #5510 to see if it makes this one simpler.

@ltalirz
Copy link
Member

ltalirz commented May 4, 2022

Thanks a lot for getting this PR ready @unkcpz and sorry for the late reply - is there any particular docker feature you would like me to test?

Comments from my side:

  • I personally don't care about the local codes. We once did a survey on the AiiDA mailing list about who used these and nobody replied. I never encountered someone who used them. In my view they create more confusion than benefit and we should drop support for them (and you don't need to implement support for this feature in containerized codes)
  • As to where to run a docker test, I would suggest to add it as a "system tests". See e.g.
    def test_leak_ssh_calcjob():
    """Test whether running a CalcJob over SSH leaks memory.
    Note: This relies on the 'slurm-ssh' computer being set up.
    """
    code = orm.Code(
    input_plugin_name='core.arithmetic.add', remote_computer_exec=[orm.load_computer('slurm-ssh'), '/bin/bash']
    )
    inputs = {'x': orm.Int(1), 'y': orm.Int(2), 'code': code}
    run_finished_ok(ArithmeticAddCalculation, **inputs)
    # check that no reference to the process is left in memory
    # some delay is necessary in order to allow for all callbacks to finish
    process_instances = get_instances(processes.Process, delay=0.2)
    assert not process_instances, f'Memory leak: process instances remain in memory: {process_instances}'
    for a test that runs a job on a slurm docker container created by github actions (you would not need that container, your test would create its own container)

@unkcpz
Copy link
Member Author

unkcpz commented May 4, 2022

I would suggest to add it as a "system tests".

@ltalirz thanks for the suggestion.

Me and Seb will meet tomorrow to settle the issues and I'll then try to finish this PR, maybe without implementing local code for containerized code, in order to keep the first introduction of containerized code concept simple and useful.

@sphuber sphuber mentioned this pull request May 19, 2022
@unkcpz unkcpz force-pushed the design/containerized-code branch 3 times, most recently from 9a2bde6 to 89efeac Compare May 24, 2022 15:45
@unkcpz
Copy link
Member Author

unkcpz commented May 24, 2022

@sphuber I reimplement this with new code, it really simplifies the implementation a lot. But there are still some open questions about the actual use of this containerized code.

  1. Although I start this task with lots of tests on SARUS and Singularity, after using Singularity also on my local machine and some other deployments of Singularity, I find it is not easy to uniform all kinds of engine commands. But the good thing here is in the current implementation, user has all flexibility to set engine command.
  2. The executable of docker, sarus, singularity hard to be validate as filepath executable of Installed Code using verdi code test. Since the like modules load information is not easy to separate from prepned_text. And even the container executable itself is in the engine command. I can add an extra option for it but I think it will increase the burden to code user.
  3. It takes time to download image first time in docker run. And it will fail for Singularity and Sarus if image is not set which require user to fetch it manually. There is solution that we can add a command to do this in remote machine since the commands to fetch image are all similar.

About this PR, if @sphuber good with current code structure, I'll go ahead with add CI test and documentation. And running some production run on my wrapped up pseudopotential generator code in container.

Also pinning @giovannipizzi for comment.

@unkcpz
Copy link
Member Author

unkcpz commented May 24, 2022

Another thought about the verdi code create, it accepts the entry point of codes, core.code.installed and core.code.portable. Is that much simpler to hide the duplicate information of core.code since it must be the code?
I can image one reason that for other code type implementable not from aiida core this will be used.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @unkcpz . Code is looking good, but have some minor changes/simplifications. It would indeed be good to have some examples of how to setup and run these with docker, singularity and sarus.

I think the verdi code test question can be left for later. This was only recently added and only really does something for InstalledCode. Don't think we need it for the containerized codes now, anyway they are a new experimental feature that will require some testing.

Comment on lines 185 to 189
try:
handle.write(code.base.repository.get_object_content(filename, mode='rb'))
except:
# raise TypeError('directory not supperted.')
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, I think it would maybe be nicer to use:

from aiida.repository import FileType
for obj in filter(code.base.repository.list_objects(), lambda o: o.file_type == FileType.FILE):
    with NamedTemporaryFile(mode='wb+') as handle:
        handle.write(code.base.repository.get_object_content(filename, mode='rb'))

Actually, thinking about this, this code is wrong. It only copies top-level files, but doesn't recurse into directories. Probably there is no test that actually tests this case. We should actually use code.base.repository.walk and iterate over all the files and copy those. Maybe I will quickly fix this in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I put the pass here for a further look into it and forget about it. I think what I expected here is as you said to walk through the directory and copy all from inside. This will then also support the that filepath_executable of portable code is a real relative path of code inside a subfolder.

@@ -174,14 +175,18 @@ def upload_calculation(
# Still, beware! The code file itself could be overwritten...
# But I checked for this earlier.
for code in input_codes:
if isinstance(code, PortableCode):
if isinstance(code, (PortableCode, PortableContainerizedCode)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(code, (PortableCode, PortableContainerizedCode)):
if isinstance(code, PortableCode):

Since PortableContainerizedCode is actually a subclass, you don't have to specifically add it.

@@ -611,7 +620,8 @@ def presubmit(self, folder: Folder) -> CalcInfo:
)
)

if isinstance(code, PortableCode) and str(code.filepath_executable) in folder.get_content_list():
if isinstance(code, (PortableCode, PortableContainerizedCode)) and str(code.filepath_executable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(code, (PortableCode, PortableContainerizedCode)) and str(code.filepath_executable
if isinstance(code, PortableCode) and str(code.filepath_executable

Comment on lines 709 to 735
this_code = load_node(
code_info.code_uuid,
sub_classes=(Code, InstalledCode, PortableCode, InstalledContainerizedCode, PortableContainerizedCode)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this_code = load_node(
code_info.code_uuid,
sub_classes=(Code, InstalledCode, PortableCode, InstalledContainerizedCode, PortableContainerizedCode)
)
this_code = load_code(code_info.code_uuid)

@@ -715,10 +728,20 @@ def presubmit(self, folder: Folder) -> CalcInfo:
else:
prepend_cmdline_params = []

escape_exec_line = False
if isinstance(this_code, (InstalledContainerizedCode, PortableContainerizedCode)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(this_code, (InstalledContainerizedCode, PortableContainerizedCode)):
if isinstance(this_code, ContainerizedCode):

Since they share this base class, why not use that?

'engine_command': {
'required': True,
'prompt': 'Engine command',
'help': 'The command to run container must contain {image} for image.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'help': 'The command to run container must contain {image} for image.',
'help': 'The command to run the container. It must contain the placeholder {image} that will be replaced with the `image_name`.',

'help': 'The command to run container must contain {image} for image.',
'type': click.STRING,
},
'image': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'image': {
'image_name': {

'required': True,
'type': click.STRING,
'prompt': 'Image',
'help': 'Image of the container to run executable.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'help': 'Image of the container to run executable.',
'help': 'Name of the image container in which to the run the executable.',

aiida/orm/nodes/data/code/containerized.py Outdated Show resolved Hide resolved
Comment on lines 10 to 16
"""Data plugin representing an executable code on a remote computer.

This plugin should be used if an executable is pre-installed on a computer. The ``InstalledCode`` represents the code by
storing the absolute filepath of the relevant executable and the computer on which it is installed. The computer is
represented by an instance of :class:`aiida.orm.computers.Computer`. Each time a :class:`aiida.engine.CalcJob` is run
using an ``InstalledCode``, it will run its executable on the associated computer.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Data plugin representing an executable code on a remote computer.
This plugin should be used if an executable is pre-installed on a computer. The ``InstalledCode`` represents the code by
storing the absolute filepath of the relevant executable and the computer on which it is installed. The computer is
represented by an instance of :class:`aiida.orm.computers.Computer`. Each time a :class:`aiida.engine.CalcJob` is run
using an ``InstalledCode``, it will run its executable on the associated computer.
"""
"""Data plugins representing an executable code to be run in a container.
These plugins are directly analogous to the ``InstalledCode`` and ``PortableCode`` plugins, except that the executable
is present inside of a container. For the ``InstalledContainerizedCode`` the executable is expected to already be
present inside a container that is available on the target computer. With the ``PortableContainerizedCode`` plugin, the
target executable will be stored in AiiDA's storage, just as with the ``PortableCode`` and when launched, the code will
be copied inside the container on the target computer and run inside the container.
"""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I forget about this part.

@sphuber
Copy link
Contributor

sphuber commented May 25, 2022

To fix the aiida/orm/nodes/data/code/containerized.py:48:15: E1101: Instance of 'Containerized' has no 'base' member (no-member) pylint messages, maybe the best fix is to have Containerized subclass AbstractCode and rename it ContainerizedCode, i.e.:

class ContainerizedCode(AbstractCode)

@unkcpz unkcpz force-pushed the design/containerized-code branch 3 times, most recently from 1d9aa46 to c242647 Compare May 25, 2022 15:57
@unkcpz
Copy link
Member Author

unkcpz commented May 25, 2022

maybe the best fix is to have Containerized subclass AbstractCode and rename it ContainerizedCode, i.e.:

I was swinging between the name Containerized and ContainerizedCode, since this class is only for multiple inheritances but is not actually used as a code type.

@sphuber thanks for reviewing. I will not re-request review at the moment. I will keep on add CI test and documentation.

@unkcpz unkcpz force-pushed the design/containerized-code branch from 9ac7e18 to b3a79b1 Compare May 27, 2022 10:15
@giovannipizzi
Copy link
Member

Just a comment on this:

It takes time to download image first time in docker run. And it will fail for Singularity and Sarus if image is not set which require user to fetch it manually

I think we can (for now) assume that the user will have already fetched the image on the computer (a bit like we assume the code has been already compiled). This, of course, needs to be documented.
Of course we can add utility functions to fetch/pull the container (e.g. verdi code pull xx@yy), even from the python interface, or think to optimise automatic pulling later (e.g. in the engine, before submitting, if the code is containerised it will first check and pull if needed). But I don't think this feature is needed in a first implementation, what do you think?

@ltalirz
Copy link
Member

ltalirz commented May 30, 2022

But I don't think this feature is needed in a first implementation, what do you think?

I agree. Documenting how to use the Code's prepend_text will address the use case.

@unkcpz unkcpz force-pushed the design/containerized-code branch 5 times, most recently from b5f8191 to 862f68e Compare May 31, 2022 07:28
@unkcpz
Copy link
Member Author

unkcpz commented May 31, 2022

As to where to run a docker test, I would suggest to add it as a "system tests".

@ltalirz thanks for the advice, but I think what I need here is in github CI I want to run docker run ... from a bash script, but the ubuntu-latest container used in github action don't have docker installed. I find an action to setup docker https://github.com/marketplace/actions/setup-docker. But it breaks the PostgreSQL we setup with service somehow, the error message is here https://github.com/aiidateam/aiida-core/runs/6666097452?check_suite_focus=true.

But I don't think this feature is needed in a first implementation, what do you think?
I agree. Documenting how to use the Code's prepend_text will address the use case.

@giovannipizzi sorry for the delay. Yes, I agree. It is true this also a problem for installed code and validate by verdi code test, it only can do the check by that and not easy just with current information for containerized code setup.

@unkcpz
Copy link
Member Author

unkcpz commented Jul 25, 2022

@chrisjsewell thanks for the comment and sorry for the late reply, I was on vacation last week.

because e.g. you can't prefix anything before the MPI commands

Yes, I was considering this but then in the coding week, we decided to start from simple without changing too much of the original design. The other reason is that for the singularity and sarus which are the "recommended" containerized technology for HPC are compatible with calling MPI from host library. The quickest way to realize what you mentioned to this PR I can imagine is to add one more code setup flag that can swap and inject mpi command to in front of the executable inside the container. Or add new options as new 'placeholder' to control the command. However, I think it is better to have a new design for the related code for Calcjob.presubmit as you said.

I am also wondering what happens if you try to use a docker container, but also have it set up for more than one MPI process? Is it going to try to do mpirun -np 4 docker run ... which is surely not right?
Would you not want to make it possible to do e.g. docker run ... mpirun -np {tot_num_mpiprocs}?

Sadly, it is not supported in docker either. I think I mentioned in docs somewhere said only serial executable is supported for docker.

I will take a look at if it is possible to get this done by what you suggested having extra placeholders.

@chrisjsewell
Copy link
Member

Yes, I was considering this but then in the coding week, we decided to start from simple without changing too much of the original design.

Yep that's fair, although I would like to make sure that this use case etc will be possible going forward, without any painful deprecations 😅 As, once it's released and people start using it, obviously it's difficult to retroactively change

will take a look at if it is possible to get this done by what you suggested having extra placeholders.

thanks 😄

@unkcpz unkcpz changed the title containerized code: setting and running containerized code (docker, sarus, singularity) containerized code: setting and running containerized code (docker, sarus, singularity, conda) Aug 31, 2022
@unkcpz unkcpz force-pushed the design/containerized-code branch 2 times, most recently from 070de20 to 9d59d03 Compare August 31, 2022 10:01
The containerized code is allowed to setting through cmdline and used in calculation.
The containerized engines supported and tested are docker, Sarus and Singularity.
It is shown below how to configure the code and running calculation.
I will then move the example below into documentation.

Test command line options for containerized code

docstring added

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

review

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
@unkcpz unkcpz force-pushed the design/containerized-code branch 2 times, most recently from 3cefdd6 to 22051c0 Compare August 31, 2022 10:18
pytest fix

Add more docs in data_types

Use localhost to test add-docker container
@unkcpz
Copy link
Member Author

unkcpz commented Aug 31, 2022

I add two optional parameters for containerized code setting, the inner_mpi a bool which set whether put the mpi arguments right before the executable commands and mpi_args which if not empty will override mpi arguments set in the computer setup otherwise used the computer one as default. Here is an example (added as the unit test):

    engine_command = """conda run --name {image}"""
    override_mpirun_command = 'inner_mpirun -np {tot_num_mpiprocs}'
    containerized_code = orm.InstalledContainerizedCode(
        default_calc_job_plugin='core.arithmetic.add',
        filepath_executable='/bin/bash',
        engine_command=engine_command,
        image='myenv',
        inner_mpi=True,
        mpi_args=override_mpirun_command,
        computer=aiida_localhost,
        escape_exec_line=False,
    ).store()

Will generate the run line as following:

"conda" "run" "--name" "myenv" 'inner_mpirun' '-np' '1' '/bin/bash' '--version' '-c' < "aiida.in" > "aiida.out" 2> "aiida.err"

It is not hard to add such a thing when more features are needed, but as @chrisjsewell said lots of code coupled at presubmit. I think it requires refactoring on this part of the code and could be a good idea to use jinja template for the job script. I'll have an investigation and open a discussion or arrange a meeting to discuss if needed.

For the implementation itself, I think this PR is ready to have another review. If @chrisjsewell compromise with such nasty workaround for mpirun inside conda/docker, I'll add the cmdline options inner_mpi and mpi_args to the code setup.

@unkcpz unkcpz mentioned this pull request Sep 5, 2022
3 tasks
@unkcpz unkcpz linked an issue Sep 5, 2022 that may be closed by this pull request
3 tasks
@sphuber
Copy link
Contributor

sphuber commented Oct 18, 2022

Superseded by #5667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Containerized codes
5 participants