diff --git a/manual/main/en/html/_images/task_view.png b/manual/main/en/html/_images/task_view.png index 0ca4be0..44d038a 100644 Binary files a/manual/main/en/html/_images/task_view.png and b/manual/main/en/html/_images/task_view.png differ diff --git a/manual/main/en/html/_sources/moller/command/index.rst.txt b/manual/main/en/html/_sources/moller/command/index.rst.txt index 1b846c6..7069b0f 100644 --- a/manual/main/en/html/_sources/moller/command/index.rst.txt +++ b/manual/main/en/html/_sources/moller/command/index.rst.txt @@ -57,7 +57,7 @@ DESCRIPTION: - ``list_file`` - specifies the file that contains list of job directories. If this file is not specified, the list will be obtained from the logfile of the batch job ``log_{task}.dat``. + specifies the file that contains list of job directories. If this file is not specified, the list will be obtained from the logfile of the batch job ``stat_{task}.dat``. - ``-o``, ``--output`` ``output_file`` @@ -91,5 +91,5 @@ DESCRIPTION: FILES: - When the programs are executed concurrently using the job script generated by ``moller``, the status of the tasks are written in log files ``log_{task}.dat``. ``moller_status`` reads these log files and makes a summary. + When the programs are executed concurrently using the job script generated by ``moller``, the status of the tasks are written in log files ``stat_{task}.dat``. ``moller_status`` reads these log files and makes a summary. diff --git a/manual/main/en/html/_sources/moller/tutorial/basic.rst.txt b/manual/main/en/html/_sources/moller/tutorial/basic.rst.txt index d1bd8b4..c2e0f70 100644 --- a/manual/main/en/html/_sources/moller/tutorial/basic.rst.txt +++ b/manual/main/en/html/_sources/moller/tutorial/basic.rst.txt @@ -78,7 +78,7 @@ A list of jobs is to be created. ``moller`` is designed so that each job is exec .. code-block:: bash - $ /usr/bin/ls -1d > list.dat + $ /usr/bin/ls -1d * > list.dat In this tutorial, an utility script ``make_inputs.sh`` is enclosed which generates datasets and a list file. @@ -128,3 +128,34 @@ An example of the output is shown below: where "o" corresponds to a task that has been completed successfully, "x" corresponds to a failed task, "-" corresponds to a skipped task because the previous task has been terminated with errors, and "." corresponds to a task yet unexecuted. In the above example, the all tasks have been completed successfully. + + +Rerun failed tasks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If a task fails, the subsequent tasks within the job will not be executed. +The following is an example of job status in which each task fails by 10% change. + +.. literalinclude:: ../../../../tutorial/moller/reference/status_failed.txt + +There, the jobs of dataset_0003 and dataset_0004 failed at task1, and the subsequent task2 and task3 were not executed. The other jobs were successful at task1, and proceeded to task2. +In this way, each job is executed independently of other jobs. + +Users can rerun the failed tasks by submitting the batch job with the retry option. +For SLURM job scheduler (e.g. used in ISSP system B), resubmit the job as follows: + +.. code-block:: bash + + $ sbatch job.sh --retry list.dat + +For PBS job scheduler (e.g. used in ISSP system C), edit the job script so that the line ``retry=0`` is replaced by ``retry=1``, and resubmit the job. + +.. literalinclude:: ../../../../tutorial/moller/reference/status_retry.txt + +The tasks that have failed will be executed in the second run. +In the above example, the task1 for dataset_0003 was successful, but the task2 failed. +For dataset_0004, task1, task2, and task3 were successfully executed. +For the jobs of datasets whose tasks have already finished successfully, the second run will not do anything. + +N.B. the list file must not be modified on the rerun. The jobs are managed according to the order of entries in the list file, and therefore, if the order is changed, the jobs will not be executed properly. + diff --git a/manual/main/en/html/_static/task_view.pdf b/manual/main/en/html/_static/task_view.pdf index e85cbb6..3af0603 100644 Binary files a/manual/main/en/html/_static/task_view.pdf and b/manual/main/en/html/_static/task_view.pdf differ diff --git a/manual/main/en/html/_static/task_view.png b/manual/main/en/html/_static/task_view.png index 0ca4be0..44d038a 100644 Binary files a/manual/main/en/html/_static/task_view.png and b/manual/main/en/html/_static/task_view.png differ diff --git a/manual/main/en/html/genindex.html b/manual/main/en/html/genindex.html index c598ec1..00dd17b 100644 --- a/manual/main/en/html/genindex.html +++ b/manual/main/en/html/genindex.html @@ -93,7 +93,7 @@

Quick search

| Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 diff --git a/manual/main/en/html/index.html b/manual/main/en/html/index.html index 55709a7..39cc168 100644 --- a/manual/main/en/html/index.html +++ b/manual/main/en/html/index.html @@ -107,7 +107,7 @@

Quick search

| Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | Quick search | Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | Quick search | Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | Quick search | Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | 4.2. moller_statusspecifies the job description file for moller.

  • list_file

    -

    specifies the file that contains list of job directories. If this file is not specified, the list will be obtained from the logfile of the batch job log_{task}.dat.

    +

    specifies the file that contains list of job directories. If this file is not specified, the list will be obtained from the logfile of the batch job stat_{task}.dat.

  • -o, --output output_file

    specifies the output file name. If it is omitted, the result is written to the standard output.

    @@ -119,7 +119,7 @@

    4.2. moller_status

    FILES:

    -

    When the programs are executed concurrently using the job script generated by moller, the status of the tasks are written in log files log_{task}.dat. moller_status reads these log files and makes a summary.

    +

    When the programs are executed concurrently using the job script generated by moller, the status of the tasks are written in log files stat_{task}.dat. moller_status reads these log files and makes a summary.

    @@ -192,7 +192,7 @@

    Quick search

    | Powered by
    Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | Quick search

    | Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | Quick search | Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | Generate batch job script

    Create list file

    A list of jobs is to be created. moller is designed so that each job is executed within a directory prepared for the job with the job name. The job list can be created, for example, by the following command:

    -
    $ /usr/bin/ls -1d > list.dat
    +
    $ /usr/bin/ls -1d * > list.dat
     

    In this tutorial, an utility script make_inputs.sh is enclosed which generates datasets and a list file.

    @@ -184,6 +184,72 @@

    Check status +

    Rerun failed tasks

    +

    If a task fails, the subsequent tasks within the job will not be executed. +The following is an example of job status in which each task fails by 10% change.

    +
    | job          | task1   | task2   | task3   |
    +|--------------|---------|---------|---------|
    +| dataset_0001 | o       | o       | o       |
    +| dataset_0002 | o       | x       | -       |
    +| dataset_0003 | x       | -       | -       |
    +| dataset_0004 | x       | -       | -       |
    +| dataset_0005 | o       | o       | o       |
    +| dataset_0006 | o       | o       | o       |
    +| dataset_0007 | o       | x       | -       |
    +| dataset_0008 | o       | o       | o       |
    +| dataset_0009 | o       | o       | x       |
    +| dataset_0010 | o       | o       | o       |
    +| dataset_0011 | o       | o       | o       |
    +| dataset_0012 | o       | o       | o       |
    +| dataset_0013 | o       | x       | -       |
    +| dataset_0014 | o       | o       | o       |
    +| dataset_0015 | o       | o       | o       |
    +| dataset_0016 | o       | o       | o       |
    +| dataset_0017 | o       | o       | o       |
    +| dataset_0018 | o       | o       | o       |
    +| dataset_0019 | o       | o       | o       |
    +| dataset_0020 | o       | o       | o       |
    +
    +
    +

    There, the jobs of dataset_0003 and dataset_0004 failed at task1, and the subsequent task2 and task3 were not executed. The other jobs were successful at task1, and proceeded to task2. +In this way, each job is executed independently of other jobs.

    +

    Users can rerun the failed tasks by submitting the batch job with the retry option. +For SLURM job scheduler (e.g. used in ISSP system B), resubmit the job as follows:

    +
    $ sbatch job.sh --retry list.dat
    +
    +
    +

    For PBS job scheduler (e.g. used in ISSP system C), edit the job script so that the line retry=0 is replaced by retry=1, and resubmit the job.

    +
    | job          | task1   | task2   | task3   |
    +|--------------|---------|---------|---------|
    +| dataset_0001 | o       | o       | o       |
    +| dataset_0002 | o       | o       | x       |
    +| dataset_0003 | o       | x       | -       |
    +| dataset_0004 | o       | o       | o       |
    +| dataset_0005 | o       | o       | o       |
    +| dataset_0006 | o       | o       | o       |
    +| dataset_0007 | o       | o       | o       |
    +| dataset_0008 | o       | o       | o       |
    +| dataset_0009 | o       | o       | o       |
    +| dataset_0010 | o       | o       | o       |
    +| dataset_0011 | o       | o       | o       |
    +| dataset_0012 | o       | o       | o       |
    +| dataset_0013 | o       | o       | o       |
    +| dataset_0014 | o       | o       | o       |
    +| dataset_0015 | o       | o       | o       |
    +| dataset_0016 | o       | o       | o       |
    +| dataset_0017 | o       | o       | o       |
    +| dataset_0018 | o       | o       | o       |
    +| dataset_0019 | o       | o       | o       |
    +| dataset_0020 | o       | o       | o       |
    +
    +
    +

    The tasks that have failed will be executed in the second run. +In the above example, the task1 for dataset_0003 was successful, but the task2 failed. +For dataset_0004, task1, task2, and task3 were successfully executed. +For the jobs of datasets whose tasks have already finished successfully, the second run will not do anything.

    +

    N.B. the list file must not be modified on the rerun. The jobs are managed according to the order of entries in the list file, and therefore, if the order is changed, the jobs will not be executed properly.

    + @@ -256,7 +322,7 @@

    Quick search

    | Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | Quick search | Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | Quick search | Powered by Sphinx 7.2.6 - & Alabaster 0.7.15 + & Alabaster 0.7.16 | 3. TutorialCreate list file
  • Run batch job
  • Check status
  • +
  • Rerun failed tasks
  • 3.2. Example for moller calculation with HPhi
  • 3.2. HPhi による moller 計算の例