From 045182afb802e1c62bf6ff9c9c9622f0a14cac1d Mon Sep 17 00:00:00 2001 From: astro-friedel Date: Tue, 10 Dec 2024 08:38:14 -0600 Subject: [PATCH 1/2] docs cleanup --- CONTRIBUTING.rst | 156 ++++++----- README.rst | 34 ++- docs/devguide/packaging.rst | 4 +- docs/devguide/roadmap.rst | 83 ++++-- docs/faq.rst | 170 ++++++------ docs/historical/performance.rst | 88 +++---- docs/index.rst | 32 ++- docs/quickstart.rst | 147 +++++------ docs/userguide/apps.rst | 153 ++++++----- docs/userguide/checkpoints.rst | 196 +++++++------- docs/userguide/configuring.rst | 414 ++++++++++++++++-------------- docs/userguide/data.rst | 279 ++++++++++---------- docs/userguide/exceptions.rst | 113 ++++---- docs/userguide/execution.rst | 346 ++++++++++++------------- docs/userguide/futures.rst | 147 ++++++----- docs/userguide/glossary.rst | 147 ++++++++--- docs/userguide/joins.rst | 149 +++++------ docs/userguide/lifted_ops.rst | 46 ++-- docs/userguide/modularizing.rst | 26 +- docs/userguide/monitoring.rst | 74 +++--- docs/userguide/mpi_apps.rst | 65 +++-- docs/userguide/overview.rst | 340 +++++++++++------------- docs/userguide/parsl_perf.rst | 25 +- docs/userguide/plugins.rst | 105 ++++---- docs/userguide/usage_tracking.rst | 66 +++-- docs/userguide/workflow.rst | 88 ++++--- 26 files changed, 1803 insertions(+), 1690 deletions(-) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 52888d0e2e..746ac9bbdf 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -1,15 +1,20 @@ Where to start -------------- -We eagerly welcome contributions of any type (e.g., bug fixes, new features, reporting issues, documentation, etc). If you're looking for a good place to get started you might like to peruse our current Git issues (those marked with `help wanted `_ are a good place to start). +We eagerly welcome contributions of any type (e.g., bug fixes, new features, reporting issues, +documentation, etc). If you're looking for a good place to get started you might like to peruse our +current Git issues (those marked with +`help wanted `_ are a good place to start). -Please be aware of `Parsl's Code of Conduct `_. +Please be aware of `Parsl's Code of Conduct `_. -If you are not familiar with GitHub pull requests, the main mechanism to contribute changes to our code, there is `documentation available `_. +If you are not familiar with GitHub pull requests, the main mechanism to contribute changes to our +code, there is `documentation available `_. The best places to ask questions or discuss development activities are: -* in our Slack's `#parsl-hackers channel `_. You can `join our Slack here `_. +* in our Slack's `#parsl-hackers channel `_. +You can `join our Slack here `_. * using `GitHub issues `_. @@ -20,111 +25,119 @@ Coding conventions Formatting conventions ====================== -Parsl code should adhere to Python `PEP-8 `_. This is enforced in CI (with some exceptions). You can also run this test yourself using ``make flake8``. +Parsl code should adhere to Python `PEP-8 `_. This is enforced in +CI (with some exceptions). You can also run this test yourself using ``make flake8``. + Naming conventions ================== -The following convention should be followed: ClassName, ExceptionName, GLOBAL_CONSTANT_NAME, and lowercase_with_underscores for everything else. +The following convention should be followed: ClassName, ExceptionName, GLOBAL_CONSTANT_NAME, and +lowercase_with_underscores for everything else. + Version increments ================== -Parsl follows the `calendar versioning scheme `_ with ``YYYY.MM.DD`` numbering scheme for versions. -This scheme was chosen following a switch from ad-hoc versioning and manual release processes to an automated weekly process. -Releases are pushed from github actions to PyPI and will be picked up automatically by Conda. -Manual packaging instructions are included in the +Parsl follows the `calendar versioning scheme `_ with ``YYYY.MM.DD`` +numbering scheme for versions. This scheme was chosen following a switch from ad-hoc versioning and +manual release processes to an automated weekly process.Releases are pushed from github actions to +PyPI and will be picked up automatically by Conda. Manual packaging instructions are included in the `packaging docs `_ + Documentation -================== +============= -Classes should be documented following the `NumPy/SciPy `_ -style. A concise summary is available `here `_. User and developer documentation is auto-generated and made available on +Classes should be documented following the +`NumPy/SciPy `_ +style. A concise summary is available +`here `_. User and +developer documentation is auto-generated and made available on `ReadTheDocs `_. + Testing ======= -Parsl uses ``pytest`` to run most tests. All tests should be placed in -the ``parsl/tests`` directory. +Parsl uses ``pytest`` to run most tests. All tests should be placed in the ``parsl/tests`` directory. -There are two broad groups of tests: those which must run with a -specific configuration, and those which should work with any -configuration. +There are two broad groups of tests: those which must run with a specific configuration, and those +which should work with any configuration. -Tests which should run with with any configuration live under -themed directories ``parsl/tests/test*/`` and should be named ``test*.py``. -They can be run with any configuration, by specifying ``--config CONFIGPATH`` -where CONFIGPATH is a path to a ``.py`` file exporting a parsl configuration -object named ``config``. The parsl-specific test fixtures will ensure -a suitable DFK is loaded with that configuration for each test. +Tests which should run with with any configuration live under themed directories +``parsl/tests/test*/`` and should be named ``test*.py``. They can be run with any configuration, by +specifying ``--config CONFIGPATH`` where CONFIGPATH is a path to a ``.py`` file exporting a parsl +configuration object named ``config``. The parsl-specific test fixtures will ensure a suitable DFK +is loaded with that configuration for each test. -Tests which require their own specially configured DFK, or no DFK at all, -should be labelled with ``@pytest.mark.local`` and can be run with -``--config local``. -Provide the special configuration creating a ``local_config`` function -that returns the required configuration in that test file. -Or, provide both a ``local_setup`` function that loads the proper configuration -and ``local_teardown`` that stops parsl. +Tests which require their own specially configured DFK, or no DFK at all, should be labelled with +``@pytest.mark.local`` and can be run with ``--config local``. Provide the special configuration +creating a ``local_config`` function that returns the required configuration in that test file. Or, +provide both a ``local_setup`` function that loads the proper configuration and ``local_teardown`` +that stops parsl. -There is more fine-grained enabling and disabling of tests within the -above categories: +There is more fine-grained enabling and disabling of tests within the above categories: -A pytest marker of ``cleannet`` (for clean network) can be used to select -or deselect tests which need a very clean network (for example, for tests -making FTP transfers). When the test environment (github actions) does not -provide a sufficiently clean network, run all tests with ``-k "not cleannet"`` to -disable those tests. +A pytest marker of ``cleannet`` (for clean network) can be used to select or deselect tests which +need a very clean network (for example, for tests making FTP transfers). When the test environment +(github actions) does not provide a sufficiently clean network, run all tests with +``-k "not cleannet"`` to disable those tests. -Some other markers are available but unused in testing; -see ``pytest --markers parsl/tests/`` for more details. +Some other markers are available but unused in testing; see ``pytest --markers parsl/tests/`` for +more details. A specific test in a specific file can be run like this::: $ pytest test_python_apps/test_basic.py::test_simple -A timeout can be added to test runs using a pytest parameter such as -``--timeout=60`` +A timeout can be added to test runs using a pytest parameter such as ``--timeout=60`` -Many tests are marked with ``@pytest.mark.skip`` for reasons usually -specified directly in the annotation - generally because they are broken -in one way or another. +Many tests are marked with ``@pytest.mark.skip`` for reasons usually specified directly in the +annotation - generally because they are broken in one way or another. Coverage testing ================ -There is also some coverage testing available. The CI by default records -coverage for most of the tests that it runs and outputs a brief report -at the end of each CI run. This is purely informational and a Lack of -coverage won't produce a CI failure. +There is also some coverage testing available. The CI by default records coverage for most of the +tests that it runs and outputs a brief report at the end of each CI run. This is purely +informational and a Lack of coverage won't produce a CI failure. + +It is possible to produce a more detailed coverage report on your development machine: make sure you +have no `.coverage` file, run the test commands as shown in `.github/workflows/ci.yaml`, and then +run `coverage report` to produce the summary as seen in CI, or run `coverage html` to produce +annotated source code in the `htmlcov/` subdirectory. This will show, line by line, if each line of +parsl source code was executed during the coverage test. -It is possible to produce a more detailed coverage report on your -development machine: make sure you have no `.coverage` file, run the -test commands as shown in `.github/workflows/ci.yaml`, and then run -`coverage report` to produce the summary as seen in CI, or run -`coverage html` to produce annotated source code in the `htmlcov/` -subdirectory. This will show, line by line, if each line of parsl -source code was executed during the coverage test. Development Process ------------------- -If you are a contributor to Parsl at large, we recommend forking the repository and submitting pull requests from your fork. -The `Parsl development team `_ has the additional privilege of creating development branches on the repository. -Parsl development follows a common pull request-based workflow similar to `GitHub flow `_. That is: +If you are a contributor to Parsl at large, we recommend forking the repository and submitting pull +requests from your fork. The `Parsl development team `_ has the +additional privilege of creating development branches on the repository. Parsl development follows a +common pull request-based workflow similar to `GitHub flow `_. +That is: -* every development activity (except very minor changes, which can be discussed in the PR) should have a related GitHub issue -* all development occurs in branches (named with a short descriptive name, for example, `add-globus-transfer-#1`) +* every development activity (except very minor changes, which can be discussed in the PR) should + have a related GitHub issue +* all development occurs in branches (named with a short descriptive name, for example, + `add-globus-transfer-#1`) * the master branch is always stable * development branches should include tests for added features -* development branches should be tested after being brought up-to-date with the master (in this way, what is being tested is what is actually going into the code; otherwise unexpected issues from merging may come up) +* development branches should be tested after being brought up-to-date with the master (in this way, + what is being tested is what is actually going into the code; otherwise unexpected issues from + merging may come up) * branches what have been successfully tested are merged via pull requests (PRs) * PRs should be used for review and discussion -* PRs should be reviewed in a timely manner, to reduce effort keeping them synced with other changes happening on the master branch +* PRs should be reviewed in a timely manner, to reduce effort keeping them synced with other changes + happening on the master branch + +Git commit messages should include a single summary sentence followed by a more explanatory +paragraph. Note: all commit messages should reference the GitHub issue to which they relate. A nice +discussion on the topic can be found `here `_. -Git commit messages should include a single summary sentence followed by a more explanatory paragraph. Note: all commit messages should reference the GitHub issue to which they relate. A nice discussion on the topic can be found `here `_. :: Implemented Globus data staging support @@ -134,22 +147,31 @@ Git commit messages should include a single summary sentence followed by a more destination Parsl will use the Globus transfer service to move data to the compute host. Fixes #-1. + Git hooks --------- -Developers may find it useful to setup a pre-commit git hook to automatically lint and run tests. This is a script which is run before each commit. For example:: +Developers may find it useful to setup a pre-commit git hook to automatically lint and run tests. +This is a script which is run before each commit. For example:: $ cat ~/parsl/.git/hooks/pre-commit #!/bin/sh make lint flake8 mypy local_thread_test + Project documentation --------------------- -All project documentation is written in reStructuredText. `Sphinx `_ is used to generate the HTML documentation from the rst documentation and structured docstrings in Parsl code. Project documentation is built automatically and added to the `Parsl documentation `_. +All project documentation is written in reStructuredText. `Sphinx `_ is used +to generate the HTML documentation from the rst documentation and structured docstrings in Parsl +code. Project documentation is built automatically and added to the +`Parsl documentation `_. + Credit and Contributions ------------------------ -Parsl wants to make sure that all contributors get credit for their contributions. When you make your first contribution, it should include updating the codemeta.json file to include yourself as a contributor to the project. +Parsl wants to make sure that all contributors get credit for their contributions. When you make +your first contribution, it should include updating the codemeta.json file to include yourself as a +contributor to the project. diff --git a/README.rst b/README.rst index a8254e2e40..9a43c39a9b 100644 --- a/README.rst +++ b/README.rst @@ -6,9 +6,9 @@ Parsl extends parallelism in Python beyond a single computer. You can use Parsl `just like Python's parallel executors `_ -but across *multiple cores and nodes*. -However, the real power of Parsl is in expressing multi-step workflows of functions. -Parsl lets you chain functions together and will launch each function as inputs and computing resources are available. +but across *multiple cores and nodes*. However, the real power of Parsl is in expressing multi-step +workflows of functions. Parsl lets you chain functions together and will launch each function as +inputs and computing resources are available. .. code-block:: python @@ -37,8 +37,10 @@ Parsl lets you chain functions together and will launch each function as inputs assert future.result() == 7 -Start with the `configuration quickstart `_ to learn how to tell Parsl how to use your computing resource, -then explore the `parallel computing patterns `_ to determine how to use parallelism best in your application. +Start with the `configuration quickstart `_ +to learn how to tell Parsl how to use your computing resource, then explore the +`parallel computing patterns `_ to +determine how to use parallelism best in your application. .. |licence| image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg :target: https://github.com/Parsl/parsl/blob/master/LICENSE @@ -65,7 +67,7 @@ then explore the `parallel computing patterns `_ +Detailed information about setting up Jupyter with Python is available +`here `_ -Note: Parsl uses an opt-in model to collect usage statistics for reporting and improvement purposes. To understand what stats are collected and enable collection please refer to the `usage tracking guide `__ +Note: Parsl uses an opt-in model to collect usage statistics for reporting and improvement purposes. +To understand what stats are collected and enable collection please refer to the +`usage tracking guide `__ Documentation ============= The complete parsl documentation is hosted `here `_. -The Parsl tutorial is hosted on live Jupyter notebooks `here `_ +The Parsl tutorial is hosted on live Jupyter notebooks +`here `_ For Developers @@ -117,14 +123,18 @@ For Developers Requirements ============ -Parsl is supported in Python 3.9+. Requirements can be found `here `_. Requirements for running tests can be found `here `_. +Parsl is supported in Python 3.9+. Requirements can be found `here `_. +Requirements for running tests can be found `here `_. Code of Conduct =============== -Parsl seeks to foster an open and welcoming environment - Please see the `Parsl Code of Conduct `_ for more details. +Parsl seeks to foster an open and welcoming environment - Please see the +`Parsl Code of Conduct `_ for +more details. Contributing ============ -We welcome contributions from the community. Please see our `contributing guide `_. +We welcome contributions from the community. Please see our +`contributing guide `_. diff --git a/docs/devguide/packaging.rst b/docs/devguide/packaging.rst index 9cd37c0aa8..5f385a1460 100644 --- a/docs/devguide/packaging.rst +++ b/docs/devguide/packaging.rst @@ -13,8 +13,8 @@ Steps to release * ``parsl/README.rst`` 3. Commit and push the changes to github -4. Run the ``tag_and_release.sh`` script. This script will verify that - version number matches the version specified. +4. Run the ``tag_and_release.sh`` script. This script will verify that version number matches the + version specified. .. code:: bash diff --git a/docs/devguide/roadmap.rst b/docs/devguide/roadmap.rst index a1fe8e44e0..c1fbc3ada2 100644 --- a/docs/devguide/roadmap.rst +++ b/docs/devguide/roadmap.rst @@ -3,49 +3,82 @@ Roadmap **OVERVIEW** -While we follow best practices in software development processes (e.g., CI, flake8, code review), there are opportunities to make our code more maintainable and accessible. This roadmap, written in the fall of 2023, covers our major activities planned through 2025 to increase efficiency, productivity, user experience, and community building. +While we follow best practices in software development processes (e.g., CI, flake8, code review), +there are opportunities to make our code more maintainable and accessible. This roadmap, written +in the fall of 2023, covers our major activities planned through 2025 to increase efficiency, +productivity, user experience, and community building. -Features and improvements are documented via GitHub -`issues `_ and `pull requests `_. +Features and improvements are documented via GitHub `issues `_ +and `pull requests `_. Code Maintenance ---------------- -* **Type Annotations and Static Type Checking**: Add static type annotations throughout the codebase and add typeguard checks. -* **Release Process**: `Improve the overall release process `_ to synchronize docs and code releases, automatically produce changelog documentation. -* **Components Maturity Model**: Defines the `component maturity model `_ and tags components with their appropriate maturity level. -* **Define and Document Interfaces**: Identify and document interfaces via which `external components `_ can augment the Parsl ecosystem. -* **Distributed Testing Process**: All tests should be run against all possible schedulers, using different executors, on a variety of remote systems. Explore the use of containerized schedulers and remote testing on real systems. +* **Type Annotations and Static Type Checking**: Add static type annotations throughout the codebase + and add typeguard checks. +* **Release Process**: `Improve the overall release process `_ + to synchronize docs and code releases, automatically produce changelog documentation. +* **Components Maturity Model**: Defines the `component maturity model `_ + and tags components with their appropriate maturity level. +* **Define and Document Interfaces**: Identify and document interfaces via which + `external components `_ can augment + the Parsl ecosystem. +* **Distributed Testing Process**: All tests should be run against all possible schedulers, using + different executors, on a variety of remote systems. Explore the use of containerized schedulers + and remote testing on real systems. + New Features and Integrations ----------------------------- -* **Enhanced MPI Support**: Extend Parsl’s MPI model with MPI apps and runtime support capable of running MPI apps in different environments (MPI flavor and launcher). -* **Serialization Configuration**: Enable users to select what serialization methods are used and enable users to supply their own serializer. +* **Enhanced MPI Support**: Extend Parsl’s MPI model with MPI apps and runtime support capable of + running MPI apps in different environments (MPI flavor and launcher). +* **Serialization Configuration**: Enable users to select what serialization methods are used and + enable users to supply their own serializer. * **PSI/J integration**: Integrate PSI/J as a common interface for schedulers. -* **Internal Concurrency Model**: Revisit and rearchitect the concurrency model to reduce areas that are not well understood and reduce the likelihood of errors. +* **Internal Concurrency Model**: Revisit and rearchitect the concurrency model to reduce areas that + are not well understood and reduce the likelihood of errors. * **Common Model for Errors**: Make Parsl errors self-describing and understandable by users. -* **Plug-in Model for External Components**: Extend Parsl to implement interfaces defined above. -* **User Configuration Validation Tool**: Provide tooling to help users configure Parsl and diagnose and resolve errors. -* **Anonymized Usage Tracking**: Usage tracking is crucial for our data-oriented approach to understand the adoption of Parsl, which components are used, and where errors occur. This allows us to prioritize investment in components, progress components through the maturity levels, and identify bugs. Revisit prior usage tracking and develop a service that enables users to control tracking information. -* **Support for Globus Compute**: Enable execution of Parsl tasks using Globus Compute as an executor. -* **Update Globus Data Management**: Update Globus integration to use the new Globus Connect v5 model (i.e., needing specific scopes for individual endpoints). +* **Plug-in Model for External Components**: Extend Parsl to implement interfaces defined above. +* **User Configuration Validation Tool**: Provide tooling to help users configure Parsl and diagnose + and resolve errors. +* **Anonymized Usage Tracking**: Usage tracking is crucial for our data-oriented approach to + understand the adoption of Parsl, which components are used, and where errors occur. This allows + us to prioritize investment in components, progress components through the maturity levels, and + identify bugs. Revisit prior usage tracking and develop a service that enables users to control + tracking information. +* **Support for Globus Compute**: Enable execution of Parsl tasks using Globus Compute as an + executor. +* **Update Globus Data Management**: Update Globus integration to use the new Globus Connect v5 + model (i.e., needing specific scopes for individual endpoints). * **Performance Measurement**: Improve ability to measure performance metrics and report to users. -* **Enhanced Debugging**: Application-level `logging `_ to understand app execution. +* **Enhanced Debugging**: Application-level `logging `_ + to understand app execution. + Tutorials, Training, and User Support ------------------------------------- -* **Configuration and Debugging**: Tutorials showing how to configure Parsl for different resources and debug execution. -* **Functional Serialization 101**: Tutorial describing how serialization works and how you can integrate custom serializers. -* **ProxyStore Data Management**: Tutorial showing how you can use ProxyStore to manage data for both inter and intra-site scenarios. -* **Open Dev Calls on Zoom**: The internal core team holds an open dev call/office hours every other Thursday to help users troubleshoot issues, present and share their work, connect with each other, and provide community updates. -* **Project Documentation**: is maintained and updated in `Read the Docs `_. +* **Configuration and Debugging**: Tutorials showing how to configure Parsl for different resources + and debug execution. +* **Functional Serialization 101**: Tutorial describing how serialization works and how you can + integrate custom serializers. +* **ProxyStore Data Management**: Tutorial showing how you can use ProxyStore to manage data for + both inter and intra-site scenarios. +* **Open Dev Calls on Zoom**: The internal core team holds an open dev call/office hours every other + Thursday to help users troubleshoot issues, present and share their work, connect with each other, + and provide community updates. +* **Project Documentation**: is maintained and updated in + `Read the Docs `_. + Longer-term Objectives ---------------------- -* **Globus Compute Integration**: Once Globus Compute supports multi-tenancy, Parsl will be able to use it to run remote tasks on initially one and then later multiple resources. -* **Multi-System Optimization**: Once Globus Compute integration is complete, it is best to use multiple systems for multiple tasks as part of a single workflow. -* **HPC Checkpointing and Job Migration**: As new resources become available, HPC tasks will be able to be checkpointed and moved to the system with more resources. +* **Globus Compute Integration**: Once Globus Compute supports multi-tenancy, Parsl will be able to + use it to run remote tasks on initially one and then later multiple resources. +* **Multi-System Optimization**: Once Globus Compute integration is complete, it is best to use + multiple systems for multiple tasks as part of a single workflow. +* **HPC Checkpointing and Job Migration**: As new resources become available, HPC tasks will be able + to be checkpointed and moved to the system with more resources. diff --git a/docs/faq.rst b/docs/faq.rst index f58d2639e7..59305cd146 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -4,10 +4,10 @@ FAQ How can I debug a Parsl script? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Parsl interfaces with the Python logger and automatically logs Parsl-related messages a ``runinfo`` directory. -The ``runinfo`` directory will be created in the folder from which you run the Parsl script -and it will contain a series of subfolders for each time you run the code. -Your latest run will be the largest number. +Parsl interfaces with the Python logger and automatically logs Parsl-related messages a ``runinfo`` +directory. The ``runinfo`` directory will be created in the folder from which you run the Parsl +script and it will contain a series of subfolders for each time you run the code. Your latest run +will be the largest number. Alternatively, you can configure the file logger to write to an output file. @@ -41,24 +41,23 @@ Parsl apps include keyword arguments for capturing stderr and stdout in files. # When hello() runs the STDOUT will be written to 'hello.txt' hello('Hello world', stdout='hello.txt') + How can I make an App dependent on multiple inputs? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -You can pass any number of futures in to a single App either as positional arguments -or as a list of futures via the special keyword ``inputs=()``. -The App will wait for all inputs to be satisfied before execution. +You can pass any number of futures in to a single App either as positional arguments or as a list of +futures via the special keyword ``inputs=()``. The App will wait for all inputs to be satisfied +before execution. Can I pass any Python object between apps? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This depends on the executor in use. The :py:class:`parsl.executors.threads.ThreadPoolExecutor` -can receive and return any Python object. Other executors will serialize their -parameters and return values, so only objects which Parsl knows how to -serialize can be passed. +can receive and return any Python object. Other executors will serialize their parameters and return +values, so only objects which Parsl knows how to serialize can be passed. -Parsl knows how to serialize objects using the Pickle and Dill -libraries. +Parsl knows how to serialize objects using the Pickle and Dill libraries. Pickle provides a list of objects that it knows how to serialize: `What can be pickled and unpickled? `_. @@ -66,14 +65,15 @@ Pickle provides a list of objects that it knows how to serialize: Dill can serialize much more than Pickle, documented in the `dill documentation `_. -For objects that can't be pickled, use object specific methods -to write the object into a file and use files to communicate between apps. +For objects that can't be pickled, use object specific methods to write the object into a file and +use files to communicate between apps. + How do I specify where apps should be run? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Parsl's multi-executor support allows you to define the executor (including local threads) -on which an App should be executed. For example: +Parsl's multi-executor support allows you to define the executor (including local threads) on which +an App should be executed. For example: .. code-block:: python @@ -85,18 +85,16 @@ on which an App should be executed. For example: def Visualize (...) ... + Workers do not connect back to Parsl ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -If you are running via ssh to a remote system from your local machine, or from the -login node of a cluster/supercomputer, it is necessary to have a public IP to which -the workers can connect back. While our remote execution systems can identify the -IP address automatically in certain cases, it is safer to specify the address explicitly. -Parsl provides a few heuristic based address resolution methods that could be useful, -however with complex networks some trial and error might be necessary to find the -right address or network interface to use. - - +If you are running via ssh to a remote system from your local machine, or from the login node of a +cluster/supercomputer, it is necessary to have a public IP to which the workers can connect back. +While our remote execution systems can identify the IP address automatically in certain cases, it +is safer to specify the address explicitly. Parsl provides a few heuristic based address resolution +methods that could be useful, however with complex networks some trial and error might be necessary +to find the right address or network interface to use. For `parsl.executors.HighThroughputExecutor` the address is specified in the :class:`~parsl.config.Config` as shown below : @@ -121,8 +119,8 @@ as shown below : .. note:: - Another possibility that can cause workers not to connect back to Parsl is an incompatibility between - the system and the pre-compiled bindings used for pyzmq. As a last resort, you can try: + Another possibility that can cause workers not to connect back to Parsl is an incompatibility + between the system and the pre-compiled bindings used for pyzmq. As a last resort, you can try: ``pip install --upgrade --no-binary pyzmq pyzmq``, which forces re-compilation. For the `parsl.executors.HighThroughputExecutor`, ``address`` is a keyword argument @@ -149,20 +147,23 @@ taken at initialization. Here is an example for the `parsl.executors.HighThrough .. note:: - On certain systems such as the Midway RCC cluster at UChicago, some network interfaces have an active - intrusion detection system that drops connections that persist beyond a specific duration (~20s). - If you get repeated ``ManagerLost`` exceptions, it would warrant taking a closer look at networking. + On certain systems such as the Midway RCC cluster at UChicago, some network interfaces have an + active intrusion detection system that drops connections that persist beyond a specific duration + (~20s). If you get repeated ``ManagerLost`` exceptions, it would warrant taking a closer look at + networking. .. _pyversion: + parsl.errors.ConfigurationError ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The Parsl configuration model underwent a major and non-backward compatible change in the transition to v0.6.0. -Prior to v0.6.0 the configuration object was a python dictionary with nested dictionaries and lists. -The switch to a class based configuration allowed for well-defined options for each specific component being -configured as well as transparency on configuration defaults. The following traceback indicates that the old -style configuration was passed to Parsl v0.6.0+ and requires an upgrade to the configuration. +The Parsl configuration model underwent a major and non-backward compatible change in the transition +to v0.6.0. Prior to v0.6.0 the configuration object was a python dictionary with nested dictionaries +and lists. The switch to a class based configuration allowed for well-defined options for each +specific component being configured as well as transparency on configuration defaults. The following +traceback indicates that the old style configuration was passed to Parsl v0.6.0+ and requires an +upgrade to the configuration. .. code-block:: @@ -174,19 +175,19 @@ style configuration was passed to Parsl v0.6.0+ and requires an upgrade to the c For more information on how to update your configuration script, please refer to: :ref:`configuration-section`. - + Remote execution fails with SystemError(unknown opcode) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -When running with Ipyparallel workers, it is important to ensure that the Python version -on the client side matches that on the side of the workers. If there's a mismatch, -the apps sent to the workers will fail with the following error: +When running with Ipyparallel workers, it is important to ensure that the Python version on the +client side matches that on the side of the workers. If there's a mismatch, the apps sent to the +workers will fail with the following error: ``ipyparallel.error.RemoteError: SystemError(unknown opcode)`` .. caution:: - It is **required** that both the parsl script and all workers are set to use python - with the same Major.Minor version numbers. For example, use Python3.5.X on both local - and worker side. + It is **required** that both the parsl script and all workers are set to use python with the same + Major.Minor version numbers. For example, use Python3.5.X on both local and worker side. + Parsl complains about missing packages ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -201,7 +202,8 @@ You should usually install parsl using a package managment tool such as ``pip`` ``conda``, ideally in a restricted environment such a virtualenv or a conda environment. -For instance, with conda, follow this `cheatsheet `_ to create a virtual environment: +For instance, with conda, follow this `cheatsheet `_ +to create a virtual environment: .. code-block:: bash @@ -215,11 +217,10 @@ For instance, with conda, follow this `cheatsheet `_ in more detail. + If an ``app`` takes a list as an ``input`` argument and the future returned is added to that + list, it creates a circular dependency that cannot be resolved. This situation is described in + `issue 59 `_ in more detail. -2. Workers requested are unable to contact the Parsl client due to one or - more issues listed below: +2. Workers requested are unable to contact the Parsl client due to one or more issues listed below: - * Parsl client does not have a public IP (e.g. laptop on wifi). - If your network does not provide public IPs, the simple solution is to - ssh over to a machine that is public facing. Machines provisioned from - cloud-vendors setup with public IPs are another option. + * Parsl client does not have a public IP (e.g. laptop on wifi). If your network does not provide + public IPs, the simple solution is to ssh over to a machine that is public facing. Machines + provisioned from cloud-vendors setup with public IPs are another option. - * Parsl hasn't autodetected the public IP. See `Workers do not connect back to Parsl`_ for more details. + * Parsl hasn't autodetected the public IP. See `Workers do not connect back to Parsl`_ for more + details. - * Firewall restrictions that block certain port ranges. - If there is a certain port range that is **not** blocked, you may specify - that via configuration: + * Firewall restrictions that block certain port ranges. If there is a certain port range that is + **not** blocked, you may specify that via configuration: .. code-block:: python @@ -282,7 +282,7 @@ Run jupyter notebook --no-browser --ip=`/sbin/ip route get 8.8.8.8 | awk '{print $NF;exit}'` -for a Jupyter notebook, or +for a Jupyter notebook, or .. code-block:: bash @@ -290,6 +290,7 @@ for a Jupyter notebook, or for Jupyter lab (recommended). If that doesn't work, see `these instructions `_. + How can I sync my conda environment and Jupyter environment? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -297,25 +298,29 @@ Run:: conda install nb_conda -Now all available conda environments (for example, one created by following the instructions `in the quickstart guide `_) will automatically be added to the list of kernels. +Now all available conda environments (for example, one created by following the instructions +`in the quickstart guide `_) will automatically be added to +the list of kernels. .. _label_serialization_error: Addressing SerializationError ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -As of v1.0.0, Parsl will raise a `SerializationError` when it encounters an object that Parsl cannot serialize. -This applies to objects passed as arguments to an app, as well as objects returned from the app. +As of v1.0.0, Parsl will raise a `SerializationError` when it encounters an object that Parsl cannot +serialize. This applies to objects passed as arguments to an app, as well as objects returned from +the app. -Parsl uses dill and pickle to serialize Python objects -to/from functions. Therefore, Python apps can only use input and output objects that can be serialized by -dill or pickle. For example the following data types are known to have issues with serializability : +Parsl uses dill and pickle to serialize Python objects to/from functions. Therefore, Python apps can +only use input and output objects that can be serialized by dill or pickle. For example the +following data types are known to have issues with serializability : * Closures * Objects of complex classes with no ``__dict__`` or ``__getstate__`` methods defined * System objects such as file descriptors, sockets and locks (e.g threading.Lock) -If Parsl raises a `SerializationError`, first identify what objects are problematic with a quick test: +If Parsl raises a `SerializationError`, first identify what objects are problematic with a quick +test: .. code-block:: python @@ -323,11 +328,11 @@ If Parsl raises a `SerializationError`, first identify what objects are problema # If non-serializable you will get a TypeError pickle.dumps(YOUR_DATA_OBJECT) -If the data object simply is complex, please refer `here `_ for more details +If the data object simply is complex, please refer +`here `_ for more details on adding custom mechanisms for supporting serialization. - How do I cite Parsl? ^^^^^^^^^^^^^^^^^^^^ @@ -347,9 +352,9 @@ or Clifford, Ben and Kumar, Rohan and Lacinski, Lukasz and - Chard, Ryan and + Chard, Ryan and Wozniak, Justin and - Foster, Ian and + Foster, Ian and Wilde, Mike and Chard, Kyle}, title = {Parsl: Pervasive Parallel Programming in Python}, @@ -363,18 +368,17 @@ or How can my tasks survive ``WorkerLost`` and ``ManagerLost`` at the end of a batch job? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -When a batch job ends, pilot workers will be terminated by the batch system, -and any tasks running there will fail. With `HighThroughputExecutor`, -this failure will be reported as a `parsl.executors.high_throughput.errors.WorkerLost` or +When a batch job ends, pilot workers will be terminated by the batch system, and any tasks running +there will fail. With `HighThroughputExecutor`, this failure will be reported as a +`parsl.executors.high_throughput.errors.WorkerLost` or `parsl.executors.high_throughput.interchange.ManagerLost` in the task future. To mitigate against this: * use retries by setting ``retries=`` in `parsl.config.Config`. -* if you only want to retry on certain errors such as `WorkerLost` and `ManagerLost`, - use ``retry_handler`` in `parsl.config.Config` to implement that policy. -* avoid sending tasks to batch jobs that will expire soon. With `HighThroughputExecutor`, - set drain_period to a little longer than you expect your tasks to take. - With `WorkQueueExecutor`, you can configure individual expected task duration using - a ``parsl_resource_specification`` and specify a worker ``--wall-time`` using the - ``worker_options`` parameter to the `WorkQueueExecutor`. +* if you only want to retry on certain errors such as `WorkerLost` and `ManagerLost`, use + ``retry_handler`` in `parsl.config.Config` to implement that policy. +* avoid sending tasks to batch jobs that will expire soon. With `HighThroughputExecutor`, set + drain_period to a little longer than you expect your tasks to take. With `WorkQueueExecutor`, you + can configure individual expected task duration using a ``parsl_resource_specification`` and + specify a worker ``--wall-time`` using the ``worker_options`` parameter to the `WorkQueueExecutor`. diff --git a/docs/historical/performance.rst b/docs/historical/performance.rst index 4f9f8bad39..bd3cf7ddbe 100644 --- a/docs/historical/performance.rst +++ b/docs/historical/performance.rst @@ -4,82 +4,70 @@ Historical: Performance and Scalability ======================================= .. note:: - This scalability review summarises results in a paper, Parsl: Pervasive - Parallel Programming in Python, which was published in 2019. The results - have not been updated since then. For that reason, this section is marked - as historical. + This scalability review summarises results in a paper, Parsl: Pervasive Parallel Programming in + Python, which was published in 2019. The results have not been updated since then. For that reason, + this section is marked as historical. Parsl is designed to scale from small to large systems . Scalability ----------- -We studied strong and weak scaling on the Blue Waters supercomputer. -In strong scaling, the total problem size is fixed; in weak scaling, the problem -size per CPU core is fixed. In both cases, we measure completion -time as a function of number of CPU cores. An ideal framework -should scale linearly, which for strong scaling means that speedup -scales with the number of cores, and for weak scaling means that -completion time remains constant as the number of cores increases. - -To measure the strong and weak scaling of Parsl executors, we -created Parsl programs to run tasks with different durations, ranging from a -"no-op"--a Python function that exits immediately---to -tasks that sleep for 10, 100, and 1,000 ms. For each executor we -deployed a worker per core on each node. - -While we compare here with IPP, Fireworks, and Dask Distributed, -we note that these systems are not necessarily designed for -Parsl-like workloads or scale. - -Further results are presented in our +We studied strong and weak scaling on the Blue Waters supercomputer. In strong scaling, the total +problem size is fixed; in weak scaling, the problem size per CPU core is fixed. In both cases, we +measure completion time as a function of number of CPU cores. An ideal framework should scale +linearly, which for strong scaling means that speedup scales with the number of cores, and for weak +scaling means that completion time remains constant as the number of cores increases. + +To measure the strong and weak scaling of Parsl executors, we created Parsl programs to run tasks +with different durations, ranging from a "no-op"--a Python function that exits immediately---to +tasks that sleep for 10, 100, and 1,000 ms. For each executor we deployed a worker per core on each +node. + +While we compare here with IPP, Fireworks, and Dask Distributed, we note that these systems are not +necessarily designed for Parsl-like workloads or scale. + +Further results are presented in our `HPDC paper `_. + Strong scaling ^^^^^^^^^^^^^^ -The figures below show the strong scaling results for 5,000 1-second -sleep tasks. HTEX -provides good performance in all cases, slightly exceeding what is -possible with EXEX, while EXEX scales to significantly more workers -than the other executors and frameworks. Both -HTEX and EXEX remain nearly constant, indicating that they likely -will continue to perform well at larger scales. +The figures below show the strong scaling results for 5,000 1-second sleep tasks. HTEX provides good +performance in all cases, slightly exceeding what is possible with EXEX, while EXEX scales to +significantly more workers than the other executors and frameworks. Both HTEX and EXEX remain nearly +constant, indicating that they likely will continue to perform well at larger scales. .. image:: ../images/performance/strong-scaling.png Weak scaling ^^^^^^^^^^^^ -Here, we launched 10 tasks per worker, while -increasing the number of workers. (We limited experiments to 10 -tasks per worker, as on 3,125 nodes, that represents 3,125 -nodes × 32 workers/node × 10 tasks/worker, or 1M tasks.) The -figure below shows our results. We observe that HTEX -and EXEX outperform other executors and frameworks with more -than 4,096 workers (128 nodes). All frameworks exhibit similar -trends, with completion time remaining close to constant initially -and increasing rapidly as the number of workers increases. +Here, we launched 10 tasks per worker, while increasing the number of workers. (We limited +experiments to 10 tasks per worker, as on 3,125 nodes, that represents 3,125 nodes × 32 workers/node +× 10 tasks/worker, or 1M tasks.) The figure below shows our results. We observe that HTEX and EXEX +outperform other executors and frameworks with more than 4,096 workers (128 nodes). All frameworks +exhibit similar trends, with completion time remaining close to constant initially and increasing +rapidly as the number of workers increases. .. image:: ../images/performance/weak-scaling.png Throughput ---------- -We measured the maximum throughput of all the Parsl executors, -on the UChicago Research Computing Center's Midway Cluster. -To do so, we ran 50,000 “no-op" tasks on a varying number of -workers and recorded the completion times. The throughout is -computed as the number of tasks divided by the completion time. -HTEX, and EXEX achieved maximum throughputs of 1,181 and 1,176 -tasks/s, respectively. +We measured the maximum throughput of all the Parsl executors, on the UChicago Research Computing +Center's Midway Cluster. To do so, we ran 50,000 “no-op" tasks on a varying number of workers and +recorded the completion times. The throughout is computed as the number of tasks divided by the +completion time. HTEX, and EXEX achieved maximum throughputs of 1,181 and 1,176 tasks/s, +respectively. + Summary ------- -The table below summarizes the scale at which we have tested Parsl executors. -The maximum number of nodes and workers for HTEX and EXEX is limited -by the size of allocation available during testing on Blue Waters. -The throughput results are collected on Midway. +The table below summarizes the scale at which we have tested Parsl executors. The maximum number of +nodes and workers for HTEX and EXEX is limited by the size of allocation available during testing on +Blue Waters. The throughput results are collected on Midway. +-----------+------------------+-------------+------------------+ | Executor | Max # workers | Max # nodes | Max tasks/second | diff --git a/docs/index.rst b/docs/index.rst index c5057c0899..ef9d58f56f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -5,9 +5,9 @@ Parsl extends parallelism in Python beyond a single computer. You can use Parsl `just like Python's parallel executors `_ -but across *multiple cores and nodes*. -However, the real power of Parsl is in expressing multi-step workflows of functions. -Parsl lets you chain functions together and will launch each function as inputs and computing resources are available. +but across *multiple cores and nodes*. However, the real power of Parsl is in expressing multi-step +workflows of functions. Parsl lets you chain functions together and will launch each function as +inputs and computing resources are available. .. code-block:: python @@ -36,9 +36,11 @@ Parsl lets you chain functions together and will launch each function as inputs assert future.result() == 7 -Start with the `configuration quickstart `_ to learn how to tell Parsl how to use your computing resource, -see if `a template configuration for your supercomputer `_ is already available, -then explore the `parallel computing patterns `_ to determine how to use parallelism best in your application. +Start with the `configuration quickstart `_ to learn how to tell +Parsl how to use your computing resource, see if +`a template configuration for your supercomputer `_ is already available, +then explore the `parallel computing patterns `_ to determine how to use +parallelism best in your application. Parsl is an open-source code, and available on GitHub: https://github.com/parsl/parsl/ @@ -49,8 +51,9 @@ Parsl is Python --------------- *Everything about a Parsl program is written in Python.* -Parsl follows Python's native parallelization approach and functions, -how they combine into workflows, and where they run are all described in Python. +Parsl follows Python's native parallelization approach and functions, how they combine into +workflows, and where they run are all described in Python. + Parsl works everywhere ---------------------- @@ -59,6 +62,7 @@ Parsl works everywhere Scaling from laptop to supercomputer is often as simple as changing the resource configuration. Parsl is tested `on many of the top supercomputers `_. + Parsl is flexible ----------------- @@ -75,6 +79,7 @@ Parsl handles data *Parsl has first-class support for workflows involving files.* Data will be automatically moved between workers, even if they reside on different filesystems. + Parsl is fast ------------- @@ -87,17 +92,18 @@ Parsl is a community *Parsl is part of a large, experienced community.* -The Parsl Project was launched by researchers with decades of experience in workflows -as part of a National Science Foundation project to create sustainable research software. +The Parsl Project was launched by researchers with decades of experience in workflows as part of a +National Science Foundation project to create sustainable research software. The Parsl team is guided by the community through its GitHub, conversations on `Slack `_, -Bi-Weekly developer calls, -and engagement with the `Workflows Community Initiative `_. +Bi-Weekly developer calls, and engagement with the +`Workflows Community Initiative `_. .. I was going to work in how we integrate with other tools, but that seemed too-detailed in this draft + Table of Contents +++++++++++++++++ @@ -107,7 +113,7 @@ Table of Contents quickstart 1-parsl-introduction.ipynb userguide/index - userguide/glossary + userguide/glossary faq reference devguide/index diff --git a/docs/quickstart.rst b/docs/quickstart.rst index d54763ee58..6ec2e2f9d0 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -1,14 +1,14 @@ Quickstart ========== -To try Parsl now (without installing any code locally), experiment with our +To try Parsl now (without installing any code locally), experiment with our `hosted tutorial notebooks on Binder `_. Installation ------------ -Parsl is available on `PyPI `_ and `conda-forge `_. +Parsl is available on `PyPI `_ and `conda-forge `_. Parsl requires Python3.9+ and has been tested on Linux. @@ -16,8 +16,8 @@ Parsl requires Python3.9+ and has been tested on Linux. Installation using Pip ^^^^^^^^^^^^^^^^^^^^^^ -While ``pip`` can be used to install Parsl, we suggest the following approach -for reliable installation when many Python environments are available. +While ``pip`` can be used to install Parsl, we suggest the following approach for reliable +installation when many Python environments are available. 1. Install Parsl:: @@ -44,7 +44,9 @@ Installation using Conda $ conda install parsl -The conda documentation provides `instructions `_ for installing conda on macOS and Linux. +The conda documentation provides `instructions `_ +for installing conda on macOS and Linux. + Getting started --------------- @@ -54,39 +56,36 @@ Getting started :align: center -Parsl has much in common with Python's native concurrency library, -but unlocking Parsl's potential requires understanding a few major concepts. +Parsl has much in common with Python's native concurrency library, but unlocking Parsl's potential +requires understanding a few major concepts. -A Parsl program submits tasks to run on Workers distributed across remote computers. -The instructions for these tasks are contained within `"apps" <#application-types>`_ -that users define using Python functions. -Each remote computer (e.g., a node on a supercomputer) has a single `"Executor" <#executors>`_ -which manages the workers. -Remote resources available to Parsl are acquired by a `"Provider" <#execution-providers>`_, -which places the executor on a system with a `"Launcher" <#launchers>`_. -Task execution is brokered by a `"Data Flow Kernel" <#benefits-of-a-data-flow-kernel>`_ that runs on your local system. +A Parsl program submits tasks to run on Workers distributed across remote computers. The +instructions for these tasks are contained within `"apps" <#application-types>`_ that users define +using Python functions. Each remote computer (e.g., a node on a supercomputer) has a single +`"Executor" <#executors>`_ which manages the workers. Remote resources available to Parsl are +acquired by a `"Provider" <#execution-providers>`_, which places the executor on a system with a +`"Launcher" <#launchers>`_. Task execution is brokered by a +`"Data Flow Kernel" <#benefits-of-a-data-flow-kernel>`_ that runs on your local system. -We describe these components briefly here, and link to more details in the `User Guide `_. +We describe these components briefly here, and link to more details in the +`User Guide `_. .. note:: Parsl's documentation includes `templates for many supercomputers `_. - Even though you may not need to write a configuration from a blank slate, - understanding the basic terminology below will be very useful. + Even though you may not need to write a configuration from a blank slate, understanding the + basic terminology below will be very useful. Application Types ^^^^^^^^^^^^^^^^^ -Parsl enables concurrent execution of Python functions (``python_app``) -or external applications (``bash_app``). -The logic for both are described by Python functions marked with Parsl decorators. -When decorated functions are invoked, they run asynchronously on other resources. -The result of a call to a Parsl app is an :class:`~parsl.app.futures.AppFuture`, -which behaves like a Python Future. +Parsl enables concurrent execution of Python functions (``python_app``) or external applications +(``bash_app``). The logic for both are described by Python functions marked with Parsl decorators. +When decorated functions are invoked, they run asynchronously on other resources. The result of a +call to a Parsl app is an :class:`~parsl.app.futures.AppFuture`, which behaves like a Python Future. -The following example shows how to write a simple Parsl program -with hello world Python and Bash apps. +The following example shows how to write a simple Parsl program with hello world Python and Bash apps. .. code-block:: python @@ -101,7 +100,7 @@ with hello world Python and Bash apps. def hello_bash(message, stdout='hello-stdout'): return 'echo "Hello %s"' % message - + with parsl.load(): # invoke the Python app and print the result print(hello_python('World (Python)').result()) @@ -114,42 +113,43 @@ with hello world Python and Bash apps. Learn more about the types of Apps and their options `here `__. + Executors ^^^^^^^^^ -Executors define how Parsl deploys work on a computer. -Many types are available, each with different advantages. +Executors define how Parsl deploys work on a computer. Many types are available, each with different +advantages. -The :class:`~parsl.executors.high_throughput.executor.HighThroughputExecutor`, -like Python's ``ProcessPoolExecutor``, creates workers that are separate Python processes. -However, you have much more control over how the work is deployed. -You can dynamically set the number of workers based on available memory and -pin each worker to specific GPUs or CPU cores -among other powerful features. +The :class:`~parsl.executors.high_throughput.executor.HighThroughputExecutor`, like Python's +``ProcessPoolExecutor``, creates workers that are separate Python processes. However, you have much +more control over how the work is deployed. You can dynamically set the number of workers based on +available memory and pin each worker to specific GPUs or CPU cores among other powerful features. Learn more about Executors `here `__. + Execution Providers ^^^^^^^^^^^^^^^^^^^ -Resource providers allow Parsl to gain access to computing power. -For supercomputers, gaining resources often requires requesting them from a scheduler (e.g., Slurm). -Parsl Providers write the requests to requisition **"Blocks"** (e.g., supercomputer nodes) on your behalf. -Parsl comes pre-packaged with Providers compatible with most supercomputers and some cloud computing services. +Resource providers allow Parsl to gain access to computing power. For supercomputers, gaining +resources often requires requesting them from a scheduler (e.g., Slurm). Parsl Providers write the +requests to requisition **"Blocks"** (e.g., supercomputer nodes) on your behalf. Parsl comes +pre-packaged with Providers compatible with most supercomputers and some cloud computing services. -Another key role of Providers is defining how to start an Executor on a remote computer. -Often, this simply involves specifying the correct Python environment and -(described below) how to launch the Executor on each acquired computers. +Another key role of Providers is defining how to start an Executor on a remote computer. Often, +this simply involves specifying the correct Python environment and (described below) how to launch +the Executor on each acquired computers. Learn more about Providers `here `__. + Launchers ^^^^^^^^^ -The Launcher defines how to spread workers across all nodes available in a Block. -A common example is an :class:`~parsl.launchers.launchers.MPILauncher`, which uses MPI's mechanism -for starting a single program on multiple computing nodes. -Like Providers, Parsl comes packaged with Launchers for most supercomputers and clouds. +The Launcher defines how to spread workers across all nodes available in a Block. A common example +is an :class:`~parsl.launchers.launchers.MPILauncher`, which uses MPI's mechanism for starting a +single program on multiple computing nodes. Like Providers, Parsl comes packaged with Launchers for +most supercomputers and clouds. Learn more about Launchers `here `__. @@ -157,26 +157,27 @@ Learn more about Launchers `here `__. Benefits of a Data-Flow Kernel ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The Data-Flow Kernel (DFK) is the behind-the-scenes engine behind Parsl. -The DFK determines when tasks can be started and sends them to open resources, -receives results, restarts failed tasks, propagates errors to dependent tasks, -and performs the many other functions needed to execute complex workflows. -The flexibility and performance of the DFK enables applications with -intricate dependencies between tasks to execute on thousands of parallel workers. +The Data-Flow Kernel (DFK) is the behind-the-scenes engine behind Parsl. The DFK determines when +tasks can be started and sends them to open resources, receives results, restarts failed tasks, +propagates errors to dependent tasks, and performs the many other functions needed to execute +complex workflows. The flexibility and performance of the DFK enables applications with intricate +dependencies between tasks to execute on thousands of parallel workers. + +Start with the Tutorial or the `parallel patterns `_ to see the complex +types of workflows you can make with Parsl. -Start with the Tutorial or the `parallel patterns `_ -to see the complex types of workflows you can make with Parsl. Starting Parsl ^^^^^^^^^^^^^^ -A Parsl script must contain the function definitions, resource configuration, and a call to ``parsl.load`` -before launching tasks. -This script runs on a system that must stay on-line until all of your tasks complete but need not have -much computing power, such as the login node for a supercomputer. +A Parsl script must contain the function definitions, resource configuration, and a call to +``parsl.load`` before launching tasks. This script runs on a system that must stay on-line until all +of your tasks complete but need not have much computing power, such as the login node for a +supercomputer. -The :class:`~parsl.config.Config` object holds definitions of Executors and the Providers and Launchers they rely on. -An example which launches 4 workers on 1 node of the Polaris supercomputer looks like +The :class:`~parsl.config.Config` object holds definitions of Executors and the Providers and +Launchers they rely on. An example which launches 4 workers on 1 node of the Polaris supercomputer +looks like .. code-block:: python @@ -220,28 +221,30 @@ The next step is to load the configuration You are then ready to use 10 PFLOPS of computing power through Python! + Tutorial -------- The best way to learn more about Parsl is by reviewing the Parsl tutorials. -There are several options for following the tutorial: +There are several options for following the tutorial: -1. Use `Binder `_ to follow the tutorial online without installing or writing any code locally. -2. Clone the `Parsl tutorial repository `_ using a local Parsl installation. +1. Use `Binder `_ to follow the tutorial + online without installing or writing any code locally. +2. Clone the `Parsl tutorial repository `_ using a local + Parsl installation. 3. Read through the online `tutorial documentation <1-parsl-introduction.html>`_. Usage Tracking -------------- -To help support the Parsl project, we ask that users opt-in to anonymized usage tracking -whenever possible. Usage tracking allows us to measure usage, identify bugs, and improve -usability, reliability, and performance. Only aggregate usage statistics will be used -for reporting purposes. +To help support the Parsl project, we ask that users opt-in to anonymized usage tracking whenever +possible. Usage tracking allows us to measure usage, identify bugs, and improve usability, +reliability, and performance. Only aggregate usage statistics will be used for reporting purposes. -As an NSF-funded project, our ability to track usage metrics is important for continued funding. +As an NSF-funded project, our ability to track usage metrics is important for continued funding. -You can opt-in by setting ``usage_tracking=3`` in the configuration object (`parsl.config.Config`). +You can opt-in by setting ``usage_tracking=3`` in the configuration object (`parsl.config.Config`). To read more about what information is collected and how it is used see :ref:`label-usage-tracking`. @@ -249,9 +252,9 @@ To read more about what information is collected and how it is used see :ref:`la For Developers -------------- -Parsl is an open source community that encourages contributions from users -and developers. A guide for `contributing `_ -to Parsl is available in the `Parsl GitHub repository `_. +Parsl is an open source community that encourages contributions from users and developers. A guide +for `contributing `_ to Parsl is +available in the `Parsl GitHub repository `_. The following instructions outline how to set up Parsl from source. diff --git a/docs/userguide/apps.rst b/docs/userguide/apps.rst index 1ef105b4fe..e7950e0630 100644 --- a/docs/userguide/apps.rst +++ b/docs/userguide/apps.rst @@ -3,10 +3,9 @@ Apps ==== -An **App** defines a computation that will be executed asynchronously by Parsl. -Apps are Python functions marked with a decorator which -designates that the function will run asynchronously and cause it to return -a :class:`~concurrent.futures.Future` instead of the result. +An **App** defines a computation that will be executed asynchronously by Parsl. Apps are Python +functions marked with a decorator which designates that the function will run asynchronously and +cause it to return a :class:`~concurrent.futures.Future` instead of the result. Apps can be one of three types of functions, each with their own type of decorator @@ -17,6 +16,7 @@ Apps can be one of three types of functions, each with their own type of decorat The intricacies of Python and Bash apps are documented below. Join apps are documented in a later section (see :ref:`label-joinapp`). + Python Apps ----------- @@ -32,23 +32,22 @@ Python Apps Python Apps run Python functions. The code inside a function marked by ``@python_app`` is what will be executed either locally or on a remote system. -Most functions can run without modification. -Limitations on the content of the functions and their inputs/outputs are described below. +Most functions can run without modification. Limitations on the content of the functions and their +inputs/outputs are described below. -Rules for Function Contents -^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _function-rules: -Parsl apps have access to less information from the script that defined them -than functions run via Python's native multiprocessing libraries. -The reason is that functions are executed on workers that -lack access to the global variables in the script that defined them. -Practically, this means +Rules for Function Contents +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Parsl apps have access to less information from the script that defined them than functions run via +Python's native multiprocessing libraries. The reason is that functions are executed on workers that +lack access to the global variables in the script that defined them. Practically, this means 1. *Functions may need to re-import libraries.* - Place the import statements that define functions or classes inside the function. - Type annotations should not use libraries defined in the function. + Place the import statements that define functions or classes inside the function. Type + annotations should not use libraries defined in the function. .. code-block:: python @@ -68,8 +67,8 @@ Practically, this means 2. *Global variables are inaccessible*. - Functions should not use variables defined outside the function. - Likewise, do not assume that variables created inside the function are visible elsewhere. + Functions should not use variables defined outside the function. Likewise, do not assume that + variables created inside the function are visible elsewhere. .. code-block:: python @@ -114,9 +113,9 @@ Practically, this means Functions from Modules ++++++++++++++++++++++ -The above rules assume that the user is running the example code from a standalone script or Jupyter Notebook. -Functions that are defined in an installed Python module do not need to abide by these guidelines, -as they are sent to workers differently than functions defined locally within a script. +The above rules assume that the user is running the example code from a standalone script or Jupyter +Notebook. Functions that are defined in an installed Python module do not need to abide by these +guidelines, as they are sent to workers differently than functions defined locally within a script. Directly convert a function from a library to a Python App by passing it as an argument to ``python_app``: @@ -128,8 +127,8 @@ Directly convert a function from a library to a Python App by passing it as an a ``function_app`` will act as Parsl App function of ``function``. It is also possible to create wrapped versions of functions, such as ones with pinned arguments. -Parsl just requires first calling :meth:`~functools.update_wrapped` with the wrapped function -to include attributes from the original function (e.g., its name). +Parsl just requires first calling :meth:`~functools.update_wrapped` with the wrapped function to +include attributes from the original function (e.g., its name). .. code-block:: python @@ -151,16 +150,16 @@ The above example is equivalent to creating a new function (as below) Inputs and Outputs ^^^^^^^^^^^^^^^^^^ -Python apps may be passed any Python type as an input and return any Python type, with a few exceptions. -There are several classes of allowed types, each with different rules. +Python apps may be passed any Python type as an input and return any Python type, with a few +exceptions. There are several classes of allowed types, each with different rules. - *Python Objects*: Any Python object that can be saved with `pickle `_ or `dill `_ - can be used as an import or output. - All primitive types (e.g., floats, strings) are valid as are many complex types (e.g., numpy arrays). -- *Files*: Pass files as inputs as a :py:class:`~parsl.data_provider.files.File` object. - Parsl can transfer them to a remote system and update the ``File`` object with a new path. - Access the new path with ``File.filepath`` attribute. + can be used as an import or output. All primitive types (e.g., floats, strings) are valid as are + many complex types (e.g., numpy arrays). +- *Files*: Pass files as inputs as a :py:class:`~parsl.data_provider.files.File` object. Parsl can + transfer them to a remote system and update the ``File`` object with a new path. Access the new + path with ``File.filepath`` attribute. .. code-block:: python @@ -170,9 +169,9 @@ There are several classes of allowed types, each with different rules. return fp.readline() Files can also be outputs of a function, but only through the ``outputs`` kwargs (described below). -- *Parsl Futures*. Functions can receive results from other Apps as Parsl ``Future`` objects. - Parsl will establish a dependency on the App(s) which created the Future(s) - and start executing as soon as the preceding ones complete. +- *Parsl Futures*. Functions can receive results from other Apps as Parsl ``Future`` objects. Parsl + will establish a dependency on the App(s) which created the Future(s) and start executing as soon + as the preceding ones complete. .. code-block:: python @@ -191,18 +190,18 @@ There are several classes of allowed types, each with different rules. Learn more about the types of data allowed in `the data section `_. .. note:: - Any changes to mutable input arguments will be ignored. + Special Keyword Arguments +++++++++++++++++++++++++ Some keyword arguments to the Python function are treated differently by Parsl -1. inputs: (list) This keyword argument defines a list of input :ref:`label-futures` or files. - Parsl will wait for the results of any listed :ref:`label-futures` to be resolved before executing the app. - The ``inputs`` argument is useful both for passing files as arguments - and when one wishes to pass in an arbitrary number of futures at call time. +1. inputs: (list) This keyword argument defines a list of input :ref:`label-futures` or files. Parsl + will wait for the results of any listed :ref:`label-futures` to be resolved before executing the + app. The ``inputs`` argument is useful both for passing files as arguments and when one wishes to + pass in an arbitrary number of futures at call time. .. code-block:: python @@ -219,10 +218,9 @@ Some keyword arguments to the Python function are treated differently by Parsl print(reduce_future.result()) # 0 + 1 * 2 + 2 * 2 = 6 -2. outputs: (list) This keyword argument defines a list of files that - will be produced by the app. For each file thus listed, Parsl will create a future, - track the file, and ensure that it is correctly created. The future - can then be passed to other apps as an input argument. +2. outputs: (list) This keyword argument defines a list of files that will be produced by the app. + For each file thus listed, Parsl will create a future, track the file, and ensure that it is + correctly created. The future can then be passed to other apps as an input argument. .. code-block:: python @@ -242,23 +240,25 @@ Some keyword arguments to the Python function are treated differently by Parsl with open(path) as fp: assert fp.read() == 'Hello!\n' -3. walltime: (int) This keyword argument places a limit on the app's - runtime in seconds. If the walltime is exceed, Parsl will raise an `parsl.app.errors.AppTimeout` exception. +3. walltime: (int) This keyword argument places a limit on the app's runtime in seconds. If the + walltime is exceed, Parsl will raise an `parsl.app.errors.AppTimeout` exception. + Outputs +++++++ -A Python app returns an AppFuture (see :ref:`label-futures`) as a proxy for the results that will be returned by the -app once it is executed. This future can be inspected to obtain task status; -and it can be used to wait for the result, and when complete, present the output Python object(s) returned by the app. -In case of an error or app failure, the future holds the exception raised by the app. +A Python app returns an AppFuture (see :ref:`label-futures`) as a proxy for the results that will be +returned by the app once it is executed. This future can be inspected to obtain task status; and it +can be used to wait for the result, and when complete, present the output Python object(s) returned +by the app. In case of an error or app failure, the future holds the exception raised by the app. + Options for Python Apps ^^^^^^^^^^^^^^^^^^^^^^^ -The :meth:`~parsl.app.app.python_app` decorator has a few options which controls how Parsl executes all tasks -run with that application. -For example, you can ensure that Parsl caches the results of the function and executes tasks on specific sites. +The :meth:`~parsl.app.app.python_app` decorator has a few options which controls how Parsl executes +all tasks run with that application. For example, you can ensure that Parsl caches the results of +the function and executes tasks on specific sites. .. code-block:: python @@ -269,14 +269,18 @@ For example, you can ensure that Parsl caches the results of the function and ex See the Parsl documentation for full details. + Limitations ^^^^^^^^^^^ To summarize, any Python function can be made a Python App with a few restrictions -1. Functions should act only on defined input arguments. That is, they should not use script-level or global variables. -2. Functions must explicitly import any required modules if they are defined in script which starts Parsl. -3. Parsl uses dill and pickle to serialize Python objects to/from apps. Therefore, Parsl require that all input and output objects can be serialized by dill or pickle. See :ref:`label_serialization_error`. +1. Functions should act only on defined input arguments. That is, they should not use script-level + or global variables. +2. Functions must explicitly import any required modules if they are defined in script which starts + Parsl. +3. Parsl uses dill and pickle to serialize Python objects to/from apps. Therefore, Parsl require + that all input and output objects can be serialized by dill or pickle. See :ref:`label_serialization_error`. 4. STDOUT and STDERR produced by Python apps remotely are not captured. @@ -299,39 +303,50 @@ Bash Apps print(f.read()) -A Parsl Bash app executes an external application by making a command-line execution. -Parsl will execute the string returned by the function as a command-line script on a remote worker. +A Parsl Bash app executes an external application by making a command-line execution. Parsl will +execute the string returned by the function as a command-line script on a remote worker. + Rules for Function Contents ^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Bash Apps follow the same rules :ref:`as Python Apps `. -For example, imports may need to be inside functions and global variables will be inaccessible. +Bash Apps follow the same rules :ref:`as Python Apps `. For example, imports may +need to be inside functions and global variables will be inaccessible. + Inputs and Outputs ^^^^^^^^^^^^^^^^^^ Bash Apps can use the same kinds of inputs as Python Apps, but only communicate results with Files. -The Bash Apps, unlike Python Apps, can also return the content printed to the Standard Output and Error. +The Bash Apps, unlike Python Apps, can also return the content printed to the Standard Output and +Error. + Special Keywords Arguments ++++++++++++++++++++++++++ -In addition to the ``inputs``, ``outputs``, and ``walltime`` keyword arguments -described above, a Bash app can accept the following keywords: +In addition to the ``inputs``, ``outputs``, and ``walltime`` keyword arguments described above, a +Bash app can accept the following keywords: -1. stdout: (string, tuple or ``parsl.AUTO_LOGNAME``) The path to a file to which standard output should be redirected. If set to ``parsl.AUTO_LOGNAME``, the log will be automatically named according to task id and saved under ``task_logs`` in the run directory. If set to a tuple ``(filename, mode)``, standard output will be redirected to the named file, opened with the specified mode as used by the Python `open `_ function. +1. stdout: (string, tuple or ``parsl.AUTO_LOGNAME``) The path to a file to which standard output + should be redirected. If set to ``parsl.AUTO_LOGNAME``, the log will be automatically named + according to task id and saved under ``task_logs`` in the run directory. If set to a tuple + ``(filename, mode)``, standard output will be redirected to the named file, opened with the + specified mode as used by the Python `open `_ + function. 2. stderr: (string or ``parsl.AUTO_LOGNAME``) Like stdout, but for the standard error stream. -3. label: (string) If the app is invoked with ``stdout=parsl.AUTO_LOGNAME`` or ``stderr=parsl.AUTO_LOGNAME``, this argument will be appended to the log name. +3. label: (string) If the app is invoked with ``stdout=parsl.AUTO_LOGNAME`` or + ``stderr=parsl.AUTO_LOGNAME``, this argument will be appended to the log name. + Outputs +++++++ -If the Bash app exits with Unix exit code 0, then the AppFuture will complete. If the Bash app -exits with any other code, Parsl will treat this as a failure, and the AppFuture will instead -contain an `BashExitFailure` exception. The Unix exit code can be accessed through the -``exitcode`` attribute of that `BashExitFailure`. +If the Bash app exits with Unix exit code 0, then the AppFuture will complete. If the Bash app exits +with any other code, Parsl will treat this as a failure, and the AppFuture will instead contain an +`BashExitFailure` exception. The Unix exit code can be accessed through the ``exitcode`` attribute +of that `BashExitFailure`. Execution Options @@ -342,6 +357,6 @@ Bash Apps have the same execution options (e.g., pinning to specific sites) as t MPI Apps ^^^^^^^^ -Applications which employ MPI to span multiple nodes are a special case of Bash apps, -and require special modification of Parsl's `execution environment `_ to function. -Support for MPI applications is described `in a later section `_. +Applications which employ MPI to span multiple nodes are a special case of Bash apps, and require +special modification of Parsl's `execution environment `_ to function. Support for +MPI applications is described `in a later section `_. diff --git a/docs/userguide/checkpoints.rst b/docs/userguide/checkpoints.rst index 8867107b7a..8984599feb 100644 --- a/docs/userguide/checkpoints.rst +++ b/docs/userguide/checkpoints.rst @@ -3,8 +3,8 @@ Memoization and checkpointing ----------------------------- -When an app is invoked several times with the same parameters, Parsl can -reuse the result from the first invocation without executing the app again. +When an app is invoked several times with the same parameters, Parsl can reuse the result from the +first invocation without executing the app again. This can save time and computational resources. @@ -12,60 +12,53 @@ This is done in two ways: * Firstly, *app caching* will allow reuse of results within the same run. -* Building on top of that, *checkpointing* will store results on the filesystem - and reuse those results in later runs. +* Building on top of that, *checkpointing* will store results on the filesystem and reuse those +results in later runs. + .. _label-appcaching: App caching =========== +There are many situations in which a program may be re-executed over time. Often, large fragments of +the program will not have changed and therefore, re-execution of apps will waste valuable time and +computation resources. Parsl's app caching solves this problem by storing results from apps that +have successfully completed so that they can be re-used. -There are many situations in which a program may be re-executed -over time. Often, large fragments of the program will not have changed -and therefore, re-execution of apps will waste valuable time and -computation resources. Parsl's app caching solves this problem by -storing results from apps that have successfully completed -so that they can be re-used. - -App caching is enabled by setting the ``cache`` -argument in the :func:`~parsl.app.app.python_app` or :func:`~parsl.app.app.bash_app` -decorator to ``True`` (by default it is ``False``). +App caching is enabled by setting the ``cache`` argument in the :func:`~parsl.app.app.python_app` or +:func:`~parsl.app.app.bash_app` decorator to ``True`` (by default it is ``False``). .. code-block:: python @bash_app(cache=True) def hello (msg, stdout=None): return 'echo {}'.format(msg) - -App caching can be globally disabled by setting ``app_cache=False`` -in the :class:`~parsl.config.Config`. -App caching can be particularly useful when developing interactive programs such as when -using a Jupyter notebook. In this case, cells containing apps are often re-executed -during development. Using app caching will ensure that only modified apps are re-executed. +App caching can be globally disabled by setting ``app_cache=False`` in the :class:`~parsl.config.Config`. +App caching can be particularly useful when developing interactive programs such as when using a +Jupyter notebook. In this case, cells containing apps are often re-executed during development. +Using app caching will ensure that only modified apps are re-executed. -App equivalence + +App equivalence ^^^^^^^^^^^^^^^ -Parsl determines app equivalence using the name of the app function: -if two apps have the same name, then they are equivalent under this -relation. +Parsl determines app equivalence using the name of the app function: if two apps have the same name, +then they are equivalent under this relation. -Changes inside the app, or by functions called by an app will not invalidate -cached values. +Changes inside the app, or by functions called by an app will not invalidate cached values. -There are lots of other ways functions might be compared for equivalence, -and `parsl.dataflow.memoization.id_for_memo` provides a hook to plug in -alternate application-specific implementations. +There are lots of other ways functions might be compared for equivalence, and +`parsl.dataflow.memoization.id_for_memo` provides a hook to plug in alternate application-specific +implementations. -Invocation equivalence +Invocation equivalence ^^^^^^^^^^^^^^^^^^^^^^ -Two app invocations are determined to be equivalent if their -input arguments are identical. +Two app invocations are determined to be equivalent if their input arguments are identical. In simple cases, this follows obvious rules: @@ -80,31 +73,28 @@ In simple cases, this follows obvious rules: f(y).result() -Internally, equivalence is determined by hashing the input arguments, and -comparing the hash to hashes from previous app executions. +Internally, equivalence is determined by hashing the input arguments, and comparing the hash to +hashes from previous app executions. + +This approach can only be applied to data types for which a deterministic hash can be computed. -This approach can only be applied to data types for which a deterministic hash -can be computed. +By default Parsl can compute sensible hashes for basic data types: str, int, float, None, as well as +more some complex types: functions, and dictionaries and lists containing hashable types. -By default Parsl can compute sensible hashes for basic data types: -str, int, float, None, as well as more some complex types: -functions, and dictionaries and lists containing hashable types. +Attempting to cache apps invoked with other, non-hashable, data types will lead to an exception at +invocation. -Attempting to cache apps invoked with other, non-hashable, data types will -lead to an exception at invocation. +In that case, mechanisms to hash new types can be registered by a program by implementing the +`parsl.dataflow.memoization.id_for_memo` function for the new type. -In that case, mechanisms to hash new types can be registered by a program by -implementing the `parsl.dataflow.memoization.id_for_memo` function for -the new type. Ignoring arguments ^^^^^^^^^^^^^^^^^^ -On occasion one may wish to ignore particular arguments when determining -app invocation equivalence - for example, when generating log file -names automatically based on time or run information. -Parsl allows developers to list the arguments to be ignored -in the ``ignore_for_cache`` app decorator parameter: +On occasion one may wish to ignore particular arguments when determining app invocation equivalence - +for example, when generating log file names automatically based on time or run information. Parsl +allows developers to list the arguments to be ignored in the ``ignore_for_cache`` app decorator +parameter: .. code-block:: python @@ -118,59 +108,52 @@ Caveats It is important to consider several important issues when using app caching: -- Determinism: App caching is generally useful only when the apps are deterministic. - If the outputs may be different for identical inputs, app caching will obscure - this non-deterministic behavior. For instance, caching an app that returns - a random number will result in every invocation returning the same result. +- Determinism: App caching is generally useful only when the apps are deterministic. If the outputs + may be different for identical inputs, app caching will obscure this non-deterministic behavior. + For instance, caching an app that returns a random number will result in every invocation + returning the same result. -- Timing: If several identical calls to an app are made concurrently having - not yet cached a result, many instances of the app will be launched. - Once one invocation completes and the result is cached +- Timing: If several identical calls to an app are made concurrently having not yet cached a result, + many instances of the app will be launched. Once one invocation completes and the result is cached all subsequent calls will return immediately with the cached result. -- Performance: If app caching is enabled, there may be some performance - overhead especially if a large number of short duration tasks are launched rapidly. - This overhead has not been quantified. - +- Performance: If app caching is enabled, there may be some performance overhead especially if a + large number of short duration tasks are launched rapidly. This overhead has not been quantified. + + .. _label-checkpointing: Checkpointing ============= -Large-scale Parsl programs are likely to encounter errors due to node failures, -application or environment errors, and myriad other issues. Parsl offers an -application-level checkpointing model to improve resilience, fault tolerance, and -efficiency. +Large-scale Parsl programs are likely to encounter errors due to node failures, application or +environment errors, and myriad other issues. Parsl offers an application-level checkpointing model +to improve resilience, fault tolerance, and efficiency. .. note:: - Checkpointing builds on top of app caching, and so app caching must be - enabled. If app caching is disabled in the config ``Config.app_cache``, checkpointing will - not work. + Checkpointing builds on top of app caching, and so app caching must be enabled. If app caching is +disabled in the config ``Config.app_cache``, checkpointing will not work. -Parsl follows an incremental checkpointing model, where each checkpoint file contains -all results that have been updated since the last checkpoint. +Parsl follows an incremental checkpointing model, where each checkpoint file contains all results +that have been updated since the last checkpoint. -When a Parsl program loads a checkpoint file and is executed, it will use -checkpointed results for any apps that have been previously executed. -Like app caching, checkpoints -use the hash of the app and the invocation input parameters to identify previously computed -results. If multiple checkpoints exist for an app (with the same hash) -the most recent entry will be used. +When a Parsl program loads a checkpoint file and is executed, it will use checkpointed results for +any apps that have been previously executed. Like app caching, checkpoints use the hash of the app +and the invocation input parameters to identify previously computed results. If multiple checkpoints +exist for an app (with the same hash) the most recent entry will be used. Parsl provides four checkpointing modes: -1. ``task_exit``: a checkpoint is created each time an app completes or fails - (after retries if enabled). This mode minimizes the risk of losing information - from completed tasks. +1. ``task_exit``: a checkpoint is created each time an app completes or fails (after retries if + enabled). This mode minimizes the risk of losing information from completed tasks. .. code-block:: python from parsl.configs.local_threads import config config.checkpoint_mode = 'task_exit' -2. ``periodic``: a checkpoint is created periodically using a user-specified - checkpointing interval. Results will be saved to the checkpoint file for - all tasks that have completed during this period. +2. ``periodic``: a checkpoint is created periodically using a user-specified checkpointing interval. + Results will be saved to the checkpoint file for all tasks that have completed during this period. .. code-block:: python @@ -178,20 +161,18 @@ Parsl provides four checkpointing modes: config.checkpoint_mode = 'periodic' config.checkpoint_period = "01:00:00" -3. ``dfk_exit``: checkpoints are created when Parsl is - about to exit. This reduces the risk of losing results due to - premature program termination from exceptions, terminate signals, etc. However - it is still possible that information might be lost if the program is - terminated abruptly (machine failure, SIGKILL, etc.) +3. ``dfk_exit``: checkpoints are created when Parsl is about to exit. This reduces the risk of + losing results due to premature program termination from exceptions, terminate signals, etc. + However, it is still possible that information might be lost if the program is terminated + abruptly (machine failure, SIGKILL, etc.) .. code-block:: python from parsl.configs.local_threads import config config.checkpoint_mode = 'dfk_exit' -4. ``manual``: in addition to these automated checkpointing modes, it is also possible - to manually initiate a checkpoint by calling ``DataFlowKernel.checkpoint()`` in the - Parsl program code. +4. ``manual``: in addition to these automated checkpointing modes, it is also possible to manually + initiate a checkpoint by calling ``DataFlowKernel.checkpoint()`` in the Parsl program code. .. code-block:: python @@ -203,17 +184,18 @@ Parsl provides four checkpointing modes: In all cases the checkpoint file is written out to the ``runinfo/RUN_ID/checkpoint/`` directory. -.. Note:: Checkpoint modes ``periodic``, ``dfk_exit``, and ``manual`` can interfere with garbage collection. - In these modes task information will be retained after completion, until checkpointing events are triggered. +.. note:: + Checkpoint modes ``periodic``, ``dfk_exit``, and ``manual`` can interfere with garbage collection. + In these modes task information will be retained after completion, until checkpointing events are triggered. Creating a checkpoint ^^^^^^^^^^^^^^^^^^^^^ -Automated checkpointing must be explicitly enabled in the Parsl configuration. -There is no need to modify a Parsl program as checkpointing will occur transparently. -In the following example, checkpointing is enabled at task exit. The results of -each invocation of the ``slow_double`` app will be stored in the checkpoint file. +Automated checkpointing must be explicitly enabled in the Parsl configuration. There is no need to +modify a Parsl program as checkpointing will occur transparently. In the following example, +checkpointing is enabled at task exit. The results of each invocation of the ``slow_double`` app +will be stored in the checkpoint file. .. code-block:: python @@ -237,10 +219,10 @@ each invocation of the ``slow_double`` app will be stored in the checkpoint file print([d[i].result() for i in range(5)]) -Alternatively, manual checkpointing can be used to explictly specify when the checkpoint -file should be saved. The following example shows how manual checkpointing can be used. -Here, the ``dfk.checkpoint()`` function will save the results of the prior invocations -of the ``slow_double`` app. +Alternatively, manual checkpointing can be used to explicitly specify when the checkpoint file should +be saved. The following example shows how manual checkpointing can be used. Here, the +``dfk.checkpoint()`` function will save the results of the prior invocations of the ``slow_double`` +app. .. code-block:: python @@ -271,14 +253,12 @@ of the ``slow_double`` app. Resuming from a checkpoint ^^^^^^^^^^^^^^^^^^^^^^^^^^ -When resuming a program from a checkpoint Parsl allows the user to select -which checkpoint file(s) to use. -Checkpoint files are stored in the ``runinfo/RUNID/checkpoint`` directory. +When resuming a program from a checkpoint Parsl allows the user to select which checkpoint file(s) +to use. Checkpoint files are stored in the ``runinfo/RUNID/checkpoint`` directory. -The example below shows how to resume using all available checkpoints. -Here, the program re-executes the same calls to the ``slow_double`` app -as above and instead of waiting for results to be computed, the values -from the checkpoint file are are immediately returned. +The example below shows how to resume using all available checkpoints. Here, the program re-executes +the same calls to the ``slow_double`` app as above and instead of waiting for results to be computed, +the values from the checkpoint file are are immediately returned. .. code-block:: python @@ -289,8 +269,8 @@ from the checkpoint file are are immediately returned. config.checkpoint_files = get_all_checkpoints() parsl.load(config) - - # Rerun the same workflow + + # Rerun the same workflow d = [] for i in range(5): d.append(slow_double(i)) diff --git a/docs/userguide/configuring.rst b/docs/userguide/configuring.rst index 88d4456a26..7eb7345c17 100644 --- a/docs/userguide/configuring.rst +++ b/docs/userguide/configuring.rst @@ -3,22 +3,19 @@ Configuration ============= -Parsl separates program logic from execution configuration, enabling -programs to be developed entirely independently from their execution -environment. Configuration is described by a Python object (:class:`~parsl.config.Config`) -so that developers can -introspect permissible options, validate settings, and retrieve/edit -configurations dynamically during execution. A configuration object specifies -details of the provider, executors, allocation size, -queues, durations, and data management options. - -The following example shows a basic configuration object (:class:`~parsl.config.Config`) for the Frontera -supercomputer at TACC. -This config uses the `parsl.executors.HighThroughputExecutor` to submit -tasks from a login node. It requests an allocation of -128 nodes, deploying 1 worker for each of the 56 cores per node, from the normal partition. -To limit network connections to just the internal network the config specifies the address -used by the infiniband interface with ``address_by_interface('ib0')`` +Parsl separates program logic from execution configuration, enabling programs to be developed +entirely independently from their execution environment. Configuration is described by a Python +object (:class:`~parsl.config.Config`) so that developers can introspect permissible options, +validate settings, and retrieve/edit configurations dynamically during execution. A configuration +object specifies details of the provider, executors, allocation size, queues, durations, and data +management options. + +The following example shows a basic configuration object (:class:`~parsl.config.Config`) for the +Frontera supercomputer at TACC. This config uses the `parsl.executors.HighThroughputExecutor` to +submit tasks from a login node. It requests an allocation of 128 nodes, deploying 1 worker for each +of the 56 cores per node, from the normal partition. To limit network connections to just the +internal network the config specifies the address used by the infiniband interface with +``address_by_interface('ib0')`` .. code-block:: python @@ -37,7 +34,7 @@ used by the infiniband interface with ``address_by_interface('ib0')`` provider=SlurmProvider( nodes_per_block=128, init_blocks=1, - partition='normal', + partition='normal', launcher=SrunLauncher(), ), ) @@ -50,8 +47,9 @@ used by the infiniband interface with ``address_by_interface('ib0')`` Creating and Using Config Objects --------------------------------- -:class:`~parsl.config.Config` objects are loaded to define the "Data Flow Kernel" (DFK) that will manage tasks. -All Parsl applications start by creating or importing a configuration then calling the load function. +:class:`~parsl.config.Config` objects are loaded to define the "Data Flow Kernel" (DFK) that will +manage tasks. All Parsl applications start by creating or importing a configuration then calling the +load function. .. code-block:: python @@ -61,11 +59,11 @@ All Parsl applications start by creating or importing a configuration then calli with parsl.load(config): The ``load`` statement can happen after Apps are defined but must occur before tasks are started. -Loading the Config object within context manager like ``with`` is recommended -for implicit cleaning of DFK on exiting the context manager +Loading the Config object within context manager like ``with`` is recommended for implicit cleaning +of DFK on exiting the context manager -The :class:`~parsl.config.Config` object may not be used again after loaded. -Consider a configuration function if the application will shut down and re-launch the DFK. +The :class:`~parsl.config.Config` object may not be used again after loaded. Consider a +configuration function if the application will shut down and re-launch the DFK. .. code-block:: python @@ -86,19 +84,17 @@ How to Configure ---------------- .. note:: - All configuration examples below must be customized for the user's - allocation, Python environment, file system, etc. + All configuration examples below must be customized for the user's allocation, Python environment, +file system, etc. -The configuration specifies what, and how, resources are to be used for executing -the Parsl program and its apps. -It is important to carefully consider the needs of the Parsl program and its apps, -and the characteristics of the compute resources, to determine an ideal configuration. -Aspects to consider include: -1) where the Parsl apps will execute; -2) how many nodes will be used to execute the apps, and how long the apps will run; -3) should Parsl request multiple nodes in an individual scheduler job; and -4) where will the main Parsl program run and how will it communicate with the apps. +The configuration specifies what, and how, resources are to be used for executing the Parsl program +and its apps. It is important to carefully consider the needs of the Parsl program and its apps, and +the characteristics of the compute resources, to determine an ideal configuration. Aspects to +consider include: 1) where the Parsl apps will execute; 2) how many nodes will be used to execute +the apps, and how long the apps will run; 3) should Parsl request multiple nodes in an individual +scheduler job; and 4) where will the main Parsl program run and how will it communicate with the +apps. Stepping through the following question should help formulate a suitable configuration object. @@ -134,7 +130,8 @@ Stepping through the following question should help formulate a suitable configu +---------------------+-----------------------------------------------+----------------------------------------+ -2. How many nodes will be used to execute the apps? What task durations are necessary to achieve good performance? +2. How many nodes will be used to execute the apps? What task durations are necessary to achieve + good performance? +--------------------------------------------+----------------------+-------------------------------------+ @@ -151,17 +148,17 @@ Stepping through the following question should help formulate a suitable configu +--------------------------------------------+----------------------+-------------------------------------+ -.. [*] Assuming 32 workers per node. If there are fewer workers launched - per node, a larger number of nodes could be supported. +.. [*] Assuming 32 workers per node. If there are fewer workers launched per node, a larger number + of nodes could be supported. -.. [*] The maximum number of nodes tested for the `parsl.executors.WorkQueueExecutor` is 10,000 GPU cores and - 20,000 CPU cores. +.. [*] The maximum number of nodes tested for the `parsl.executors.WorkQueueExecutor` is 10,000 GPU + cores and 20,000 CPU cores. -.. [*] The maximum number of nodes tested for the `parsl.executors.taskvine.TaskVineExecutor` is - 10,000 GPU cores and 20,000 CPU cores. +.. [*] The maximum number of nodes tested for the `parsl.executors.taskvine.TaskVineExecutor` is + 10,000 GPU cores and 20,000 CPU cores. -3. Should Parsl request multiple nodes in an individual scheduler job? -(Here the term block is equivalent to a single scheduler job.) +3. Should Parsl request multiple nodes in an individual scheduler job? (Here the term block is + equivalent to a single scheduler job.) +--------------------------------------------------------------------------------------------+ | ``nodes_per_block = 1`` | @@ -186,29 +183,32 @@ Stepping through the following question should help formulate a suitable configu | | | * `parsl.launchers.AprunLauncher`, otherwise | +-------------------------------------+--------------------------+----------------------------------------------------+ -.. note:: If using a Cray system, you most likely need to use the `parsl.launchers.AprunLauncher` to launch workers unless you - are on a **native Slurm** system like :ref:`configuring_nersc_cori` +.. note:: + If using a Cray system, you most likely need to use the `parsl.launchers.AprunLauncher` to launch +workers unless you are on a **native Slurm** system like :ref:`configuring_nersc_cori` Heterogeneous Resources ----------------------- -In some cases, it can be difficult to specify the resource requirements for running a workflow. -For example, if the compute nodes a site provides are not uniform, there is no "correct" resource configuration; -the amount of parallelism depends on which node (large or small) each job runs on. -In addition, the software and filesystem setup can vary from node to node. -A Condor cluster may not provide shared filesystem access at all, -and may include nodes with a variety of Python versions and available libraries. - -The `parsl.executors.WorkQueueExecutor` provides several features to work with heterogeneous resources. -By default, Parsl only runs one app at a time on each worker node. -However, it is possible to specify the requirements for a particular app, -and Work Queue will automatically run as many parallel instances as possible on each node. -Work Queue automatically detects the amount of cores, memory, and other resources available on each execution node. -To activate this feature, add a resource specification to your apps. A resource specification is a dictionary with -the following three keys: ``cores`` (an integer corresponding to the number of cores required by the task), -``memory`` (an integer corresponding to the task's memory requirement in MB), and ``disk`` (an integer corresponding to -the task's disk requirement in MB), passed to an app via the special keyword argument ``parsl_resource_specification``. The specification can be set for all app invocations via a default, for example: +In some cases, it can be difficult to specify the resource requirements for running a workflow. For +example, if the compute nodes a site provides are not uniform, there is no "correct" resource +configuration; the amount of parallelism depends on which node (large or small) each job runs on. In +addition, the software and filesystem setup can vary from node to node. A Condor cluster may not +provide shared filesystem access at all, and may include nodes with a variety of Python versions and +available libraries. + +The `parsl.executors.WorkQueueExecutor` provides several features to work with heterogeneous +resources. By default, Parsl only runs one app at a time on each worker node. However, it is +possible to specify the requirements for a particular app, and Work Queue will automatically run as +many parallel instances as possible on each node. Work Queue automatically detects the amount of +cores, memory, and other resources available on each execution node. To activate this feature, add a +resource specification to your apps. A resource specification is a dictionary with the following +three keys: ``cores`` (an integer corresponding to the number of cores required by the task), +``memory`` (an integer corresponding to the task's memory requirement in MB), and ``disk`` (an +integer corresponding to the task's disk requirement in MB), passed to an app via the special +keyword argument ``parsl_resource_specification``. The specification can be set for all app +invocations via a default, for example: .. code-block:: python @@ -224,14 +224,15 @@ or updated when the app is invoked: spec = {'cores': 1, 'memory': 500, 'disk': 500} future = compute(x, parsl_resource_specification=spec) -This ``parsl_resource_specification`` special keyword argument will inform Work Queue about the resources this app requires. -When placing instances of ``compute(x)``, Work Queue will run as many parallel instances as possible based on each worker node's available resources. +This ``parsl_resource_specification`` special keyword argument will inform Work Queue about the +resources this app requires. When placing instances of ``compute(x)``, Work Queue will run as many +parallel instances as possible based on each worker node's available resources. -If an app's resource requirements are not known in advance, -Work Queue has an auto-labeling feature that measures the actual resource usage of your apps and automatically chooses resource labels for you. -With auto-labeling, it is not necessary to provide ``parsl_resource_specification``; -Work Queue collects stats in the background and updates resource labels as your workflow runs. -To activate this feature, add the following flags to your executor config: +If an app's resource requirements are not known in advance, Work Queue has an auto-labeling feature +that measures the actual resource usage of your apps and automatically chooses resource labels for +you. With auto-labeling, it is not necessary to provide ``parsl_resource_specification``; Work Queue +collects stats in the background and updates resource labels as your workflow runs. To activate this +feature, add the following flags to your executor config: .. code-block:: python @@ -245,26 +246,24 @@ To activate this feature, add the following flags to your executor config: ] ) -The ``autolabel`` flag tells Work Queue to automatically generate resource labels. -By default, these labels are shared across all apps in your workflow. -The ``autocategory`` flag puts each app into a different category, -so that Work Queue will choose separate resource requirements for each app. -This is important if e.g. some of your apps use a single core and some apps require multiple cores. -Unless you know that all apps have uniform resource requirements, -you should turn on ``autocategory`` when using ``autolabel``. - -The Work Queue executor can also help deal with sites that have non-uniform software environments across nodes. -Parsl assumes that the Parsl program and the compute nodes all use the same Python version. -In addition, any packages your apps import must be available on compute nodes. -If no shared filesystem is available or if node configuration varies, -this can lead to difficult-to-trace execution problems. - -If your Parsl program is running in a Conda environment, -the Work Queue executor can automatically scan the imports in your apps, -create a self-contained software package, -transfer the software package to worker nodes, -and run your code inside the packaged and uniform environment. -First, make sure that the Conda environment is active and you have the required packages installed (via either ``pip`` or ``conda``): +The ``autolabel`` flag tells Work Queue to automatically generate resource labels. By default, these +labels are shared across all apps in your workflow. The ``autocategory`` flag puts each app into a +different category, so that Work Queue will choose separate resource requirements for each app. This +is important if e.g. some of your apps use a single core and some apps require multiple cores. +Unless you know that all apps have uniform resource requirements, you should turn on ``autocategory`` +when using ``autolabel``. + +The Work Queue executor can also help deal with sites that have non-uniform software environments +across nodes. Parsl assumes that the Parsl program and the compute nodes all use the same Python +version. In addition, any packages your apps import must be available on compute nodes. If no shared +filesystem is available or if node configuration varies, this can lead to difficult-to-trace +execution problems. + +If your Parsl program is running in a Conda environment, the Work Queue executor can automatically +scan the imports in your apps, create a self-contained software package, transfer the software +package to worker nodes, and run your code inside the packaged and uniform environment. First, make +sure that the Conda environment is active and you have the required packages installed (via either +``pip`` or ``conda``): - ``python`` - ``parsl`` @@ -285,29 +284,31 @@ Then add the following to your config: ) .. note:: - There will be a noticeable delay the first time Work Queue sees an app; - it is creating and packaging a complete Python environment. - This packaged environment is cached, so subsequent app invocations should be much faster. + There will be a noticeable delay the first time Work Queue sees an app; it is creating and +packaging a complete Python environment. This packaged environment is cached, so subsequent app +invocations should be much faster. -Using this approach, it is possible to run Parsl applications on nodes that don't have Python available at all. -The packaged environment includes a Python interpreter, -and Work Queue does not require Python to run. +Using this approach, it is possible to run Parsl applications on nodes that don't have Python +available at all. The packaged environment includes a Python interpreter, and Work Queue does not +require Python to run. .. note:: The automatic packaging feature only supports packages installed via ``pip`` or ``conda``. - Importing from other locations (e.g. via ``$PYTHONPATH``) or importing other modules in the same directory is not supported. +Importing from other locations (e.g. via ``$PYTHONPATH``) or importing other modules in the same +directory is not supported. Accelerators ------------ -Many modern clusters provide multiple accelerators per compute note, yet many applications are best suited to using a -single accelerator per task. Parsl supports pinning each worker to different accelerators using -``available_accelerators`` option of the :class:`~parsl.executors.HighThroughputExecutor`. Provide either the number of -executors (Parsl will assume they are named in integers starting from zero) or a list of the names of the accelerators -available on the node. Parsl will limit the number of workers it launches to the number of accelerators specified, -in other words, you cannot have more workers per node than there are accelerators. By default, Parsl will launch -as many workers as the accelerators specified via ``available_accelerators``. +Many modern clusters provide multiple accelerators per compute note, yet many applications are best +suited to using a single accelerator per task. Parsl supports pinning each worker to different +accelerators using ``available_accelerators`` option of the :class:`~parsl.executors.HighThroughputExecutor`. +Provide either the number of executors (Parsl will assume they are named in integers starting from +zero) or a list of the names of the accelerators available on the node. Parsl will limit the number +of workers it launches to the number of accelerators specified, in other words, you cannot have more +workers per node than there are accelerators. By default, Parsl will launch as many workers as the +accelerators specified via ``available_accelerators``. .. code-block:: python @@ -326,9 +327,10 @@ as many workers as the accelerators specified via ``available_accelerators``. strategy='none', ) -It is possible to bind multiple/specific accelerators to each worker by specifying a list of comma separated strings -each specifying accelerators. In the context of binding to NVIDIA GPUs, this works by setting ``CUDA_VISIBLE_DEVICES`` -on each worker to a specific string in the list supplied to ``available_accelerators``. +It is possible to bind multiple/specific accelerators to each worker by specifying a list of comma +separated strings each specifying accelerators. In the context of binding to NVIDIA GPUs, this works +by setting ``CUDA_VISIBLE_DEVICES``on each worker to a specific string in the list supplied to +``available_accelerators``. Here's an example: @@ -347,32 +349,38 @@ Here's an example: ], ) + GPU Oversubscription """""""""""""""""""" -For hardware that uses Nvidia devices, Parsl allows for the oversubscription of workers to GPUS. This is intended to -make use of Nvidia's `Multi-Process Service (MPS) `_ available on many of their -GPUs that allows users to run multiple concurrent processes on a single GPU. The user needs to set in the -``worker_init`` commands to start MPS on every node in the block (this is machine dependent). The -``available_accelerators`` option should then be set to the total number of GPU partitions run on a single node in the -block. For example, for a node with 4 Nvidia GPUs, to create 8 workers per GPU, set ``available_accelerators=32``. -GPUs will be assigned to workers in ascending order in contiguous blocks. In the example, workers 0-7 will be placed -on GPU 0, workers 8-15 on GPU 1, workers 16-23 on GPU 2, and workers 24-31 on GPU 3. - +For hardware that uses Nvidia devices, Parsl allows for the oversubscription of workers to GPUS. +This is intended to make use of Nvidia's `Multi-Process Service (MPS) `_ +available on many of their GPUs that allows users to run multiple concurrent processes on a single +GPU. The user needs to set in the ``worker_init`` commands to start MPS on every node in the block +(this is machine dependent). The ``available_accelerators`` option should then be set to the total +number of GPU partitions run on a single node in the block. For example, for a node with 4 Nvidia +GPUs, to create 8 workers per GPU, set ``available_accelerators=32``. GPUs will be assigned to +workers in ascending order in contiguous blocks. In the example, workers 0-7 will be placed on GPU +0, workers 8-15 on GPU 1, workers 16-23 on GPU 2, and workers 24-31 on GPU 3. + + Multi-Threaded Applications --------------------------- -Workflows which launch multiple workers on a single node which perform multi-threaded tasks (e.g., NumPy, Tensorflow operations) may run into thread contention issues. -Each worker may try to use the same hardware threads, which leads to performance penalties. -Use the ``cpu_affinity`` feature of the :class:`~parsl.executors.HighThroughputExecutor` to assign workers to specific threads. Users can pin threads to -workers either with a strategy method or an explicit list. +Workflows which launch multiple workers on a single node which perform multi-threaded tasks (e.g., +NumPy, Tensorflow operations) may run into thread contention issues. Each worker may try to use the +same hardware threads, which leads to performance penalties. Use the ``cpu_affinity`` feature of the +:class:`~parsl.executors.HighThroughputExecutor` to assign workers to specific threads. Users can +pin threads to workers either with a strategy method or an explicit list. -The strategy methods will auto assign all detected hardware threads to workers. -Allowed strategies that can be assigned to ``cpu_affinity`` are ``block``, ``block-reverse``, and ``alternating``. -The ``block`` method pins threads to workers in sequential order (ex: 4 threads are grouped (0, 1) and (2, 3) on two workers); -``block-reverse`` pins threads in reverse sequential order (ex: (3, 2) and (1, 0)); and ``alternating`` alternates threads among workers (ex: (0, 2) and (1, 3)). +The strategy methods will auto assign all detected hardware threads to workers. Allowed strategies +that can be assigned to ``cpu_affinity`` are ``block``, ``block-reverse``, and ``alternating``. The +``block`` method pins threads to workers in sequential order (ex: 4 threads are grouped (0, 1) and +(2, 3) on two workers); ``block-reverse`` pins threads in reverse sequential order (ex: (3, 2) and +(1, 0)); and ``alternating`` alternates threads among workers (ex: (0, 2) and (1, 3)). -Select the best blocking strategy for processor's cache hierarchy (choose ``alternating`` if in doubt) to ensure workers to not compete for cores. +Select the best blocking strategy for processor's cache hierarchy (choose ``alternating`` if in +doubt) to ensure workers to not compete for cores. .. code-block:: python @@ -391,8 +399,8 @@ Select the best blocking strategy for processor's cache hierarchy (choose ``alte strategy='none', ) -Users can also use ``cpu_affinity`` to assign explicitly threads to workers with a string that has the format of -``cpu_affinity="list:::"``. +Users can also use ``cpu_affinity`` to assign explicitly threads to workers with a string that has +the format of ``cpu_affinity="list:::"``. Each worker's threads can be specified as a comma separated list or a hyphenated range: ``thread1,thread2,thread3`` @@ -405,30 +413,29 @@ An example for 12 workers on a node with 208 threads is: cpu_affinity="list:0-7,104-111:8-15,112-119:16-23,120-127:24-31,128-135:32-39,136-143:40-47,144-151:52-59,156-163:60-67,164-171:68-75,172-179:76-83,180-187:84-91,188-195:92-99,196-203" -This example assigns 16 threads each to 12 workers. Note that in this example there are threads that are skipped. -If a thread is not explicitly assigned to a worker, it will be left idle. -The number of thread "ranks" (colon separated thread lists/ranges) must match the total number of workers on the node; otherwise an exception will be raised. +This example assigns 16 threads each to 12 workers. Note that in this example there are threads that +are skipped. If a thread is not explicitly assigned to a worker, it will be left idle. The number of +thread "ranks" (colon separated thread lists/ranges) must match the total number of workers on the +node; otherwise an exception will be raised. +Thread affinity is accomplished in two ways. Each worker first sets the affinity for the Python +process using `the affinity mask `_, +which may not be available on all operating systems. It then sets environment variables to control +`OpenMP thread affinity `_ so that +any subprocesses launched by a worker which use OpenMP know which processors are valid. These +include ``OMP_NUM_THREADS``, ``GOMP_COMP_AFFINITY``, and ``KMP_THREAD_AFFINITY``. -Thread affinity is accomplished in two ways. -Each worker first sets the affinity for the Python process using `the affinity mask `_, -which may not be available on all operating systems. -It then sets environment variables to control -`OpenMP thread affinity `_ -so that any subprocesses launched by a worker which use OpenMP know which processors are valid. -These include ``OMP_NUM_THREADS``, ``GOMP_COMP_AFFINITY``, and ``KMP_THREAD_AFFINITY``. - Ad-Hoc Clusters --------------- -Parsl's support of ad-hoc clusters of compute nodes without a scheduler -is deprecated. +Parsl's support of ad-hoc clusters of compute nodes without a scheduler is deprecated. See `issue #3515 `_ for further discussion. + Amazon Web Services ------------------- @@ -437,10 +444,12 @@ Amazon Web Services .. note:: To use AWS with Parsl, install Parsl with AWS dependencies via ``python3 -m pip install 'parsl[aws]'`` -Amazon Web Services is a commercial cloud service which allows users to rent a range of computers and other computing services. -The following snippet shows how Parsl can be configured to provision nodes from the Elastic Compute Cloud (EC2) service. -The first time this configuration is used, Parsl will configure a Virtual Private Cloud and other networking and security infrastructure that will be -re-used in subsequent executions. The configuration uses the `parsl.providers.AWSProvider` to connect to AWS. +Amazon Web Services is a commercial cloud service which allows users to rent a range of computers +and other computing services. The following snippet shows how Parsl can be configured to provision +nodes from the Elastic Compute Cloud (EC2) service. The first time this configuration is used, Parsl +will configure a Virtual Private Cloud and other networking and security infrastructure that will be +re-used in subsequent executions. The configuration uses the `parsl.providers.AWSProvider` to +connect to AWS. .. literalinclude:: ../../parsl/configs/ec2.py @@ -450,45 +459,48 @@ ASPIRE 1 (NSCC) .. image:: https://www.nscc.sg/wp-content/uploads/2017/04/ASPIRE1Img.png -The following snippet shows an example configuration for accessing NSCC's **ASPIRE 1** supercomputer. This example uses the `parsl.executors.HighThroughputExecutor` executor and connects to ASPIRE1's PBSPro scheduler. It also shows how ``scheduler_options`` parameter could be used for scheduling array jobs in PBSPro. +The following snippet shows an example configuration for accessing NSCC's **ASPIRE 1** supercomputer. +This example uses the `parsl.executors.HighThroughputExecutor` executor and connects to ASPIRE1's +PBSPro scheduler. It also shows how ``scheduler_options`` parameter could be used for scheduling +array jobs in PBSPro. .. literalinclude:: ../../parsl/configs/ASPIRE1.py - - Illinois Campus Cluster (UIUC) ------------------------------ .. image:: https://campuscluster.illinois.edu/wp-content/uploads/2018/02/ND2_3633-sm.jpg The following snippet shows an example configuration for executing on the Illinois Campus Cluster. -The configuration assumes the user is running on a login node and uses the `parsl.providers.SlurmProvider` to interface -with the scheduler, and uses the `parsl.launchers.SrunLauncher` to launch workers. +The configuration assumes the user is running on a login node and uses the `parsl.providers.SlurmProvider` +to interface with the scheduler, and uses the `parsl.launchers.SrunLauncher` to launch workers. .. literalinclude:: ../../parsl/configs/illinoiscluster.py + Bridges (PSC) ------------- .. image:: https://insidehpc.com/wp-content/uploads/2016/08/Bridges_FB1b.jpg -The following snippet shows an example configuration for executing on the Bridges supercomputer at the Pittsburgh Supercomputing Center. -The configuration assumes the user is running on a login node and uses the `parsl.providers.SlurmProvider` to interface -with the scheduler, and uses the `parsl.launchers.SrunLauncher` to launch workers. +The following snippet shows an example configuration for executing on the Bridges supercomputer at +the Pittsburgh Supercomputing Center. The configuration assumes the user is running on a login node +and uses the `parsl.providers.SlurmProvider` to interface with the scheduler, and uses the +`parsl.launchers.SrunLauncher` to launch workers. .. literalinclude:: ../../parsl/configs/bridges.py - CC-IN2P3 -------- .. image:: https://cc.in2p3.fr/wp-content/uploads/2017/03/bandeau_accueil.jpg -The snippet below shows an example configuration for executing from a login node on IN2P3's Computing Centre. -The configuration uses the `parsl.providers.LocalProvider` to run on a login node primarily to avoid GSISSH, which Parsl does not support. -This system uses Grid Engine which Parsl interfaces with using the `parsl.providers.GridEngineProvider`. +The snippet below shows an example configuration for executing from a login node on IN2P3's +Computing Centre. The configuration uses the `parsl.providers.LocalProvider` to run on a login node +primarily to avoid GSISSH, which Parsl does not support. This system uses Grid Engine which Parsl +interfaces with using the `parsl.providers.GridEngineProvider`. .. literalinclude:: ../../parsl/configs/cc_in2p3.py @@ -498,8 +510,9 @@ CCL (Notre Dame, TaskVine) .. image:: https://ccl.cse.nd.edu/software/taskvine/taskvine-logo.png -To utilize TaskVine with Parsl, please install the full CCTools software package within an appropriate Anaconda or Miniconda environment -(instructions for installing Miniconda can be found `in the Conda install guide `_): +To utilize TaskVine with Parsl, please install the full CCTools software package within an +appropriate Anaconda or Miniconda environment (instructions for installing Miniconda can be found +`in the Conda install guide `_): .. code-block:: bash @@ -507,26 +520,30 @@ To utilize TaskVine with Parsl, please install the full CCTools software package $ conda activate $ conda install -y -c conda-forge ndcctools parsl -This creates a Conda environment on your machine with all the necessary tools and setup needed to utilize TaskVine with the Parsl library. +This creates a Conda environment on your machine with all the necessary tools and setup needed to +utilize TaskVine with the Parsl library. -The following snippet shows an example configuration for using the Parsl/TaskVine executor to run applications on the local machine. -This examples uses the `parsl.executors.taskvine.TaskVineExecutor` to schedule tasks, and a local worker will be started automatically. -For more information on using TaskVine, including configurations for remote execution, visit the +The following snippet shows an example configuration for using the Parsl/TaskVine executor to run +applications on the local machine. This examples uses the `parsl.executors.taskvine.TaskVineExecutor` +to schedule tasks, and a local worker will be started automatically. For more information on using +TaskVine, including configurations for remote execution, visit the `TaskVine/Parsl documentation online `_. .. literalinclude:: ../../parsl/configs/vineex_local.py -TaskVine's predecessor, WorkQueue, may continue to be used with Parsl. -For more information on using WorkQueue visit the `CCTools documentation online `_. +TaskVine's predecessor, WorkQueue, may continue to be used with Parsl. For more information on using +WorkQueue visit the `CCTools documentation online `_. + Expanse (SDSC) -------------- .. image:: https://www.hpcwire.com/wp-content/uploads/2019/07/SDSC-Expanse-graphic-cropped.jpg -The following snippet shows an example configuration for executing remotely on San Diego Supercomputer -Center's **Expanse** supercomputer. The example is designed to be executed on the login nodes, using the -`parsl.providers.SlurmProvider` to interface with the Slurm scheduler used by Comet and the `parsl.launchers.SrunLauncher` to launch workers. +The following snippet shows an example configuration for executing remotely on San Diego +Supercomputer Center's **Expanse** supercomputer. The example is designed to be executed on the +login nodes, using the `parsl.providers.SlurmProvider` to interface with the Slurm scheduler used by +Comet and the `parsl.launchers.SrunLauncher` to launch workers. .. literalinclude:: ../../parsl/configs/expanse.py @@ -536,21 +553,22 @@ Improv (Argonne LCRC) .. image:: https://www.lcrc.anl.gov/sites/default/files/styles/965_wide/public/2023-12/20231214_114057.jpg?itok=A-Rz5pP9 -**Improv** is a PBS Pro based supercomputer at Argonne's Laboratory Computing Resource -Center (LCRC). The following snippet is an example configuration that uses `parsl.providers.PBSProProvider` -and `parsl.launchers.MpiRunLauncher` to run on multinode jobs. +**Improv** is a PBS Pro based supercomputer at Argonne's Laboratory Computing Resource Center (LCRC). +The following snippet is an example configuration that uses `parsl.providers.PBSProProvider` and +`parsl.launchers.MpiRunLauncher` to run on multinode jobs. .. literalinclude:: ../../parsl/configs/improv.py .. _configuring_nersc_cori: + Perlmutter (NERSC) ------------------ NERSC provides documentation on `how to use Parsl on Perlmutter `_. -Perlmutter is a Slurm based HPC system and parsl uses `parsl.providers.SlurmProvider` with `parsl.launchers.SrunLauncher` -to launch tasks onto this machine. +Perlmutter is a Slurm based HPC system and parsl uses `parsl.providers.SlurmProvider` with +`parsl.launchers.SrunLauncher` to launch tasks onto this machine. Frontera (TACC) @@ -558,9 +576,11 @@ Frontera (TACC) .. image:: https://frontera-portal.tacc.utexas.edu/media/filer_public/2c/fb/2cfbf6ab-818d-42c8-b4d5-9b39eb9d0a05/frontera-banner-home.jpg -Deployed in June 2019, Frontera is the 5th most powerful supercomputer in the world. Frontera replaces the NSF Blue Waters system at NCSA -and is the first deployment in the National Science Foundation's petascale computing program. The configuration below assumes that the user is -running on a login node and uses the `parsl.providers.SlurmProvider` to interface with the scheduler, and uses the `parsl.launchers.SrunLauncher` to launch workers. +Deployed in June 2019, Frontera is the 5th most powerful supercomputer in the world. Frontera +replaces the NSF Blue Waters system at NCSA and is the first deployment in the National Science +Foundation's petascale computing program. The configuration below assumes that the user is running +on a login node and uses the `parsl.providers.SlurmProvider` to interface with the scheduler, and +uses the `parsl.launchers.SrunLauncher` to launch workers. .. literalinclude:: ../../parsl/configs/frontera.py @@ -570,9 +590,10 @@ Kubernetes Clusters .. image:: https://d1.awsstatic.com/PAC/kuberneteslogo.eabc6359f48c8e30b7a138c18177f3fd39338e05.png -Kubernetes is an open-source system for container management, such as automating deployment and scaling of containers. -The snippet below shows an example configuration for deploying pods as workers on a Kubernetes cluster. -The KubernetesProvider exploits the Python Kubernetes API, which assumes that you have kube config in ``~/.kube/config``. +Kubernetes is an open-source system for container management, such as automating deployment and +scaling of containers. The snippet below shows an example configuration for deploying pods as +workers on a Kubernetes cluster. The KubernetesProvider exploits the Python Kubernetes API, which +assumes that you have kube config in ``~/.kube/config``. .. literalinclude:: ../../parsl/configs/kubernetes.py @@ -582,10 +603,10 @@ Midway (RCC, UChicago) .. image:: https://rcc.uchicago.edu/sites/rcc.uchicago.edu/files/styles/slideshow-image/public/uploads/images/slideshows/20140430_RCC_8978.jpg?itok=BmRuJ-wq -This Midway cluster is a campus cluster hosted by the Research Computing Center at the University of Chicago. -The snippet below shows an example configuration for executing remotely on Midway. -The configuration assumes the user is running on a login node and uses the `parsl.providers.SlurmProvider` to interface -with the scheduler, and uses the `parsl.launchers.SrunLauncher` to launch workers. +This Midway cluster is a campus cluster hosted by the Research Computing Center at the University +of Chicago. The snippet below shows an example configuration for executing remotely on Midway. The +configuration assumes the user is running on a login node and uses the `parsl.providers.SlurmProvider` +to interface with the scheduler, and uses the `parsl.launchers.SrunLauncher` to launch workers. .. literalinclude:: ../../parsl/configs/midway.py @@ -595,9 +616,10 @@ Open Science Grid .. image:: https://www.renci.org/wp-content/uploads/2008/10/osg_logo.png -The Open Science Grid (OSG) is a national, distributed computing Grid spanning over 100 individual sites to provide tens of thousands of CPU cores. -The snippet below shows an example configuration for executing remotely on OSG. You will need to have a valid project name on the OSG. -The configuration uses the `parsl.providers.CondorProvider` to interface with the scheduler. +The Open Science Grid (OSG) is a national, distributed computing Grid spanning over 100 individual +sites to provide tens of thousands of CPU cores. The snippet below shows an example configuration +for executing remotely on OSG. You will need to have a valid project name on the OSG. The +configuration uses the `parsl.providers.CondorProvider` to interface with the scheduler. .. literalinclude:: ../../parsl/configs/osg.py @@ -609,8 +631,8 @@ Polaris (ALCF) :width: 75% ALCF provides documentation on `how to use Parsl on Polaris `_. -Polaris uses `parsl.providers.PBSProProvider` and `parsl.launchers.MpiExecLauncher` to launch tasks onto the HPC system. - +Polaris uses `parsl.providers.PBSProProvider` and `parsl.launchers.MpiExecLauncher` to launch tasks +onto the HPC system. Stampede2 (TACC) @@ -618,7 +640,9 @@ Stampede2 (TACC) .. image:: https://www.tacc.utexas.edu/documents/1084364/1413880/stampede2-0717.jpg/ -The following snippet shows an example configuration for accessing TACC's **Stampede2** supercomputer. This example uses theHighThroughput executor and connects to Stampede2's Slurm scheduler. +The following snippet shows an example configuration for accessing TACC's **Stampede2** +supercomputer. This example uses theHighThroughput executor and connects to Stampede2's Slurm +scheduler. .. literalinclude:: ../../parsl/configs/stampede2.py @@ -628,8 +652,10 @@ Summit (ORNL) .. image:: https://www.olcf.ornl.gov/wp-content/uploads/2018/06/Summit_Exaop-1500x844.jpg -The following snippet shows an example configuration for executing from the login node on Summit, the leadership class supercomputer hosted at the Oak Ridge National Laboratory. -The example uses the `parsl.providers.LSFProvider` to provision compute nodes from the LSF cluster scheduler and the `parsl.launchers.JsrunLauncher` to launch workers across the compute nodes. +The following snippet shows an example configuration for executing from the login node on Summit, +the leadership class supercomputer hosted at the Oak Ridge National Laboratory. The example uses the +`parsl.providers.LSFProvider` to provision compute nodes from the LSF cluster scheduler and the +`parsl.launchers.JsrunLauncher` to launch workers across the compute nodes. .. literalinclude:: ../../parsl/configs/summit.py @@ -640,9 +666,10 @@ TOSS3 (LLNL) .. image:: https://hpc.llnl.gov/sites/default/files/Magma--2020-LLNL.jpg The following snippet shows an example configuration for executing on one of LLNL's **TOSS3** -machines, such as Quartz, Ruby, Topaz, Jade, or Magma. This example uses the `parsl.executors.FluxExecutor` -and connects to Slurm using the `parsl.providers.SlurmProvider`. This configuration assumes that the script -is being executed on the login nodes of one of the machines. +machines, such as Quartz, Ruby, Topaz, Jade, or Magma. This example uses the +`parsl.executors.FluxExecutor` and connects to Slurm using the `parsl.providers.SlurmProvider`. +This configuration assumes that the script is being executed on the login nodes of one of the +machines. .. literalinclude:: ../../parsl/configs/toss3_llnl.py @@ -650,7 +677,10 @@ is being executed on the login nodes of one of the machines. Further help ------------ -For help constructing a configuration, you can click on class names such as :class:`~parsl.config.Config` or :class:`~parsl.executors.HighThroughputExecutor` to see the associated class documentation. The same documentation can be accessed interactively at the python command line via, for example: +For help constructing a configuration, you can click on class names such as +:class:`~parsl.config.Config` or :class:`~parsl.executors.HighThroughputExecutor` to see the +associated class documentation. The same documentation can be accessed interactively at the python +command line via, for example: .. code-block:: python diff --git a/docs/userguide/data.rst b/docs/userguide/data.rst index 9350a6d96f..c752e0f3ca 100644 --- a/docs/userguide/data.rst +++ b/docs/userguide/data.rst @@ -3,85 +3,76 @@ Passing Python objects ====================== -Parsl apps can communicate via standard Python function parameter passing -and return statements. The following example shows how a Python string -can be passed to, and returned from, a Parsl app. +Parsl apps can communicate via standard Python function parameter passing and return statements. The +following example shows how a Python string can be passed to, and returned from, a Parsl app. .. code-block:: python @python_app def example(name): return 'hello {0}'.format(name) - + r = example('bob') print(r.result()) -Parsl uses the dill and pickle libraries to serialize Python objects -into a sequence of bytes that can be passed over a network from the submitting -machine to executing workers. +Parsl uses the dill and pickle libraries to serialize Python objects into a sequence of bytes that +can be passed over a network from the submitting machine to executing workers. -Thus, Parsl apps can receive and return standard Python data types -such as booleans, integers, tuples, lists, and dictionaries. However, not -all objects can be serialized with these methods (e.g., closures, generators, -and system objects), and so those objects cannot be used with all executors. +Thus, Parsl apps can receive and return standard Python data types such as booleans, integers, +tuples, lists, and dictionaries. However, not all objects can be serialized with these methods +(e.g., closures, generators, and system objects), and so those objects cannot be used with all +executors. -Parsl will raise a `SerializationError` if it encounters an object that it cannot -serialize. This applies to objects passed as arguments to an app, as well as objects -returned from an app. See :ref:`label_serialization_error`. +Parsl will raise a `SerializationError` if it encounters an object that it cannot serialize. This +applies to objects passed as arguments to an app, as well as objects returned from an app. See +:ref:`label_serialization_error`. Staging data files ================== -Parsl apps can take and return data files. A file may be passed as an input -argument to an app, or returned from an app after execution. Parsl -provides support to automatically transfer (stage) files between -the main Parsl program, worker nodes, and external data storage systems. +Parsl apps can take and return data files. A file may be passed as an input argument to an app, or +returned from an app after execution. Parsl provides support to automatically transfer (stage) files +between the main Parsl program, worker nodes, and external data storage systems. -Input files can be passed as regular arguments, or a list of them may be -specified in the special ``inputs`` keyword argument to an app invocation. +Input files can be passed as regular arguments, or a list of them may be specified in the special +``inputs`` keyword argument to an app invocation. -Inside an app, the ``filepath`` attribute of a `File` can be read to determine -where on the execution-side file system the input file has been placed. +Inside an app, the ``filepath`` attribute of a `File` can be read to determine where on the +execution-side file system the input file has been placed. -Output `File` objects must also be passed in at app invocation, through the -outputs parameter. In this case, the `File` object specifies where Parsl -should place output after execution. +Output `File` objects must also be passed in at app invocation, through the outputs parameter. In +this case, the `File` object specifies where Parsl should place output after execution. -Inside an app, the ``filepath`` attribute of an output -`File` provides the path at which the corresponding output file should be -placed so that Parsl can find it after execution. +Inside an app, the ``filepath`` attribute of an output `File` provides the path at which the +corresponding output file should be placed so that Parsl can find it after execution. -If the output from an app is to be used as the input to a subsequent app, -then a `DataFuture` that represents whether the output file has been created -must be extracted from the first app's AppFuture, and that must be passed -to the second app. This causes app -executions to be properly ordered, in the same way that passing AppFutures -to subsequent apps causes execution ordering based on an app returning. +If the output from an app is to be used as the input to a subsequent app, then a `DataFuture` that +represents whether the output file has been created must be extracted from the first app's AppFuture, +and that must be passed to the second app. This causes app executions to be properly ordered, in the +same way that passing AppFutures to subsequent apps causes execution ordering based on an app +returning. -In a Parsl program, file handling is split into two pieces: files are named in an -execution-location independent manner using :py:class:`~parsl.data_provider.files.File` -objects, and executors are configured to stage those files in to and out of -execution locations using instances of the :py:class:`~parsl.data_provider.staging.Staging` -interface. +In a Parsl program, file handling is split into two pieces: files are named in an execution-location +independent manner using :py:class:`~parsl.data_provider.files.File` objects, and executors are +configured to stage those files in to and out of execution locations using instances of the +:py:class:`~parsl.data_provider.staging.Staging` interface. Parsl files ----------- -Parsl uses a custom :py:class:`~parsl.data_provider.files.File` to provide a -location-independent way of referencing and accessing files. -Parsl files are defined by specifying the URL *scheme* and a path to the file. -Thus a file may represent an absolute path on the submit-side file system -or a URL to an external file. +Parsl uses a custom :py:class:`~parsl.data_provider.files.File` to provide a location-independent +way of referencing and accessing files. Parsl files are defined by specifying the URL *scheme* and a +path to the file. Thus a file may represent an absolute path on the submit-side file system or a URL +to an external file. -The scheme defines the protocol via which the file may be accessed. -Parsl supports the following schemes: file, ftp, http, https, and globus. -If no scheme is specified Parsl will default to the file scheme. +The scheme defines the protocol via which the file may be accessed. Parsl supports the following +schemes: file, ftp, http, https, and globus. If no scheme is specified Parsl will default to the +file scheme. -The following example shows creation of two files with different -schemes: a locally-accessible data.txt file and an HTTPS-accessible -README file. +The following example shows creation of two files with different schemes: a locally-accessible +data.txt file and an HTTPS-accessible README file. .. code-block:: python @@ -89,10 +80,9 @@ README file. File('https://github.com/Parsl/parsl/blob/master/README.rst') -Parsl automatically translates the file's location relative to the -environment in which it is accessed (e.g., the Parsl program or an app). -The following example shows how a file can be accessed in the app -irrespective of where that app executes. +Parsl automatically translates the file's location relative to the environment in which it is +accessed (e.g., the Parsl program or an app). The following example shows how a file can be accessed +in the app irrespective of where that app executes. .. code-block:: python @@ -109,27 +99,23 @@ irrespective of where that app executes. r = print_file(inputs=[f]) r.result() -As described below, the method by which this files are transferred -depends on the scheme and the staging providers specified in the Parsl -configuration. +As described below, the method by which this files are transferred depends on the scheme and the +staging providers specified in the Parsl configuration. + Staging providers ----------------- -Parsl is able to transparently stage files between at-rest locations and -execution locations by specifying a list of -:py:class:`~parsl.data_provider.staging.Staging` instances for an executor. -These staging instances define how to transfer files in and out of an execution -location. This list should be supplied as the ``storage_access`` -parameter to an executor when it is constructed. - -Parsl includes several staging providers for moving files using the -schemes defined above. By default, Parsl executors are created with -three common staging providers: -the NoOpFileStaging provider for local and shared file systems -and the HTTP(S) and FTP staging providers for transferring -files to and from remote storage locations. The following -example shows how to explicitly set the default staging providers. +Parsl is able to transparently stage files between at-rest locations and execution locations by +specifying a list of :py:class:`~parsl.data_provider.staging.Staging` instances for an executor. +These staging instances define how to transfer files in and out of an execution location. This list +should be supplied as the ``storage_access`` parameter to an executor when it is constructed. + +Parsl includes several staging providers for moving files using the schemes defined above. By +default, Parsl executors are created with three common staging providers: the NoOpFileStaging +provider for local and shared file systems and the HTTP(S) and FTP staging providers for +transferring files to and from remote storage locations. The following example shows how to +explicitly set the default staging providers. .. code-block:: python @@ -146,30 +132,24 @@ example shows how to explicitly set the default staging providers. ) ] ) - - -Parsl further differentiates when staging occurs relative to -the app invocation that requires or produces files. -Staging either occurs with the executing task (*in-task staging*) -or as a separate task (*separate task staging*) before app execution. -In-task staging -uses a wrapper that is executed around the Parsl task and thus -occurs on the resource on which the task is executed. Separate -task staging inserts a new Parsl task in the graph and associates -a dependency between the staging task and the task that depends -on that file. Separate task staging may occur on either the submit-side + + +Parsl further differentiates when staging occurs relative to the app invocation that requires or +produces files. Staging either occurs with the executing task (*in-task staging*) or as a separate +task (*separate task staging*) before app execution. In-task staging uses a wrapper that is executed +around the Parsl task and thus occurs on the resource on which the task is executed. Separate task +staging inserts a new Parsl task in the graph and associates a dependency between the staging task +and the task that depends on that file. Separate task staging may occur on either the submit-side (e.g., when using Globus) or on the execution-side (e.g., HTTPS, FTP). NoOpFileStaging for Local/Shared File Systems ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The NoOpFileStaging provider assumes that files specified either -with a path or with the ``file`` URL scheme are available both -on the submit and execution side. This occurs, for example, when there is a -shared file system. In this case, files will not moved, and the -File object simply presents the same file path to the Parsl program -and any executing tasks. +The NoOpFileStaging provider assumes that files specified either with a path or with the ``file`` +URL scheme are available both on the submit and execution side. This occurs, for example, when there +is a shared file system. In this case, files will not moved, and the File object simply presents the +same file path to the Parsl program and any executing tasks. Files defined as follows will be handled by the NoOpFileStaging provider. @@ -179,9 +159,8 @@ Files defined as follows will be handled by the NoOpFileStaging provider. File('/home/parsl/data.txt') -The NoOpFileStaging provider is enabled by default on all -executors. It can be explicitly set as the only -staging provider as follows. +The NoOpFileStaging provider is enabled by default on all executors. It can be explicitly set as the +only staging provider as follows. .. code-block:: python @@ -201,20 +180,20 @@ staging provider as follows. FTP, HTTP, HTTPS: separate task staging ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Files named with the ``ftp``, ``http`` or ``https`` URL scheme will be -staged in using HTTP GET or anonymous FTP commands. These commands -will be executed as a separate -Parsl task that will complete before the corresponding app -executes. These providers cannot be used to stage out output files. +Files named with the ``ftp``, ``http`` or ``https`` URL scheme will be staged in using HTTP GET or +anonymous FTP commands. These commands will be executed as a separate Parsl task that will complete +before the corresponding app executes. These providers cannot be used to stage out output files. -The following example defines a file accessible on a remote FTP server. +The following example defines a file accessible on a remote FTP server. .. code-block:: python File('ftp://www.iana.org/pub/mirror/rirstats/arin/ARIN-STATS-FORMAT-CHANGE.txt') -When such a file object is passed as an input to an app, Parsl will download the file to whatever location is selected for the app to execute. -The following example illustrates how the remote file is implicitly downloaded from an FTP server and then converted. Note that the app does not need to know the location of the downloaded file on the remote computer, as Parsl abstracts this translation. +When such a file object is passed as an input to an app, Parsl will download the file to whatever +location is selected for the app to execute. The following example illustrates how the remote file +is implicitly downloaded from an FTP server and then converted. Note that the app does not need to +know the location of the downloaded file on the remote computer, as Parsl abstracts this translation. .. code-block:: python @@ -234,8 +213,8 @@ The following example illustrates how the remote file is implicitly downloaded f # call the convert app with the Parsl file f = convert(inputs=[inp], outputs=[out]) f.result() - -HTTP and FTP separate task staging providers can be configured as follows. + +HTTP and FTP separate task staging providers can be configured as follows. .. code-block:: python @@ -243,8 +222,8 @@ HTTP and FTP separate task staging providers can be configured as follows. from parsl.executors import HighThroughputExecutor from parsl.data_provider.http import HTTPSeparateTaskStaging from parsl.data_provider.ftp import FTPSeparateTaskStaging - - config = Config( + + config = Config( executors=[ HighThroughputExecutor( storage_access=[HTTPSeparateTaskStaging(), FTPSeparateTaskStaging()] @@ -252,21 +231,21 @@ HTTP and FTP separate task staging providers can be configured as follows. ] ) + FTP, HTTP, HTTPS: in-task staging ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -These staging providers are intended for use on executors that do not have -a file system shared between each executor node. +These staging providers are intended for use on executors that do not have a file system shared +between each executor node. -These providers will use the same HTTP GET/anonymous FTP as the separate -task staging providers described above, but will do so in a wrapper around -individual app invocations, which guarantees that they will stage files to -a file system visible to the app. +These providers will use the same HTTP GET/anonymous FTP as the separate task staging providers +described above, but will do so in a wrapper around individual app invocations, which guarantees +that they will stage files to a file system visible to the app. -A downside of this staging approach is that the staging tasks are less visible -to Parsl, as they are not performed as separate Parsl tasks. +A downside of this staging approach is that the staging tasks are less visible to Parsl, as they +are not performed as separate Parsl tasks. -In-task staging providers can be configured as follows. +In-task staging providers can be configured as follows. .. code-block:: python @@ -287,9 +266,8 @@ In-task staging providers can be configured as follows. Globus ^^^^^^ -The ``Globus`` staging provider is used to transfer files that can be accessed -using Globus. A guide to using Globus is available `here -`_). +The ``Globus`` staging provider is used to transfer files that can be accessed using Globus. A guide +to using Globus is available `here `_). A file using the Globus scheme must specify the UUID of the Globus endpoint and a path to the file on the endpoint, for example: @@ -298,16 +276,19 @@ endpoint and a path to the file on the endpoint, for example: File('globus://037f054a-15cf-11e8-b611-0ac6873fc732/unsorted.txt') -Note: a Globus endpoint's UUID can be found in the Globus `Manage Endpoints `_ page. +Note: a Globus endpoint's UUID can be found in the Globus +`Manage Endpoints `_ page. + +There must also be a Globus endpoint available with access to a execute-side file system, because +Globus file transfers happen between two Globus endpoints. -There must also be a Globus endpoint available with access to a -execute-side file system, because Globus file transfers happen -between two Globus endpoints. Globus Configuration """""""""""""""""""" -In order to manage where files are staged, users must configure the default ``working_dir`` on a remote location. This information is specified in the :class:`~parsl.executors.base.ParslExecutor` via the ``working_dir`` parameter in the :class:`~parsl.config.Config` instance. For example: +In order to manage where files are staged, users must configure the default ``working_dir`` on a +remote location. This information is specified in the :class:`~parsl.executors.base.ParslExecutor` +via the ``working_dir`` parameter in the :class:`~parsl.config.Config` instance. For example: .. code-block:: python @@ -322,9 +303,16 @@ In order to manage where files are staged, users must configure the default ``wo ] ) -Parsl requires knowledge of the Globus endpoint that is associated with an executor. This is done by specifying the ``endpoint_name`` (the UUID of the Globus endpoint that is associated with the system) in the configuration. +Parsl requires knowledge of the Globus endpoint that is associated with an executor. This is done by +specifying the ``endpoint_name`` (the UUID of the Globus endpoint that is associated with the system) +in the configuration. -In some cases, for example when using a Globus `shared endpoint `_ or when a Globus endpoint is mounted on a supercomputer, the path seen by Globus is not the same as the local path seen by Parsl. In this case the configuration may optionally specify a mapping between the ``endpoint_path`` (the common root path seen in Globus), and the ``local_path`` (the common root path on the local file system), as in the following. In most cases, ``endpoint_path`` and ``local_path`` are the same and do not need to be specified. +In some cases, for example when using a Globus `shared endpoint `_ +or when a Globus endpoint is mounted on a supercomputer, the path seen by Globus is not the same as +the local path seen by Parsl. In this case the configuration may optionally specify a mapping +between the ``endpoint_path`` (the common root path seen in Globus), and the ``local_path`` (the +common root path on the local file system), as in the following. In most cases, ``endpoint_path`` +and ``local_path`` are the same and do not need to be specified. .. code-block:: python @@ -345,17 +333,16 @@ In some cases, for example when using a Globus `shared endpoint 0), Parsl will automatically -re-launch tasks that have failed until the retry limit is reached. -By default, retries are disabled and exceptions will be communicated -to the Parsl program. +Often errors in distributed/parallel environments are transient. In these cases, retrying failed +tasks can be a simple way of overcoming transient (e.g., machine failure, network failure) and +intermittent failures. When ``retries`` are enabled (and set to an integer > 0), Parsl will +automatically re-launch tasks that have failed until the retry limit is reached. By default, retries +are disabled and exceptions will be communicated to the Parsl program. The following example shows how the number of retries can be set to 2: @@ -73,27 +66,24 @@ The following example shows how the number of retries can be set to 2: import parsl from parsl.configs.htex_local import config - + config.retries = 2 parsl.load(config) -More specific retry handling can be specified using retry handlers, documented -below. +More specific retry handling can be specified using retry handlers, documented below. Lazy fail --------- -Parsl implements a lazy failure model through which a workload will continue -to execute in the case that some tasks fail. That is, the program will not -halt as soon as it encounters a failure, rather it will continue to execute -unaffected apps. +Parsl implements a lazy failure model through which a workload will continue to execute in the case +that some tasks fail. That is, the program will not halt as soon as it encounters a failure, rather +it will continue to execute unaffected apps. -The following example shows how lazy failures affect execution. In this -case, task C fails and therefore tasks E and F that depend on results from -C cannot be executed; however, Parsl will continue to execute tasks B and D -as they are unaffected by task C's failure. +The following example shows how lazy failures affect execution. In this case, task C fails and +therefore tasks E and F that depend on results from C cannot be executed; however, Parsl will +continue to execute tasks B and D as they are unaffected by task C's failure. .. code-block:: @@ -117,26 +107,21 @@ as they are unaffected by task C's failure. Retry handlers -------------- -The basic parsl retry mechanism keeps a count of the number of times a task -has been (re)tried, and will continue retrying that task until the configured -retry limit is reached. +The basic parsl retry mechanism keeps a count of the number of times a task has been (re)tried, and +will continue retrying that task until the configured retry limit is reached. -Retry handlers generalize this to allow more expressive retry handling: -parsl keeps a retry cost for a task, and the task will be retried until the -configured retry limit is reached. Instead of the cost being 1 for each -failure, user-supplied code can examine the failure and compute a custom -cost. +Retry handlers generalize this to allow more expressive retry handling: parsl keeps a retry cost for +a task, and the task will be retried until the configured retry limit is reached. Instead of the +cost being 1 for each failure, user-supplied code can examine the failure and compute a custom cost. -This allows user knowledge about failures to influence the retry mechanism: -an exception which is almost definitely a non-recoverable failure (for example, -due to bad parameters) can be given a high retry cost (so that it will not -be retried many times, or at all), and exceptions which are likely to be -transient (for example, where a worker node has died) can be given a low -retry cost so they will be retried many times. +This allows user knowledge about failures to influence the retry mechanism: an exception which is +almost definitely a non-recoverable failure (for example, due to bad parameters) can be given a high +retry cost (so that it will not be retried many times, or at all), and exceptions which are likely +to be transient (for example, where a worker node has died) can be given a low retry cost so they +will be retried many times. A retry handler can be specified in the parsl configuration like this: - .. code-block:: python Config( @@ -144,16 +129,13 @@ A retry handler can be specified in the parsl configuration like this: retry_handler=example_retry_handler ) +``example_retry_handler`` should be a function defined by the user that will compute the retry cost +for a particular failure, given some information about the failure. -``example_retry_handler`` should be a function defined by the user that will -compute the retry cost for a particular failure, given some information about -the failure. - -For example, the following handler will give a cost of 1 to all exceptions, -except when a bash app exits with unix exitcode 9, in which case the cost will -be 100. This will have the effect that retries will happen as normal for most -errors, but the bash app can indicate that there is little point in retrying -by exiting with exitcode 9. +For example, the following handler will give a cost of 1 to all exceptions, except when a bash app +exits with unix exitcode 9, in which case the cost will be 100. This will have the effect that +retries will happen as normal for most errors, but the bash app can indicate that there is little +point in retrying by exiting with exitcode 9. .. code-block:: python @@ -163,9 +145,8 @@ by exiting with exitcode 9. else return 1 -The retry handler is given two parameters: the exception from execution, and -the parsl internal task_record. The task record contains details such as the -app name, parameters and executor. +The retry handler is given two parameters: the exception from execution, and the parsl internal +task_record. The task record contains details such as the app name, parameters and executor. -If a retry handler raises an exception itself, then the task will be aborted -and no further tries will be attempted. +If a retry handler raises an exception itself, then the task will be aborted and no further tries +will be attempted. diff --git a/docs/userguide/execution.rst b/docs/userguide/execution.rst index 832985c164..7f1e17c607 100644 --- a/docs/userguide/execution.rst +++ b/docs/userguide/execution.rst @@ -1,91 +1,104 @@ .. _label-execution: - Execution ========= -Contemporary computing environments may include a wide range of computational platforms or **execution providers**, from laptops and PCs to various clusters, supercomputers, and cloud computing platforms. Different execution providers may require or allow for the use of different **execution models**, such as threads (for efficient parallel execution on a multicore processor), processes, and pilot jobs for running many small tasks on a large parallel system. +Contemporary computing environments may include a wide range of computational platforms or +**execution providers**, from laptops and PCs to various clusters, supercomputers, and cloud +computing platforms. Different execution providers may require or allow for the use of different +**execution models**, such as threads (for efficient parallel execution on a multicore processor), +processes, and pilot jobs for running many small tasks on a large parallel system. -Parsl is designed to abstract these low-level details so that an identical Parsl program can run unchanged on different platforms or across multiple platforms. -To this end, Parsl uses a configuration file to specify which execution provider(s) and execution model(s) to use. -Parsl provides a high level abstraction, called a *block*, for providing a uniform description of a compute resource irrespective of the specific execution provider. +Parsl is designed to abstract these low-level details so that an identical Parsl program can run +unchanged on different platforms or across multiple platforms. To this end, Parsl uses a +configuration file to specify which execution provider(s) and execution model(s) to use. Parsl +provides a high level abstraction, called a *block*, for providing a uniform description of a +compute resource irrespective of the specific execution provider. .. note:: - Refer to :ref:`configuration-section` for information on how to configure the various components described - below for specific scenarios. + Refer to :ref:`configuration-section` for information on how to configure the various components + described below for specific scenarios. + Execution providers ------------------- -Clouds, supercomputers, and local PCs offer vastly different modes of access. -To overcome these differences, and present a single uniform interface, -Parsl implements a simple provider abstraction. This -abstraction is key to Parsl's ability to enable scripts to be moved -between resources. The provider interface exposes three core actions: submit a -job for execution (e.g., sbatch for the Slurm resource manager), -retrieve the status of an allocation (e.g., squeue), and cancel a running -job (e.g., scancel). Parsl implements providers for local execution -(fork), for various cloud platforms using cloud-specific APIs, and -for clusters and supercomputers that use a Local Resource Manager -(LRM) to manage access to resources, such as Slurm and HTCondor. - -Each provider implementation may allow users to specify additional parameters for further configuration. Parameters are generally mapped to LRM submission script or cloud API options. -Examples of LRM-specific options are partition, wall clock time, -scheduler options (e.g., #SBATCH arguments for Slurm), and worker -initialization commands (e.g., loading a conda environment). Cloud +Clouds, supercomputers, and local PCs offer vastly different modes of access. To overcome these +differences, and present a single uniform interface, Parsl implements a simple provider abstraction. +This abstraction is key to Parsl's ability to enable scripts to be moved between resources. The +provider interface exposes three core actions: submit a job for execution (e.g., sbatch for the +Slurm resource manager), retrieve the status of an allocation (e.g., squeue), and cancel a running +job (e.g., scancel). Parsl implements providers for local execution (fork), for various cloud +platforms using cloud-specific APIs, and for clusters and supercomputers that use a Local Resource +Manager (LRM) to manage access to resources, such as Slurm and HTCondor. + +Each provider implementation may allow users to specify additional parameters for further +configuration. Parameters are generally mapped to LRM submission script or cloud API options. +Examples of LRM-specific options are partition, wall clock time, scheduler options (e.g., #SBATCH +arguments for Slurm), and worker initialization commands (e.g., loading a conda environment). Cloud parameters include access keys, instance type, and spot bid price Parsl currently supports the following providers: -1. `parsl.providers.LocalProvider`: The provider allows you to run locally on your laptop or workstation. -2. `parsl.providers.SlurmProvider`: This provider allows you to schedule resources via the Slurm scheduler. -3. `parsl.providers.CondorProvider`: This provider allows you to schedule resources via the Condor scheduler. -4. `parsl.providers.GridEngineProvider`: This provider allows you to schedule resources via the GridEngine scheduler. -5. `parsl.providers.TorqueProvider`: This provider allows you to schedule resources via the Torque scheduler. -6. `parsl.providers.AWSProvider`: This provider allows you to provision and manage cloud nodes from Amazon Web Services. -7. `parsl.providers.GoogleCloudProvider`: This provider allows you to provision and manage cloud nodes from Google Cloud. -8. `parsl.providers.KubernetesProvider`: This provider allows you to provision and manage containers on a Kubernetes cluster. -9. `parsl.providers.LSFProvider`: This provider allows you to schedule resources via IBM's LSF scheduler. - +1. `parsl.providers.LocalProvider`: The provider allows you to run locally on your laptop or + workstation. +2. `parsl.providers.SlurmProvider`: This provider allows you to schedule resources via the Slurm + scheduler. +3. `parsl.providers.CondorProvider`: This provider allows you to schedule resources via the Condor + scheduler. +4. `parsl.providers.GridEngineProvider`: This provider allows you to schedule resources via the + GridEngine scheduler. +5. `parsl.providers.TorqueProvider`: This provider allows you to schedule resources via the Torque + scheduler. +6. `parsl.providers.AWSProvider`: This provider allows you to provision and manage cloud nodes from + Amazon Web Services. +7. `parsl.providers.GoogleCloudProvider`: This provider allows you to provision and manage cloud + nodes from Google Cloud. +8. `parsl.providers.KubernetesProvider`: This provider allows you to provision and manage containers + on a Kubernetes cluster. +9. `parsl.providers.LSFProvider`: This provider allows you to schedule resources via IBM's LSF + scheduler. Executors --------- -Parsl programs vary widely in terms of their -execution requirements. Individual Apps may run for milliseconds -or days, and available parallelism can vary between none for -sequential programs to millions for "pleasingly parallel" programs. -Parsl executors, as the name suggests, execute Apps on one or more -target execution resources such as multi-core workstations, clouds, -or supercomputers. As it appears infeasible to implement a single -execution strategy that will meet so many diverse requirements on -such varied platforms, Parsl provides a modular executor interface -and a collection of executors that are tuned for common execution -patterns. - -Parsl executors extend the Executor class offered by Python's -concurrent.futures library, which allows Parsl to use -existing solutions in the Python Standard Library (e.g., ThreadPoolExecutor) -and from other packages such as Work Queue. Parsl -extends the concurrent.futures executor interface to support -additional capabilities such as automatic scaling of execution resources, -monitoring, deferred initialization, and methods to set working -directories. -All executors share a common execution kernel that is responsible -for deserializing the task (i.e., the App and its input arguments) -and executing the task in a sandboxed Python environment. +Parsl programs vary widely in terms of their execution requirements. Individual Apps may run for +milliseconds or days, and available parallelism can vary between none for sequential programs to +millions for "pleasingly parallel" programs. Parsl executors, as the name suggests, execute Apps on +one or more target execution resources such as multi-core workstations, clouds, or supercomputers. +As it appears infeasible to implement a single execution strategy that will meet so many diverse +requirements on such varied platforms, Parsl provides a modular executor interface and a collection +of executors that are tuned for common execution patterns. + +Parsl executors extend the Executor class offered by Python's concurrent.futures library, which +allows Parsl to use existing solutions in the Python Standard Library (e.g., ThreadPoolExecutor) and +from other packages such as Work Queue. Parsl extends the concurrent.futures executor interface to +support additional capabilities such as automatic scaling of execution resources, monitoring, +deferred initialization, and methods to set working directories. All executors share a common +execution kernel that is responsible for deserializing the task (i.e., the App and its input +arguments) and executing the task in a sandboxed Python environment. Parsl currently supports the following executors: -1. `parsl.executors.ThreadPoolExecutor`: This executor supports multi-thread execution on local resources. - -2. `parsl.executors.HighThroughputExecutor`: This executor implements hierarchical scheduling and batching using a pilot job model to deliver high throughput task execution on up to 4000 Nodes. - -3. `parsl.executors.WorkQueueExecutor`: This executor integrates `Work Queue `_ as an execution backend. Work Queue scales to tens of thousands of cores and implements reliable execution of tasks with dynamic resource sizing. - -4. `parsl.executors.taskvine.TaskVineExecutor`: This executor uses `TaskVine `_ as the execution backend. TaskVine scales up to tens of thousands of cores and actively uses local storage on compute nodes to offer a diverse array of performance-oriented features, including: smart caching and sharing common large files between tasks and compute nodes, reliable execution of tasks, dynamic resource sizing, automatic Python environment detection and sharing. -These executors cover a broad range of execution requirements. As with other Parsl components, there is a standard interface (ParslExecutor) that can be implemented to add support for other executors. +1. `parsl.executors.ThreadPoolExecutor`: This executor supports multi-thread execution on local + resources. +2. `parsl.executors.HighThroughputExecutor`: This executor implements hierarchical scheduling and + batching using a pilot job model to deliver high throughput task execution on up to 4000 Nodes. +3. `parsl.executors.WorkQueueExecutor`: This executor integrates + `Work Queue `_ as an execution backend. Work Queue + scales to tens of thousands of cores and implements reliable execution of tasks with dynamic + resource sizing. +4. `parsl.executors.taskvine.TaskVineExecutor`: This executor uses + `TaskVine `_ as the execution backend. TaskVine scales + up to tens of thousands of cores and actively uses local storage on compute nodes to offer a + diverse array of performance-oriented features, including: smart caching and sharing common large + files between tasks and compute nodes, reliable execution of tasks, dynamic resource sizing, + automatic Python environment detection and sharing. + +These executors cover a broad range of execution requirements. As with other Parsl components, +there is a standard interface (ParslExecutor) that can be implemented to add support for other +executors. .. note:: Refer to :ref:`configuration-section` for information on how to configure these executors. @@ -94,35 +107,33 @@ These executors cover a broad range of execution requirements. As with other Par Launchers --------- -Many LRMs offer mechanisms for spawning applications across nodes -inside a single job and for specifying the -resources and task placement information needed to execute that -application at launch time. Common mechanisms include -`srun `_ (for Slurm), -`aprun `_ (for Crays), and `mpirun `_ (for MPI). -Thus, to run Parsl programs on such systems, we typically want first to -request a large number of nodes and then to *launch* "pilot job" or -**worker** processes using the system launchers. -Parsl's Launcher abstraction enables Parsl programs -to use these system-specific launcher systems to start workers across -cores and nodes. +Many LRMs offer mechanisms for spawning applications across nodes inside a single job and for +specifying the resources and task placement information needed to execute that application at launch +time. Common mechanisms include `srun `_ (for Slurm), +`aprun `_ +(for Crays), and `mpirun `_ (for MPI). Thus, to +run Parsl programs on such systems, we typically want first to request a large number of nodes and +then to *launch* "pilot job" or **worker** processes using the system launchers. Parsl's Launcher +abstraction enables Parsl programs to use these system-specific launcher systems to start workers +across cores and nodes. Parsl currently supports the following set of launchers: 1. `parsl.launchers.SrunLauncher`: Srun based launcher for Slurm based systems. 2. `parsl.launchers.AprunLauncher`: Aprun based launcher for Crays. 3. `parsl.launchers.SrunMPILauncher`: Launcher for launching MPI applications with Srun. -4. `parsl.launchers.GnuParallelLauncher`: Launcher using GNU parallel to launch workers across nodes and cores. +4. `parsl.launchers.GnuParallelLauncher`: Launcher using GNU parallel to launch workers across nodes + and cores. 5. `parsl.launchers.MpiExecLauncher`: Uses Mpiexec to launch. 6. `parsl.launchers.SimpleLauncher`: The launcher default to a single worker launch. -7. `parsl.launchers.SingleNodeLauncher`: This launcher launches ``workers_per_node`` count workers on a single node. +7. `parsl.launchers.SingleNodeLauncher`: This launcher launches ``workers_per_node`` count workers + on a single node. -Additionally, the launcher interface can be used to implement specialized behaviors -in custom environments (for example, to -launch node processes inside containers with customized environments). -For example, the following launcher uses Srun to launch ``worker-wrapper``, passing the -command to be run as parameters to ``worker-wrapper``. It is the responsibility of ``worker-wrapper`` -to launch the command it is given inside the appropriate environment. +Additionally, the launcher interface can be used to implement specialized behaviors in custom +environments (for example, to launch node processes inside containers with customized environments). +For example, the following launcher uses Srun to launch ``worker-wrapper``, passing the command to +be run as parameters to ``worker-wrapper``. It is the responsibility of ``worker-wrapper`` to launch +the command it is given inside the appropriate environment. .. code:: python @@ -134,82 +145,70 @@ to launch the command it is given inside the appropriate environment. new_command="worker-wrapper {}".format(command) return self.srun_launcher(new_command, tasks_per_node, nodes_per_block) + Blocks ------ -One challenge when making use of heterogeneous -execution resource types is the need to provide a uniform representation of -resources. Consider that single requests on clouds return individual -nodes, clusters and supercomputers provide batches of nodes, grids -provide cores, and workstations provide a single multicore node - -Parsl defines a resource abstraction called a *block* as the most basic unit -of resources to be acquired from a provider. A block contains one -or more nodes and maps to the different provider abstractions. In -a cluster, a block corresponds to a single allocation request to a -scheduler. In a cloud, a block corresponds to a single API request -for one or more instances. -Parsl can then execute *tasks* (instances of apps) -within and across (e.g., for MPI jobs) nodes within a block. -Blocks are also used as the basis for -elasticity on batch scheduling systems (see Elasticity below). -Three different examples of block configurations are shown below. +One challenge when making use of heterogeneous execution resource types is the need to provide a +uniform representation of resources. Consider that single requests on clouds return individual +nodes, clusters and supercomputers provide batches of nodes, grids provide cores, and workstations +provide a single multicore node Parsl defines a resource abstraction called a *block* as the most +basic unit of resources to be acquired from a provider. A block contains one or more nodes and maps +to the different provider abstractions. In a cluster, a block corresponds to a single allocation +request to a scheduler. In a cloud, a block corresponds to a single API request for one or more +instances. Parsl can then execute *tasks* (instances of apps) within and across (e.g., for MPI jobs) +nodes within a block. Blocks are also used as the basis for elasticity on batch scheduling systems +(see Elasticity below). Three different examples of block configurations are shown below. 1. A single block comprised of a node executing one task: .. image:: ../images/N1_T1.png :scale: 75% -2. A single block with one node executing several tasks. This configuration is - most suitable for single threaded apps running on multicore target systems. - The number of tasks executed concurrently is proportional to the number of cores available on the system. +2. A single block with one node executing several tasks. This configuration is most suitable for + single threaded apps running on multicore target systems. The number of tasks executed + concurrently is proportional to the number of cores available on the system. .. image:: ../images/N1_T4.png :scale: 75% -3. A block comprised of several nodes and executing several tasks, where a task can span multiple nodes. This configuration - is generally used by MPI applications. Starting a task requires using a specific - MPI launcher that is supported on the target system (e.g., aprun, srun, mpirun, mpiexec). +3. A block comprised of several nodes and executing several tasks, where a task can span multiple + nodes. This configuration is generally used by MPI applications. Starting a task requires using a + specific MPI launcher that is supported on the target system (e.g., aprun, srun, mpirun, mpiexec). The `MPI Apps `_ documentation page describes how to configure Parsl for this case. .. image:: ../images/N4_T2.png The configuration options for specifying the shape of each block are: -1. ``workers_per_node``: Number of workers started per node, which corresponds to the number of tasks that can execute concurrently on a node. +1. ``workers_per_node``: Number of workers started per node, which corresponds to the number of + tasks that can execute concurrently on a node. 2. ``nodes_per_block``: Number of nodes requested per block. + .. _label-elasticity: Elasticity ---------- -Workload resource requirements often vary over time. -For example, in the map-reduce paradigm the map phase may require more -resources than the reduce phase. In general, reserving sufficient -resources for the widest parallelism will result in underutilization -during periods of lower load; conversely, reserving minimal resources -for the thinnest parallelism will lead to optimal utilization -but also extended execution time. -Even simple bag-of-task applications may have tasks of different durations, leading to trailing -tasks with a thin workload. - -To address dynamic workload requirements, -Parsl implements a cloud-like elasticity model in which resource -blocks are provisioned/deprovisioned in response to workload pressure. -Given the general nature of the implementation, -Parsl can provide elastic execution on clouds, clusters, -and supercomputers. Of course, in an HPC setting, elasticity may -be complicated by queue delays. - -Parsl's elasticity model includes a flow control system -that monitors outstanding tasks and available compute capacity. -This flow control monitor determines when to trigger scaling (in or out) +Workload resource requirements often vary over time. For example, in the map-reduce paradigm the map +phase may require more resources than the reduce phase. In general, reserving sufficient resources +for the widest parallelism will result in underutilization during periods of lower load; conversely, +reserving minimal resources for the thinnest parallelism will lead to optimal utilization but also +extended execution time. Even simple bag-of-task applications may have tasks of different durations, +leading to trailing tasks with a thin workload. + +To address dynamic workload requirements, Parsl implements a cloud-like elasticity model in which +resource blocks are provisioned/deprovisioned in response to workload pressure. Given the general +nature of the implementation, Parsl can provide elastic execution on clouds, clusters, and +supercomputers. Of course, in an HPC setting, elasticity may be complicated by queue delays. + +Parsl's elasticity model includes a flow control system that monitors outstanding tasks and +available compute capacity. This flow control monitor determines when to trigger scaling (in or out) events to match workload needs. -The animated diagram below shows how blocks are elastically -managed within an executor. The Parsl configuration for an executor -defines the minimum, maximum, and initial number of blocks to be used. +The animated diagram below shows how blocks are elastically managed within an executor. The Parsl +configuration for an executor defines the minimum, maximum, and initial number of blocks to be used. .. image:: parsl_scaling.gif @@ -220,24 +219,23 @@ The configuration options for specifying elasticity bounds are: 3. ``max_blocks``: Maximum number of blocks that can be active per executor. - Parallelism ^^^^^^^^^^^ -Parsl provides a user-managed model for controlling elasticity. -In addition to setting the minimum -and maximum number of blocks to be provisioned, users can also define -the desired level of parallelism by setting a parameter (*p*). Parallelism -is expressed as the ratio of task execution capacity to the sum of running tasks -and available tasks (tasks with their dependencies met, but waiting for execution). -A parallelism value of 1 represents aggressive scaling where the maximum resources -needed are used (i.e., max_blocks); parallelism close to 0 represents the opposite situation in which -as few resources as possible (i.e., min_blocks) are used. By selecting a fraction between 0 and 1, -the provisioning aggressiveness can be controlled. +Parsl provides a user-managed model for controlling elasticity. In addition to setting the minimum +and maximum number of blocks to be provisioned, users can also define the desired level of +parallelism by setting a parameter (*p*). Parallelism is expressed as the ratio of task execution +capacity to the sum of running tasks and available tasks (tasks with their dependencies met, but +waiting for execution). A parallelism value of 1 represents aggressive scaling where the maximum +resources needed are used (i.e., max_blocks); parallelism close to 0 represents the opposite +situation in which as few resources as possible (i.e., min_blocks) are used. By selecting a fraction +between 0 and 1, the provisioning aggressiveness can be controlled. For example: -- When p = 0: Use the fewest resources possible. If there is no workload then no blocks will be provisioned, otherwise the fewest blocks specified (e.g., min_blocks, or 1 if min_blocks is set to 0) will be provisioned. +- When p = 0: Use the fewest resources possible. If there is no workload then no blocks will be + provisioned, otherwise the fewest blocks specified (e.g., min_blocks, or 1 if min_blocks is set to + 0) will be provisioned. .. code:: python @@ -246,7 +244,8 @@ For example: else: blocks = max(min_blocks, 1) -- When p = 1: Use as many resources as possible. Provision sufficient nodes to execute all running and available tasks concurrently up to the max_blocks specified. +- When p = 1: Use as many resources as possible. Provision sufficient nodes to execute all running + and available tasks concurrently up to the max_blocks specified. .. code-block:: python @@ -259,11 +258,11 @@ For example: Configuration ^^^^^^^^^^^^^ -The example below shows how elasticity and parallelism can be configured. Here, a `parsl.executors.HighThroughputExecutor` -is used with a minimum of 1 block and a maximum of 2 blocks, where each block may host -up to 2 workers per node. Thus this setup is capable of servicing 2 tasks concurrently. -Parallelism of 0.5 means that when more than 2 * the total task capacity (i.e., 4 tasks) are queued a new -block will be requested. An example :class:`~parsl.config.Config` is: +The example below shows how elasticity and parallelism can be configured. Here, a +`parsl.executors.HighThroughputExecutor` is used with a minimum of 1 block and a maximum of 2 blocks, +where each block may host up to 2 workers per node. Thus this setup is capable of servicing 2 tasks +concurrently. Parallelism of 0.5 means that when more than 2 * the total task capacity (i.e., 4 +tasks) are queued a new block will be requested. An example :class:`~parsl.config.Config` is: .. code:: python @@ -287,11 +286,9 @@ block will be requested. An example :class:`~parsl.config.Config` is: ] ) -The animated diagram below illustrates the behavior of this executor. -In the diagram, the tasks are allocated to the first block, until -5 tasks are submitted. At this stage, as more than double the available -task capacity is used, Parsl provisions a new block for executing the remaining -tasks. +The animated diagram below illustrates the behavior of this executor. In the diagram, the tasks are +allocated to the first block, until 5 tasks are submitted. At this stage, as more than double the +available task capacity is used, Parsl provisions a new block for executing the remaining tasks. .. image:: parsl_parallelism.gif @@ -299,26 +296,22 @@ tasks. Multi-executor -------------- -Parsl supports the use of one or more executors as specified in the configuration. -In this situation, individual apps may indicate which executors they are able to use. +Parsl supports the use of one or more executors as specified in the configuration. In this situation, +individual apps may indicate which executors they are able to use. The common scenarios for this feature are: -* A workflow has an initial simulation stage that runs on the compute heavy - nodes of an HPC system followed by an analysis and visualization stage that - is better suited for GPU nodes. -* A workflow follows a repeated fan-out, fan-in model where the long running - fan-out tasks are computed on a cluster and the quick fan-in computation is - better suited for execution using threads on a login node. -* A workflow includes apps that wait and evaluate the results of a - computation to determine whether the app should be relaunched. - Only apps running on threads may launch other apps. Often, simulations - have stochastic behavior and may terminate before completion. - In such cases, having a wrapper app that checks the exit code - and determines whether or not the app has completed successfully can - be used to automatically re-execute the app (possibly from a - checkpoint) until successful completion. - +* A workflow has an initial simulation stage that runs on the compute heavy nodes of an HPC system + followed by an analysis and visualization stage that is better suited for GPU nodes. +* A workflow follows a repeated fan-out, fan-in model where the long running fan-out tasks are + computed on a cluster and the quick fan-in computation is better suited for execution using + threads on a login node. +* A workflow includes apps that wait and evaluate the results of a computation to determine whether + the app should be relaunched. Only apps running on threads may launch other apps. Often, + simulations have stochastic behavior and may terminate before completion. In such cases, having a + wrapper app that checks the exit code and determines whether or not the app has completed + successfully can be used to automatically re-execute the app (possibly from a checkpoint) until + successful completion. The following code snippet shows how apps can specify suitable executors in the app decorator. @@ -366,13 +359,14 @@ For example, Under the hood, we use `CurveZMQ `_ to encrypt all communication channels between the executor and related nodes. + Encryption performance ^^^^^^^^^^^^^^^^^^^^^^ -CurveZMQ depends on `libzmq `_ and `libsodium `_, -which `pyzmq `_ (a Parsl dependency) includes as part of its -installation via ``pip``. This installation path should work on most systems, but users have -reported significant performance degradation as a result. +CurveZMQ depends on `libzmq `_ and +`libsodium `_, which `pyzmq `_ +(a Parsl dependency) includes as part of its installation via ``pip``. This installation path should +work on most systems, but users have reported significant performance degradation as a result. If you experience a significant performance hit after enabling encryption, we recommend installing ``pyzmq`` with conda: diff --git a/docs/userguide/futures.rst b/docs/userguide/futures.rst index 13d22a211b..084e77443b 100644 --- a/docs/userguide/futures.rst +++ b/docs/userguide/futures.rst @@ -3,65 +3,61 @@ Futures ======= -When an ordinary Python function is invoked in a Python program, the Python interpreter waits for the function to complete execution -before proceeding to the next statement. -But if a function is expected to execute for a long period of time, it may be preferable not to wait for -its completion but instead to proceed immediately with executing subsequent statements. -The function can then execute concurrently with that other computation. +When an ordinary Python function is invoked in a Python program, the Python interpreter waits for +the function to complete execution before proceeding to the next statement. But if a function is +expected to execute for a long period of time, it may be preferable not to wait for its completion +but instead to proceed immediately with executing subsequent statements. The function can then +execute concurrently with that other computation. -Concurrency can be used to enhance performance when independent activities -can execute on different cores or nodes in parallel. The following -code fragment demonstrates this idea, showing that overall execution time -may be reduced if the two function calls are executed concurrently. +Concurrency can be used to enhance performance when independent activities can execute on different +cores or nodes in parallel. The following code fragment demonstrates this idea, showing that overall +execution time may be reduced if the two function calls are executed concurrently. .. code-block:: python v1 = expensive_function(1) v2 = expensive_function(2) result = v1 + v2 - -However, concurrency also introduces a need for **synchronization**. -In the example, it is not possible to compute the sum of ``v1`` and ``v2`` -until both function calls have completed. -Synchronization provides a way of blocking execution of one activity -(here, the statement ``result = v1 + v2``) until other activities -(here, the two calls to ``expensive_function()``) have completed. - -Parsl supports concurrency and synchronization as follows. -Whenever a Parsl program calls a Parsl app (a function annotated with a Parsl -app decorator, see :ref:`apps`), -Parsl will create a new ``task`` and immediately return a -`future `_ in lieu of that function's result(s). -The program will then continue immediately to the next statement in the program. -At some point, for example when the task's dependencies are met and there -is available computing capacity, Parsl will execute the task. Upon -completion, Parsl will set the value of the future to contain the task's -output. - -A future can be used to track the status of an asynchronous task. -For example, after creation, the future may be interrogated to determine -the task's status (e.g., running, failed, completed), access results, -and capture exceptions. Further, futures may be used for synchronization, -enabling the calling Python program to block until the future -has completed execution. - -Parsl provides two types of futures: `AppFuture` and `DataFuture`. -While related, they enable subtly different parallel patterns. + +However, concurrency also introduces a need for **synchronization**. In the example, it is not +possible to compute the sum of ``v1`` and ``v2`` until both function calls have completed. +Synchronization provides a way of blocking execution of one activity (here, the statement +``result = v1 + v2``) until other activities (here, the two calls to ``expensive_function()``) have +completed. + +Parsl supports concurrency and synchronization as follows. Whenever a Parsl program calls a Parsl +app (a function annotated with a Parsl app decorator, see :ref:`apps`), Parsl will create a new +``task`` and immediately return a `future `_ in +lieu of that function's result(s). The program will then continue immediately to the next statement +in the program. At some point, for example when the task's dependencies are met and there is +available computing capacity, Parsl will execute the task. Upon completion, Parsl will set the value +of the future to contain the task's output. + +A future can be used to track the status of an asynchronous task. For example, after creation, the +future may be interrogated to determine the task's status (e.g., running, failed, completed), access +results, and capture exceptions. Further, futures may be used for synchronization, enabling the +calling Python program to block until the future has completed execution. + +Parsl provides two types of futures: `AppFuture` and `DataFuture`. While related, they enable subtly +different parallel patterns. + AppFutures ---------- -AppFutures are the basic building block upon which Parsl programs are built. Every invocation of a Parsl app returns an AppFuture that may be used to monitor and manage the task's execution. -AppFutures are inherited from Python's `concurrent library `_. -They provide three key capabilities: +AppFutures are the basic building block upon which Parsl programs are built. Every invocation of a +Parsl app returns an AppFuture that may be used to monitor and manage the task's execution. +AppFutures are inherited from Python's +`concurrent library `_. They provide +three key capabilities: -1. An AppFuture's ``result()`` function can be used to wait for an app to complete, and then access any result(s). -This function is blocking: it returns only when the app completes or fails. -The following code fragment implements an example similar to the ``expensive_function()`` example above. -Here, the ``sleep_double`` app simply doubles the input value. The program invokes -the ``sleep_double`` app twice, and returns futures in place of results. The example -shows how the future's ``result()`` function can be used to wait for the results from the -two ``sleep_double`` app invocations to be computed. +1. An AppFuture's ``result()`` function can be used to wait for an app to complete, and then access + any result(s). This function is blocking: it returns only when the app completes or fails. The + following code fragment implements an example similar to the ``expensive_function()`` example + above. Here, the ``sleep_double`` app simply doubles the input value. The program invokes the + ``sleep_double`` app twice, and returns futures in place of results. The example shows how the + future's ``result()`` function can be used to wait for the results from the two ``sleep_double`` + app invocations to be computed. .. code-block:: python @@ -79,7 +75,8 @@ two ``sleep_double`` app invocations to be computed. print(doubled_x1.result() + doubled_x2.result()) 2. An AppFuture's ``done()`` function can be used to check the status of an app, *without blocking*. -The following example shows that calling the future's ``done()`` function will not stop execution of the main Python program. + The following example shows that calling the future's ``done()`` function will not stop execution + of the main Python program. .. code-block:: python @@ -94,8 +91,8 @@ The following example shows that calling the future's ``done()`` function will n print(doubled_x.done()) 3. An AppFuture provides a safe way to handle exceptions and errors while asynchronously executing -apps. The example shows how exceptions can be captured in the same way as a standard Python program -when calling the future's ``result()`` function. + apps. The example shows how exceptions can be captured in the same way as a standard Python + program when calling the future's ``result()`` function. .. code-block:: python @@ -115,36 +112,36 @@ when calling the future's ``result()`` function. print('Oops! Something really bad happened') -In addition to being able to capture exceptions raised by a specific app, Parsl also raises ``DependencyErrors`` when apps are unable to execute due to failures in prior dependent apps. -That is, an app that is dependent upon the successful completion of another app will fail with a dependency error if any of the apps on which it depends fail. +In addition to being able to capture exceptions raised by a specific app, Parsl also raises +``DependencyErrors`` when apps are unable to execute due to failures in prior dependent apps. That +is, an app that is dependent upon the successful completion of another app will fail with a +dependency error if any of the apps on which it depends fail. DataFutures ----------- -While an AppFuture represents the execution of an asynchronous app, -a DataFuture represents a file to be produced by that app. -Parsl's dataflow model requires such a construct so that it can determine -when dependent apps, apps that that are to consume a file produced by another app, -can start execution. - -When calling an app that produces files as outputs, Parsl requires that a list of output files be specified (as a list of `File` objects passed in via the ``outputs`` keyword argument). Parsl will return a DataFuture for each output file as part AppFuture when the app is executed. -These DataFutures are accessible in the AppFuture's ``outputs`` attribute. - -Each DataFuture will complete when the App has finished executing, -and the corresponding file has been created (and if specified, staged out). - -When a DataFuture is passed as an argument to a subsequent app invocation, -that subsequent app will not begin execution until the DataFuture is -completed. The input argument will then be replaced with an appropriate -File object. - -The following code snippet shows how DataFutures are used. In this -example, the call to the echo Bash app specifies that the results -should be written to an output file ("hello1.txt"). The main -program inspects the status of the output file (via the future's -``outputs`` attribute) and then blocks waiting for the file to -be created (``hello.outputs[0].result()``). +While an AppFuture represents the execution of an asynchronous app, a DataFuture represents a file +to be produced by that app. Parsl's dataflow model requires such a construct so that it can +determine when dependent apps, apps that that are to consume a file produced by another app, can +start execution. + +When calling an app that produces files as outputs, Parsl requires that a list of output files be +specified (as a list of `File` objects passed in via the ``outputs`` keyword argument). Parsl will +return a DataFuture for each output file as part AppFuture when the app is executed. These +DataFutures are accessible in the AppFuture's ``outputs`` attribute. + +Each DataFuture will complete when the App has finished executing, and the corresponding file has +been created (and if specified, staged out). + +When a DataFuture is passed as an argument to a subsequent app invocation, that subsequent app will +not begin execution until the DataFuture is completed. The input argument will then be replaced with +an appropriate File object. + +The following code snippet shows how DataFutures are used. In this example, the call to the echo +Bash app specifies that the results should be written to an output file ("hello1.txt"). The main +program inspects the status of the output file (via the future's ``outputs`` attribute) and then +blocks waiting for the file to be created (``hello.outputs[0].result()``). .. code-block:: python diff --git a/docs/userguide/glossary.rst b/docs/userguide/glossary.rst index a1a773b6a8..93db20730b 100644 --- a/docs/userguide/glossary.rst +++ b/docs/userguide/glossary.rst @@ -1,219 +1,282 @@ Glossary of Parsl Terms ======================= -This glossary defines terms based on their usage within Parsl. By defining our terminology, we hope to create understanding across our community and reduce confusion. When asking for or providing support to fellow Parsl users, please use these terms as defined. +This glossary defines terms based on their usage within Parsl. By defining our terminology, we hope +to create understanding across our community and reduce confusion. When asking for or providing +support to fellow Parsl users, please use these terms as defined. -Our glossary is organized alphabetically in English. Feel free to contribute terms and definitions to this list that will benefit Parsl users. +Our glossary is organized alphabetically in English. Feel free to contribute terms and definitions +to this list that will benefit Parsl users. .. _glossary: .. _appglossary: - + **App:** ---------- -In Parsl, an app is a small, self-contained program that does a specific job. It's a piece of code, such as a Python function or a Bash script, that can run separately from your main program. Think of it as a mini-tool within your larger toolbox. +In Parsl, an app is a small, self-contained program that does a specific job. It's a piece of code, +such as a Python function or a Bash script, that can run separately from your main program. Think of +it as a mini-tool within your larger toolbox. .. _appfutureglossary: **AppFuture:** ----------------- -An AppFuture is a placeholder for the result of an app that runs in the background. It's like a ticket you get when you order food at a restaurant – you get the ticket right away, but you have to wait for the food to be ready. Similarly, you get an AppFuture immediately when you start an app, but you have to wait for the app to finish before you can see the results. +An AppFuture is a placeholder for the result of an app that runs in the background. It's like a +ticket you get when you order food at a restaurant – you get the ticket right away, but you have to +wait for the food to be ready. Similarly, you get an AppFuture immediately when you start an app, +but you have to wait for the app to finish before you can see the results. .. _bashappglossary: **Bash App:** --------------- - -A Bash app is a special kind of app in Parsl that lets you run commands from your computer's terminal (like the ones you type in the command prompt or shell). It's a way to use Parsl to automate tasks that you would normally do manually in the terminal. + +A Bash app is a special kind of app in Parsl that lets you run commands from your computer's +terminal (like the ones you type in the command prompt or shell). It's a way to use Parsl to +automate tasks that you would normally do manually in the terminal. .. _blockglossary: **Block:** ------------ -A block is a group of resources, such as nodes or computational units, allocated for executing tasks. Parsl manages the distribution of work across these resources to expedite task completion. +A block is a group of resources, such as nodes or computational units, allocated for executing tasks. +Parsl manages the distribution of work across these resources to expedite task completion. .. _checkpointingglossary: **Checkpointing:** --------------------- -Checkpointing is like saving your progress in a video game. If something goes wrong, you can restart from the last saved point instead of starting over. In Parsl, checkpointing saves the state of your work so you can resume it later if interrupted. +Checkpointing is like saving your progress in a video game. If something goes wrong, you can restart +from the last saved point instead of starting over. In Parsl, checkpointing saves the state of your +work so you can resume it later if interrupted. .. _concurrencyglossary: **Concurrency:** ------------------- -Concurrency means doing multiple things at the same time. In Parsl, it enables your apps to run in parallel across different resources, significantly speeding up program execution. It's like a chef preparing multiple dishes in a single kitchen, switching between all of them quickly. +Concurrency means doing multiple things at the same time. In Parsl, it enables your apps to run in +parallel across different resources, significantly speeding up program execution. It's like a chef +preparing multiple dishes in a single kitchen, switching between all of them quickly. .. _configurationglossary: **Configuration:** --------------------- -Configuration sets up the rules for how Parsl should work. It's like adjusting the settings on your phone – you can choose how you want things to look and behave. In Parsl, you can configure things like how many resources to use, where to store data, and how to handle errors. +Configuration sets up the rules for how Parsl should work. It's like adjusting the settings on your +phone – you can choose how you want things to look and behave. In Parsl, you can configure things +like how many resources to use, where to store data, and how to handle errors. .. _datafutureglossary: **DataFuture:** ------------------ -A DataFuture is a placeholder for a file that an app is creating. It's like a receipt for a package you're expecting – you get the receipt right away, but you have to wait for the package to arrive. Similarly, you get a DataFuture immediately when an app starts creating a file, but you have to wait for the file to be finished before you can use it. +A DataFuture is a placeholder for a file that an app is creating. It's like a receipt for a package +you're expecting – you get the receipt right away, but you have to wait for the package to arrive. +Similarly, you get a DataFuture immediately when an app starts creating a file, but you have to wait +for the file to be finished before you can use it. .. _dfkglossary: **DataFlowKernel (DFK):** ------------------------------ -The DataFlowKernel is like the brain of Parsl. It's the part that controls how your apps run and how they share information. It's like the conductor of an orchestra, making sure that all the musicians play together in harmony. +The DataFlowKernel is like the brain of Parsl. It's the part that controls how your apps run and how +they share information. It's like the conductor of an orchestra, making sure that all the musicians +play together in harmony. .. _elasticityglossary: **Elasticity:** ----------------- -Elasticity refers to the ability to scale resources up or down as needed. In Parsl, it allows you to add or remove blocks of computational resources based on workload demands. +Elasticity refers to the ability to scale resources up or down as needed. In Parsl, it allows you to +add or remove blocks of computational resources based on workload demands. .. _executionproviderglossary: **Execution Provider:** -------------------------- -An execution provider acts as a bridge between Parsl and the resources you want to use, such as your laptop, a cluster, or a cloud service. It handles communication with these resources to execute tasks. +An execution provider acts as a bridge between Parsl and the resources you want to use, such as your +laptop, a cluster, or a cloud service. It handles communication with these resources to execute +tasks. .. _executorglossary: **Executor:** ---------------- -An executor is a manager that determines which app runs on which resource and when. It directs the flow of apps to ensure efficient task execution. It's like a traffic controller, directing the flow of apps to make sure they all get where they need to go. +An executor is a manager that determines which app runs on which resource and when. It directs the +flow of apps to ensure efficient task execution. It's like a traffic controller, directing the flow +of apps to make sure they all get where they need to go. .. _futureglossary: **Future:** ------------- -A future is a placeholder for the result of a task that hasn't finished yet. Both AppFuture and DataFuture are types of Futures. You can use the ``.result()`` method to get the actual result when it's ready. +A future is a placeholder for the result of a task that hasn't finished yet. Both AppFuture and +DataFuture are types of Futures. You can use the ``.result()`` method to get the actual result when +it's ready. .. _jobglossary: **Job:** --------- -A job in Parsl is a unit of work submitted to an execution environment (such as a cluster or cloud) for processing. It can consist of one or more apps executed on computational resources. +A job in Parsl is a unit of work submitted to an execution environment (such as a cluster or cloud) +for processing. It can consist of one or more apps executed on computational resources. .. _launcherglossary: **Launcher:** ---------------- -A launcher in Parsl is responsible for placing the workers onto each computer, preparing them to run the apps. It’s like a bus driver who brings the players to the stadium, ensuring they are ready to start, but not directly involved in telling them what to do once they arrive. +A launcher in Parsl is responsible for placing the workers onto each computer, preparing them to run +the apps. It’s like a bus driver who brings the players to the stadium, ensuring they are ready to +start, but not directly involved in telling them what to do once they arrive. .. _managerglossary: **Manager:** -------------- -A manager in Parsl is responsible for overseeing the execution of tasks on specific compute resources. It's like a supervisor who ensures that all workers (or workers within a block) are carrying out their tasks correctly and efficiently. +A manager in Parsl is responsible for overseeing the execution of tasks on specific compute +resources. It's like a supervisor who ensures that all workers (or workers within a block) are +carrying out their tasks correctly and efficiently. .. _memoizationglossary: **Memoization:** ------------------- -Memoization is like remembering something so you don't have to do it again. In Parsl, if you are using memoization and you run an app with the same inputs multiple times, Parsl will remember the result from the first time and give it to you again instead of running the app again. This can save a lot of time. +Memoization is like remembering something so you don't have to do it again. In Parsl, if you are +using memoization and you run an app with the same inputs multiple times, Parsl will remember the +result from the first time and give it to you again instead of running the app again. This can save +a lot of time. -.. _mpiappglossary: +.. _mpiappglossary: **MPI App:** --------------- -An MPI app is a specialized app that uses the Message Passing Interface (MPI) for communication, which can occur both across nodes and within a single node. MPI enables different parts of the app to communicate and coordinate their activities, similar to how a walkie-talkie allows different teams to stay in sync. +An MPI app is a specialized app that uses the Message Passing Interface (MPI) for communication, +which can occur both across nodes and within a single node. MPI enables different parts of the app +to communicate and coordinate their activities, similar to how a walkie-talkie allows different +teams to stay in sync. .. _nodeglossary: **Node:** ------------ -A node in Parsl is like a workstation in a factory. It's a physical or virtual machine that provides the computational power needed to run tasks. Each node can host several workers that execute tasks. +A node in Parsl is like a workstation in a factory. It's a physical or virtual machine that provides +the computational power needed to run tasks. Each node can host several workers that execute tasks. .. _parallelismglossary: **Parallelism:** ------------------- -Parallelism means doing multiple things at the same time but not necessarily in the same location or using the same resources. In Parsl, it involves running apps simultaneously across different nodes or computational resources, accelerating program execution. Unlike concurrency which is like a chef preparing multiple dishes in a single kitchen, parallelism is like multiple chefs preparing different dishes in separate kitchens, at the same time. +Parallelism means doing multiple things at the same time but not necessarily in the same location or +using the same resources. In Parsl, it involves running apps simultaneously across different nodes +or computational resources, accelerating program execution. Unlike concurrency which is like a chef +preparing multiple dishes in a single kitchen, parallelism is like multiple chefs preparing +different dishes in separate kitchens, at the same time. -.. _parslscriptglossary: +.. _parslscriptglossary: **Parsl Script:** --------------------- -A Parsl script is a Python program that uses the Parsl library to define and run apps in parallel. It's like a recipe that tells you what ingredients to use and how to combine them. +A Parsl script is a Python program that uses the Parsl library to define and run apps in parallel. +It's like a recipe that tells you what ingredients to use and how to combine them. .. _pluginglossary: **Plugin:** --------------- -A plugin is an add-on for Parsl. It's a piece of code that you can add to Parsl to give it new features or change how it works. It's like an extra tool that you can add to your toolbox. +A plugin is an add-on for Parsl. It's a piece of code that you can add to Parsl to give it new +features or change how it works. It's like an extra tool that you can add to your toolbox. -.. _pythonappglossary: +.. _pythonappglossary: **Python App:** ------------------ -A Python app is a special kind of app in Parsl that's written as a Python function. It's a way to use Parsl to run your Python code in parallel. +A Python app is a special kind of app in Parsl that's written as a Python function. It's a way to +use Parsl to run your Python code in parallel. .. _resourceglossary: **Resource:** --------------- -A resource in Parsl refers to any computational asset that can be used to execute tasks, such as CPU cores, memory, or entire nodes. It's like the tools and materials you need to get a job done. Resources, often grouped in nodes or clusters, are essential for processing workloads. +A resource in Parsl refers to any computational asset that can be used to execute tasks, such as CPU +cores, memory, or entire nodes. It's like the tools and materials you need to get a job done. +Resources, often grouped in nodes or clusters, are essential for processing workloads. -.. _serializationglossary: +.. _serializationglossary: **Serialization:** -------------------- -Serialization is like packing your belongings into a suitcase so you can take them on a trip. In Parsl, it means converting your data into a format that can be sent over a network to another computer. +Serialization is like packing your belongings into a suitcase so you can take them on a trip. In +Parsl, it means converting your data into a format that can be sent over a network to another +computer. -.. _stagingglossary: +.. _stagingglossary: **Staging:** --------------- -Staging in Parsl involves moving data to the appropriate location before an app starts running and can also include moving data back after the app finishes. This process ensures that all necessary data is available where it needs to be for the app to execute properly and that the output data is returned to a specified location once the execution is complete. +Staging in Parsl involves moving data to the appropriate location before an app starts running and +can also include moving data back after the app finishes. This process ensures that all necessary +data is available where it needs to be for the app to execute properly and that the output data is +returned to a specified location once the execution is complete. .. _taskglossary: **Task:** ------------ -A task in Parsl is the execution of an app, it is the smallest unit of work that can be executed. It's like a single step in a larger process, where each task is part of a broader workflow or job. +A task in Parsl is the execution of an app, it is the smallest unit of work that can be executed. +It's like a single step in a larger process, where each task is part of a broader workflow or job. -.. _threadglossary: +.. _threadglossary: **Thread:** ------------- -A thread is like a smaller part of a program that can run independently. It's like a worker in a factory who can do their job at the same time as other workers. Threads are commonly used for parallelism within a single node. +A thread is like a smaller part of a program that can run independently. It's like a worker in a +factory who can do their job at the same time as other workers. Threads are commonly used for +parallelism within a single node. .. _workerglossary: **Worker:** ------------- -A worker in Parsl is an independent process that runs on a node to execute tasks. Unlike threads, which share resources within a single process, workers operate as separate entities, each potentially handling different tasks on the same or different nodes. +A worker in Parsl is an independent process that runs on a node to execute tasks. Unlike threads, +which share resources within a single process, workers operate as separate entities, each +potentially handling different tasks on the same or different nodes. -.. _workflowglossary: +.. _workflowglossary: **Workflow:** ---------------- -A workflow is like a series of steps that you follow to complete a task. In Parsl, it's a way to describe how your apps should run and how they depend on each other, like a flowchart that shows you the order in which things need to happen. A workflow is typically expressed in a Parsl script, which is a Python program that leverages the Parsl library to orchestrate these tasks in a structured manner. - +A workflow is like a series of steps that you follow to complete a task. In Parsl, it's a way to +describe how your apps should run and how they depend on each other, like a flowchart that shows you +the order in which things need to happen. A workflow is typically expressed in a Parsl script, which +is a Python program that leverages the Parsl library to orchestrate these tasks in a structured +manner. diff --git a/docs/userguide/joins.rst b/docs/userguide/joins.rst index defb0ad012..07977cd160 100644 --- a/docs/userguide/joins.rst +++ b/docs/userguide/joins.rst @@ -3,51 +3,49 @@ Join Apps ========= -Join apps, defined with the ``@join_app`` decorator, are a form of app that can -launch other pieces of a workflow: for example a Parsl sub-workflow, or a task -that runs in some other system. +Join apps, defined with the ``@join_app`` decorator, are a form of app that can launch other pieces +of a workflow: for example a Parsl sub-workflow, or a task that runs in some other system. + Parsl sub-workflows ------------------- -One reason for launching Parsl apps from inside a join app, rather than -directly in the main workflow code, is because the definitions of those tasks -are not known well enough at the start of the workflow. +One reason for launching Parsl apps from inside a join app, rather than directly in the main +workflow code, is because the definitions of those tasks are not known well enough at the start of +the workflow. -For example, a workflow might run an expensive step to detect some objects -in an image, and then on each object, run a further expensive step. Because -the number of objects is not known at the start of the workflow, but instead -only after an expensive step has completed, the subsequent tasks cannot be -defined until after that step has completed. +For example, a workflow might run an expensive step to detect some objects in an image, and then on +each object, run a further expensive step. Because the number of objects is not known at the start +of the workflow, but instead only after an expensive step has completed, the subsequent tasks cannot +be defined until after that step has completed. -In simple cases, the main workflow script can be stopped using -``Future.result()`` and join apps are not necessary, but in more complicated -cases, that approach can severely limit concurrency. +In simple cases, the main workflow script can be stopped using ``Future.result()`` and join apps are +not necessary, but in more complicated cases, that approach can severely limit concurrency. -Join apps allow more naunced dependencies to be expressed that can help with: +Join apps allow more nuanced dependencies to be expressed that can help with: * increased concurrency - helping with strong scaling * more focused error propagation - allowing more of an ultimately failing workflow to complete * more useful monitoring information + Using Futures from other components ----------------------------------- -Sometimes, a workflow might need to incorporate tasks from other systems that -run asynchronously but do not need a Parsl worker allocated for their entire -run. An example of this is delegating some work into Globus Compute: work can -be given to Globus Compute, but Parsl does not need to keep a worker allocated -to that task while it runs. Instead, Parsl can be told to wait for the ``Future`` +Sometimes, a workflow might need to incorporate tasks from other systems that run asynchronously but +do not need a Parsl worker allocated for their entire run. An example of this is delegating some +work into Globus Compute: work can be given to Globus Compute, but Parsl does not need to keep a +worker allocated to that task while it runs. Instead, Parsl can be told to wait for the ``Future`` returned by Globus Compute to complete. + Usage ----- -A `join_app` looks quite like a `python_app`, but should return one or more -``Future`` objects, rather than a value. Once the Python code has run, the -app will wait for those Futures to complete without occuping a Parsl worker, -and when those Futures complete, their contents will be the return value -of the `join_app`. +A `join_app` looks quite like a `python_app`, but should return one or more ``Future`` objects, +rather than a value. Once the Python code has run, the app will wait for those Futures to complete +without occupying a Parsl worker, and when those Futures complete, their contents will be the return +value of the `join_app`. For example: @@ -64,12 +62,13 @@ For example: assert example.result() == 3 + Example of a Parsl sub-workflow ------------------------------- -This example workflow shows a preprocessing step, followed by -a middle stage that is chosen by the result of the pre-processing step -(either option 1 or option 2) followed by a know post-processing step. +This example workflow shows a preprocessing step, followed by a middle stage that is chosen by the +result of the pre-processing step (either option 1 or option 2) followed by a know post-processing +step. .. code-block:: python @@ -100,31 +99,27 @@ a middle stage that is chosen by the result of the pre-processing step * Why can't process be a regular python function? -``process`` needs to inspect the value of ``x`` to make a decision about -what app to launch. So it needs to defer execution until after the -pre-processing stage has completed. In Parsl, the way to defer that is -using apps: even though ``process`` is invoked at the start of the workflow, -it will execute later on, when the Future returned by ``pre_process`` has a -value. +``process`` needs to inspect the value of ``x`` to make a decision about what app to launch. So it +needs to defer execution until after the pre-processing stage has completed. In Parsl, the way to +defer that is using apps: even though ``process`` is invoked at the start of the workflow, it will +execute later on, when the Future returned by ``pre_process`` has a value. * Why can't process be a @python_app? -A Python app, if run in a `parsl.executors.ThreadPoolExecutor`, can launch -more parsl apps; so a ``python_app`` implementation of process() would be able -to inspect x and choose and invoke the appropriate ``option_{one, two}``. +A Python app, if run in a `parsl.executors.ThreadPoolExecutor`, can launch more parsl apps; so a +``python_app`` implementation of process() would be able to inspect x and choose and invoke the +appropriate ``option_{one, two}``. -From launching the ``option_{one, two}`` app, the app body python code would -get a ``Future[int]`` - a ``Future`` that will eventually contain ``int``. +From launching the ``option_{one, two}`` app, the app body python code would get a ``Future[int]``, +a ``Future`` that will eventually contain ``int``. -But, we want to invoke ``post_process`` at submission time near the start of -workflow so that Parsl knows about as many tasks as possible. But we don't -want it to execute until the value of the chosen ``option_{one, two}`` app -is known. +But, we want to invoke ``post_process`` at submission time near the start of workflow so that Parsl +knows about as many tasks as possible. But we don't want it to execute until the value of the chosen +``option_{one, two}`` app is known. If we don't have join apps, how can we do this? -We could make process wait for ``option_{one, two}`` to complete, before -returning, like this: +We could make process wait for ``option_{one, two}`` to complete, before returning, like this: .. code-block:: python @@ -136,10 +131,10 @@ returning, like this: f = option_two(x) return f.result() -but this will block the worker running ``process`` until ``option_{one, two}`` -has completed. If there aren't enough workers to run ``option_{one, two}`` this -can even deadlock. (principle: apps should not wait on completion of other -apps and should always allow parsl to handle this through dependencies) +but this will block the worker running ``process`` until ``option_{one, two}`` has completed. If +there aren't enough workers to run ``option_{one, two}`` this can even deadlock. (principle: apps +should not wait on completion of other apps and should always allow parsl to handle this through +dependencies) We could make process return the ``Future`` to the main workflow thread: @@ -156,13 +151,13 @@ We could make process return the ``Future`` to the main workflow thread: # process(3) is a Future[Future[int]] -What comes out of invoking ``process(x)`` now is a nested ``Future[Future[int]]`` -- it's a promise that eventually process will give you a promise (from -``option_one, two}``) that will eventually give you an int. +What comes out of invoking ``process(x)`` now is a nested ``Future[Future[int]]`` - it's a promise +that eventually process will give you a promise (from ``option_one, two}``) that will eventually +give you an int. -We can't pass that future into post_process... because post_process wants the -final int, and that future will complete before the int is ready, and that -(outer) future will have as its value the inner future (which won't be complete yet). +We can't pass that future into post_process... because post_process wants the final int, and that +future will complete before the int is ready, and that (outer) future will have as its value the +inner future (which won't be complete yet). So we could wait for the result in the main workflow thread: @@ -173,9 +168,8 @@ So we could wait for the result in the main workflow thread: result = post_process(f_inner) # result == "6" -But this now blocks the main workflow thread. If we really only need to run -these three lines, that's fine, but what about if we are in a for loop that -sets up 1000 parametrised iterations: +But this now blocks the main workflow thread. If we really only need to run these three lines, +that's fine, but what about if we are in a for loop that sets up 1000 parametrised iterations: .. code-block:: python @@ -184,15 +178,13 @@ sets up 1000 parametrised iterations: f_inner = f_outer.result() # Future[int] result = post_process(f_inner) -The ``for`` loop can only iterate after pre_processing is done for each -iteration - it is unnecessarily serialised by the ``.result()`` call, -so that pre_processing cannot run in parallel. +The ``for`` loop can only iterate after pre_processing is done for each iteration - it is +unnecessarily serialised by the ``.result()`` call, so that pre_processing cannot run in parallel. -So, the rule about not calling ``.result()`` applies in the main workflow thread -too. +So, the rule about not calling ``.result()`` applies in the main workflow thread too. -What join apps add is the ability for parsl to unwrap that Future[Future[int]] into a -Future[int] in a "sensible" way (eg it doesn't need to block a worker). +What join apps add is the ability for parsl to unwrap that Future[Future[int]] into a Future[int] in +a "sensible" way (eg it doesn't need to block a worker). .. _label-join-globus-compute: @@ -200,23 +192,20 @@ Future[int] in a "sensible" way (eg it doesn't need to block a worker). Example of invoking a Futures-driven task from another system ------------------------------------------------------------- +This example shows launching some activity in another system, without occupying a Parsl worker while +that activity happens: in this example, work is delegated to Globus Compute, which performs the work +elsewhere. When the work is completed, Globus Compute will put the result into the future that it +returns, and then (because the Parsl app is a ``@join_app``), that result will be used as the result +of the Parsl app. -This example shows launching some activity in another system, without -occupying a Parsl worker while that activity happens: in this example, work is -delegated to Globus Compute, which performs the work elsewhere. When the work -is completed, Globus Compute will put the result into the future that it -returns, and then (because the Parsl app is a ``@join_app``), that result will -be used as the result of the Parsl app. - -As above, the motivation for doing this inside an app, rather than in the -top level is that sufficient information to launch the Globus Compute task -might not be available at start of the workflow. +As above, the motivation for doing this inside an app, rather than in the top level is that +sufficient information to launch the Globus Compute task might not be available at start of the +workflow. -This workflow will run a first stage, ``const_five``, on a Parsl worker, -then using the result of that stage, pass the result as a parameter to a -Globus Compute task, getting a ``Future`` from that submission. Then, the -results of the Globus Compute task will be passed onto a second Parsl -local task, ``times_two``. +This workflow will run a first stage, ``const_five``, on a Parsl worker, then using the result of +that stage, pass the result as a parameter to a Globus Compute task, getting a ``Future`` from that +submission. Then, the results of the Globus Compute task will be passed onto a second Parsl local +task, ``times_two``. .. code-block:: python diff --git a/docs/userguide/lifted_ops.rst b/docs/userguide/lifted_ops.rst index 6e258b9b62..392fe64e2f 100644 --- a/docs/userguide/lifted_ops.rst +++ b/docs/userguide/lifted_ops.rst @@ -3,21 +3,20 @@ Lifted operators ================ -Parsl allows some operators (``[]`` and ``.``) to be used on an AppFuture in -a way that makes sense with those operators on the eventually returned -result. +Parsl allows some operators (``[]`` and ``.``) to be used on an AppFuture in a way that makes sense +with those operators on the eventually returned result. + Lifted [] operator ------------------ -When an app returns a complex structure such as a ``dict`` or a ``list``, -it is sometimes useful to pass an element of that structure to a subsequent -task, without waiting for that subsequent task to complete. +When an app returns a complex structure such as a ``dict`` or a ``list``, it is sometimes useful to +pass an element of that structure to a subsequent task, without waiting for that subsequent task to +complete. -To help with this, Parsl allows the ``[]`` operator to be used on an -`AppFuture`. This operator will return another `AppFuture` that will -complete after the initial future, with the result of ``[]`` on the value -of the initial future. +To help with this, Parsl allows the ``[]`` operator to be used on an `AppFuture`. This operator will +return another `AppFuture` that will complete after the initial future, with the result of ``[]`` on +the value of the initial future. The end result is that this assertion will hold: @@ -26,19 +25,17 @@ The end result is that this assertion will hold: fut = my_app() assert fut['x'].result() == fut.result()[x] -but more concurrency will be available, as execution of the main workflow -code will not stop to wait for ``result()`` to complete on the initial -future. +but more concurrency will be available, as execution of the main workflow code will not stop to wait +for ``result()`` to complete on the initial future. + +`AppFuture` does not implement other methods commonly associated with dicts and lists, such as +``len``, because those methods should return a specific type of result immediately, and that is not +possible when the results are not available until the future. -`AppFuture` does not implement other methods commonly associated with -dicts and lists, such as ``len``, because those methods should return a -specific type of result immediately, and that is not possible when the -results are not available until the future. +If a key does not exist in the returned result, then the exception will appear in the Future +returned by ``[]``, rather than at the point that the ``[]`` operator is applied. This is because +the valid values that can be used are not known until the underlying result is available. -If a key does not exist in the returned result, then the exception will -appear in the Future returned by ``[]``, rather than at the point that -the ``[]`` operator is applied. This is because the valid values that can -be used are not known until the underlying result is available. Lifted . operator ----------------- @@ -50,7 +47,6 @@ The ``.`` operator works similarly to ``[]`` described above: fut = my_app assert fut.x.result() == fut.result().x -Attributes beginning with ``_`` are not lifted as this usually indicates an -attribute that is used for internal purposes, and to try to avoid mixing -protocols (such as iteration in for loops) defined on AppFutures vs protocols -defined on the underlying result object. +Attributes beginning with ``_`` are not lifted as this usually indicates an attribute that is used +for internal purposes, and to try to avoid mixing protocols (such as iteration in for loops) defined +on AppFutures vs protocols defined on the underlying result object. diff --git a/docs/userguide/modularizing.rst b/docs/userguide/modularizing.rst index 93b23575b9..6fbab5da81 100644 --- a/docs/userguide/modularizing.rst +++ b/docs/userguide/modularizing.rst @@ -3,10 +3,10 @@ Structuring Parsl programs -------------------------- -Parsl programs can be developed in many ways. When developing a simple program it is -often convenient to include the app definitions and control logic in a single script. -However, as a program inevitably grows and changes, like any code, there are significant -benefits to be obtained by modularizing the program, including: +Parsl programs can be developed in many ways. When developing a simple program it is often +convenient to include the app definitions and control logic in a single script. However, as a +program inevitably grows and changes, like any code, there are significant benefits to be obtained +by modularizing the program, including: 1. Better readability 2. Logical separation of components (e.g., apps, config, and control logic) @@ -14,22 +14,19 @@ benefits to be obtained by modularizing the program, including: The following example illustrates how a Parsl project can be organized into modules. -The configuration(s) can be defined in a module or file (e.g., ``config.py``) -which can be imported into the control script depending on which execution resources -should be used. +The configuration(s) can be defined in a module or file (e.g., ``config.py``) which can be imported +into the control script depending on which execution resources should be used. .. literalinclude:: examples/config.py -Parsl apps can be defined in separate file(s) or module(s) (e.g., ``library.py``) -grouped by functionality. - +Parsl apps can be defined in separate file(s) or module(s) (e.g., ``library.py``) grouped by +functionality. .. literalinclude:: examples/library.py -Finally, the control logic for the Parsl program can then be implemented in a -separate file (e.g., ``run_increment.py``). This file must the import the -configuration from ``config.py`` before calling the ``increment`` app from -``library.py``: +Finally, the control logic for the Parsl program can then be implemented in a separate file (e.g., +``run_increment.py``). This file must the import the configuration from ``config.py`` before calling +the ``increment`` app from ``library.py``: .. literalinclude:: examples/run_increment.py @@ -40,3 +37,4 @@ Which produces the following output:: 2 + 1 = 3 3 + 1 = 4 4 + 1 = 5 + diff --git a/docs/userguide/monitoring.rst b/docs/userguide/monitoring.rst index 02b3177ca7..a11848e11f 100644 --- a/docs/userguide/monitoring.rst +++ b/docs/userguide/monitoring.rst @@ -1,25 +1,23 @@ Monitoring ========== -Parsl includes a monitoring system to capture task state as well as resource -usage over time. The Parsl monitoring system aims to provide detailed -information and diagnostic capabilities to help track the state of your -programs, down to the individual apps that are executed on remote machines. +Parsl includes a monitoring system to capture task state as well as resource usage over time. The +Parsl monitoring system aims to provide detailed information and diagnostic capabilities to help +track the state of your programs, down to the individual apps that are executed on remote machines. -The monitoring system records information to an SQLite database while a -workflow runs. This information can then be visualised in a web dashboard -using the ``parsl-visualize`` tool, or queried using SQL using regular -SQLite tools. +The monitoring system records information to an SQLite database while a workflow runs. This +information can then be visualised in a web dashboard using the ``parsl-visualize`` tool, or queried +using SQL using regular SQLite tools. Monitoring configuration ------------------------ -Parsl monitoring is only supported with the `parsl.executors.HighThroughputExecutor`. +Parsl monitoring is only supported with the `parsl.executors.HighThroughputExecutor`. -The following example shows how to enable monitoring in the Parsl -configuration. Here the `parsl.monitoring.MonitoringHub` is specified to use port -55055 to receive monitoring messages from workers every 10 seconds. +The following example shows how to enable monitoring in the Parsl configuration. Here the +`parsl.monitoring.MonitoringHub` is specified to use port 55055 to receive monitoring messages from +workers every 10 seconds. .. code-block:: python @@ -52,31 +50,34 @@ configuration. Here the `parsl.monitoring.MonitoringHub` is specified to use por Visualization ------------- -To run the web dashboard utility ``parsl-visualize`` you first need to install -its dependencies: +To run the web dashboard utility ``parsl-visualize`` you first need to install its dependencies: $ pip install 'parsl[monitoring,visualization]' -To view the web dashboard while or after a Parsl program has executed, run -the ``parsl-visualize`` utility:: +To view the web dashboard while or after a Parsl program has executed, run the ``parsl-visualize`` +utility:: $ parsl-visualize -By default, this command expects that the default ``monitoring.db`` database is used -in the runinfo directory. Other databases can be loaded by passing -the database URI on the command line. For example, if the full path -to the database is ``/tmp/my_monitoring.db``, run:: +By default, this command expects that the default ``monitoring.db`` database is used in the runinfo +directory. Other databases can be loaded by passing the database URI on the command line. For +example, if the full path to the database is ``/tmp/my_monitoring.db``, run:: $ parsl-visualize sqlite:////tmp/my_monitoring.db -By default, the visualization web server listens on ``127.0.0.1:8080``. If the web server is deployed on a machine with a web browser, the dashboard can be accessed in the browser at ``127.0.0.1:8080``. If the web server is deployed on a remote machine, such as the login node of a cluster, you will need to use an ssh tunnel from your local machine to the cluster:: +By default, the visualization web server listens on ``127.0.0.1:8080``. If the web server is +deployed on a machine with a web browser, the dashboard can be accessed in the browser at ` +`127.0.0.1:8080``. If the web server is deployed on a remote machine, such as the login node of a +cluster, you will need to use an ssh tunnel from your local machine to the cluster:: $ ssh -L 50000:127.0.0.1:8080 username@cluster_address This command will bind your local machine's port 50000 to the remote cluster's port 8080. -The dashboard can then be accessed via the local machine's browser at ``127.0.0.1:50000``. +The dashboard can then be accessed via the local machine's browser at ``127.0.0.1:50000``. -.. warning:: Alternatively you can deploy the visualization server on a public interface. However, first check that this is allowed by the cluster's security policy. The following example shows how to deploy the web server on a public port (i.e., open to Internet via ``public_IP:55555``):: +.. warning:: Alternatively you can deploy the visualization server on a public interface. However, +first check that this is allowed by the cluster's security policy. The following example shows how +to deploy the web server on a public port (i.e., open to Internet via ``public_IP:55555``):: $ parsl-visualize --listen 0.0.0.0 --port 55555 @@ -84,38 +85,39 @@ The dashboard can then be accessed via the local machine's browser at ``127.0.0. Workflows Page ^^^^^^^^^^^^^^ -The workflows page lists all Parsl workflows that have been executed with monitoring enabled -with the selected database. -It provides a high level summary of workflow state as shown below: +The workflows page lists all Parsl workflows that have been executed with monitoring enabled with +the selected database. It provides a high level summary of workflow state as shown below: .. image:: ../images/mon_workflows_page.png -Throughout the dashboard, all blue elements are clickable. For example, clicking a specific worklow +Throughout the dashboard, all blue elements are clickable. For example, clicking a specific workflow name from the table takes you to the Workflow Summary page described in the next section. + Workflow Summary ^^^^^^^^^^^^^^^^ -The workflow summary page captures the run level details of a workflow, including start and end times -as well as task summary statistics. The workflow summary section is followed by the *App Summary* that lists -the various apps and invocation count for each. +The workflow summary page captures the run level details of a workflow, including start and end +times as well as task summary statistics. The workflow summary section is followed by the +*App Summary* that lists the various apps and invocation count for each. .. image:: ../images/mon_workflow_summary.png The workflow summary also presents three different views of the workflow: -* Workflow DAG - with apps differentiated by colors: This visualization is useful to visually inspect the dependency - structure of the workflow. Hovering over the nodes in the DAG shows a tooltip for the app represented by the node and it's task ID. +* Workflow DAG - with apps differentiated by colors: This visualization is useful to visually +inspect the dependency structure of the workflow. Hovering over the nodes in the DAG shows a tooltip +for the app represented by the node and it's task ID. .. image:: ../images/mon_task_app_grouping.png -* Workflow DAG - with task states differentiated by colors: This visualization is useful to identify what tasks have been completed, failed, or are currently pending. +* Workflow DAG - with task states differentiated by colors: This visualization is useful to identify +what tasks have been completed, failed, or are currently pending. .. image:: ../images/mon_task_state_grouping.png -* Workflow resource usage: This visualization provides resource usage information at the workflow level. - For example, cumulative CPU/Memory utilization across workers over time. +* Workflow resource usage: This visualization provides resource usage information at the workflow +level. For example, cumulative CPU/Memory utilization across workers over time. .. image:: ../images/mon_resource_summary.png - diff --git a/docs/userguide/mpi_apps.rst b/docs/userguide/mpi_apps.rst index 82123123b6..939049feae 100644 --- a/docs/userguide/mpi_apps.rst +++ b/docs/userguide/mpi_apps.rst @@ -1,40 +1,44 @@ MPI and Multi-node Apps ======================= -The :class:`~parsl.executors.MPIExecutor` supports running MPI applications or other computations which can -run on multiple compute nodes. +The :class:`~parsl.executors.MPIExecutor` supports running MPI applications or other computations +which can run on multiple compute nodes. + Background ---------- -MPI applications run multiple copies of a program that complete a single task by -coordinating using messages passed within or across nodes. +MPI applications run multiple copies of a program that complete a single task by coordinating using +messages passed within or across nodes. -Starting MPI application requires invoking a "launcher" code (e.g., ``mpiexec``) -with options that define how the copies of a program should be distributed. +Starting MPI application requires invoking a "launcher" code (e.g., ``mpiexec``) with options that +define how the copies of a program should be distributed. -The launcher includes options that control how copies of the program are distributed -across the nodes (e.g., how many copies per node) and -how each copy is configured (e.g., which CPU cores it can use). +The launcher includes options that control how copies of the program are distributed across the +nodes (e.g., how many copies per node) and how each copy is configured (e.g., which CPU cores it can +use). The options for launchers vary between MPI implementations and compute clusters. + Configuring ``MPIExecutor`` --------------------------- The :class:`~parsl.executors.MPIExecutor` is a wrapper over -:class:`~parsl.executors.high_throughput.executor.HighThroughputExecutor` -which eliminates options that are irrelevant for MPI applications. +:class:`~parsl.executors.high_throughput.executor.HighThroughputExecutor` which eliminates options +that are irrelevant for MPI applications. Define a configuration for :class:`~parsl.executors.MPIExecutor` by -1. Setting ``max_workers_per_block`` to the maximum number of tasks to run per block of compute nodes. - This value is typically the number of nodes per block divided by the number of nodes per task. +1. Setting ``max_workers_per_block`` to the maximum number of tasks to run per block of compute + nodes. This value is typically the number of nodes per block divided by the number of nodes per + task. 2. Setting ``mpi_launcher`` to the launcher used for your application. 3. Specifying a provider that matches your cluster and use the :class:`~parsl.launchers.SimpleLauncher`, which will ensure that no Parsl processes are placed on the compute nodes. -An example for ALCF's Polaris supercomputer that will run 3 MPI tasks of 2 nodes each at the same time: +An example for ALCF's Polaris supercomputer that will run 3 MPI tasks of 2 nodes each at the same +time: .. code-block:: python @@ -63,17 +67,20 @@ An example for ALCF's Polaris supercomputer that will run 3 MPI tasks of 2 nodes .. warning:: Please note that ``Provider`` options that specify per-task or per-node resources, for example, ``SlurmProvider(cores_per_node=N, ...)`` should not be used with :class:`~parsl.executors.high_throughput.MPIExecutor`. - Parsl primarily uses a pilot job model and assumptions from that context do not translate to the MPI context. For - more info refer to : + Parsl primarily uses a pilot job model and assumptions from that context do not translate to the + MPI context. For more info refer to: `github issue #3006 `_ + Writing an MPI App ------------------ -:class:`~parsl.executors.high_throughput.MPIExecutor` can execute both Python or Bash Apps which invoke an MPI application. +:class:`~parsl.executors.high_throughput.MPIExecutor` can execute both Python or Bash Apps which +invoke an MPI application. -Create the app by first defining a function which includes ``parsl_resource_specification`` keyword argument. -The resource specification is a dictionary which defines the number of nodes and ranks used by the application: +Create the app by first defining a function which includes ``parsl_resource_specification`` keyword +argument. The resource specification is a dictionary which defines the number of nodes and ranks +used by the application: .. code-block:: python @@ -83,10 +90,10 @@ The resource specification is a dictionary which defines the number of nodes and 'num_ranks': , # Number of ranks in total } -Then, replace the call to the MPI launcher with ``$PARSL_MPI_PREFIX``. -``$PARSL_MPI_PREFIX`` references an environmental variable which will be replaced with -the correct MPI launcher configured for the resource list provided when calling the function -and with options that map the task to nodes which Parsl knows to be available. +Then, replace the call to the MPI launcher with ``$PARSL_MPI_PREFIX``. ``$PARSL_MPI_PREFIX`` +references an environmental variable which will be replaced with the correct MPI launcher configured +for the resource list provided when calling the function and with options that map the task to nodes +which Parsl knows to be available. The function can be a Bash app @@ -110,7 +117,8 @@ or a Python app: return proc.returncode -Run either App by calling with its arguments and a resource specification which defines how to execute it +Run either App by calling with its arguments and a resource specification which defines how to +execute it .. code-block:: python @@ -123,6 +131,7 @@ Run either App by calling with its arguments and a resource specification which } future = lammps_mpi_application(File('in.file'), parsl_resource_specification=resource_spec) + Advanced: More Environment Variables ++++++++++++++++++++++++++++++++++++ @@ -131,7 +140,8 @@ can make their own MPI invocation using other environment variables. These other variables include versions of the launch command for different launchers -- ``PARSL_MPIEXEC_PREFIX``: mpiexec launch command which works for a large number of batch systems especially PBS systems +- ``PARSL_MPIEXEC_PREFIX``: mpiexec launch command which works for a large number of batch systems + especially PBS systems - ``PARSL_SRUN_PREFIX``: srun launch command for Slurm based clusters - ``PARSL_APRUN_PREFIX``: aprun launch command prefix for some Cray machines @@ -142,11 +152,12 @@ And the information used by Parsl when assembling the launcher commands: - ``PARSL_MPI_NODELIST``: List of assigned nodes separated by commas (Eg, NODE1,NODE2) - ``PARSL_RANKS_PER_NODE``: Number of ranks per node + Limitations +++++++++++ -Support for MPI tasks in HTEX is limited. It is designed for running many multi-node MPI applications within a single -batch job. +Support for MPI tasks in HTEX is limited. It is designed for running many multi-node MPI +applications within a single batch job. #. MPI tasks may not span across nodes from more than one block. #. Parsl does not correctly determine the number of execution slots per block (`Issue #1647 `_) diff --git a/docs/userguide/overview.rst b/docs/userguide/overview.rst index 073cc202e6..cd7eb2b6ee 100644 --- a/docs/userguide/overview.rst +++ b/docs/userguide/overview.rst @@ -1,199 +1,178 @@ Overview ======== -Parsl is designed to enable straightforward parallelism and orchestration of asynchronous -tasks into dataflow-based workflows, in Python. Parsl manages the concurrent execution of -these tasks across various computation resources, from laptops to supercomputers, -scheduling each task only when its dependencies (e.g., input data dependencies) are met. +Parsl is designed to enable straightforward parallelism and orchestration of asynchronous tasks into +dataflow-based workflows, in Python. Parsl manages the concurrent execution of these tasks across +various computation resources, from laptops to supercomputers, scheduling each task only when its +dependencies (e.g., input data dependencies) are met. Developing a Parsl program is a two-step process: -1. Define Parsl apps by annotating Python functions to indicate that they can be executed concurrently. -2. Use standard Python code to invoke Parsl apps, creating asynchronous tasks and adhering to dependencies defined between apps. +1. Define Parsl apps by annotating Python functions to indicate that they can be executed + concurrently. +2. Use standard Python code to invoke Parsl apps, creating asynchronous tasks and adhering to + dependencies defined between apps. -We aim in this section to provide a mental model of how Parsl programs behave. -We discuss how Parsl programs create concurrent tasks, how tasks communicate, -and the nature of the environment on which Parsl programs can perform -operations. In each case, we compare and contrast the behavior of Python -programs that use Parsl constructs with those of conventional Python -programs. +We aim in this section to provide a mental model of how Parsl programs behave. We discuss how Parsl +programs create concurrent tasks, how tasks communicate, and the nature of the environment on which +Parsl programs can perform operations. In each case, we compare and contrast the behavior of Python +programs that use Parsl constructs with those of conventional Python programs. .. note:: - The behavior of a Parsl program can vary in minor respects depending on the - Executor used (see :ref:`label-execution`). We focus here on the behavior seen when - using the recommended `parsl.executors.HighThroughputExecutor` (HTEX). + The behavior of a Parsl program can vary in minor respects depending on the Executor used (see + :ref:`label-execution`). We focus here on the behavior seen when using the recommended + `parsl.executors.HighThroughputExecutor` (HTEX). + Parsl and Concurrency --------------------- -Any call to a Parsl app creates a new task that executes concurrently with the -main program and any other task(s) that are currently executing. Different -tasks may execute on the same nodes or on different nodes, and on the same or -different computers. - -The Parsl execution model thus differs from the Python native execution model, -which is inherently sequential. A Python program that does not contain Parsl -constructs, or make use of other concurrency mechanisms, executes statements -one at a time, in the order that they appear in the program. This behavior is -illustrated in the following figure, which shows a Python program on the left -and, on the right, the statements executed over time when that program is run, -from top to bottom. Each time that the program calls a function, control passes -from the main program (in black) to the function (in red). Execution of the -main program resumes only after the function returns. +Any call to a Parsl app creates a new task that executes concurrently with the main program and any +other task(s) that are currently executing. Different tasks may execute on the same nodes or on +different nodes, and on the same or different computers. + +The Parsl execution model thus differs from the Python native execution model, which is inherently +sequential. A Python program that does not contain Parsl constructs, or make use of other +concurrency mechanisms, executes statements one at a time, in the order that they appear in the +program. This behavior is illustrated in the following figure, which shows a Python program on the +left and, on the right, the statements executed over time when that program is run, from top to +bottom. Each time that the program calls a function, control passes from the main program (in black) +to the function (in red). Execution of the main program resumes only after the function returns. .. image:: ../images/overview/python-concurrency.png :scale: 70 - :align: center - -In contrast, the Parsl execution model is inherently concurrent. Whenever a -program calls an app, a separate thread of execution is created, and the main -program continues without pausing. Thus in the example shown in the figure -below. There is initially a single task: the main program (black). The first -call to ``double`` creates a second task (red) and the second call to ``double`` -creates a third task (orange). The second and third task terminate as the -function that they execute returns. (The dashed lines represent the start and -finish of the tasks). The calling program will only block (wait) when it is -explicitly told to do so (in this case by calling ``result()``) + :align: center + +In contrast, the Parsl execution model is inherently concurrent. Whenever a program calls an app, a +separate thread of execution is created, and the main program continues without pausing. Thus in the +example shown in the figure below. There is initially a single task: the main program (black). The +first call to ``double`` creates a second task (red) and the second call to ``double`` creates a +third task (orange). The second and third task terminate as the function that they execute returns. +(The dashed lines represent the start and finish of the tasks). The calling program will only block +(wait) when it is explicitly told to do so (in this case by calling ``result()``) .. image:: ../images/overview/parsl-concurrency.png .. note:: - Note: We talk here about concurrency rather than parallelism for a reason. - Two activities are concurrent if they can execute at the same time. Two - activities occur in parallel if they do run at the same time. If a Parsl - program creates more tasks that there are available processors, not all - concurrent activities may run in parallel. + Note: We talk here about concurrency rather than parallelism for a reason. Two activities are + concurrent if they can execute at the same time. Two activities occur in parallel if they do run + at the same time. If a Parsl program creates more tasks that there are available processors, not + all concurrent activities may run in parallel. Parsl and Execution ------------------- -We have now seen that Parsl tasks are executed concurrently alongside the main -Python program and other Parsl tasks. We now turn to the question of how and -where are those tasks executed. Given the range of computers on which parallel -programs may be executed, Parsl allows tasks to be executed using different -executors (:py:class:`parsl.executors`). Executors are responsible for taking a queue of tasks and executing -them on local or remote resources. - -We briefly describe two of Parsl's most commonly used executors. -Other executors are described in :ref:`label-execution`. - -The `parsl.executors.HighThroughputExecutor` (HTEX) implements a *pilot job model* that enables -fine-grain task execution using across one or more provisioned nodes. -HTEX can be used on a single node (e.g., a laptop) and will make use of -multiple processes for concurrent execution. -As shown in the following figure, HTEX uses Parsl's provider abstraction (:py:class:`parsl.providers`) to -communicate with a resource manager (e.g., batch scheduler or cloud API) to -provision a set of nodes (e.g., Parsl will use Slurm’s sbatch command to request -nodes on a Slurm cluster) for the duration of execution. -HTEX deploys a lightweight worker agent on the nodes which subsequently connects -back to the main Parsl process. Parsl tasks are then sent from the main program -to the connected workers for execution and the results are sent back via the -same mechanism. This approach has a number of advantages over other methods: -it avoids long job scheduler queue delays by acquiring one set of resources -for the entire program and it allows for scheduling of many tasks on individual -nodes. +We have now seen that Parsl tasks are executed concurrently alongside the main Python program and +other Parsl tasks. We now turn to the question of how and where are those tasks executed. Given the +range of computers on which parallel programs may be executed, Parsl allows tasks to be executed +using different executors (:py:class:`parsl.executors`). Executors are responsible for taking a +queue of tasks and executing them on local or remote resources. + +We briefly describe two of Parsl's most commonly used executors. Other executors are described in +:ref:`label-execution`. + +The `parsl.executors.HighThroughputExecutor` (HTEX) implements a *pilot job model* that enables +fine-grain task execution using across one or more provisioned nodes. HTEX can be used on a single +node (e.g., a laptop) and will make use of multiple processes for concurrent execution. As shown in +the following figure, HTEX uses Parsl's provider abstraction (:py:class:`parsl.providers`) to +communicate with a resource manager (e.g., batch scheduler or cloud API) to provision a set of nodes +(e.g., Parsl will use Slurm’s sbatch command to request nodes on a Slurm cluster) for the duration +of execution. HTEX deploys a lightweight worker agent on the nodes which subsequently connects back +to the main Parsl process. Parsl tasks are then sent from the main program to the connected workers +for execution and the results are sent back via the same mechanism. This approach has a number of +advantages over other methods: it avoids long job scheduler queue delays by acquiring one set of +resources for the entire program and it allows for scheduling of many tasks on individual nodes. .. image:: ../images/overview/htex-model.png -.. Note: - Note: when deploying HTEX, or any pilot job model such as the - WorkQueueExecutor, it is important that the worker nodes be able to connect - back to the main Parsl process. Thus, you should verify that there is network - connectivity between the workers and the Parsl process and ensure that the - correct network address is used by the workers. Parsl provides a helper - function to automatically detect network addresses - (`parsl.addresses.address_by_query`). - +.. note:: + Note: when deploying HTEX, or any pilot job model such as the WorkQueueExecutor, it is important + that the worker nodes be able to connect back to the main Parsl process. Thus, you should verify + that there is network connectivity between the workers and the Parsl process and ensure that the + correct network address is used by the workers. Parsl provides a helper function to automatically + detect network addresses (`parsl.addresses.address_by_query`). -The `parsl.executors.ThreadPoolExecutor` allows tasks to be executed on a pool of locally -accessible threads. As execution occurs on the same computer, on a pool of -threads forked from the main program, the tasks share memory with one another -(this is discussed further in the following sections). +The `parsl.executors.ThreadPoolExecutor` allows tasks to be executed on a pool of locally accessible +threads. As execution occurs on the same computer, on a pool of threads forked from the main program, +the tasks share memory with one another (this is discussed further in the following sections). Parsl and Communication ----------------------- -Parsl tasks typically need to communicate in order to perform useful work. -Parsl provides for two forms of communication: by parameter passing -and by file passing. -As described in the next section, Parsl programs may also communicate by -interacting with shared filesystems and services its environment. +Parsl tasks typically need to communicate in order to perform useful work. Parsl provides for two +forms of communication: by parameter passing and by file passing. As described in the next section, +Parsl programs may also communicate by interacting with shared filesystems and services its +environment. + Parameter Passing ^^^^^^^^^^^^^^^^^ -The figure above illustrates communication via parameter passing. -The call ``double(3)`` to the app ``double`` in the main program creates a new task -and passes the parameter value, 3, to that new task. When the task completes -execution, its return value, 6, is returned to the main program. Similarly, the -second task is passed the value 5 and returns the value 10. In this case, the -parameters passed are simple primitive types (i.e., integers); however, complex -objects (e.g., Numpy Arrays, Pandas DataFrames, custom objects) can also be -passed to/from tasks. +The figure above illustrates communication via parameter passing. The call ``double(3)`` to the app +``double`` in the main program creates a new task and passes the parameter value, 3, to that new +task. When the task completes execution, its return value, 6, is returned to the main program. +Similarly, the second task is passed the value 5 and returns the value 10. In this case, the +parameters passed are simple primitive types (i.e., integers); however, complex objects (e.g., Numpy +Arrays, Pandas DataFrames, custom objects) can also be passed to/from tasks. + File Passing ^^^^^^^^^^^^ -Parsl supports communication via files in both Bash apps and Python apps. -Files may be used in place of parameter passing for many reasons, such as for -apps are designed to support files, when data to be exchanged are large, -or when data cannot be easily serialized into Python objects. -As Parsl tasks may be executed on remote nodes, without shared file systems, -Parsl offers a Parsl :py:class:`parsl.data_provider.files.File` construct for location-independent reference -to files. Parsl will translate file objects to worker-accessible paths -when executing dependent apps. -Parsl is also able to transfer files in, out, and between Parsl -apps using one of several methods (e.g., FTP, HTTP(S), Globus and rsync). -To accommodate the asynchronous nature of file transfer, Parsl treats -data movement like a Parsl app, adding a dependency to the execution graph -and waiting for transfers to complete before executing dependent apps. -More information is provided in :ref:`label-data`). +Parsl supports communication via files in both Bash apps and Python apps. Files may be used in place +of parameter passing for many reasons, such as for apps are designed to support files, when data to +be exchanged are large, or when data cannot be easily serialized into Python objects. As Parsl tasks +may be executed on remote nodes, without shared file systems, Parsl offers a Parsl +:py:class:`parsl.data_provider.files.File` construct for location-independent reference to files. +Parsl will translate file objects to worker-accessible paths when executing dependent apps. Parsl is +also able to transfer files in, out, and between Parsl apps using one of several methods (e.g., FTP, +HTTP(S), Globus and rsync). To accommodate the asynchronous nature of file transfer, Parsl treats +data movement like a Parsl app, adding a dependency to the execution graph and waiting for transfers +to complete before executing dependent apps. More information is provided in :ref:`label-data`). + Futures ^^^^^^^ -Communication via parameter and file passing also serves a second purpose, namely -synchronization. As we discuss in more detail in :ref:`label-futures`, a call to an -app returns a special object called a future that has a special unassigned -state until such time as the app returns, at which time it takes the return -value. (In the example program, two futures are thus created, d1 and d2.) The -AppFuture function result() blocks until the future to which it is applied takes -a value. Thus the print statement in the main program blocks until both child -tasks created by the calls to the double app return. The following figure -captures this behavior, with time going from left to right rather than top to -bottom as in the preceding figure. Task 1 is initially active as it starts -Tasks 2 and 3, then blocks as a result of calls to d1.result() and d2.result(), -and when those values are available, is active again. +Communication via parameter and file passing also serves a second purpose, namely synchronization. +As we discuss in more detail in :ref:`label-futures`, a call to an app returns a special object +called a future that has a special unassigned state until such time as the app returns, at which +time it takes the return value. (In the example program, two futures are thus created, d1 and d2.) +The AppFuture function result() blocks until the future to which it is applied takes a value. Thus +the print statement in the main program blocks until both child tasks created by the calls to the +double app return. The following figure captures this behavior, with time going from left to right +rather than top to bottom as in the preceding figure. Task 1 is initially active as it starts Tasks +2 and 3, then blocks as a result of calls to d1.result() and d2.result(), and when those values are +available, is active again. .. image:: ../images/overview/communication.png + The Parsl Environment --------------------- -Regular Python and Parsl-enhanced Python differ in terms of the environment in -which code executes. We use the term *environment* here to refer to the -variables and modules (the *memory environment*), the file system(s) -(the *file system environment*), and the services (the *service environment*) +Regular Python and Parsl-enhanced Python differ in terms of the environment in which code executes. +We use the term *environment* here to refer to the variables and modules (the *memory environment*), +the file system(s) (the *file system environment*), and the services (the *service environment*) that are accessible to a function. -An important question when it comes to understanding the behavior of Parsl -programs is the environment in which this new task executes: does it have the -same or different memory, file system, or service environment as its parent -task or any other task? The answer, depends on the executor used, and (in the -case of the file system environment) where the task executes. -Below we describe behavior for the most commonly used `parsl.executors.HighThroughputExecutor` -which is representative of all Parsl executors except the `parsl.executors.ThreadPoolExecutor`. +An important question when it comes to understanding the behavior of Parsl programs is the +environment in which this new task executes: does it have the same or different memory, file system, +or service environment as its parent task or any other task? The answer, depends on the executor +used, and (in the case of the file system environment) where the task executes. Below we describe +behavior for the most commonly used `parsl.executors.HighThroughputExecutor` which is representative +of all Parsl executors except the `parsl.executors.ThreadPoolExecutor`. + +.. warning:: + The `parsl.executors.ThreadPoolExecutor` behaves differently than other Parsl executors as it + allows tasks to share memory. -.. Warning: - The `parsl.executors.ThreadPoolExecutor` behaves differently than other Parsl executors as - it allows tasks to share memory. Memory environment -^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^ -In Python, the variables and modules that are accessible to a function are defined -by Python scoping rules, by which a function has access to both variables defined -within the function (*local* variables) and those defined outside the function -(*global* variables). Thus in the following code, the print statement in the -print_answer function accesses the global variable "answer", and we see as output -"the answer is 42." +In Python, the variables and modules that are accessible to a function are defined by Python scoping +rules, by which a function has access to both variables defined within the function (*local* +variables) and those defined outside the function (*global* variables). Thus in the following code, +the print statement in the print_answer function accesses the global variable "answer", and we see +as output "the answer is 42." .. code-block:: python @@ -204,20 +183,17 @@ print_answer function accesses the global variable "answer", and we see as outpu print_answer() +In Parsl (except when using the `parsl.executors.ThreadPoolExecutor`) a Parsl app is executed in a +distinct environment that only has access to local variables associated with the app function. Thus, +if the program above is executed with say the `parsl.executors.HighThroughputExecutor`, will print +"the answer is 0" rather than "the answer is 42," because the print statement in provide_answer does +not have access to the global variable that has been assigned the value 42. The program will run +without errors when using the `parsl.executors.ThreadPoolExecutor`. -In Parsl (except when using the `parsl.executors.ThreadPoolExecutor`) a Parsl app is executed -in a distinct environment that only has access to local variables associated -with the app function. Thus, if the program above is executed with say the -`parsl.executors.HighThroughputExecutor`, will print "the answer is 0" rather than "the answer -is 42," because the print statement in provide_answer does not have access to -the global variable that has been assigned the value 42. The program will -run without errors when using the `parsl.executors.ThreadPoolExecutor`. - -Similarly, the same scoping rules apply to import statements, and thus -the following program will run without errors with the `parsl.executors.ThreadPoolExecutor`, -but raise errors when run with any other executor, because the return statement -in ``ambiguous_double`` refers to a variable (factor) and a module (random) that are -not known to the function. +Similarly, the same scoping rules apply to import statements, and thus the following program will +run without errors with the `parsl.executors.ThreadPoolExecutor`, but raise errors when run with any +other executor, because the return statement in ``ambiguous_double`` refers to a variable (factor) +and a module (random) that are not known to the function. .. code-block:: python @@ -229,11 +205,9 @@ not known to the function. return x * random.random() * factor print(ambiguous_double(42)) - -To allow this program to run correctly with all Parsl executors, the random -library must be imported within the app, and the factor variable must be -passed as an argument, as follows. +To allow this program to run correctly with all Parsl executors, the random library must be imported +within the app, and the factor variable must be passed as an argument, as follows. .. code-block:: python @@ -248,15 +222,13 @@ passed as an argument, as follows. print(good_double(factor, 42)) -File system environment +File system environment ^^^^^^^^^^^^^^^^^^^^^^^ -In a regular Python program the environment that is accessible to a Python -program also includes the file system(s) of the computer on which it is -executing. -Thus in the following code, a value written to a file "answer.txt" in the -current directory can be retrieved by reading the same file, and the print -statement outputs "the answer is 42." +In a regular Python program the environment that is accessible to a Python program also includes the +file system(s) of the computer on which it is executing. Thus in the following code, a value written +to a file "answer.txt" in the current directory can be retrieved by reading the same file, and the +print statement outputs "the answer is 42." .. code-block:: python @@ -271,38 +243,38 @@ statement outputs "the answer is 42." print_answer_file() -The question of which file system environment is accessible to a Parsl app -depends on where the app executes. If two tasks run on nodes that share a -file system, then those tasks (e.g., tasks A and B in the figure below, -but not task C) share a file system environment. Thus the program above will -output "the answer is 42" if the parent task and the child task run on -nodes 1 and 2, but not if they run on nodes 2 and 3. +The question of which file system environment is accessible to a Parsl app depends on where the app +executes. If two tasks run on nodes that share a file system, then those tasks (e.g., tasks A and B +in the figure below, but not task C) share a file system environment. Thus the program above will +output "the answer is 42" if the parent task and the child task run on nodes 1 and 2, but not if +they run on nodes 2 and 3. .. image:: ../images/overview/filesystem.png :scale: 70 - :align: center + :align: center + Service Environment ^^^^^^^^^^^^^^^^^^^ -We use the term service environment to refer to network services that may be -accessible to a Parsl program, such as a Redis server or Globus data management -service. These services are accessible to any task. +We use the term service environment to refer to network services that may be accessible to a Parsl +program, such as a Redis server or Globus data management service. These services are accessible to +any task. + Environment Summary ^^^^^^^^^^^^^^^^^^^ -As we summarize in the table, if tasks execute with the `parsl.executors.ThreadPoolExecutor`, -they share the memory and file system environment of the parent task. If they -execute with any other executor, they have a separate memory environment, and -may or may not share their file system environment with other tasks, depending -on where they are placed. All tasks typically have access to the same network -services. +As we summarize in the table, if tasks execute with the `parsl.executors.ThreadPoolExecutor`, they +share the memory and file system environment of the parent task. If they execute with any other +executor, they have a separate memory environment, and may or may not share their file system +environment with other tasks, depending on where they are placed. All tasks typically have access to +the same network services. +--------------------+--------------------+--------------------+---------------------------+------------------+ | | Share memory | Share file system | Share file system | Share service | | | environment with | environment with | environment with other | environment | -| | parent/other tasks | parent | tasks | with other tasks | +| | parent/other tasks | parent | tasks | with other tasks | +====================+====================+====================+===========================+==================+ +--------------------+--------------------+--------------------+---------------------------+------------------+ | Python | Yes | Yes | N/A | N/A | diff --git a/docs/userguide/parsl_perf.rst b/docs/userguide/parsl_perf.rst index 2ea1adb00f..47e5fefeee 100644 --- a/docs/userguide/parsl_perf.rst +++ b/docs/userguide/parsl_perf.rst @@ -3,25 +3,20 @@ Measuring performance with parsl-perf ===================================== -``parsl-perf`` is tool for making basic performance measurements of Parsl -configurations. +``parsl-perf`` is tool for making basic performance measurements of Parsl configurations. -It runs increasingly large numbers of no-op apps until a batch takes -(by default) 120 seconds, giving a measurement of tasks per second. +It runs increasingly large numbers of no-op apps until a batch takes (by default) 120 seconds, +giving a measurement of tasks per second. -This can give a basic measurement of some of the overheads in task -execution. +This can give a basic measurement of some of the overheads in task execution. -``parsl-perf`` must be invoked with a configuration file, which is a Python -file containing a variable ``config`` which contains a `Config` object, or -a function ``fresh_config`` which returns a `Config` object. The -``fresh_config`` format is the same as used with the pytest test suite. +``parsl-perf`` must be invoked with a configuration file, which is a Python file containing a +variable ``config`` which contains a `Config` object, or a function ``fresh_config`` which returns a +`Config` object. The ``fresh_config`` format is the same as used with the pytest test suite. -To specify a ``parsl_resource_specification`` for tasks, add a ``--resources`` -argument. +To specify a ``parsl_resource_specification`` for tasks, add a ``--resources`` argument. -To change the target runtime from the default of 120 seconds, add a -``--time`` parameter. +To change the target runtime from the default of 120 seconds, add a ``--time`` parameter. For example: @@ -50,4 +45,4 @@ For example: Tasks per second: 158.184 Cleaning up DFK The end - + diff --git a/docs/userguide/plugins.rst b/docs/userguide/plugins.rst index cd9244960c..8cad7c34f2 100644 --- a/docs/userguide/plugins.rst +++ b/docs/userguide/plugins.rst @@ -1,106 +1,101 @@ Plugins ======= -Parsl has several places where code can be plugged in. Parsl usually provides -several implementations that use each plugin point. +Parsl has several places where code can be plugged in. Parsl usually provides several +implementations that use each plugin point. + +This page gives a brief summary of those places and why you might want to use them, with links to +the API guide. -This page gives a brief summary of those places and why you might want -to use them, with links to the API guide. Executors --------- -When the parsl dataflow kernel is ready for a task to run, it passes that -task to an `ParslExecutor`. The executor is then responsible for running the task's -Python code and returning the result. This is the abstraction that allows one -executor to run code on the local submitting host, while another executor can -run the same code on a large supercomputer. + +When the parsl dataflow kernel is ready for a task to run, it passes that task to an `ParslExecutor`. +The executor is then responsible for running the task's Python code and returning the result. This +is the abstraction that allows one executor to run code on the local submitting host, while another +executor can run the same code on a large supercomputer. Providers and Launchers ----------------------- -Some executors are based on blocks of workers (for example the -`parsl.executors.HighThroughputExecutor`: the submit side requires a -batch system (eg slurm, kubernetes) to start worker processes, which then +Some executors are based on blocks of workers (for example the `parsl.executors.HighThroughputExecutor`: +the submit side requires a batch system (eg slurm, kubernetes) to start worker processes, which then execute tasks. -The particular way in which a system makes those workers start is implemented -by providers and launchers. +The particular way in which a system makes those workers start is implemented by providers and +launchers. -An `ExecutionProvider` allows a command line to be submitted as a request to the -underlying batch system to be run inside an allocation of nodes. +An `ExecutionProvider` allows a command line to be submitted as a request to the underlying batch +system to be run inside an allocation of nodes. + +A `Launcher` modifies that command line when run inside the allocation to add on any wrappers that +are needed to launch the command (eg srun inside slurm). Providers and launchers are usually paired +together for a particular system type. -A `Launcher` modifies that command line when run inside the allocation to -add on any wrappers that are needed to launch the command (eg srun inside -slurm). Providers and launchers are usually paired together for a particular -system type. File staging ------------ -Parsl can copy input files from an arbitrary URL into a task's working -environment, and copy output files from a task's working environment to -an arbitrary URL. A small set of data staging providers is installed by default, -for ``file://`` ``http://`` and ``ftp://`` URLs. More data staging providers can -be added in the workflow configuration, in the ``storage`` parameter of the -relevant `ParslExecutor`. Each provider should subclass the `Staging` class. + +Parsl can copy input files from an arbitrary URL into a task's working environment, and copy output +files from a task's working environment to an arbitrary URL. A small set of data staging providers +is installed by default, for ``file://`` ``http://`` and ``ftp://`` URLs. More data staging +providers can be added in the workflow configuration, in the ``storage`` parameter of the relevant +`ParslExecutor`. Each provider should subclass the `Staging` class. Default stdout/stderr name generation ------------------------------------- -Parsl can choose names for your bash apps stdout and stderr streams -automatically, with the parsl.AUTO_LOGNAME parameter. The choice of path is -made by a function which can be configured with the ``std_autopath`` -parameter of Parsl `Config`. By default, ``DataFlowKernel.default_std_autopath`` + +Parsl can choose names for your bash apps stdout and stderr streams automatically, with the +parsl.AUTO_LOGNAME parameter. The choice of path is made by a function which can be configured with +the ``std_autopath`` parameter of Parsl `Config`. By default, ``DataFlowKernel.default_std_autopath`` will be used. Memoization/checkpointing ------------------------- -When parsl memoizes/checkpoints an app parameter, it does so by computing a -hash of that parameter that should be the same if that parameter is the same -on subsequent invocations. This isn't straightforward to do for arbitrary -objects, so parsl implements a checkpointing hash function for a few common -types, and raises an exception on unknown types: +When parsl memoizes/checkpoints an app parameter, it does so by computing a hash of that parameter +that should be the same if that parameter is the same on subsequent invocations. This isn't +straightforward to do for arbitrary objects, so parsl implements a checkpointing hash function for a +few common types, and raises an exception on unknown types: .. code-block:: ValueError("unknown type for memoization ...") -You can plug in your own type-specific hash code for additional types that -you need and understand using `id_for_memo`. +You can plug in your own type-specific hash code for additional types that you need and understand +using `id_for_memo`. Invoking other asynchronous components -------------------------------------- -Parsl code can invoke other asynchronous components which return Futures, and -integrate those Futures into the task graph: Parsl apps can be given any -`concurrent.futures.Future` as a dependency, even if those futures do not come -from invoking a Parsl app. This includes as the return value of a +Parsl code can invoke other asynchronous components which return Futures, and integrate those +Futures into the task graph: Parsl apps can be given any `concurrent.futures.Future` as a dependency, +even if those futures do not come from invoking a Parsl app. This includes as the return value of a ``join_app``. -An specific example of this is integrating Globus Compute tasks into a Parsl -task graph. See :ref:`label-join-globus-compute` +An specific example of this is integrating Globus Compute tasks into a Parsl task graph. See :ref:`label-join-globus-compute` + Dependency resolution --------------------- -When Parsl examines the arguments to an app, it uses a `DependencyResolver`. -The default `DependencyResolver` will cause Parsl to wait for -``concurrent.futures.Future`` instances (including `AppFuture` and -`DataFuture`), and pass through other arguments without waiting. +When Parsl examines the arguments to an app, it uses a `DependencyResolver`. The default +`DependencyResolver` will cause Parsl to wait for ``concurrent.futures.Future`` instances (including +`AppFuture` and `DataFuture`), and pass through other arguments without waiting. + +This behaviour is pluggable: Parsl comes with another dependency resolver, `DEEP_DEPENDENCY_RESOLVER` +which knows about futures contained with structures such as tuples, lists, sets and dicts. -This behaviour is pluggable: Parsl comes with another dependency resolver, -`DEEP_DEPENDENCY_RESOLVER` which knows about futures contained with structures -such as tuples, lists, sets and dicts. +This plugin interface might be used to interface other task-like or future-like objects to the Parsl +dependency mechanism, by describing how they can be interpreted as a Future. -This plugin interface might be used to interface other task-like or future-like -objects to the Parsl dependency mechanism, by describing how they can be -interpreted as a Future. Removed interfaces ------------------ Parsl had a deprecated ``Channel`` abstraction. See -`issue 3515 `_ -for further discussion on its removal. +`issue 3515 `_ for further discussion on its removal. diff --git a/docs/userguide/usage_tracking.rst b/docs/userguide/usage_tracking.rst index da8ac9b79d..b835fe0d34 100644 --- a/docs/userguide/usage_tracking.rst +++ b/docs/userguide/usage_tracking.rst @@ -3,29 +3,42 @@ Usage Statistics Collection =========================== -Parsl uses an **Opt-in** model for usage tracking, allowing users to decide if they wish to participate. Usage statistics are crucial for improving software reliability and help focus development and maintenance efforts on the most used components of Parsl. The collected data is used solely for enhancements and reporting and is not shared in its raw form outside of the Parsl team. +Parsl uses an **Opt-in** model for usage tracking, allowing users to decide if they wish to +participate. Usage statistics are crucial for improving software reliability and help focus +development and maintenance efforts on the most used components of Parsl. The collected data is used +solely for enhancements and reporting and is not shared in its raw form outside of the Parsl team. + Why are we doing this? ---------------------- -The Parsl development team relies on funding from government agencies. To sustain this funding and advocate for continued support, it is essential to show that the research community benefits from these investments. +The Parsl development team relies on funding from government agencies. To sustain this funding and +advocate for continued support, it is essential to show that the research community benefits from +these investments. + +By opting in to share usage data, you actively support the ongoing development and maintenance of +Parsl. (See:ref:`What is sent? ` below). -By opting in to share usage data, you actively support the ongoing development and maintenance of Parsl. (See:ref:`What is sent? ` below). Opt-In Model ------------ -We use an **opt-in model** for usage tracking to respect user privacy and provide full control over shared information. We hope that developers and researchers will choose to send us this information. The reason is that we need this data - it is a requirement for funding. +We use an **opt-in model** for usage tracking to respect user privacy and provide full control over +shared information. We hope that developers and researchers will choose to send us this information. +The reason is that we need this data - it is a requirement for funding. Choose the data you share with Usage Tracking Levels. **Usage Tracking Levels:** -* **Level 1:** Only basic information such as Python version, Parsl version, and platform name (Linux, MacOS, etc.) -* **Level 2:** Level 1 information and configuration information including provider, executor, and launcher names. -* **Level 3:** Level 2 information and workflow execution details, including the number of applications run, failures, and execution time. +* **Level 1:** Only basic information such as Python version, Parsl version, and platform name + (Linux, MacOS, etc.) +* **Level 2:** Level 1 information and configuration information including provider, executor, and + launcher names. +* **Level 3:** Level 2 information and workflow execution details, including the number of + applications run, failures, and execution time. -By enabling usage tracking, you support Parsl's development. +By enabling usage tracking, you support Parsl's development. **To opt-in, set** ``usage_tracking`` **to the desired level (1, 2, or 3) in the configuration object** (``parsl.config.Config``) **.** @@ -42,6 +55,7 @@ Example: usage_tracking=3 ) + .. _what-is-sent: What is sent? @@ -49,9 +63,12 @@ What is sent? The data collected depends on the tracking level selected: -* **Level 1:** Only basic information such as Python version, Parsl version, and platform name (Linux, MacOS, etc.) -* **Level 2:** Level 1 information and configuration information including provider, executor, and launcher names. -* **Level 3:** Level 2 information and workflow execution details, including the number of applications run, failures, and execution time. +* **Level 1:** Only basic information such as Python version, Parsl version, and platform name + (Linux, MacOS, etc.) +* **Level 2:** Level 1 information and configuration information including provider, executor, and + launcher names. +* **Level 3:** Level 2 information and workflow execution details, including the number of + applications run, failures, and execution time. **Example Messages:** @@ -101,12 +118,17 @@ The data collected depends on the tracking level selected: **All messages sent are logged in the** ``parsl.log`` **file, ensuring complete transparency.** + How is the data sent? --------------------- -Data is sent using **UDP** to minimize the impact on workflow performance. While this may result in some data loss, it significantly reduces the chances of usage tracking affecting the software's operation. +Data is sent using **UDP** to minimize the impact on workflow performance. While this may result in +some data loss, it significantly reduces the chances of usage tracking affecting the software's +operation. + +The data is processed through AWS CloudWatch to generate a monitoring dashboard, providing valuable +insights into usage patterns. -The data is processed through AWS CloudWatch to generate a monitoring dashboard, providing valuable insights into usage patterns. When is the data sent? ---------------------- @@ -116,10 +138,12 @@ Data is sent twice per run: 1. At the start of the script. 2. Upon script completion (for Tracking Level 3). + What will the data be used for? ------------------------------- -The data will help the Parsl team understand Parsl usage and make development and maintenance decisions, including: +The data will help the Parsl team understand Parsl usage and make development and maintenance +decisions, including: * Focus development and maintenance on the most-used components of Parsl. * Determine which Python versions to continue supporting. @@ -127,10 +151,13 @@ The data will help the Parsl team understand Parsl usage and make development an * Assess how long it takes for most users to adopt new changes. * Track usage statistics to report to funders. + Usage Statistics Dashboard -------------------------- -The collected data is aggregated and displayed on a publicly accessible dashboard. This dashboard provides an overview of how Parsl is being used across different environments and includes metrics such as: +The collected data is aggregated and displayed on a publicly accessible dashboard. This dashboard +provides an overview of how Parsl is being used across different environments and includes metrics +such as: * Total workflows executed over time * Most-used Python and Parsl versions @@ -138,6 +165,7 @@ The collected data is aggregated and displayed on a publicly accessible dashboar `Find the dashboard here `_ + Leaderboard ----------- @@ -160,12 +188,14 @@ Example: project_name="my-test-project" ) -Every run of parsl with usage tracking **Level 1** or **Level 2** earns you **1 point**. And every run with usage tracking **Level 3**, earns you **2 points**. - +Every run of parsl with usage tracking **Level 1** or **Level 2** earns you **1 point**. And every +run with usage tracking **Level 3**, earns you **2 points**. + + Feedback -------- -Please send us your feedback at parsl@googlegroups.com. Feedback from our user communities will be +Please send us your feedback at parsl@googlegroups.com. Feedback from our user communities will be useful in determining our path forward with usage tracking in the future. **Please consider turning on usage tracking to support the continued development of Parsl.** diff --git a/docs/userguide/workflow.rst b/docs/userguide/workflow.rst index 2a0a2c8c28..011222970e 100644 --- a/docs/userguide/workflow.rst +++ b/docs/userguide/workflow.rst @@ -4,28 +4,26 @@ Example parallel patterns ========================= Parsl can be used to implement a wide range of parallel programming patterns, from bag of tasks -through to nested workflows. Parsl implicitly assembles a dataflow -dependency graph based on the data shared between apps. -The flexibility of this model allows for the implementation of a wide range -of parallel programming and workflow patterns. +through to nested workflows. Parsl implicitly assembles a dataflow dependency graph based on the +data shared between apps. The flexibility of this model allows for the implementation of a wide +range of parallel programming and workflow patterns. -Parsl is also designed to address broad execution requirements, from programs -that run many short tasks to those that run a few long tasks. +Parsl is also designed to address broad execution requirements, from programs that run many short +tasks to those that run a few long tasks. -Below we illustrate a range of parallel programming and workflow patterns. It is important -to note that this set of examples is by no means comprehensive. +Below we illustrate a range of parallel programming and workflow patterns. It is important to note +that this set of examples is by no means comprehensive. Bag of Tasks ------------ -Parsl can be used to execute a large bag of tasks. In this case, Parsl -assembles the set of tasks (represented as Parsl apps) and manages their concurrent -execution on available resources. +Parsl can be used to execute a large bag of tasks. In this case, Parsl assembles the set of tasks +(represented as Parsl apps) and manages their concurrent execution on available resources. .. code-block:: python from parsl import python_app - + parsl.load() # Map function that returns double the input integer @@ -39,21 +37,24 @@ execution on available resources. x = app_random() results.append(x) - for r in results: + for r in results: print(r.result()) Sequential workflows -------------------- -Sequential workflows can be created by passing an AppFuture from one task to another. For example, in the following program the ``generate`` app (a Python app) generates a random number that is consumed by the ``save`` app (a Bash app), which writes it to a file. Because ``save`` cannot execute until it receives the ``message`` produced by ``generate``, the two apps execute in sequence. +Sequential workflows can be created by passing an AppFuture from one task to another. For example, +in the following program the ``generate`` app (a Python app) generates a random number that is +consumed by the ``save`` app (a Bash app), which writes it to a file. Because ``save`` cannot +execute until it receives the ``message`` produced by ``generate``, the two apps execute in sequence. .. code-block:: python from parsl import python_app - + parsl.load() - + # Generate a random number @python_app def generate(limit): @@ -77,14 +78,18 @@ Sequential workflows can be created by passing an AppFuture from one task to ano Parallel workflows ------------------ -Parallel execution occurs automatically in Parsl, respecting dependencies among app executions. In the following example, three instances of the ``wait_sleep_double`` app are created. The first two execute concurrently, as they have no dependencies; the third must wait until the first two complete and thus the ``doubled_x`` and ``doubled_y`` futures have values. Note that this sequencing occurs even though ``wait_sleep_double`` does not in fact use its second and third arguments. +Parallel execution occurs automatically in Parsl, respecting dependencies among app executions. In +the following example, three instances of the ``wait_sleep_double`` app are created. The first two +execute concurrently, as they have no dependencies; the third must wait until the first two complete +and thus the ``doubled_x`` and ``doubled_y`` futures have values. Note that this sequencing occurs +even though ``wait_sleep_double`` does not in fact use its second and third arguments. .. code-block:: python - + from parsl import python_app parsl.load() - + @python_app def wait_sleep_double(x, foo_1, foo_2): import time @@ -109,14 +114,15 @@ Parallel execution occurs automatically in Parsl, respecting dependencies among Parallel workflows with loops ----------------------------- -A common approach to executing Parsl apps in parallel is via loops. The following example uses a loop to create many random numbers in parallel. +A common approach to executing Parsl apps in parallel is via loops. The following example uses a +loop to create many random numbers in parallel. .. code-block:: python from parsl import python_app - + parsl.load() - + @python_app def generate(limit): """Generate a random integer and return it""" @@ -149,15 +155,18 @@ The :class:`~parsl.concurrent.ParslPoolExecutor` simplifies this pattern using t outputs = pool.map(generate, range(1, 5)) -In the preceding example, the execution of different tasks is coordinated by passing Python objects from producers to consumers. -In other cases, it can be convenient to pass data in files, as in the following reformulation. Here, a set of files, each with a random number, is created by the ``generate`` app. These files are then concatenated into a single file, which is subsequently used to compute the sum of all numbers. +In the preceding example, the execution of different tasks is coordinated by passing Python objects +from producers to consumers. In other cases, it can be convenient to pass data in files, as in the +following reformulation. Here, a set of files, each with a random number, is created by the +``generate`` app. These files are then concatenated into a single file, which is subsequently used +to compute the sum of all numbers. .. code-block:: python from parsl import python_app, bash_app - + parsl.load() - + @bash_app def generate(outputs=()): return 'echo $(( RANDOM % (10 - 5 + 1 ) + 5 )) &> {}'.format(outputs[0]) @@ -190,16 +199,15 @@ In other cases, it can be convenient to pass data in files, as in the following MapReduce --------- -MapReduce is a common pattern used in data analytics. It is composed of a map phase -that filters values and a reduce phase that aggregates values. -The following example demonstrates how Parsl can be used to specify a MapReduce computation -in which the map phase doubles a set of input integers and the reduce phase computes -the sum of those results. +MapReduce is a common pattern used in data analytics. It is composed of a map phase that filters +values and a reduce phase that aggregates values. The following example demonstrates how Parsl can +be used to specify a MapReduce computation in which the map phase doubles a set of input integers +and the reduce phase computes the sum of those results. .. code-block:: python from parsl import python_app - + parsl.load() # Map function that returns double the input integer @@ -226,18 +234,20 @@ the sum of those results. print(total.result()) -The program first defines two Parsl apps, ``app_double`` and ``app_sum``. -It then makes calls to the ``app_double`` app with a set of input -values. It then passes the results from ``app_double`` to the ``app_sum`` app -to aggregate values into a single result. -These tasks execute concurrently, synchronized by the ``mapped_results`` variable. -The following figure shows the resulting task graph. +The program first defines two Parsl apps, ``app_double`` and ``app_sum``. It then makes calls to the +``app_double`` app with a set of input values. It then passes the results from ``app_double`` to the +``app_sum`` app to aggregate values into a single result. These tasks execute concurrently, +synchronized by the ``mapped_results`` variable. The following figure shows the resulting task +graph. .. image:: ../images/MapReduce.png + Caching expensive initialisation between tasks ---------------------------------------------- -Many tasks in workflows require a expensive "initialization" steps that, once performed, can be used across successive invocations for that task. For example, you may want to reuse a machine learning model for multiple interface tasks and avoid loading it onto GPUs more than once. +Many tasks in workflows require a expensive "initialization" steps that, once performed, can be used +across successive invocations for that task. For example, you may want to reuse a machine learning +model for multiple interface tasks and avoid loading it onto GPUs more than once. `This ExaWorks tutorial `_ gives examples of how to do this. From c03a33ccf8dcdff739d987d0792f37076828ec0d Mon Sep 17 00:00:00 2001 From: astro-friedel Date: Tue, 10 Dec 2024 09:15:37 -0600 Subject: [PATCH 2/2] bug fixes --- .gitignore | 4 ++++ docs/Makefile | 4 ++-- docs/userguide/configuring.rst | 26 ++++++++++++-------------- docs/userguide/monitoring.rst | 16 ++++++++-------- 4 files changed, 26 insertions(+), 24 deletions(-) diff --git a/.gitignore b/.gitignore index 8811016b83..22752d0cdb 100644 --- a/.gitignore +++ b/.gitignore @@ -121,3 +121,7 @@ ENV/ # emacs buffers \#* + +docs/stubs/ + +docs/1-parsl-introduction.ipynb diff --git a/docs/Makefile b/docs/Makefile index 6e79b1697f..54ef828d16 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -28,8 +28,8 @@ help: @echo " epub to make an epub" @echo " epub3 to make an epub3" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" - @echo " latexpdf to make LaTeX files and run them through pdflatex" - @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" + @echo " latexpdf to make LaTeX files and run them through pdflatex (currently does not work)" + @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx (currently does not work)" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" diff --git a/docs/userguide/configuring.rst b/docs/userguide/configuring.rst index 7eb7345c17..3a58ce94c1 100644 --- a/docs/userguide/configuring.rst +++ b/docs/userguide/configuring.rst @@ -85,7 +85,7 @@ How to Configure .. note:: All configuration examples below must be customized for the user's allocation, Python environment, -file system, etc. + file system, etc. The configuration specifies what, and how, resources are to be used for executing the Parsl program @@ -185,7 +185,7 @@ Stepping through the following question should help formulate a suitable configu .. note:: If using a Cray system, you most likely need to use the `parsl.launchers.AprunLauncher` to launch -workers unless you are on a **native Slurm** system like :ref:`configuring_nersc_cori` + workers unless you are on a **native Slurm** system like :ref:`configuring_nersc_cori` Heterogeneous Resources @@ -285,8 +285,8 @@ Then add the following to your config: .. note:: There will be a noticeable delay the first time Work Queue sees an app; it is creating and -packaging a complete Python environment. This packaged environment is cached, so subsequent app -invocations should be much faster. + packaging a complete Python environment. This packaged environment is cached, so subsequent app + invocations should be much faster. Using this approach, it is possible to run Parsl applications on nodes that don't have Python available at all. The packaged environment includes a Python interpreter, and Work Queue does not @@ -294,8 +294,8 @@ require Python to run. .. note:: The automatic packaging feature only supports packages installed via ``pip`` or ``conda``. -Importing from other locations (e.g. via ``$PYTHONPATH``) or importing other modules in the same -directory is not supported. + Importing from other locations (e.g. via ``$PYTHONPATH``) or importing other modules in the same + directory is not supported. Accelerators @@ -454,10 +454,8 @@ connect to AWS. .. literalinclude:: ../../parsl/configs/ec2.py -ASPIRE 1 (NSCC) ---------------- - -.. image:: https://www.nscc.sg/wp-content/uploads/2017/04/ASPIRE1Img.png +ASPIRE 1 (NSCC) (Decommissioned) +-------------------------------- The following snippet shows an example configuration for accessing NSCC's **ASPIRE 1** supercomputer. This example uses the `parsl.executors.HighThroughputExecutor` executor and connects to ASPIRE1's @@ -635,13 +633,13 @@ Polaris uses `parsl.providers.PBSProProvider` and `parsl.launchers.MpiExecLaunch onto the HPC system. -Stampede2 (TACC) ----------------- +Stampede2 (TACC) (Decommissioned) +--------------------------------- -.. image:: https://www.tacc.utexas.edu/documents/1084364/1413880/stampede2-0717.jpg/ +.. image:: https://tacc.utexas.edu/media/filer_public_thumbnails/filer_public/5d/7c/5d7cd2e7-b2a0-461c-9b91-ecb608e85884/stampede2.jpg__992x992_q85_subsampling-2.jpg The following snippet shows an example configuration for accessing TACC's **Stampede2** -supercomputer. This example uses theHighThroughput executor and connects to Stampede2's Slurm +supercomputer. This example uses the HighThroughput executor and connects to Stampede2's Slurm scheduler. .. literalinclude:: ../../parsl/configs/stampede2.py diff --git a/docs/userguide/monitoring.rst b/docs/userguide/monitoring.rst index a11848e11f..1a98e6d741 100644 --- a/docs/userguide/monitoring.rst +++ b/docs/userguide/monitoring.rst @@ -66,8 +66,8 @@ example, if the full path to the database is ``/tmp/my_monitoring.db``, run:: $ parsl-visualize sqlite:////tmp/my_monitoring.db By default, the visualization web server listens on ``127.0.0.1:8080``. If the web server is -deployed on a machine with a web browser, the dashboard can be accessed in the browser at ` -`127.0.0.1:8080``. If the web server is deployed on a remote machine, such as the login node of a +deployed on a machine with a web browser, the dashboard can be accessed in the browser at +``127.0.0.1:8080``. If the web server is deployed on a remote machine, such as the login node of a cluster, you will need to use an ssh tunnel from your local machine to the cluster:: $ ssh -L 50000:127.0.0.1:8080 username@cluster_address @@ -76,8 +76,8 @@ This command will bind your local machine's port 50000 to the remote cluster's p The dashboard can then be accessed via the local machine's browser at ``127.0.0.1:50000``. .. warning:: Alternatively you can deploy the visualization server on a public interface. However, -first check that this is allowed by the cluster's security policy. The following example shows how -to deploy the web server on a public port (i.e., open to Internet via ``public_IP:55555``):: + first check that this is allowed by the cluster's security policy. The following example shows how + to deploy the web server on a public port (i.e., open to Internet via ``public_IP:55555``):: $ parsl-visualize --listen 0.0.0.0 --port 55555 @@ -107,17 +107,17 @@ times as well as task summary statistics. The workflow summary section is follow The workflow summary also presents three different views of the workflow: * Workflow DAG - with apps differentiated by colors: This visualization is useful to visually -inspect the dependency structure of the workflow. Hovering over the nodes in the DAG shows a tooltip -for the app represented by the node and it's task ID. + inspect the dependency structure of the workflow. Hovering over the nodes in the DAG shows a tooltip + for the app represented by the node and it's task ID. .. image:: ../images/mon_task_app_grouping.png * Workflow DAG - with task states differentiated by colors: This visualization is useful to identify -what tasks have been completed, failed, or are currently pending. + what tasks have been completed, failed, or are currently pending. .. image:: ../images/mon_task_state_grouping.png * Workflow resource usage: This visualization provides resource usage information at the workflow -level. For example, cumulative CPU/Memory utilization across workers over time. + level. For example, cumulative CPU/Memory utilization across workers over time. .. image:: ../images/mon_resource_summary.png