From 5befcd5dbba82c33323a940f8397a322913b6597 Mon Sep 17 00:00:00 2001 From: Greg Caporaso Date: Fri, 23 Aug 2024 12:47:56 -0700 Subject: [PATCH] shuffle location of parallel try/except note --- book/_toc.yml | 1 + .../how-to-guides/parallel-configuration.md | 38 +------------------ ...handle-exceptions-in-parallel-pipelines.md | 32 ++++++++++++++++ .../tutorials/add-parallel-pipeline.md | 7 +--- 4 files changed, 36 insertions(+), 42 deletions(-) create mode 100644 book/plugins/how-to-guides/handle-exceptions-in-parallel-pipelines.md diff --git a/book/_toc.yml b/book/_toc.yml index a9dd239e..a69cd58a 100644 --- a/book/_toc.yml +++ b/book/_toc.yml @@ -40,6 +40,7 @@ parts: - file: plugins/how-to-guides/test-plugins - file: plugins/how-to-guides/usage-examples - file: plugins/how-to-guides/format-validation-levels + - file: plugins/how-to-guides/handle-exceptions-in-parallel-pipelines - file: plugins/explanations/intro sections: - file: plugins/explanations/actions diff --git a/book/framework/how-to-guides/parallel-configuration.md b/book/framework/how-to-guides/parallel-configuration.md index fbaac2d3..19e4701b 100644 --- a/book/framework/how-to-guides/parallel-configuration.md +++ b/book/framework/how-to-guides/parallel-configuration.md @@ -3,7 +3,7 @@ ```{note} This is more of an advanced user or system administrator usage document. -[This is slated to move](https://github.com/caporaso-lab/developing-with-qiime2/issues/29) to the new general-purpose user documentation. +[This is slated to move](https://github.com/caporaso-lab/developing-with-qiime2/issues/29) to the new general-purpose user documentation. ``` QIIME 2 supports parallelization of pipelines through [Parsl](https://parsl.readthedocs.io/en/stable/1-parsl-introduction.html>). @@ -68,7 +68,7 @@ The [Parsl documentation](https://parsl.readthedocs.io/en/stable/) provides full Briefly, we create a [`ThreadPoolExecutor`](https://parsl.readthedocs.io/en/stable/stubs/parsl.executors.ThreadPoolExecutor.html?highlight=Threadpoolexecutor) that parallelizes jobs across multiple threads in a process. We also create a [`HighThroughputExecutor`](https://parsl.readthedocs.io/en/stable/stubs/parsl.executors.HighThroughputExecutor.html?highlight=HighThroughputExecutor) that parallelizes jobs across multiple processes. -```{note} +```{note} Your config MUST contain an executor with the label default. This is the executor that QIIME 2 will dispatch your jobs to if you do not specify an executor to use. The default executor in the default config is the ThreadPoolExecutor meaning that unless you specify otherwise all jobs that use the default config will run on the ThreadPoolExecutor. @@ -288,37 +288,3 @@ with ParallelConfig(parallel_config=config, action_executor_mapping=mapping): # Make sure to call _result inside of the context manager result = future._result() ``` - -````{admonition} Note for parallel Pipeline developers -:class: warning -If you have something like this in a pipeline: - -```python -try: - result1, result2 = some_action(*args) -except SomeException: - do.something() -``` - -You must call `_result()` on the return value from `some_action` in the try/except block. -This is necessary to allow users to run your pipeline in parallel. -If you do not do this, and a user attempts to run your pipeline in parallel, it will most likely fail. - -```python -try: - results = some_action(*args) - result1, result2 = results._result() -except SomeException: - do.something() -``` - -The reason this needs to be done is a bit technical. -Basically, if the pipeline is being executed in parallel, the return value from the action will be a future that will eventually resolve into your results when the parallel thread returns. -Calling `._result()` blocks the main thread and waits for results before proceeding. - -If you do not call `_result()` in the try/except, the future will most likely resolve into results after the main Python thread has exited the try/except block. -This will lead to the exception not being caught because it is now actually being raised outside of the try/except. - -This is a bit confusing, as parallelism often is. -We tried hard to ensure that developers wouldn't need to change anything about their pipelines to parallelize them, but we did need to make this one concession. -```` diff --git a/book/plugins/how-to-guides/handle-exceptions-in-parallel-pipelines.md b/book/plugins/how-to-guides/handle-exceptions-in-parallel-pipelines.md new file mode 100644 index 00000000..b5ef30ec --- /dev/null +++ b/book/plugins/how-to-guides/handle-exceptions-in-parallel-pipelines.md @@ -0,0 +1,32 @@ +# Handling exceptions in parallel Pipelines + +In developing parallel computing support in QIIME 2, we tried to minimize the edits that are required to existing Pipelines to enable them to run in parallel. +In the code we developed in the [plugin tutorial](plugin-tutorial-parallel-pipeline), for example, the modifications we made were primarily to support the splitting and combining steps - we didn't add anything to explicitly integrate parallel computing. +There is one minor exception to this though. + +If you have code that looks like the following in a Pipeline that you want to run in parallel: + +```python +try: + result1, result2 = some_action(*args) +except SomeException: + do.something() +``` + +You must call `_result()` on the return value from `some_action` in the try/except block: + +```python +try: + results = some_action(*args) + result1, result2 = results._result() +except SomeException: + do.something() +``` + +If you do not do this, a parallel run of your Pipeline will most likely crash if `SomeException` is raised. + +The reason for this is that when the Pipeline is run in parallel, the return value from `some_action` will be a [Future](https://parsl.readthedocs.io/en/stable/userguide/futures.html) that will eventually resolve into your actual results when the parallel processes complete. +Calling `._result()` blocks the main thread and waits for results before proceeding from the try/except block. + +If you do not call `_result()` in the try block, the Future will most likely resolve into results after the main Python thread has exited the try/except block. +This will lead to the exception not being caught, because it is now actually being raised outside of the try/except. diff --git a/book/plugins/tutorials/add-parallel-pipeline.md b/book/plugins/tutorials/add-parallel-pipeline.md index 55fe9297..46ca4d7e 100644 --- a/book/plugins/tutorials/add-parallel-pipeline.md +++ b/book/plugins/tutorials/add-parallel-pipeline.md @@ -1,3 +1,4 @@ +(plugin-tutorial-parallel-pipeline)= # Add a Pipeline with parallel computing support In this chapter we'll add a second {term}`Pipeline` to our plugin, and then we'll add parallel computing support to that `Pipeline`. @@ -429,9 +430,3 @@ When I do this, I observe the following run times associated with the data prove ::: :::: - - - - - -