Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shuffle location of parallel try/except note #121

Merged
merged 1 commit into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ parts:
- file: plugins/how-to-guides/test-plugins
- file: plugins/how-to-guides/usage-examples
- file: plugins/how-to-guides/format-validation-levels
- file: plugins/how-to-guides/handle-exceptions-in-parallel-pipelines
- file: plugins/explanations/intro
sections:
- file: plugins/explanations/actions
Expand Down
38 changes: 2 additions & 36 deletions book/framework/how-to-guides/parallel-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

```{note}
This is more of an advanced user or system administrator usage document.
[This is slated to move](https://github.com/caporaso-lab/developing-with-qiime2/issues/29) to the new general-purpose user documentation.
[This is slated to move](https://github.com/caporaso-lab/developing-with-qiime2/issues/29) to the new general-purpose user documentation.
```

QIIME 2 supports parallelization of pipelines through [Parsl](https://parsl.readthedocs.io/en/stable/1-parsl-introduction.html>).
Expand Down Expand Up @@ -68,7 +68,7 @@ The [Parsl documentation](https://parsl.readthedocs.io/en/stable/) provides full
Briefly, we create a [`ThreadPoolExecutor`](https://parsl.readthedocs.io/en/stable/stubs/parsl.executors.ThreadPoolExecutor.html?highlight=Threadpoolexecutor) that parallelizes jobs across multiple threads in a process.
We also create a [`HighThroughputExecutor`](https://parsl.readthedocs.io/en/stable/stubs/parsl.executors.HighThroughputExecutor.html?highlight=HighThroughputExecutor) that parallelizes jobs across multiple processes.

```{note}
```{note}
Your config MUST contain an executor with the label default.
This is the executor that QIIME 2 will dispatch your jobs to if you do not specify an executor to use.
The default executor in the default config is the ThreadPoolExecutor meaning that unless you specify otherwise all jobs that use the default config will run on the ThreadPoolExecutor.
Expand Down Expand Up @@ -288,37 +288,3 @@ with ParallelConfig(parallel_config=config, action_executor_mapping=mapping):
# Make sure to call _result inside of the context manager
result = future._result()
```

````{admonition} Note for parallel Pipeline developers
:class: warning
If you have something like this in a pipeline:

```python
try:
result1, result2 = some_action(*args)
except SomeException:
do.something()
```

You must call `_result()` on the return value from `some_action` in the try/except block.
This is necessary to allow users to run your pipeline in parallel.
If you do not do this, and a user attempts to run your pipeline in parallel, it will most likely fail.

```python
try:
results = some_action(*args)
result1, result2 = results._result()
except SomeException:
do.something()
```

The reason this needs to be done is a bit technical.
Basically, if the pipeline is being executed in parallel, the return value from the action will be a future that will eventually resolve into your results when the parallel thread returns.
Calling `._result()` blocks the main thread and waits for results before proceeding.

If you do not call `_result()` in the try/except, the future will most likely resolve into results after the main Python thread has exited the try/except block.
This will lead to the exception not being caught because it is now actually being raised outside of the try/except.

This is a bit confusing, as parallelism often is.
We tried hard to ensure that developers wouldn't need to change anything about their pipelines to parallelize them, but we did need to make this one concession.
````
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Handling exceptions in parallel Pipelines

In developing parallel computing support in QIIME 2, we tried to minimize the edits that are required to existing Pipelines to enable them to run in parallel.
In the code we developed in the [plugin tutorial](plugin-tutorial-parallel-pipeline), for example, the modifications we made were primarily to support the splitting and combining steps - we didn't add anything to explicitly integrate parallel computing.
There is one minor exception to this though.

If you have code that looks like the following in a Pipeline that you want to run in parallel:

```python
try:
result1, result2 = some_action(*args)
except SomeException:
do.something()
```

You must call `_result()` on the return value from `some_action` in the try/except block:

```python
try:
results = some_action(*args)
result1, result2 = results._result()
except SomeException:
do.something()
```

If you do not do this, a parallel run of your Pipeline will most likely crash if `SomeException` is raised.

The reason for this is that when the Pipeline is run in parallel, the return value from `some_action` will be a [Future](https://parsl.readthedocs.io/en/stable/userguide/futures.html) that will eventually resolve into your actual results when the parallel processes complete.
Calling `._result()` blocks the main thread and waits for results before proceeding from the try/except block.

If you do not call `_result()` in the try block, the Future will most likely resolve into results after the main Python thread has exited the try/except block.
This will lead to the exception not being caught, because it is now actually being raised outside of the try/except.
7 changes: 1 addition & 6 deletions book/plugins/tutorials/add-parallel-pipeline.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(plugin-tutorial-parallel-pipeline)=
# Add a Pipeline with parallel computing support

In this chapter we'll add a second {term}`Pipeline` to our plugin, and then we'll add parallel computing support to that `Pipeline`.
Expand Down Expand Up @@ -429,9 +430,3 @@ When I do this, I observe the following run times associated with the data prove
:::

::::






Loading