Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade prefect 1.* to 2.13 #374

Merged
merged 38 commits into from
Nov 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
a3575bf
Hide python flow files before converting
annshress Oct 10, 2023
7806a3e
Update common files for prefect2 refactor
annshress Oct 11, 2023
8d6feba
Refactor conftest and config
annshress Oct 11, 2023
3d141a1
Refactor brt workflow for prefect2
annshress Oct 12, 2023
f78112e
Refactor lrg workflow for prefect2
annshress Oct 12, 2023
2d5f10a
Refactor sem workflow for prefect2
annshress Oct 12, 2023
5dc276b
Refactor dm workflow for prefect2
annshress Oct 12, 2023
184c1f5
Refactor czi workflow for prefect2
annshress Oct 12, 2023
0417b2f
Fix tests
annshress Oct 12, 2023
83d0b6e
Fix state change hooks and add test
annshress Oct 13, 2023
9dff0c1
Add state success/flow hooks for remaining flows
annshress Oct 13, 2023
6c768a4
Update prefect2 related docs
annshress Oct 27, 2023
e8a1725
Update helper-scripts for prefect2
annshress Nov 2, 2023
e6d1163
Update docs
annshress Oct 31, 2023
1a83086
Update tests
annshress Nov 6, 2023
0a64f6e
Update listener service
annshress Nov 7, 2023
a3815e3
Copy workdir files to assets
annshress Nov 7, 2023
14970d4
Minor update to pytest
annshress Nov 13, 2023
be7ea15
Update config for worker startup
annshress Nov 13, 2023
8f676f2
Refactor dm workflow
annshress Nov 16, 2023
bcfa29c
Update pytools for in-memory rechunk
annshress Nov 20, 2023
a72488e
Add subflow for czi workflows; Add copy dependencies on cleanup;
annshress Nov 15, 2023
4db0ec9
Make rechunk and copy zarr assets as separate tasks
annshress Nov 15, 2023
60874c4
Turn save and cleanup into a generic function
annshress Nov 16, 2023
831a18a
Add default and high slurm cluster specs
annshress Nov 17, 2023
950d1e1
Add copy workdir logs to assets
annshress Nov 20, 2023
551e410
Add test for copy working dir log files to asstes dir
annshress Nov 20, 2023
d289aaf
Remove redundant flow.py
annshress Nov 21, 2023
073ebc2
Remove filter callback function
annshress Nov 21, 2023
7b84983
Fix future.result() call for callback
annshress Nov 21, 2023
310f8b8
Fixes issues with dm test, large 2d test, brt, cleanup, adds workflow…
philipmac Nov 28, 2023
7fed8dc
fixing sem test
philipmac Nov 29, 2023
9037078
Update tests for czi and utils.py
annshress Nov 29, 2023
ad40b42
Adding Optionals to czi flow
philipmac Nov 29, 2023
3e7b163
Add flow run name generator
annshress Nov 29, 2023
e4492a9
Update gh-pages destination from main to dev
annshress Nov 29, 2023
f340aeb
Changes benchmarked inc speed ~1.5hrs -> 5mins
philipmac Nov 29, 2023
ca52c88
Convert flow name dm conversion to Small 2d
annshress Nov 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ name: Python Test and Package
on:
push:
branches:
- main
- dev
tags:
- "v*"
pull_request:
branches:
- main
- dev

env:
HEDWIG_ENV: dev
Expand Down Expand Up @@ -39,7 +39,7 @@ jobs:
name: sphinx-docs
path: docs/_build/html
- name: Update gh-pages
if: github.ref == 'refs/heads/main'
if: github.ref == 'refs/heads/dev'
run: |
rm docs/_build/html/.buildinfo
touch docs/_build/html/.nojekyll
Expand Down Expand Up @@ -102,7 +102,7 @@ jobs:
export PATH=$BIO2R_DIR/bioformats2raw-${BIO2R_V}/bin:$PATH
wget https://github.com/glencoesoftware/bioformats2raw/releases/download/v${BIO2R_V}/${BIO2R_Z} && sudo unzip ${BIO2R_Z} -d ${BIO2R_DIR} && rm -f ${BIO2R_Z}
python -m pip install --upgrade pip
pip install -e . -r requirements.txt
pip install -e . -r requirements-dev.txt

- name: Directory setup
run: |
Expand All @@ -115,7 +115,7 @@ jobs:
export BIO2R_DIR=/usr/local/BIO2R/bioformats2raw-${BIO2R_V}
export IMOD_DIR=/usr/local/IMOD
export PATH=$BIO2R_DIR/bin:$IMOD_DIR/bin:$PATH
python -m pytest -v --log-cli-level=INFO -m "not (slow or localdata)" --cov-report term --cov=.
python -m pytest -v --log-cli-level=INFO -m "not (slow or localdata)"

- name: Coverage Badge
uses: tj-actions/coverage-badge-py@v2
Expand Down
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,12 @@ dmypy.json

# tags
tags

# prefect artifacts
.prefectignore

# test assets
test/input_files/*/Assets/*

# Dask
dask-worker-space/
17 changes: 4 additions & 13 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,6 @@
Workflows related to project previously referred to as "Hedwig".

Please refer to Spinx Docs below for _Installation_, _Local Set-up_, _Sending Pull Requests_, and _Testing_.

_Sphinx Docs_: https://niaid.github.io/image_portal_workflows/
[![Test](https://github.com/niaid/image_portal_workflows/actions/workflows/main.yml/badge.svg)](https://github.com/niaid/image_portal_workflows/actions/workflows/main.yml)
[![GH Pages](https://github.com/niaid/image_portal_workflows/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/niaid/image_portal_workflows/actions/workflows/pages/pages-build-deployment)

Build/Test:

.. image:: https://github.com/mbopfNIH/image_portal_workflows/actions/workflows/main.yml/badge.svg?branch=main
:target: https://github.com/mbopfNIH/image_portal_workflows/actions/workflows/main.yml/badge.svg?branch=main
:alt: GitHub Action

Test Coverage:
Workflows related to project previously referred to as "Hedwig".

.. image:: ../../coverage.svg
Please refer to `Spinx Docs <https://niaid.github.io/image_portal_workflows/>`_ for Installation, Local Setup, Sending Pull Requests, and Testing.
8 changes: 8 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = _build
SPHINXAUTOBUILD = sphinx-autobuild

ALLSPHINXLIVEOPTS = $(ALLSPHINXOPTS) -q --port 0 --host 0.0.0.0 --open-browser --delay 1 --ignore "*.swp" --ignore "*.pdf" --ignore "*.log" --ignore "*.out" --ignore "*.toc" --ignore "*.aux" --ignore "*.idx" --ignore "*.ind" --ignore "*.ilg" --ignore "*.tex" --watch source source

# Put it first so that "make" without argument is like "make help".
help:
Expand All @@ -18,3 +21,8 @@ help:
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

livehtml:
@$(SPHINXAUTOBUILD) -b html $(ALLSPHINXLIVEOPTS) $(BUILDDIR)
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)."
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sphinx
sphinx-rtd-theme
-r ../requirements.txt
-r ../requirements-dev.txt
27 changes: 11 additions & 16 deletions docs/source/development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,20 @@ The docs related to development and testing are contained in this section.
Prerequisites
*************

Please read this Development section before cloning. The `generate_venv.sh` script referenced below
automates much of the process. Current system assumes that you are locally running either a Linux
or Mac OS.
Please read this Development section before cloning.
Current system assumes that you are locally running either a Linux or Mac OS.

Github.com
==========

The repository is located in the NIAID Github.com enterprise organization. Having a github.com account
which is a member of the NIAID organization is required.

pip install -e <path_to_clone>
git+https://github.com/niaid/tomojs-pytools.git@master in requirements.txt

Git LFS
=======

*We are aiming to use s3 (with rsync capabilities) over git-lfs for test data storage*

Git `Large File Storage <https://git-lfs.github.com>`_ (LFS) is used to store larger files in the repository such as
test images, trained models, and other data ( i.e. not text based code ). Before the repository is cloned, git lfs must
be installed on the system and set up on the users account. The `tool's documentation <https://git-lfs.github.com>`_
Expand Down Expand Up @@ -65,15 +63,17 @@ All ``pytest`` files reside in the `test` directory:
- ``test_dm``: 2D end-to-end pipeline test
- ``test_sem``: end-to-end test of FIBSEM pipeline
- ``test_lrg_2d``: Large 2d pipeline test
- ``test_czi``: Spatial Omics (IF - immunofluoroscene) pipeline test
- ``test_utils``: unit tests of utils/utils.py module

There is test data for `test_dm` in the Git repo, but not for the others. These files need to be
downloaded from the HPC machines. The following script will copy them:

`test/copy_test_data.sh`

These files are quite large, so you may want to run each line separately at the command line. Some unit tests also
require the results of previous ``test_brt`` runs, specifically in the Assets directory. So you must run ''test_brt''
These files are quite large, so you may want to run each line separately at the command line.

Some unit tests also require the results of previous ``test_brt`` runs, specifically in the Assets directory. So you must run ''test_brt''
before the complete test suite will work.

To run the entire test suite, in the portal_image_workflows directory, run::
Expand Down Expand Up @@ -162,12 +162,13 @@ Note that we are setting a USER environment variable here. This is because `clas

Once you are in the container, you can run the commands you want to. For example: `pytest`.

*************
HPC Set up
==========
*************

**NOTE, THIS IS ONLY relevant for HPC. Added for completeness.**

NOTE: *Similar to the HPC Set, you can locally set up `dev` and `qa` virtual envs. This step can be skipped for testing purposes.*
**NOTE: generate_vevn.sh is not used as of 08/11/2023, setup is documented in** :doc:`hpc`.

Workflows are currently run on RML HPC ("BigSky").

Expand All @@ -188,12 +189,6 @@ They were set up as follows:
conda deactivate


A script exists to help set up `dev`, `qa`, or `prod` environments in
`$HOME/code/hedwig/<HEDWIG_ENV>`
Insure `$HOME/code/hedwig` exists. Runs on Linux.


**Note**: generate_vevn.sh is not used as of 08/11/2023, setup is documented in `hpc.rst`.

.. code-block:: sh

Expand Down
46 changes: 9 additions & 37 deletions docs/source/hpc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ It's a good idea to test the ``ExecStart`` command can be run, eg:

``image_portal_workflows/helper_scripts/hedwig_reg_listen.sh listen``

The daemon by default polls prefect server in the given workpool and brings in the flow run details if something
has been submitted to the server.

To update:
----------
Expand All @@ -59,50 +61,20 @@ Upon promotion into HPC env do:
git checkout <label>
python -m pip install -e .

When there are changes in the workflows (e.g, a new task is added, task function signature has changed, etc), you should
redeploy the workflows. It can be done as follows:

Finally register the worlfows with the helper script.

``./helper_scripts/hedwig_reg_listen.sh register``
.. code-block::

cd ~/image_portal_workflows
./helper_scripts/hedwig_reg_listen.sh register


Spatialomics file layout.
-------------------------

Normally the dir structure is : ``$lab/$pi/$project/$session/$sample``

For Spatialomics this is not the case, the $sample is not really a sample, it's grouping of ROIs,
PreROIs, etc from each of the slides.
These grouping directories sit one down from session, in the place a ``$sample`` normally would be.

For a single unit of work (eg 8 slides) the set up looks like:

.. code-block::

$lab/$pi/$project/$session/Pre_ROI_Selection
$lab/$pi/$project/$session/Heatmaps
$lab/$pi/$project/$session/HQ_Images
$lab/$pi/$project/$session/ROI_Images


and eg:

.. code-block::

ls $lab/$pi/$project/$session/HQ_Images/
slide_1.czi, slide_2.czi, ...

and

.. code-block::

ls $lab/$pi/$project/$session/Pre_ROI_Selection
slide_1_Pre_ROI_a.png, slide_2_Pre_ROI_a.png, ...


Note:
Different outputs from different slides are split across different dirs.
The pipeline will not parse out or otherwise link slides together in any way.
Similarly there is no linking from from heatmaps to any specific slide.
For Spatialomics this is not the case, the $sample is not really a sample, it's grouping of ROIs, PreROIs, etc from each of the slides.

Also, other directories can exist in this directory, which will be ignored.
More details can be found in :ref:`ref-workflow-spatial-omics`.
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Image Portal Workflows's documentation
:caption: Contents:

development
workflows
workflows/index.rst
usage
api
hpc
2 changes: 0 additions & 2 deletions docs/source/neuroglancer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,3 @@ neuroglancer module
:members:
:undoc-members:
:show-inheritance:

.. autofunction:: gen_zarr(fp_in: FilePath, width: int, height: int, depth: int = None) -> Dict:
30 changes: 30 additions & 0 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
==================
Submitting Jobs
==================

NOTE: You need to be in NIH VPN to be able to submit jobs for your workflow runs.

Workflow servers are deployed in `development <https://prefect2.hedwig-workflow-api.niaiddev.net>`_, `qa <https://prefect2.hedwig-workflow-api.niaidqa.net>`_, and `production <https://prefect2.hedwig-workflow-api.niaidprod.net>`_ environments separately.

Manual Submission
-----------------

Depending on the environment, go to the deployment section. Click on the `Kebab menu` icon next to your desired deployment and do a `Custom Run`. Here you can add appropriate values for the fields and submit your job. For manual submission, do not forget to add `no_api: true` value.

CLI/SDK Submission
---------------

You can also use `curl` to submit a job. The data body you post is same as the manual submission. However, you also need to have the information of `Deployment ID` which you can get from the workflow deployment's detail page.
Once you have the deployment id and relevant job information, you can submit a job as,

.. code-block::

~ curl -X 'POST' \
'https://prefect2.hedwig-workflow-api.niaiddev.net/api/deployments/<DEPLOYMENT-ID>/create_flow_run' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"parameters": {"key": value, "key2": value2, ...}
}'

As same as curl, you can use any other SDK to submit a job. For example, you can use `axios` to `send a POST request <https://axios-http.com/docs/post_example>`_ to the workflow server to submit a job.
11 changes: 3 additions & 8 deletions docs/source/workflows.rst → docs/source/workflows/brt.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
=========
Workflows
=========
*********
BRT Workflow
*********

Overview:

Expand All @@ -12,11 +12,6 @@ A number of parameters can be passed to the "BRT" workflow. There are two types:
Inputs/Outputs
--------------

Within HPC, each environment (dev,qa,prod) has its own mount point. These are:
/mnt/ai-fas12/RMLEMHedwigDev/
/mnt/ai-fas12/RMLEMHedwigQA/
/mnt/ai-fas12/RMLEMHedwigProd/

There are inputs and outputs. We never write to the inputs directory, only to outputs.
The output dir is defined as input directory with s/Projects/Projects/. For example:

Expand Down
13 changes: 13 additions & 0 deletions docs/source/workflows/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
=========
Workflows
=========

Within HPC, each environment (dev,qa,prod) has its own mount point. These are:
/mnt/ai-fas12/RMLEMHedwigDev/
/mnt/ai-fas12/RMLEMHedwigQA/
/mnt/ai-fas12/RMLEMHedwigProd/

.. include:: brt.rst

.. _ref-workflow-spatial-omics:
.. include:: spatialomics.rst
41 changes: 41 additions & 0 deletions docs/source/workflows/spatialomics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
*********
Spatialomics Workflow
*********

Normally the dir structure is : ``$lab/$pi/$project/$session/$sample``

For Spatialomics this is not the case, the $sample is not really a sample, it's grouping of ROIs, PreROIs, etc from each of the slides.
These grouping directories sit one down from session, in the place a ``$sample`` normally would be.


For a single unit of work (eg 8 slides) the set up looks like

.. code-block::

$lab/$pi/$project/$session/Pre_ROI_Selection
$lab/$pi/$project/$session/Heatmaps
$lab/$pi/$project/$session/HQ_Images
$lab/$pi/$project/$session/ROI_Images


and eg:

.. code-block::

ls $lab/$pi/$project/$session/HQ_Images/
slide_1.czi, slide_2.czi, ...

and

.. code-block::

ls $lab/$pi/$project/$session/Pre_ROI_Selection
slide_1_Pre_ROI_a.png, slide_2_Pre_ROI_a.png, ...


Note:
Different outputs from different slides are split across different dirs.
The pipeline will not parse out or otherwise link slides together in any way.
Similarly there is no linking from from heatmaps to any specific slide.

Also, other directories can exist in this directory, which will be ignored.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading
Loading