From 1c6b4ba73c5d22ba37fcd81570383a0f7396bead Mon Sep 17 00:00:00 2001 From: cmarshak Date: Thu, 23 Mar 2023 12:37:30 -0700 Subject: [PATCH 1/5] contribution instructions --- README.md | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 52 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 54b19e94..cf47db16 100644 --- a/README.md +++ b/README.md @@ -154,7 +154,58 @@ The easiest way to see what was the *last* working build, check out the docker [ 2. Load the image and get into interactive mode e.g. `docker run --entrypoint /usr/bin/bash -it --rm ghcr.io/access-cloud-based-insar/dockerizedtopsapp:0.2.2.dev136_ga2d5389 -l` 3. Check the packages `conda list | grep xarray` -## FAQ +# Contribution Instructions + +This is an open-source plugin and we welcome contributions. Because we use this plugin for producing publicly available datasets, there is some additional tests and requirements for any new features to be integrated. + +## Installation for Development + + +1. Clone or fork this repo. If you are a member of ACCESS or working on new features of this plugin, ask to become a member of this Github organization. The integration tests will be easier to run if you are pushing to branches of this repository as opposed to a fork (require organization secrets). This will ensure new features are more quickly integrated particularly into hyp3. +2. Navigate with your terminal to the repo. +3. Create a new environment and install requirements using `conda env update --file environment.yml` +4. Activate the environment: `conda activate topsapp_env` +5. Install the package from cloned repo using `python -m pip install -e .` +6. Create a new branch with your feature. + +## Tests + +There are some integration tests. They are to make sure the wrapping part of the plugin, e.g. downloading metadata, a simulated CMR handshake, and packaging, occur as expected. However, the plugin takes about 1.5 to 3 hours (depending on the number of corrections requested) to generate a final GUNW. Therefore, these integration are *not* sufficient to permit merging into main. Until we have a complete end-to-end test of the workflow, any new feature cannot be integrated. As a first step, it is imperative to share the output of a new feature (i.e. the GUNW netcdf file). This is important because the integration tests do not check that ISCE2 successfully ran nor did the complex processing scripts. There are a lot of large libraries that need to be built and run successfully before this can be integrated. The CI/CD below ensures correct building of the python enviroment and Docker image. However, there can be subsequent errors in say ISCE2 even if the environment was built correctly. + +## Hyp3 and Cloud Submission + +There are two branches: `dev` and `main`. The former is the "test" branch and the later is the "production" branch. When submitting jobs to Hyp3 there are two job types related to `INSAR_ISCE_TEST` and `INSAR_ISCE`. The former corresponds to the `dev` branch and the latter to `main`. This allows us to test the plugin *in the cloud* incrementally before transitioning new features to production. This is a design feature of Hyp3 (thanks, Joseph Kennedy!). Since end-to-end tests are required as noted above, this ensures that our production branch is always well tested. + +## Versions + +There are several versions to keep track of: + +1. *Dataset Version*: this is the GUNW version and incremented manually only when the GUNW has changed with relation to the end-user. This is in the GUNW name, i.e. `S1-GUNW-D-R-042-tops-20220429_20210528-140902-00123W_00034N-PP-e677-v2_0_5` has version 2.0.5. It is manually set [here](https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/blob/0b219faac0c4e8c11ef610d3b35e7a8cb9a30511/isce2_topsapp/packaging.py#L17) +2. *Software Version*: Automatically tracked and incremented in CI/CD on merges to the production branch (`main`) +3. *Container Version*: Automatically tracked and incremented in CI/D on merges to `dev` (with `test` tag) and to `main` (with `latest` tag). The various images can be downloaded fromt he Packages sidebar ([link](https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/pkgs/container/dockerizedtopsapp)). + + +## CI/CD + +The CI/CD allows us to be more efficient both in terms of tracking the releases and pushing container images to the cloud for actual processing. Each merge to `main` or `dev` creates a new docker image that is published through the github registry. The `test` tag is the latest image built to `dev`. The `latest` tag is the most up-to-date `main` release for production. + +The plugin's *software* version is governed by `main`. Any merge into `dev` is seen as a test release so until `dev` is merged into `main`, the release number will not increment. ASF CI/CD requires any merge into `dev` and `main` to update the changelog. The changelog version is described via (Semantic Versioning)[https://semver.org/] (i.e. Major.Minor.Patch). Our CI/CD uses labels to automatically detect whether changelog needs to be updated. If you use `bumpless` and only CI/CD workflows or documentation has been changed, the software version remains the same. Otherwise, use `major`, `minor`, or `patch` + +## PR Requirements + +1. If the software changes, you must use: `major`, `minor`, or `patch` labels on a PR. For PRs to `dev` this is not strictly necessary. However, for PRs to `main`, this is **required** and determines how the software is incremented. For example, if the current release is `0.1.0`, a tag of `patch` will increment the next release to `0.1.1` and a tag of `major` will increment the release to `1.0.0`. The fact that the changelog is updated is checked, but has to be manually checked upon PRs to main to ensure proper versioning. + +2. Any PR to `main` requires the consistent `vXX.XX.XX` in the PR title. Using regular expressions in the github actions, the release number in the title is extracted and tagged to the relevant branch for the software release. + +In summary, for PRs to main, the PR label (`major`/`minor`/`patch`), the changelog, and the PR title should all reflect the upcoming release version correctly. If not, there are some manual fixes. + +### Possible Fixes + +1. Forget to correct the changelog? Go through the PR process to update it correctly +2. Forget to title a PR to main with the release version? See this [comment](https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/pull/114#issuecomment-1478362852). The comment is in a PR to `main` without the release title. + + +# FAQ 1. The docker build is taking a long time. From 2dfd936a8ae25f551cbed651691e192f50aba3b0 Mon Sep 17 00:00:00 2001 From: cmarshak Date: Thu, 23 Mar 2023 12:43:18 -0700 Subject: [PATCH 2/5] ignore readme --- .trufflehog.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/.trufflehog.txt b/.trufflehog.txt index d0b503c0..f82a6a86 100644 --- a/.trufflehog.txt +++ b/.trufflehog.txt @@ -5,3 +5,4 @@ Dockerfile tests/test_data/sec_manifest.xml tests/test_data/ref_manifest.xml tests/test_data/ +README.md From 5bfdfb965f0df0b70d2ad9a1c98e73232deeb618 Mon Sep 17 00:00:00 2001 From: Charlie Marshak Date: Tue, 30 Jan 2024 13:55:39 -0800 Subject: [PATCH 3/5] link to cicd notes and shorten --- README.md | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 45613a4c..09551f4d 100644 --- a/README.md +++ b/README.md @@ -190,13 +190,20 @@ This is an open-source plugin and we welcome contributions. Because we use this ## Installation for Development -1. Clone or fork this repo. If you are a member of ACCESS or working on new features of this plugin, ask to become a member of this Github organization. The integration tests will be easier to run if you are pushing to branches of this repository as opposed to a fork (require organization secrets). This will ensure new features are more quickly integrated particularly into hyp3. +1. Clone or fork this repo. If you are a member of ACCESS or working on new features of this plugin, ask to become a member of this Github organization. The integration tests will be easier to run if you are pushing to branches of this repository as opposed to a fork (require organization secrets). This will ensure new features are more quickly integrated particularly into hyp3. Otherwise, please add secrets to your repository as indicate below. 2. Navigate with your terminal to the repo. 3. Create a new environment and install requirements using `conda env update --file environment.yml` 4. Activate the environment: `conda activate topsapp_env` 5. Install the package from cloned repo using `python -m pip install -e .` 6. Create a new branch with your feature. +**Note**: because forked repositories do not have access to the repositories secrets, you will have to add secrets to your repository. You will need to add your Earthdata username and password to the secrets as: + +``` +EARTHDATA_USERNAME=... +EARTHDATA_PASSWORD=... +``` + ## Tests There are some integration tests. They are to make sure the wrapping part of the plugin, e.g. downloading metadata, a simulated CMR handshake, and packaging, occur as expected. However, the plugin takes about 1.5 to 3 hours (depending on the number of corrections requested) to generate a final GUNW. Therefore, these integration are *not* sufficient to permit merging into main. Until we have a complete end-to-end test of the workflow, any new feature cannot be integrated. As a first step, it is imperative to share the output of a new feature (i.e. the GUNW netcdf file). This is important because the integration tests do not check that ISCE2 successfully ran nor did the complex processing scripts. There are a lot of large libraries that need to be built and run successfully before this can be integrated. The CI/CD below ensures correct building of the python enviroment and Docker image. However, there can be subsequent errors in say ISCE2 even if the environment was built correctly. @@ -214,24 +221,22 @@ There are several versions to keep track of: 3. *Container Version*: Automatically tracked and incremented in CI/D on merges to `dev` (with `test` tag) and to `main` (with `latest` tag). The various images can be downloaded fromt he Packages sidebar ([link](https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/pkgs/container/dockerizedtopsapp)). -## CI/CD +## Releases (Docker containers and software) -The CI/CD allows us to be more efficient both in terms of tracking the releases and pushing container images to the cloud for actual processing. Each merge to `main` or `dev` creates a new docker image that is published through the github registry. The `test` tag is the latest image built to `dev`. The `latest` tag is the most up-to-date `main` release for production. +The release workflows within the (ASF) CI/CD allows us to be more efficient both in terms of tracking the releases and pushing container images for cloud processing. Each merge to `main` or `dev` creates a new docker image that is published through the github registry. The `test` tag is the latest image built to `dev`. The `latest` tag is the most up-to-date `main` release for production. We can view various containers [here](https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/pkgs/container/dockerizedtopsapp/versions). -The plugin's *software* version is governed by `main`. Any merge into `dev` is seen as a test release so until `dev` is merged into `main`, the release number will not increment. ASF CI/CD requires any merge into `dev` and `main` to update the changelog. The changelog version is described via (Semantic Versioning)[https://semver.org/] (i.e. Major.Minor.Patch). Our CI/CD uses labels to automatically detect whether changelog needs to be updated. If you use `bumpless` and only CI/CD workflows or documentation has been changed, the software version remains the same. Otherwise, use `major`, `minor`, or `patch` +The plugin's *software* version is governed by git tags and semantic versioning. The software version is incremented by merges to `main` (according to `major`/`minor`/`patch` labels). Any merge into `dev` is a (patch) test release so until `dev` is merged into `main`, the release number will correctly not increment. Software version can be seen via `import isce2_topsapp; isce2_topsapp.__version__`. -## PR Requirements - -1. If the software changes, you must use: `major`, `minor`, or `patch` labels on a PR. For PRs to `dev` this is not strictly necessary. However, for PRs to `main`, this is **required** and determines how the software is incremented. For example, if the current release is `0.1.0`, a tag of `patch` will increment the next release to `0.1.1` and a tag of `major` will increment the release to `1.0.0`. The fact that the changelog is updated is checked, but has to be manually checked upon PRs to main to ensure proper versioning. +When releasing a new version i.e. merging `dev` into `main`, ASF CI/CD requires an update the changelog with correct version that will be captured by the `major`/`minor`/`patch` bump. If you use the label `bumpless` and only CI/CD workflows or documentation has been changed, the software version remains the same. Otherwise, use `major`, `minor`, or `patch`. For many more details see these [notes](https://github.com/ACCESS-Cloud-Based-InSAR/CICD-notes). -2. Any PR to `main` requires the consistent `vXX.XX.XX` in the PR title. Using regular expressions in the github actions, the release number in the title is extracted and tagged to the relevant branch for the software release. +## PR Requirements -In summary, for PRs to main, the PR label (`major`/`minor`/`patch`), the changelog, and the PR title should all reflect the upcoming release version correctly. If not, there are some manual fixes. +All new features will first be merged into `dev` before being released (and merged into `main`). -### Possible Fixes +1. If the software changes, please use: `major`, `minor`, or `patch` labels on a PR for maintainers to track the type of changes you are making +2. Update the changelog so that the bump (depending on `major`/`minor`/`patch`) is consistent with what the next release will hold (you may be including your changes in a larger release) -1. Forget to correct the changelog? Go through the PR process to update it correctly -2. Forget to title a PR to main with the release version? See this [comment](https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/pull/114#issuecomment-1478362852). The comment is in a PR to `main` without the release title. +Here are more detailed [notes](https://github.com/ACCESS-Cloud-Based-InSAR/CICD-notes). # FAQ @@ -240,7 +245,7 @@ In summary, for PRs to main, the PR label (`major`/`minor`/`patch`), the changel *Answer*: Make sure the time is spent with `conda/mamba` not copying data files. The `.dockerignore` file should ignore ISCE2 data files (if you are running some examples within this repo directory, there will be GBs of intermediate files). It's crucial you don't include unnecessary ISCE2 intermediate files into the Docker image as this will bloat it. -2. Need to install additional packages such as vim? +2. Need to install additional packages such as vim in your container? *Answer*: Login as root user to the container and install the additional packages. From b5b0a8564b88d31312cf9ef341f01048a20ede99 Mon Sep 17 00:00:00 2001 From: Charlie Marshak Date: Tue, 30 Jan 2024 14:03:59 -0800 Subject: [PATCH 4/5] manual checks --- README.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/README.md b/README.md index 09551f4d..c3b52551 100644 --- a/README.md +++ b/README.md @@ -229,15 +229,25 @@ The plugin's *software* version is governed by git tags and semantic versioning. When releasing a new version i.e. merging `dev` into `main`, ASF CI/CD requires an update the changelog with correct version that will be captured by the `major`/`minor`/`patch` bump. If you use the label `bumpless` and only CI/CD workflows or documentation has been changed, the software version remains the same. Otherwise, use `major`, `minor`, or `patch`. For many more details see these [notes](https://github.com/ACCESS-Cloud-Based-InSAR/CICD-notes). + ## PR Requirements All new features will first be merged into `dev` before being released (and merged into `main`). 1. If the software changes, please use: `major`, `minor`, or `patch` labels on a PR for maintainers to track the type of changes you are making 2. Update the changelog so that the bump (depending on `major`/`minor`/`patch`) is consistent with what the next release will hold (you may be including your changes in a larger release) +3. Please share a sample GUNW (and the command to generate it) so that the team can understand how the final product has been changed - unfortunately the test suite does not ensure the workflow will complete end-to-end. Here are more detailed [notes](https://github.com/ACCESS-Cloud-Based-InSAR/CICD-notes). +### Additional (manual) checks + +Even though there are some integration tests and unit tests in our test suites, this CPU intensive workflow cannot be tested end-to-end using github actions (there is simply not enough memory and CPU on a github runner). +Therefore, even if all the tests pass, there is still a nontrivial chance a GUNW is not successfully generated (e.g. the `topo` step of topsapp fizzles out because `numpy` API was not successfully tracked in the latest ISCE2 release). +We request a sample GUNW be shared in a PR. +Even if we go through careful accounting, once a PR is merged into `dev`, we will use `hyp3` to further inspect the new plugin e.g. through this [notebook](https://github.com/ACCESS-Cloud-Based-InSAR/s1_frame_enumerator/blob/dev/notebooks/Submitting_to_Hyp3.ipynb) using a few sites. +It is important to use `INSAR_ISCE_TEST` job to ensure the features from the `dev` branch are used. +Only after these **manual** checks, will we continue with a release of the plugin. # FAQ From 58e68c08c49f79949f39393c4be78ebaf77e78d2 Mon Sep 17 00:00:00 2001 From: Charlie Marshak Date: Tue, 30 Jan 2024 14:20:47 -0800 Subject: [PATCH 5/5] link to sample notebooks --- README.md | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c3b52551..bf35575c 100644 --- a/README.md +++ b/README.md @@ -204,9 +204,28 @@ EARTHDATA_USERNAME=... EARTHDATA_PASSWORD=... ``` +## Developing portions of the workflow + +The main entrypoint is in the `__main__.py` file [here](https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp/blob/dev/isce2_topsapp/__main__.py). +The whole standard `++slc` workflow that generates the `ARIA-S1-GUNW` product is summarized in this file. +The workflow takes 1.5 - 3 hours to complete. +For developing a new feature, we need to modify a very small parts of this long running workflow e.g. the packaging of ISCE2 outputs into a netcdf file, staging/preparation of auxiliary data, etc. +Thus, for development it is not necessary to rerun the workflow each time to test a new feature, but utilize the intermediate ISCE data files and fix relevant parts of the code. +We have some sample [notebooks](https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsapp-Debugging-NBs) to load the relevant metadata to make this "jumping" into the code slightly easier. +We note that `ISCE2` generates *a lot* of interemdiate files. +For our workflow, this can be between 100-150 GBs of disk required. +So be warned! + ## Tests -There are some integration tests. They are to make sure the wrapping part of the plugin, e.g. downloading metadata, a simulated CMR handshake, and packaging, occur as expected. However, the plugin takes about 1.5 to 3 hours (depending on the number of corrections requested) to generate a final GUNW. Therefore, these integration are *not* sufficient to permit merging into main. Until we have a complete end-to-end test of the workflow, any new feature cannot be integrated. As a first step, it is imperative to share the output of a new feature (i.e. the GUNW netcdf file). This is important because the integration tests do not check that ISCE2 successfully ran nor did the complex processing scripts. There are a lot of large libraries that need to be built and run successfully before this can be integrated. The CI/CD below ensures correct building of the python enviroment and Docker image. However, there can be subsequent errors in say ISCE2 even if the environment was built correctly. +We have a test suite, but it is far from complete. +The passing of the test suite *do not guarantee successful generation of a GUNW!*. +The test suite helps make sure portions of the cloud workflow, e.g. downloading metadata, a simulated CMR handshake, and packaging, occur as expected. +However, the plugin takes about 1.5 to 3 hours (depending on the number of corrections requested) to generate a final GUNW. +Therefore, these integration are *not* sufficient to permit a new release (see more instructions below). +Until we have a complete end-to-end test of the workflow (via Hyp3), any new feature cannot be integrated into official production (i.e. the `main` branch). +As a first step, it is imperative to share the output of a new feature (i.e. the GUNW file and the command to generate it). + ## Hyp3 and Cloud Submission