linz · amfage · Aug 30, 2023 · Aug 30, 2023 · Aug 31, 2023 · Aug 31, 2023
@@ -18,17 +18,134 @@ This script sets up for the automated processing of numerous elevation datasets
 
 ```bash
 cd ./tools
+python3 -m venv .venv
+. .venv/bin/activate
+pip install pyyaml
 python3 generate-argo-cli-commands-elevation.py
 ```
 
 **Output:**
 
 - **region-year-datatype-scale.yaml:** workflow parameters for this dataset
-- **standardise-publish.sh:** bash script to 'deploy' argo workflows  
+- **standardise-publish.sh** bash script to 'deploy' argo workflows  
    **nb: the commented lines at the end of this file detail the datasets not run due to know issues.**
 
 **Submitting:**
 
 ```bash
 sh standardise-publish.sh
 ```
+
+## generate-argo-cli-commands-nz-imagery-publish-copy.py
+
+**Date:** 04/09/2023
+
+**Related Jira Tickets:** [TDE-801](https://toitutewhenua.atlassian.net/browse/TDE-801)
+
+**Description:**  
+This script sets up the commands for copying validly named imagery from `linz-imagery` to `nz-imagery` S3 buckets.
+
+**Instructions:**
+
+1. Run:
+
+```bash
+cd ./tools
+python3 -m venv .venv
+. .venv/bin/activate
+pip install pyyaml
+python3 generate-argo-cli-commands-nz-imagery-publish-copy.py
+```
+
+NB: Uncomment the following lines and log into the LINZ imagery account if you need the source STAC files:
+
+```_run_command(["git", "clone", """[email protected]:linz/imagery""", "./data/imagery/"], None)```
+```_run_command(["s5cmd", "cp", "s3://linz-imagery/catalog.json", "./data/imagery/stac/"], None)```
+
+**Output:**
+
+- **publish-region-year-datatype-scale.yaml:** workflow parameters for this dataset
+- **publish-copy.sh** bash script to submit argo workflows
+
+**Submitting:**
+
+```bash
+sh publish-copy.sh
+```
+
+## generate-argo-cli-commands-nz-imagery-restandardise.py
+
+**Date:** 04/09/2023
+
+**Related Jira Tickets:** [TDE-804](https://toitutewhenua.atlassian.net/browse/TDE-804)
+
+**Description:**  
+This script sets up the commands for the automated re-processing of already standardised imagery datasets using the argo cli.
+As is, this script will copy the output to the `linz-workflow-artifacts` S3 bucket.
+
+**Instructions:**
+
+1. Run:
+
+```bash
+cd ./tools
+python3 -m venv .venv
+. .venv/bin/activate
+pip install pyyaml
+python3 generate-argo-cli-commands-nz-imagery-restandardise.py
+```
+
+NB: Uncomment the following lines and log into the LINZ imagery account if you need the source STAC files:
+
+```_run_command(["git", "clone", """[email protected]:linz/imagery""", "./data/imagery/"], None)```
+```_run_command(["s5cmd", "cp", "s3://linz-imagery/catalog.json", "./data/imagery/stac/"], None)```
+
+**Output:**
+
+- **region-year-datatype-scale.yaml:** workflow parameters for this dataset
+- **standardise-publish.sh** bash script to submit argo workflows
+   **_errors_region-year-datatype-scale.yaml.txt** imagery sets that have issues and cannot be re-standardised as is e.g. invalid scale
+
+**Submitting:**
+
+```bash
+sh standardise-publish.sh
+```
+
+## generate-argo-cli-commands-nz-imagery-publish-after-restandardise.py
+
+**Date:** 04/09/2023
+
+**Related Jira Tickets:** [TDE-804](https://toitutewhenua.atlassian.net/browse/TDE-804)
+
+**Description:**  
+This script sets up the commands for copying imagery from `linz-workflow-artifacts` to `nz-imagery` S3 buckets.
+This is intended to be run after re-standardising imagery.
+
+**Instructions:**
+
+1. Run:
+
+```bash
+cd ./tools
+python3 -m venv .venv
+. .venv/bin/activate
+pip install pyyaml
+python3 generate-argo-cli-commands-nz-imagery-publish-after-standardise.py
+```
+
+NB: Uncomment the following lines and log into the LINZ imagery account if you need the source STAC files:
+
+```_run_command(["git", "clone", """[email protected]:linz/imagery""", "./data/imagery/"], None)```
+```_run_command(["s5cmd", "cp", "s3://linz-imagery/catalog.json", "./data/imagery/stac/"], None)```
+
+**Output:**
+
+- **publish-region-year-datatype-scale.yaml:** workflow parameters for this dataset
+- **publish.sh** bash script to 'deploy' argo workflows  
+
+**Submitting:**
+
+```bash
+sh publish.sh
+```
@@ -0,0 +1,87 @@
+import json
+import os
+import subprocess
+import yaml
+from typing import Dict, List, Union
+
+
+CATALOG_FILE = "./data/imagery/stac/catalog.json"
+COMMAND = "argo submit --from wftmpl/publish-copy -n argo -f ./{0}.yaml --generate-name {1}-\n"
+
+
+def _run_command(command: List[str], cwd: Union[str, None]) -> "subprocess.CompletedProcess[bytes]":
+    try:
+        proc = subprocess.run(
+            command,
+            cwd=cwd,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            check=True,
+        )
+    except subprocess.CalledProcessError as cpe:
+        raise cpe
+    return proc
+
+
+def _write_params(params: Dict[str, str], file: str) -> None:
+    with open(f"./publish-{file}.yaml", "w", encoding="utf-8") as output:
+        yaml.dump(
+            params,
+            output,
+            default_flow_style=False,
+            default_style='"',
+            sort_keys=False,
+            allow_unicode=True,
+            width=1000,
+        )
+
+def _tmp_source_edit(source: str) -> str:
+    if "2193/rgb" in source:
+        source = source.replace("2193/rgb", "rgb/2193")
+    source = os.path.join("s3://linz-workflow-artifacts/nz-imagery/", source.strip("./")).rstrip("collection.json")
+    return source
+
+def _tmp_target_edit(target: str) -> str:
+    if "2193/rgb" in target:
+        target = target.replace("2193/rgb", "rgb/2193")
+    return target.replace("s3://linz-workflow-artifacts/nz-imagery/", "s3://nz-imagery/")
+
+## Uncomment if you need to retrieve the STAC files
+# _run_command(["git", "clone", """[email protected]:linz/imagery""", "./data/imagery/"], None)
+## Need to be logged into imagery account to get the catalog.json file
+# _run_command(["s5cmd", "cp", "s3://linz-imagery/catalog.json", "./data/imagery/stac/"], None)
+
+
+with open(CATALOG_FILE, encoding="utf-8") as catalog:
+     catalog_json = json.loads(catalog.read())
+
+parameter_list = []
+
+for link in catalog_json["links"]:
+    if link["rel"] == "child":
+        data_errors = []
+        collection_link = os.path.abspath("./data/imagery/stac/" + link["href"])
+        with open(collection_link, encoding="utf-8") as collection:
+            collection_json = json.loads(collection.read())
+            source = _tmp_source_edit(link["href"])
+            target = _tmp_target_edit(source)
+
+            params = {
+                "source": source,
+                "target": target,
+                "include": ".tiff?$|.json$",
+                "group": "1000",
+                "group-size": "100Gi",
+            }
+
+            file_name = target.split("/")[-4:-2]
+            file_name = f"{file_name[0]}-{file_name[1]}"
+            formatted_file_name = file_name.replace("_", "-").replace(".", "-")
+
+            parameter_list.append(COMMAND.format(formatted_file_name, formatted_file_name))
+
+            _write_params(params, formatted_file_name)
+
+    with open("./publish.sh", "w") as script:
+        script.write("#!/bin/bash\n\n")
+        script.writelines(parameter_list)
@@ -0,0 +1,116 @@
+import json
+import os
+import subprocess
+import yaml
+from typing import Dict, List, Set, Union
+
+CATALOG_FILE = "./data/imagery/stac/catalog.json"
+COMMAND = "argo submit --from wftmpl/publish-odr -n argo -f ./publish-{0}.yaml --generate-name odr-{1}-\n"
+VALID_SCALES: Set[str] = {"500", "1000", "2000", "5000", "10000", "50000"}
+
+
+def _run_command(command: List[str], cwd: Union[str, None]) -> "subprocess.CompletedProcess[bytes]":
+    try:
+        proc = subprocess.run(
+            command,
+            cwd=cwd,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            check=True,
+        )
+    except subprocess.CalledProcessError as cpe:
+        raise cpe
+    return proc
+
+def _is_valid_scale(links: List[Dict[str, str]]) -> bool:
+    scales: List[str] = []
+    for link in links:
+        if link["rel"] == "item":
+            try:
+                scale = os.path.splitext(link["href"].split("_")[1])[0]
+                if scale in VALID_SCALES:
+                    if scale not in scales:
+                        scales.append(scale)
+                else:
+                    return None
+            except:
+                return False
+    if len(scales) != 1:
+        return None
+    return True
+
+def _write_params(params: Dict[str, str], file: str) -> None:
+    with open(f"./publish-{file}.yaml", "w", encoding="utf-8") as output:
+        yaml.dump(
+            params,
+            output,
+            default_flow_style=False,
+            default_style='"',
+            sort_keys=False,
+            allow_unicode=True,
+            width=1000,
+        )
+
+def _tmp_target_edit(target: str) -> str:
+    if "_0.10m/" in target:
+        target = target.replace("_0.10m/", "_0.1m/")
+    if "tauranga-city_2022_0.1m" in target:
+        target = target.replace("tauranga-city_2022_0.1m", "tauranga_2022_0.1m")
+    if "tauranga_winter_2022_0.1m" in target:
+        target = target.replace("tauranga_winter_2022_0.1m", "tauranga-winter_2022_0.1m")
+    if "christchurch-post-earthquake_24-february-2011_0.1m" in target:
+        target = target.replace("christchurch-post-earthquake_24-february-2011_0.1m", "christchurch-earthquake_2011_0.1m")
+    if "2193/rgb" in target:
+        target = target.replace("2193/rgb", "rgb/2193")
+    return target.replace("s3://linz-imagery/", "s3://nz-imagery/")
+
+## Uncomment if you need to retrieve the STAC files
+# _run_command(["git", "clone", """[email protected]:linz/imagery""", "./data/imagery/"], None)
+## Need to be logged into imagery account to get the catalog.json file
+# _run_command(["s5cmd", "cp", "s3://linz-imagery/catalog.json", "./data/imagery/stac/"], None)
+
+
+with open(CATALOG_FILE, encoding="utf-8") as catalog:
+     catalog_json = json.loads(catalog.read())
+
+parameter_list = []
+
+for link in catalog_json["links"]:
+    if link["rel"] == "child":
+        data_errors = []
+        collection_link = os.path.abspath("./data/imagery/stac/" + link["href"])
+        with open(collection_link, encoding="utf-8") as collection:
+            collection_json = json.loads(collection.read())
+            # TDE-854
+            # north-island_20221122_10m← Sentinel-2 Cyclone Gabrielle imagery (can we reformat date in dirname?)
+            # north-island_20230220_10m← Sentinel-2 Cyclone Gabrielle imagery (can we reformat date in dirname?)
+            # north-island_2023_0-5m ← don’t want to rename
+            # north-island_2023_10m ← do not copy
+            if "north-island_2023_10m" in link["href"]:
+                continue
+            elif "north-island" in link["href"]:
+                pass
+            elif not _is_valid_scale(collection_json["links"]):
+                continue
+            source = os.path.join("s3://linz-imagery/", link["href"].strip("./")).rstrip("collection.json")
+            target = _tmp_target_edit(source)
+
+            params = {
+                "source": source,
+                "target": target,
+                "include": ".tiff?$|.json$",
+                "group": "2000",
+                "group-size": "200Gi",
+            }
+
+            file_name = target.split("/")[-4:-2]
+            file_name = f"{file_name[0]}-{file_name[1]}"
+            formatted_file_name = file_name.replace("_", "-").replace(".", "-")
+
+            parameter_list.append(COMMAND.format(formatted_file_name, formatted_file_name))
+
+            _write_params(params, formatted_file_name)
+
+    with open("./publish-copy.sh", "w") as script:
+        script.write("#!/bin/bash\n\n")
+        script.writelines(parameter_list)