Merge remote-tracking branch 'origin/master' into benc-stdstreams-bugs

Parsl · Oct 22, 2024 · 6dc3a8c · 6dc3a8c
2 parents 6dde5a4 + 410d2cb
commit 6dc3a8c
Show file tree

Hide file tree

Showing 37 changed files with 666 additions and 199 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -10,7 +10,7 @@ jobs:
   main-test-suite:
     strategy:
       matrix:
-        python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
+        python-version: ["3.9", "3.10", "3.11", "3.12"]
     runs-on: ubuntu-20.04
     timeout-minutes: 60
 
@@ -60,7 +60,7 @@ jobs:
         export PARSL_TEST_PRESERVE_NUM_RUNS=7
 
         make test
-        ln -s .pytest/parsltest-current test_runinfo
+        ln -s pytest-parsl/parsltest-current test_runinfo
 
     - name: Documentation checks
       run: |
@@ -80,11 +80,11 @@ jobs:
         # database manager log file or monitoring router log file. It would be better if
         # the tests themselves failed immediately when there was a monitoring error, but
         # in the absence of that, this is a dirty way to check.
-        bash -c '! grep ERROR .pytest/parsltest-current/runinfo*/*/database_manager.log'
-        bash -c '! grep ERROR .pytest/parsltest-current/runinfo*/*/monitoring_router.log'
+        bash -c '! grep ERROR pytest-parsl/parsltest-current/runinfo*/*/database_manager.log'
+        bash -c '! grep ERROR pytest-parsl/parsltest-current/runinfo*/*/monitoring_router.log'
 
         # temporary; until test-matrixification
-        rm -f .pytest/parsltest-current test_runinfo
+        rm -f pytest-parsl/parsltest-current test_runinfo
 
     - name: Checking parsl-visualize
       run: |
@@ -105,6 +105,6 @@ jobs:
         name: runinfo-${{ matrix.python-version }}-${{ steps.job-info.outputs.as-ascii }}-${{ github.sha }}
         path: |
           runinfo/
-          .pytest/
+          pytest-parsl/
           ci_job_info.txt
         compression-level: 9
diff --git a/.gitignore b/.gitignore
@@ -63,6 +63,7 @@ coverage.xml
 *.cover
 .hypothesis/
 /.pytest/
+/pytest-parsl/
 
 # Translations
 *.mo

diff --git a/Makefile b/Makefile
@@ -84,7 +84,7 @@ radical_local_test:
 
 .PHONY: config_local_test
 config_local_test: $(CCTOOLS_INSTALL)
-	pip3 install ".[monitoring,visualization,proxystore]"
+	pip3 install ".[monitoring,visualization,proxystore,kubernetes]"
 	PYTHONPATH=/tmp/cctools/lib/python3.8/site-packages pytest parsl/tests/ -k "not cleannet" --config local --random-order --durations 10
 
 .PHONY: site_test

diff --git a/README.rst b/README.rst
@@ -1,6 +1,6 @@
 Parsl - Parallel Scripting Library
 ==================================
-|licence| |build-status| |docs| |NSF-1550588| |NSF-1550476| |NSF-1550562| |NSF-1550528| |CZI-EOSS|
+|licence| |docs| |NSF-1550588| |NSF-1550476| |NSF-1550562| |NSF-1550528| |CZI-EOSS|
 
 Parsl extends parallelism in Python beyond a single computer.
 
@@ -43,9 +43,6 @@ then explore the `parallel computing patterns <https://parsl.readthedocs.io/en/s
 .. |licence| image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg
    :target: https://github.com/Parsl/parsl/blob/master/LICENSE
    :alt: Apache Licence V2.0
-.. |build-status| image:: https://github.com/Parsl/parsl/actions/workflows/ci.yaml/badge.svg
-   :target: https://github.com/Parsl/parsl/actions/workflows/ci.yaml
-   :alt: Build status
 .. |docs| image:: https://readthedocs.org/projects/parsl/badge/?version=stable
    :target: http://parsl.readthedocs.io/en/stable/?badge=stable
    :alt: Documentation Status
@@ -120,7 +117,7 @@ For Developers
 Requirements
 ============
 
-Parsl is supported in Python 3.8+. Requirements can be found `here <requirements.txt>`_. Requirements for running tests can be found `here <test-requirements.txt>`_.
+Parsl is supported in Python 3.9+. Requirements can be found `here <requirements.txt>`_. Requirements for running tests can be found `here <test-requirements.txt>`_.
 
 Code of Conduct
 ===============

diff --git a/codemeta.json b/codemeta.json
@@ -191,8 +191,8 @@
         "name": "The Python Package Index",
         "url": "https://pypi.org"
     },
-    "runtimePlatform": "Python 3.8",
+    "runtimePlatform": "Python 3.9",
     "url": "https://github.com/Parsl/parsl",
     "developmentStatus": "active",
-    "programmingLanguage": "Python :: 3.8"
+    "programmingLanguage": "Python :: 3.9"
 }
diff --git a/docs/faq.rst b/docs/faq.rst
@@ -209,7 +209,7 @@ For instance, with conda, follow this `cheatsheet <https://conda.io/docs/_downlo
    source activate <my_env>
 
    # Install packages:
-   conda install <ipyparallel, dill, boto3...>
+   conda install <dill, boto3...>
 
 
 How do I run code that uses Python2.X?

diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -10,7 +10,7 @@ Installation
 
 Parsl is available on `PyPI <https://pypi.org/project/parsl/>`_ and `conda-forge <https://anaconda.org/conda-forge/parsl>`_. 
 
-Parsl requires Python3.8+ and has been tested on Linux and macOS.
+Parsl requires Python3.9+ and has been tested on Linux.
 
 
 Installation using Pip
@@ -31,7 +31,7 @@ Installation using Conda
 
 1. Create and activate a new conda environment::
 
-     $ conda create --name parsl_py38 python=3.8
+     $ conda create --name parsl_py38 python=3.9
      $ source activate parsl_py38
 
 2. Install Parsl::
@@ -236,7 +236,7 @@ for reporting purposes.
 
 As an NSF-funded project, our ability to track usage metrics is important for continued funding. 
 
-You can opt-in by setting ``usage_tracking=True`` in the configuration object (`parsl.config.Config`). 
+You can opt-in by setting ``usage_tracking=3`` in the configuration object (`parsl.config.Config`). 
 
 To read more about what information is collected and how it is used see :ref:`label-usage-tracking`.
 

diff --git a/docs/reference.rst b/docs/reference.rst
@@ -78,6 +78,16 @@ Executors
     parsl.executors.FluxExecutor
     parsl.executors.radical.RadicalPilotExecutor
 
+Manager Selectors
+=================
+
+.. autosummary::
+    :toctree: stubs
+    :nosignatures:
+
+    parsl.executors.high_throughput.manager_selector.RandomManagerSelector
+    parsl.executors.high_throughput.manager_selector.BlockIdManagerSelector
+
 Launchers
 =========
 

diff --git a/docs/userguide/usage_tracking.rst b/docs/userguide/usage_tracking.rst
@@ -1,82 +1,171 @@
 .. _label-usage-tracking:
 
-Usage statistics collection
+Usage Statistics Collection
 ===========================
 
-Parsl uses an **Opt-in** model to send usage statistics back to the Parsl development team to
-measure worldwide usage and improve reliability and usability. The usage statistics are used only for
-improvements and reporting. They are not shared in raw form outside of the Parsl team.
-
+Parsl uses an **Opt-in** model for usage tracking, allowing users to decide if they wish to participate. Usage statistics are crucial for improving software reliability and help focus development and maintenance efforts on the most used components of Parsl. The collected data is used solely for enhancements and reporting and is not shared in its raw form outside of the Parsl team.
 
 Why are we doing this?
 ----------------------
 
-The Parsl development team receives support from government funding agencies. For the team to continue to
-receive such funding, and for the agencies themselves to argue for funding, both the team and the agencies
-must be able to demonstrate that the scientific community is benefiting from these investments. To this end,
-it is important that we provide aggregate usage data about such things as the following:
+The Parsl development team relies on funding from government agencies. To sustain this funding and advocate for continued support, it is essential to show that the research community benefits from these investments.
+
+By opting in to share usage data, you actively support the ongoing development and maintenance of Parsl. (See:ref:`What is sent? <what-is-sent>` below).
+
+Opt-In Model
+------------
+
+We use an **opt-in model** for usage tracking to respect user privacy and provide full control over shared information. We hope that developers and researchers will choose to send us this information. The reason is that we need this data - it is a requirement for funding.
 
-* How many people use Parsl
-* Average job length
-* Parsl exit codes
+Choose the data you share with Usage Tracking Levels.
 
-By participating in this project, you help justify continuing support for the software on which you rely.
-(see :ref:`What is sent? <what-is-sent>` below).
+**Usage Tracking Levels:**
 
-Opt-In
-------
+* **Level 1:** Only basic information such as Python version, Parsl version, and platform name (Linux, MacOS, etc.)
+* **Level 2:** Level 1 information and configuration information including provider, executor, and launcher names.
+* **Level 3:** Level 2 information and workflow execution details, including the number of applications run, failures, and execution time.
 
-We have chosen opt-in collection rather than opt-out with the hope that developers and researchers
-will choose to send us this information. The reason is that we need this data - it is a requirement for funding.
+By enabling usage tracking, you support Parsl's development. 
 
-By opting-in, and allowing these statistics to be reported back, you are explicitly supporting the
-further development of Parsl.
+**To opt-in, set** ``usage_tracking`` **to the desired level (1, 2, or 3) in the configuration object** (``parsl.config.Config``) **.**
 
-If you wish to opt in to usage reporting, set ``usage_tracking=True`` in the configuration object (`parsl.config.Config`).
+Example:
 
+.. code-block:: python3
+
+    config = Config(
+        executors=[
+            HighThroughputExecutor(
+                ...
+            )
+        ],
+        usage_tracking=3
+    )
 
 .. _what-is-sent:
 
 What is sent?
 -------------
 
-* IP address
-* Run UUID
-* Start and end times
-* Number of executors used
-* Number of failures
-* Parsl and Python version
-* OS and OS version
-
+The data collected depends on the tracking level selected:
+
+* **Level 1:** Only basic information such as Python version, Parsl version, and platform name (Linux, MacOS, etc.)
+* **Level 2:** Level 1 information and configuration information including provider, executor, and launcher names.
+* **Level 3:** Level 2 information and workflow execution details, including the number of applications run, failures, and execution time.
+
+**Example Messages:**
+
+- At launch:
+
+  .. code-block:: json
+
+    {
+       "correlator":"6bc7484e-5693-48b2-b6c0-5889a73f7f4e",
+       "parsl_v":"1.3.0-dev",
+       "python_v":"3.12.2",
+       "platform.system":"Darwin",
+       "tracking_level":3,
+       "components":[
+          {
+             "c":"parsl.config.Config",
+             "executors_len":1,
+             "dependency_resolver":false
+          },
+          "parsl.executors.threads.ThreadPoolExecutor"
+       ],
+       "start":1727156153
+    }
+
+- On closure (Tracking Level 3 only):
+
+  .. code-block:: json
+
+    {
+       "correlator":"6bc7484e-5693-48b2-b6c0-5889a73f7f4e",
+       "execution_time":31,
+       "components":[
+          {
+             "c":"parsl.dataflow.dflow.DataFlowKernel",
+             "app_count":3,
+             "app_fails":0
+          },
+          {
+             "c":"parsl.config.Config",
+             "executors_len":1,
+             "dependency_resolver":false
+          },
+          "parsl.executors.threads.ThreadPoolExecutor"
+       ],
+       "end":1727156156
+    }
+
+**All messages sent are logged in the** ``parsl.log`` **file, ensuring complete transparency.**
 
 How is the data sent?
 ---------------------
 
-The data is sent via UDP. While this may cause us to lose some data, it drastically reduces the possibility
-that the usage statistics reporting will adversely affect the operation of the software.
+Data is sent using **UDP** to minimize the impact on workflow performance. While this may result in some data loss, it significantly reduces the chances of usage tracking affecting the software's operation.
 
+The data is processed through AWS CloudWatch to generate a monitoring dashboard, providing valuable insights into usage patterns.
 
 When is the data sent?
 ----------------------
 
-The data is sent twice per run, once when Parsl starts a script, and once when the script is completed.
+Data is sent twice per run:
 
+1. At the start of the script.
+2. Upon script completion (for Tracking Level 3).
 
 What will the data be used for?
 -------------------------------
 
-The data will be used for reporting purposes to answer questions such as:
+The data will help the Parsl team understand Parsl usage and make development and maintenance decisions, including:
+
+* Focus development and maintenance on the most-used components of Parsl.
+* Determine which Python versions to continue supporting.
+* Track the age of Parsl installations.
+* Assess how long it takes for most users to adopt new changes.
+* Track usage statistics to report to funders.
+
+Usage Statistics Dashboard
+--------------------------
 
-* How many unique users are using Parsl?
-* To determine patterns of usage - is activity increasing or decreasing?
+The collected data is aggregated and displayed on a publicly accessible dashboard. This dashboard provides an overview of how Parsl is being used across different environments and includes metrics such as:
 
-We will also use this information to improve Parsl by identifying software faults.
+* Total workflows executed over time
+* Most-used Python and Parsl versions
+* Most common platforms and executors and more
 
-* What percentage of tasks complete successfully?
-* Of the tasks that fail, what is the most common fault code returned?
+`Find the dashboard here <https://cloudwatch.amazonaws.com/dashboard.html?dashboard=Parsl-Usage-Tracking-Stats&context=eyJSIjoidXMtZWFzdC0xIiwiRCI6ImN3LWRiLTA0Njc5ODQ4MjQwNiIsIlUiOiJ1cy1lYXN0LTFfNW41R1BwYVd0IiwiQyI6IjN2bzJmbzAxYnI1dm92YjY2dGEwcmo2dmNkIiwiSSI6InVzLWVhc3QtMTplMjYyZGZkMy05NjI2LTQ4YTMtYjBkOC1jYWYwYWU1NzA4M2EiLCJPIjoiYXJuOmF3czppYW06OjA0Njc5ODQ4MjQwNjpyb2xlL3NlcnZpY2Utcm9sZS9DV0RCU2hhcmluZy1QdWJsaWNSZWFkT25seUFjY2Vzcy1UTlBOMk5COSIsIk0iOiJQdWJsaWMifQ==&start=PT3H&end=null>`_
 
+Leaderboard
+-----------
+
+**Opting in to usage tracking also allows you to participate in the Parsl Leaderboard.
+To participate in the leaderboard, you can deanonymize yourself using the** ``project_name`` **parameter in the parsl configuration object** (``parsl.config.Config``) **.**
+
+`Find the Parsl Leaderboard here <https://cloudwatch.amazonaws.com/dashboard.html?dashboard=Parsl-Usage-Tracking-Stats&context=eyJSIjoidXMtZWFzdC0xIiwiRCI6ImN3LWRiLTA0Njc5ODQ4MjQwNiIsIlUiOiJ1cy1lYXN0LTFfNW41R1BwYVd0IiwiQyI6IjN2bzJmbzAxYnI1dm92YjY2dGEwcmo2dmNkIiwiSSI6InVzLWVhc3QtMTplMjYyZGZkMy05NjI2LTQ4YTMtYjBkOC1jYWYwYWU1NzA4M2EiLCJPIjoiYXJuOmF3czppYW06OjA0Njc5ODQ4MjQwNjpyb2xlL3NlcnZpY2Utcm9sZS9DV0RCU2hhcmluZy1QdWJsaWNSZWFkT25seUFjY2Vzcy1UTlBOMk5COSIsIk0iOiJQdWJsaWMifQ==&start=PT3H&end=null>`_
+
+Example:
+
+.. code-block:: python3
+
+    config = Config(
+        executors=[
+            HighThroughputExecutor(
+                ...
+            )
+        ],
+        usage_tracking=3,
+        project_name="my-test-project"
+    )
+
+Every run of parsl with usage tracking **Level 1** or **Level 2** earns you **1 point**. And every run with usage tracking **Level 3**, earns you **2 points**.
+
 Feedback
 --------
 
-Please send us your feedback at [email protected]. Feedback from our user communities will be
+Please send us your feedback at [email protected]. Feedback from our user communities will be 
 useful in determining our path forward with usage tracking in the future.
+
+**Please consider turning on usage tracking to support the continued development of Parsl.**
diff --git a/mypy.ini b/mypy.ini
@@ -137,12 +137,6 @@ ignore_missing_imports = True
 [mypy-copy_reg.*]
 ignore_missing_imports = True
 
-[mypy-ipyparallel.*]
-ignore_missing_imports = True
-
-[mypy-ipython_genutils.*]
-ignore_missing_imports = True
-
 [mypy-cmreslogging.handlers.*]
 ignore_missing_imports = True
 

diff --git a/parsl/config.py b/parsl/config.py
@@ -83,6 +83,9 @@ class Config(RepresentationMixin, UsageInformation):
         Setting this field to 0 will disable usage tracking. Default (this field is not set): usage tracking is not enabled.
         Parsl only collects minimal, non personally-identifiable,
         information used for reporting to our funding agencies.
+    project_name: str, optional
+        Option to deanonymize usage tracking data.
+        If set, this value will be used as the project name in the usage tracking data and placed on the leaderboard.
     initialize_logging : bool, optional
         Make DFK optionally not initialize any logging. Log messages
         will still be passed into the python logging system under the
@@ -118,6 +121,7 @@ def __init__(self,
                  max_idletime: float = 120.0,
                  monitoring: Optional[MonitoringHub] = None,
                  usage_tracking: int = 0,
+                 project_name: Optional[str] = None,
                  initialize_logging: bool = True) -> None:
 
         executors = tuple(executors or [])
@@ -154,6 +158,7 @@ def __init__(self,
         self.max_idletime = max_idletime
         self.validate_usage_tracking(usage_tracking)
         self.usage_tracking = usage_tracking
+        self.project_name = project_name
         self.initialize_logging = initialize_logging
         self.monitoring = monitoring
         self.std_autopath: Optional[Callable] = std_autopath