Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

take any nvidia-smi exception as not having gpu #4611

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

glennhickey
Copy link
Contributor

Toil runs nvidia-smi to count the available GPUs, and if the command fails, it catches it and returns 0. This is fine, but only when it fails with an exception that Toil expects. This came up once before in ComparativeGenomicsToolkit/cactus#937 where an unhandled permission error wasn't caught, which completely blocked the user from running Cactus. This got fixed in #4391 by adding a check for PermissionError.

But now somebody's run into what looks like the exact same issue again: ComparativeGenomicsToolkit/cactus#1185. This time it looks like they're getting a NotADirectoryError when running nvidia-smi. They don't want to use GPUs but the failing exception completely breaks Cactus...

Anyway, this PR just makes the check return gpus=0 (and not crash) for any exception which, as a user, I think is more desirable.

Changelog Entry

To be copied to the draft changelog by merger:

  • More robust nvidia-smi check

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to see what not catching misspelled variables does for us in terms of surprise problems. Hopefully it causes fewer than nvidia-smi finding new and innovative ways to not work.

@adamnovak
Copy link
Member

I've pulled this into issues/4611-tolerate-broken-nvidia-smi for testing.

@adamnovak
Copy link
Member

I'm going to merge this without waiting for a re-test because I don't see how it could fight with the config file work.

@adamnovak adamnovak merged commit ef2b923 into DataBiosphere:master Oct 5, 2023
2 checks passed
misterbrandonwalker pushed a commit to misterbrandonwalker/toil that referenced this pull request Oct 12, 2023
michael-kotliar added a commit to michael-kotliar/toil that referenced this pull request May 14, 2024
* Update docs to hide Mesos (#4413)

* Update docs to hide Mesos

* address review comments

* remove invisible characters?

* replace mesos in more places

* Document Kubernetes-managed autoscaling, with in-workflow Mesos autoscaling as deprected

* Reword some documentation and messages

* Chase out more Mesoses

* Don't insist on processes actually running promptly in parallel

* Ask for a compatible set of Sphinx packages

* Keep back astroid

We can't use astroid 3 until sphinx-autoapi releases a fix for
https://github.com/readthedocs/sphinx-autoapi/issues/392

---------

Co-authored-by: Adam Novak <[email protected]>

* Avoid concurrent modification in cluster scaler tests (#4600)

This will fix #4599 by making the mock leader thread safe.

* Add String to File functionality into toil-wdl-runner (#4589)

* monkeypatch coerce for workflow related nodes

* Fix task inputs string coerce

* Disable kubernetes

* Comment out cwl kubernetes

* Maybe markers are wrong and comment out cactus-on-kubernetes

* Add docstrings to changed functions + change input list to dict

* Deal with nonetype

---------

Co-authored-by: Adam Novak <[email protected]>

* Separate out integration tests to run on a schedule (#4612)

* Reorganize tests and move integration tests to scheduled pipeline runs

* Also handle tags

* Add config file support (#4569)

* Centralize defaults

* Add requirements

* Grab logLevel

grabbed logLevel used to be the default in Config(), so grab effective
logLevel that is set

* Satisfy mypy

mypy might still complain about missing stubs for configargparser
though

* Fix wrong default

* add config tool

* temp fix

config sets defaults but so does argparse, runs twice in workflows but
deals with tests

* Fix create_config for tests instead

* Fix setting of config defaults

* Go back to previous method, create defaults at init

* Fix default cli options set

* Centralize, config util, and read properly

* Fix type hinting to support 3.9

* mypy

* Fix cwl edge case

* Fix tests

* fix typos, always generate config, fix some tests

* Remove subprocess as maybe tests are flaky on CI with it?

* just run quick_test_offline

* make CI print stuff

* Harden default config creation against races

* Cleanup and argument renaming

* Fix bad yaml and toil status bug

* Fix mypy

* Change behavior of --stats and --clean

* Change test behavior as options namespace and config now have the same
behavior

* Put forgotten line

ouch

* Batchsystem, requirements, fixes for tests

* Mypy conformance

* Mypy conformance

* Fix retryCount argument and kubernetesPodTimeout type

* Only run batchsystem and slurm_test tests on CI

* Whoops, this implementation never worked

* Add pyyaml to requirements for slurm to pass

* Add rest of gitlab CI back and run all tests

* Update stub file to be compatible with updated mypy

* Fix environment CLI option

* Update provisioner test to use configargparse

* Code cleanup and add jobstore_as_flag to DefaultArgumentParser etc

* Fix toil config test

* Add suggestions

* Deprecate options, add underscore CLI options only for newly deprecated options

* Update docs/argparse help and fix bug with deprecated options
also make most generic arg as default for runLocalJobsOnWorkers

* Add config file section to docs

* Remove upper bound for ruamel requirements

* Remove redundancies and improve disableCaching's destination name

* Update src/toil/batchSystems/kubernetes.py

Co-authored-by: Adam Novak <[email protected]>

* Remove redundant line in status util

* Remove comments in configargparse stub

* Workaround to get action=append instead of nargs and get proper backwards compatibility
Fix wrong name for link_imports and move_exports, remove new unused functions

* Import SYS_MAX_SIZE from common rather than duplicating it

* Mypy and syntax errors

* Move config options back to the old naming syntax

* Change names for link_imports and move_exports to camelCase options

* Fix formatting

* Bring back old --restart and --clean functionality where they collide and raise an error

* Make debug less spammy and remove unused types

* Disable kubernetes temporarily

* Revert changes to --restart and --clean collision

* Typo in tests

* Change some comments and add member fields to config

* Fix pickling error when jobstate file doesnt exist and fix threading error when lock file exists then disappears (#4575)

Co-authored-by: Brandon Walker <[email protected]>
Co-authored-by: Adam Novak <[email protected]>

* Reduce the number of assert statements (#4590)

* Change all asserts to raising errors for central toil files

Co-authored-by: Adam Novak <[email protected]>

* Fix mypy and update docs to match options in common

* Update src/toil/common.py

Co-authored-by: Adam Novak <[email protected]>

---------

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: Brandon Walker <[email protected]>
Co-authored-by: Brandon Walker <[email protected]>

* take any nvidia-smi exception as not having gpu (#4611)

Co-authored-by: Adam Novak <[email protected]>

* Make WDLOutputJob collect all task outputs (#4602)

Co-authored-by: Adam Novak <[email protected]>

* Ensure sibling files in toil-wdl-runner (#4610)

* Ensure sibling files stay sibling files when downloaded

* Fix incorrect argument order

* Fix directory collisions with sibling files

* Make sure the `--batchLogsDir` exists if it is set (#4635)

* Make sure the batch logs dir exists if it is set

* Test Slurm with nonexistent --batchLogsDir

* Upgrade cwltool to avoid broken galaxy-tool-util release. (#4639)

Fixes: https://github.com/DataBiosphere/toil/issues/4638

* cwl: use the latest commit from the proposed CWL v1.2.1 branch (#4565)

* Report errors in WDL using MiniWDL's error location printer (#4637)

* Report errors in WDL using MiniWDL's error location printer

* Decorate actual tasks with fancy WDL error reporting

* Slap WDL error reporting on main

* Remove banned ignore comment

* Support Python3.11 and drop Python 3.7 (#4646)

* Remove python 3.7 and add python 3.11 and make python3.11 the main python package

* Move main python package back to 3.9

* Incude python3.11 in docker

* Test 3.11 in CI

* Add python3.11 to CI dockerfile

* Add 3.11 to setup.py and debugging statements

* Python 3.7 backwards compatibility

* Update to py 3.12 and run 3.12 on gitlab CI

* Comment out fstring and try importlib

* Debug lint

* Ensure mypy is using python3.12

* Print python version beofre mypy

* Fix virtualenv, pip for python3.12

* Get rid of mesos tests/builds

* 3.12

* Revert debug change

* Go back to 3.11 and update docker package to make requests work again

* use an available htcondor package closest to 3.10 version

* update htcondor for all

* get pip for all python versions

* get virtualenv for all python versions

* needs specific ordering

* Separate mesos tests

* remove 3.7 from CI image

* Remove debug statement from makefile

* Fix configargparse in CWL (#4618)

* Parse config file separately from rest of args

* Mypy

* update configargparse stub

* Dont try to eat cwl arguments

* Use simpler workaround

* Revert to just CWL

* Change REMAINDER to "*", add help statements and test command line inputs

* Remove extradockergroup name

* Declare type

* Add proper relative path to cwl file

* Remove unnecessary test

---------

Co-authored-by: Adam Novak <[email protected]>

* Update ruamel-yaml requirement from <0.17.33,>=0.15 to >=0.15,<0.18.4 (#4659)

Updates the requirements on [ruamel-yaml]() to permit the latest version.

---
updated-dependencies:
- dependency-name: ruamel-yaml
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix CI Appliance Builds (#4655)

* Properly build 3.11, fix dependencies and move aws stubs/mock into dev

* only keep htcondor installs in appliance builds

* Remove unused import

* Fix extras_require syntax

* Fix #3867 and try to explain but not crash when bad things happen to our mutex file (#4656)

* Bump mypy from 1.5.1 to 1.6.1 (#4660)

* Bump mypy from 1.5.1 to 1.6.1

Bumps [mypy](https://github.com/python/mypy) from 1.5.1 to 1.6.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.5.1...v1.6.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* type fix

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael R. Crusoe <[email protected]>

* Move around reqs and move aws dev libraries to aws (#4664)

* Turn batch system tests back on (#4649)

This should fix #4648 by turning on the batch system tests again. The Mesos-specific ones are already moved elsewhere.

Co-authored-by: Lon Blauvelt <[email protected]>

* Bump miniwdl from 1.10.0 to 1.11.1 (#4669)

Bumps [miniwdl](https://github.com/chanzuckerberg/miniwdl) from 1.10.0 to 1.11.1.
- [Release notes](https://github.com/chanzuckerberg/miniwdl/releases)
- [Commits](https://github.com/chanzuckerberg/miniwdl/compare/v1.10.0...v1.11.1)

---
updated-dependencies:
- dependency-name: miniwdl
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Move TES batch system to a plugin (#4650)

* Implement new batch system finding API and plugin scan

* Satisfy MyPy

* Implement deprecation for the old constants

* Get plugin loader to actually load, and drop TES

* Remove TES Kubernetes setup we don't use

* Stop asking for needs_tes

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* skip unwanted networkx version (#4450)

* skip unwanted networkx version

* Limit to released major versions of networkx

---------

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* CWL Pipefish compatibility (#4636)

* Add a bunch of value resolving logging

* Quiet debugging a bit

* Move default setting for workflows so it works on subworkflows

* Remember to keep making a ToilFsAccess on the leader

* Satisfy MyPy

* Stop giving CWL containers directories full of broken symlinks

* Update test to expect no symlinks

* Move CWL integration tests for bioconda/biocontainers to integration test runs

* Wrap mkdtemp to fix #4644

* Sort imports in example scripts

* Use absolute-ized paths for work and coordination directories

* Bump cwltool from 3.1.20231020140205 to 3.1.20231114134824 (#4685)

Bumps [cwltool](https://github.com/common-workflow-language/cwltool) from 3.1.20231020140205 to 3.1.20231114134824.
- [Release notes](https://github.com/common-workflow-language/cwltool/releases)
- [Commits](https://github.com/common-workflow-language/cwltool/compare/3.1.20231020140205...3.1.20231114134824)

---
updated-dependencies:
- dependency-name: cwltool
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump mypy from 1.6.1 to 1.7.0 (#4684)

* Bump mypy from 1.6.1 to 1.7.0

Bumps [mypy](https://github.com/python/mypy) from 1.6.1 to 1.7.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.6.1...v1.7.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* mypy 1.7.0 type updates

* format modified files

* remove unused imports

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael R. Crusoe <[email protected]>

* Remove the parasol batch system. (#4678)

Co-authored-by: Adam Novak <[email protected]>

* Reenable Cactus on Kubernetes CI test (#4604)

* Reenable kubernetes tests that don't require a local cluster, eg CWL on
ARM and Cactus integration on kubernetes

* Disable CWL kubernetes

* enable cactus tests

* Add to scheduled integration tests

* Add forgotten file

* Remove print statements

* Remove unnecessary env var and move file

* Run test when updated

Co-authored-by: Adam Novak <[email protected]>

* update gitlab

* Fix typo in path

* Add virtualenv and prepare build to gitlab CI to run tests properly

* add gitlab setup scripts

* add gitlab setup scripts

---------

Co-authored-by: Adam Novak <[email protected]>

* Only count output file usage when using the file store (#4692)

* Bump mypy from 1.7.0 to 1.7.1 (#4697)

Bumps [mypy](https://github.com/python/mypy) from 1.7.0 to 1.7.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.7.0...v1.7.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* AWS jobStoreTest: re-use delete_s3_bucket from toil.lib.aws (#4700)

Ignore errors when cleaning up the FileJobStoreTest

* Make sure cwltool always knows we have an outdir to fix #4698 (#4699)

* remove useage of the deprecated pkg_resources (#4701)

setup.py: make clear that Python 3.7 is no longer supported

Co-authored-by: Lon Blauvelt <[email protected]>

* more resiliancy (#4395)

* Support CWL 1.2.1 (#4682)

* cwl: use the latest commit from the proposed CWL v1.2.1 branch
* Double default CWL conformance test timeout
* Support abs path for directory outputs
* Better comment for why local paths are permitted
* add relax-path-checks to CI tests

---------

Co-authored-by: Michael R. Crusoe <[email protected]>
Co-authored-by: Michael R. Crusoe <[email protected]>

* Remove the WDL compiler. (#4679)

* Remove the WDL compiler.

* Linting.

* Update WDL stand-alone.

* Weird linting error?

* Cut compiler docs

* Stop trying to run removed WDL compiler tests

---------

Co-authored-by: Adam Novak <[email protected]>

* Allow working with remote files in CWL and WDL workflows (#4690)

* Start implementing real ToilFsAccess URL operations

* Implement URL opening for CWL

* Implement other ToilFsAccess operations without local copies

* Remove getSize spelling and pass mypy

* Add missing import

* Remove check for extremely old setuptools

* Add --reference-inputs option to toil-cwl-runner

* Allow files to be gotten by URI on the nodes

* Add some tests to exercise URL references

* Implement URI access and import logic in WDL interpreter

* Remove duplicated test

* Fixc some merge problems

* Satisfy MyPy

* Spell default correctly

* Actually hook up import bypass flag

* Actually pass self test when using URLs

* Make file job store volunteer for non-schemed URIs

* Revert "Make file job store volunteer for non-schemed URIs"

This reverts commit 3d1e8f6761bd29f5bfedfd055f025943ab6ed1b8.

* Handle size requests for bare filenames

* Handle polling for URL existence

* Add a make test_debug target for getting test logs

* Add more logging to CWL streaming tests

* Contemplate multi-threaded access to the CachingFileStore from user code

* Allow downloading URLs in structures, and poll AWS directory existence right

* Update tests to a Debian with ARM Docker images

* Undo permission changes

* Add missing import

---------

Co-authored-by: Michael R. Crusoe <[email protected]>

* upgrade to cwltool 3.1.20231207110929 (#4707)

Co-authored-by: Michael R. Crusoe <[email protected]>

* Update docker requirement from <7,>=3.7.2 to >=3.7.2,<8 (#4713)

Updates the requirements on [docker](https://github.com/docker/docker-py) to permit the latest version.
- [Release notes](https://github.com/docker/docker-py/releases)
- [Commits](https://github.com/docker/docker-py/compare/3.7.2...7.0.0)

---
updated-dependencies:
- dependency-name: docker
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Implement a better config file system for CWL/WDL options (#4666)

* Strip leading whitespace from WDL commands (#4720)

* Strip leading whitespace from WDL commands

* Work around MiniWDL's wrong type

* Add __init__.py to options folder (#4723)

* Make cwl mutually exclusive groups exist only when cwl is not suppressed (#4725)

* Point CI at the new public URLs for stuff we host

* Bump mypy from 1.7.1 to 1.8.0 (#4731)

Bumps [mypy](https://github.com/python/mypy) from 1.7.1 to 1.8.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.7.1...v1.8.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Tolerate a failed AMI polling attempt (#4727)

* Tolerate a failed AMI polling attempt

* Start marking Internet-relates tests to keep them out of the offline step

* Update flake8 requirement from <7,>=3.8.4 to >=3.8.4,<8 (#4738)

Updates the requirements on [flake8](https://github.com/pycqa/flake8) to permit the latest version.
- [Commits](https://github.com/pycqa/flake8/compare/3.8.4...7.0.0)

---
updated-dependencies:
- dependency-name: flake8
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix --printJobInfo (#4709)

* Add a test for --printJobInfo

* Move file name listing into the FileJobStore so it can sort of work again

* Fix Toil subcommand usage to include the subcommand

* Satisfy MyPy

* Fix =True syntax and find files even when their jobs are gone or they are no-job

* Add a test for actually rerunning a job

* Make the test for running a job alone pass

* Address review comments

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* remove extraneous dependency on old 'mock' (#4739)

'mock' has been integrated in the standard library as 'unittest.mock'

* Improve WDL documentation (#4732)

* Fix code block boundary

* Make the CWL quickstart the main one

* Talk about Python workflows instead of user scripts

* Chase away all the Sphinx warnings so we know the docs should look right

* Fail the docs build if the docstrings don't parse cleanly

* Encourage installing with cwl and wdl extras

* Qualify Python development

* Reorganize docs to plug the workflow languages more

* Talk a bit about WDL

* Add conformance test and install info

* Stop trying to draw inheritance diagrams since RtD doesn't give us a dot anyway

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Fix scheduled CI tests (#4742)

* Actually filter to Mesos tests

Also run Mesos tests of we touch Mesos.

It looks like https://github.com/DataBiosphere/toil/pull/4646 added a
bunch of Mesos test run steps but didn't include tests= so they just run
all tests, even if the dependencies aren't there.

* Don't import boto when it may not be installed

* Stop pinning very old setuptools and pyyaml

This basically reverts 60096d89eb7233b2791000da87a9754399fcb9c4 and
should let us use a setuptools that is new enough for the Python
versions we are using.

* Run all tests on -fix-ci branches

* Put Mesos AWS tests in the Mesos step

* Improve WDL documentation (#4732)

* Fix code block boundary

* Make the CWL quickstart the main one

* Talk about Python workflows instead of user scripts

* Chase away all the Sphinx warnings so we know the docs should look right

* Fail the docs build if the docstrings don't parse cleanly

* Encourage installing with cwl and wdl extras

* Qualify Python development

* Reorganize docs to plug the workflow languages more

* Talk a bit about WDL

* Add conformance test and install info

* Stop trying to draw inheritance diagrams since RtD doesn't give us a dot anyway

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Indent docstring to fix doc build failure

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Update EC2 instances and EC2 update script. (#4745)

* Update EC2 instances and EC2 update script.

* Minor details.

* Clean up.

* Linting.

* Ignore a perfectly good import.

---------

Co-authored-by: Adam Novak <[email protected]>

* Log more usefully for CWL workflows (#4736)

* Log files going in and out and the various CWL workflow phases

* Log CWL job executions to the leader just as text; replace logToMaster

* Log runtime context name

* Revise other logging messages to improve CWL logs

* Fix test to allow trailing newline

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Don't mark inputs (or outputs) executable for no reason (#4728)

* Be explicit about executable representation

* Add testing to make sure outputs aren't unexpecteldy executable

* Let js expressions in the scatters take a long time to start Node

---------

Co-authored-by: Lon Blauvelt <[email protected]>

* Bump cwltool from 3.1.20231207110929 to 3.1.20240112164112 (#4751)

Bumps [cwltool](https://github.com/common-workflow-language/cwltool) from 3.1.20231207110929 to 3.1.20240112164112.
- [Release notes](https://github.com/common-workflow-language/cwltool/releases)
- [Commits](https://github.com/common-workflow-language/cwltool/compare/3.1.20231207110929...3.1.20240112164112)

---
updated-dependencies:
- dependency-name: cwltool
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update flake8-bugbear requirement from <24,>=20.11.1 to >=20.11.1,<25 (#4752)

Updates the requirements on [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear) to permit the latest version.
- [Release notes](https://github.com/PyCQA/flake8-bugbear/releases)
- [Commits](https://github.com/PyCQA/flake8-bugbear/compare/20.11.1...24.1.15)

---
updated-dependencies:
- dependency-name: flake8-bugbear
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add pure Python fallback for getDirSizeRecursively() (#4753)

* add pure Python fallback for getDirSizeRecursively()

* Fix spelling

---------

Co-authored-by: Adam Novak <[email protected]>

* Update version_template.py for release

* Store chaining information just once (#4737)

* Keep around old names of chained jobs

* Get rid of chainedJobs

* Just pull log names from jobDesc

* Use an accessor to just get the whole chain together

* Imporve comment and formatting

* Fix wrong name in import

* Stop marking HTTP registry as insecure (#4757)

This should fix #4756 and hopefully the intermittent test failures where buildkit tries to speak HTTPS to our Docker cache.

* CWL: don't clear out user-provided values for the --default-container (#4730)

* CWL: don't clear out user-provided values for the --default-container

Fixes https://stackoverflow.com/questions/77684785/toil-cwl-runner-not-using-default-container-option-with-singularity-option

* mypy --strict for the CWL tests

* soften cap on ruamel.yaml dependency

* remove ruamel.yaml.string dependency for a simpler solution (#4760)

* Try to mitigate filling up the coordination directory (#4749)

* Complain more usefully about a bad coordination directory

* Don't pick tiny filesystems for coordination, and organize everything in toilwf- directories

* Put cleanup arena so it shares a prefix with but isn't in the directory it protects

* Fix variable name

* Don't catch any old thing, which doesn't work anymore anyway

* Allow toil-wdl-runner to run on Kubernetes and Mesos (#4754)

* Change docker security rules, remove --containall on singularity, add tzdata as dependency

* remove link for tzdata and add integration test

* Add test to gitlab and remove provisioner option

---------

Co-authored-by: Adam Novak <[email protected]>

* Ship User Logs to Leader (#4755)

* Document the stats and logging design as it stands

* Plug WDL task stdout and stderr into the --writeLogs system as new user streams

* Log CWL and WDL output and error logs that aren't captured by the workflow itself

* Name CWL and WDL log files usefully

This goes back to using displayName for stats and logging.

It also adds a WDL "task path" which is like the namespace but includes
numbers for scatters, and uses that to name the log files.

* Log more to illustrate https://github.com/moby/buildkit/issues/4458

* Document the user log system architecture

* Satisfy mypy

* Go back to using displayName for stats again

* Clarify CWL output handling

* Revise test to allow new '_'

* Update pytest requirement from <8,>=6.2.1 to >=6.2.1,<9 (#4772)

Updates the requirements on [pytest](https://github.com/pytest-dev/pytest) to permit the latest version.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/6.2.1...8.0.0)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add workflow to automatically update PRs when other PRs merge (#4774)

* Stop complaining about XDG_RUNTIME_DIR (#4769)

* Update setuptools requirement from <69,>=65.5.1 to >=65.5.1,<70 (#4693)

Updates the requirements on [setuptools](https://github.com/pypa/setuptools) to permit the latest version.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v65.5.1...v69.0.0)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lon Blauvelt <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Stop failing auto-update workflow on every merge conflict

* read the docs: enable generating graphs like inheritance trees. (#4734)

* read the docs: enable generating graphs like inheritance trees.

* Add Graphviz to CI image

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Docs: Always show Python execution using `python3` (#4764)

In case a virtualenv is not used

Co-authored-by: Andreas Tille <[email protected]>
Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Make formatting do all the code (#4777)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* avoid unnecessary boto{,3} imports (#4763)

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* remove use of distutils by copying in strtobool() (#4765)

* remove use of distutils by copying in strtobool()

Copied code is MIT licensed

https://github.com/pypa/distutils/blob/fb5c5704962cd3f40c69955437da9a88f4b28567/distutils/util.py#L340
https://github.com/pypa/distutils/blob/fb5c5704962cd3f40c69955437da9a88f4b28567/LICENSE

* Add type hints and replace distutils code with our own

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Revert --disableProgress to old flag-style behavior (#4778)

* Change default Singularity cache paths to be global (#4762)

* Change default cache paths to piggyback off of singularity and miniwdl defaults + set cache paths on cloud to /var/lib/toil

* Improve documentation

* Revert block quote and bold instead

* Change singularity cache directory to the right default directory

---------

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* CPU count fallback (#4780)

* Fall back to 1 core when # CPUs unavailable

* Apply all limits and then fall back to 1

---------

Co-authored-by: Theodore Ni <[email protected]>

* Fix special characters in filenames with the FileJobStore (#4781)

* Remove extraneous unquote

* Log task standard error to the worker log if it fails and MiniWDL hasn't already logged it

* Hack around having to dedent the command at the wrong time by keying on the first line

* Remove extra logging and cross-checks

* Add back missing line end

* Work around boto stubs regression in https://github.com/python/typeshed/issues/11381

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update sphinx-autodoc-typehints requirement (#4784)

Updates the requirements on [sphinx-autodoc-typehints](https://github.com/tox-dev/sphinx-autodoc-typehints) to permit the latest version.
- [Release notes](https://github.com/tox-dev/sphinx-autodoc-typehints/releases)
- [Changelog](https://github.com/tox-dev/sphinx-autodoc-typehints/blob/main/CHANGELOG.md)
- [Commits](https://github.com/tox-dev/sphinx-autodoc-typehints/compare/1.24.0...2.0.0)

---
updated-dependencies:
- dependency-name: sphinx-autodoc-typehints
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Use a default log limit of 100MiB (#4788)

* Use a default log limit of 100MiB

* Update documented default

* Require a new enough Docker to fix #4794 (#4795)

* Log CWL command output inline on failure, and to logging system whether it succeeds or not (#4793)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Unify devirtualization to fix output name collisions (#4792)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Allow setting WDL container engine with --container (#4787)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lon Blauvelt <[email protected]>

* Request and handle Slurm timeout signal (#4804)

* Add a Slurm termination signal for timeouts

* Use SIGINT for Slurm timeouts instead of SIGTERM

* Make the interrupt signal actually get to the worker process

* Run worker orderly cleanup even if asked to stop

* Preserve exit code from user code

* Enforce failure when Slurm jobs time out (#4802)

* Don't let 0 exit codes out of the Slurm batch system if the job isn't completed.

* Add missing import

* Teach Slurm and part of LSF to use the Toil exit reason system

* Report unavailable exit status better

* Make sure exit reasons come out as readable strings when logged on Python 3.11+

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix caching being accidentally set to True instead of None (#4805)

* Better stats for WDL workflows (#4770)

* Split up WDL input evaluation and command execution

* Rename task parts to inputs and command

* Deduplicate across scatters for stats

* Report CPU wait accurately with multiple cores, and improve titles

Fixes #4768

* Fix memory units in stats and on Mac

* Move job disk usage tracking and warning to AbstractFileStore

* Save disk to stats

* Fix imports and variable name

* Remove duplicated stat printing code

* Unify stat computation

* Use the category metadata globals to drive everything and sync the width and print code

* Stop coming up with negative wait when jobs don't report cores

* Allow setting WDL container engine with --container

* Use a default log limit of 100MiB

* Update documented default

* Require a new enough Docker to fix #4794

* Add a unit notion to stats

* Be consistent about printing units in toil stats

* Rename functions to snake_case

* Improve error reporting and split cluster and normal utils

* Start documenting the parts of the stats

* Swap over to a stats example that is more illustrative

* Fix counting the jobs per worker

* Explain all the job columns and the sorting

* Fix typing of jobs list

* Fix documentation build

* Fix white-box stats test

* Move the cluster utils out of the cloud providers ToC section

* Update worker.py

---------

Co-authored-by: Lon Blauvelt <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update EC2 instance list. (#4808)

* Bump version.

* Update README.rst

Couple of small doc changes.

* Respect job local-ness when chaining (#4809)

* Add test to make sure local jobs don't chain to nonlocal ones

* Implement chaining block for local to nonlocal

* Scale down stats tutorial test to fit on small CI runners

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix Python 3.8 support (#4823)

* Add all the supported Python versions as scheduled tests

* Don't let the Docker build succeed when Toil can't run at all

* Use 3.8-compatible type hints

* Fix missing description on PyPI (#4820)

* setuptools: Include README in the package metadata.

Currently https://pypi.org/project/toil/#description is
> The author of this package has not provided a project description

* Makefile: use isolated builds, add dist target (sdist+wheel) and deprecate the sdist target.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Install build (#4826)

* Use a sentinel location instead of an unmodified location to mark missing files (#4818)

* Use a sentinel location instead of an unmodified location to mark missing files

* Fix spelling

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Bump mypy from 1.8.0 to 1.9.0 (#4830)

Bumps [mypy](https://github.com/python/mypy) from 1.8.0 to 1.9.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.8.0...1.9.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Make sure output directory exists before using it (#4832)

* Pass through statusCode to prevent infinite loop (#4829)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Add tests for environment pickling (#4837)

* Add a test for the environment coming from environment.pickle over top of anything on the leader

* Make sure test works with slow job stores like AWS

* Bump sphinxcontrib-autoprogram from 0.1.8 to 0.1.9 (#4838)

Bumps [sphinxcontrib-autoprogram](https://github.com/sphinx-contrib/autoprogram) from 0.1.8 to 0.1.9.
- [Release notes](https://github.com/sphinx-contrib/autoprogram/releases)
- [Changelog](https://github.com/sphinx-contrib/autoprogram/blob/master/doc/changelog.rst)
- [Commits](https://github.com/sphinx-contrib/autoprogram/compare/0.1.8...0.1.9)

---
updated-dependencies:
- dependency-name: sphinxcontrib-autoprogram
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add colored logging (#4828)

* Add coloredlogs

* type ignore

* Fix test to get around how coloredlogs deals with  handlers

* Fix option, functionname, license, formatting, and colors

* Remove excess datetime

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove unused CI test (#4843)

* Measure CPU and memory usage in WDL Docker containers (#4819)

* Inject code into the container like MiniWDL to get Docker CPU and memory usage

* Remove not a real ref

* Keep resource monitoring state in a class

* Fix lingering old import

* Get import name right

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lon Blauvelt <[email protected]>

* Allow debugging jobs by name (and status improvements) (#4840)

* Report tag parsing errors better in case you mix up type and tag

* Fix toil status per-job status report to be per-job

* Shorten toil status option names

* Report completely failed jobs

* Rearrange per-job stats to make it easier to find runnable and failed jobs

* Add printing failed jobs specifically

* Stop making a config just to get status

* Implement search for job by name in debug-job by cribbing from status

* Document the toil status flags a bit

* Write up some debug-job examples

* Explain names more and drop distracting log line

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Improve exception handling to not output tracebacks (#4839)

* Improve exception handling, don't output tracebacks when possible

* Remove excess code in test

* Fix test to use subprocess to accommodate for changed exception handling

* Reword check_initialized()

Co-authored-by: Adam Novak <[email protected]>

* Move comments and make LocatorException take a prefix instead

* Change config to options as it no longer exists

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Update pytest-cov requirement from <5,>=2.12.1 to >=2.12.1,<6 (#4851)

Updates the requirements on [pytest-cov](https://github.com/pytest-dev/pytest-cov) to permit the latest version.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v2.12.1...v5.0.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update docutils requirement from <0.21,>=0.16 to >=0.16,<0.22 (#4866)

Updates the requirements on [docutils](https://docutils.sourceforge.io) to permit the latest version.

---
updated-dependencies:
- dependency-name: docutils
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update galaxy-util requirement from <23 to <25 (#4862)

Updates the requirements on [galaxy-util](https://github.com/galaxyproject/galaxy) to permit the latest version.
- [Release notes](https://github.com/galaxyproject/galaxy/releases)
- [Commits](https://github.com/galaxyproject/galaxy/compare/galaxy-util-19.9.0...v24.0)

---
updated-dependencies:
- dependency-name: galaxy-util
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update galaxy-tool-util requirement from <23 to <25 (#4861)

Updates the requirements on [galaxy-tool-util](https://github.com/galaxyproject/galaxy) to permit the latest version.
- [Release notes](https://github.com/galaxyproject/galaxy/releases)
- [Commits](https://github.com/galaxyproject/galaxy/compare/galaxy-tool-util-19.9.0...v24.0)

---
updated-dependencies:
- dependency-name: galaxy-tool-util
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michael R. Crusoe <[email protected]>

* Bump cwltool from 3.1.20240112164112 to 3.1.20240404144621 (#4870)

Bumps [cwltool](https://github.com/common-workflow-language/cwltool) from 3.1.20240112164112 to 3.1.20240404144621.
- [Release notes](https://github.com/common-workflow-language/cwltool/releases)
- [Commits](https://github.com/common-workflow-language/cwltool/compare/3.1.20240112164112...3.1.20240404144621)

---
updated-dependencies:
- dependency-name: cwltool
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump gunicorn from 21.2.0 to 22.0.0 (#4871)

Bumps [gunicorn](https://github.com/benoitc/gunicorn) from 21.2.0 to 22.0.0.
- [Release notes](https://github.com/benoitc/gunicorn/releases)
- [Commits](https://github.com/benoitc/gunicorn/compare/21.2.0...22.0.0)

---
updated-dependencies:
- dependency-name: gunicorn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Retry Slurm interactions more (#4869)

* Hook up grid engine batch systems to the normal retry system and add --stastePollingTimeout

* Remove extra word

* Insist on understanding the Slurm states and stop if we don't

* Change how we think of REVOKED and SPECIAL_EXIT

* Add missing argument

* Import missing exception type

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Replace use of boto with boto3 for `awsProvisioner.py` (#4859)

* Take out boto2 from awsProvisioner.py

* Add mypy stub file for s3

* Lazy import aws to avoid dependency if extra is not installed yet

* Also lazy import in tests

* Separate out wdl kubernetes test to avoid missing dependency

* Add unittest main

* Fix wdl CI to run separated tests

* Fix typo in lookup

* Update moto and remove leftover line in node.py

* Apply suggestions from code review

Co-authored-by: Adam Novak <[email protected]>

* Apply fixes

* Abstract AWS ErrorCondition server errors into a constant instance

* Move AWSServiceErrors declaration to a better place

* Prevent aliasing from confusing sphinx and remove cached autoapi in clean

* Update src/toil/lib/aws/__init__.py

Co-authored-by: Adam Novak <[email protected]>

* Change retry loop

* Replace assert with raise

---------

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Allow fetching job inputs for debugging (#4848)

* Reformat worker

* Actually change kwarg name

* Enable stopping WDL (and probably CWL) jobs after files are downloaded

* Make sure WDL commands get logged before we stop

* Add type hints

* Add debug flag accessor

* Make debug-job default to debug logging

* Build fake container environments for CWL and WDL jobs when debugging them

* Add an example of dumping job files to the docs

* Add tests for the file retrieval and container faking

* Add missing imports

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Make leader wait for expected updates to be visible in the job store, or fail the job (#4811)

* Implement expecting version bumps and fail src/toil/test/batchSystems/batchSystemTest.py::MaxCoresSingleMachineBatchSystemTest::testServices

* Actually turn on debug logging for service test

* Refer to jobs for space usage accounting by stringified job description and not body file

* Use exponential backoff when polling for job updates

* Fix comparison direction

* Plug the new CLI option

* Include version writers in warnings

* Make return type annotation correct

* Don't wait for new versions of failed jobs because then we're too slow to pass the badWorker tests

* Scale down stats tutorial test to fit on small CI runners

* Work out that command overrides aren't being removed

* Stop having an overloaded command field on JobDescriptions

* Fix typos and update architecture to lean less on command

* Fix calling the checkpoint restore

* Handle None vs. empty successors in tests

* Handle places that didn't expect nextSuccessors() to ever be None

* Remove extra the

* Fix handling jobs that had no bodies, and consolidate warning logic

* Always actually do a reset even if no new version is ready.

* Use has_body accessor more

* Rename loadJob variables

* Rename _body_spec and use more has_body()

* Use a NamedTuple instead of a command-style string to point to the body

* Improve JobDescription docstring and fix typoed argument name

* Remove worker command from JobDescription

* Eliminate references to get_worker_command/set_worker_command

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Enable FUSE for privileged Toil clusters (#4824)

* Add option for privileged clusters and enable privileges for toil-managed clusters

* Fix syntax error and add back namespace rules

* packages might be broken

* Dependencies

* Move apt clean

* Create test image

* Create test image 2

* Try just creating the base docker image

* test image creation, typo

* Try focal debian package

* Try the last docker build command

* remove nontoil makefile dependencies to test

* Successfully build docker images at least for amd64

* Remove unprivileged fuse mount code

* Bring back rest of docker builds

* Remove unnecessary env var in dockerfile

* Fix setuptools and virtualenv to some version and revert whitespace

* Apply suggestions from code review

Co-authored-by: Adam Novak <[email protected]>

* Move SINGULARITY_CACHEDIR comment

* Formatting and move strtobool

* Reflect moved functions for imports

* Remove debug_mute flag and print debugging statement outside instead

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Detect if the GridEngine worker thread has crashed to prevent hanging the workflow (#4873)

* Debug envvar

* add error to message

* Add logic for unexpected background thread failure

* Set block back to true

* Don't duplicate thread exception message and print at end

* Revert "Debug envvar"

This reverts commit 13392858db352da75c8ddfe3b4d13b5d88eccf14.

* Apply suggestions from code review

Co-authored-by: Adam Novak <[email protected]>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Bump mypy from 1.9.0 to 1.10.0 (#4878)

Bumps [mypy](https://github.com/python/mypy) from 1.9.0 to 1.10.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* remove SLURM caching override to support caching (#4884)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Add more debug logging for when the job is attempted and the worker is started (#4881)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update WDL conformance tests on CI (#4876)

* Update wdltoil_test.py

* Fix typo

* Fix version for integration tests

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Replace all usage of boto2 with boto3 (#4868)

* Take out boto2 from awsProvisioner.py

* Add mypy stub file for s3

* Lazy import aws to avoid dependency if extra is not installed yet

* Also lazy import in tests

* Separate out wdl kubernetes test to avoid missing dependency

* Add unittest main

* Fix wdl CI to run separated tests

* Fix typo in lookup

* Update moto and remove leftover line in node.py

* Remove all instances of boto

* Fix issues with boto return types and grab attributes before deleting

* Remove some unnecessary abstraction

* Fix improperly types in ec2.py

* Ensure UUID is a string for boto3

* No more boto

* Remove comments

* Move attribute initialization

* Properly delete all attributes of the item

* Move out pager and use pager for select to get around output limits

* Turn getter into method

* Remove comment in setup.py

* Remove commented dead import

* Remove stray boto import

* Apply suggestions from code review

Co-authored-by: Adam Novak <[email protected]>

* Rename, rearrange some code

* Revert not passing Value's to attributes when deleting attributes in SDB

* Fix missed changed var names

* Change ordering of jobstorexists exception to fix improper output on exception

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Revert ensurepip to get-pip (#4900)

* docs cleanup (#4889)

* file incorrect file extensions.

* fix typos

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Bump to a new major version (#4885)

Since #4811 made the batch systems take the command as an argument, we now have to bump the major version to signal incompatibility with any old batch system plugins.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Warn user. (#4893)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Allow symlinks to inputs as WDL outputs (#4883)

* Detect missing files at the offending step and announce the problem conspicuously

* Log the offending expression

* Resolve symlinks against container mounts during file virtualization

* Try and forward along original virtualized filenames

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* bye pytz (#4890)

* pytz is not needed in Python 3.9+, or with the zoneinfo backport

* make diff_mypy: quieter and target the correct branch

* Linting.

* Satisfy MyPy more (new MyPy?)

---------

Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: DailyDreaming <[email protected]>

* Stop suggesting infinity when validating half-open intervals (#4887)

This should fix #4886 by not suggesting to the user that "infinity" is an option value that can be used.

It also explains the option intervals in words instead of interval notation, which people might not be expecting.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix WDL option spelling and tolerate Cromwell-isms (#4906)

* Fix WDL option spelling and tolerate Cromwell-isms

* Linting.

* Satisfy MyPy more (new MyPy?)

---------

Co-authored-by: DailyDreaming <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove wrapped CWL doc example. (#4892)

* Remove wrapped CWL doc example.

* Patch missing links.

* Remove AWS dependant import/test from cwlTest.py.

* Missing @slow.

* Missing import.

* Make SimpleDB retry on EndpointConnectionError

* Linting.

* Satisfy MyPy more (new MyPy?)

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adam Novak <[email protected]>

* Add retries to DockerCheckTest.testBadGoogleRepo (#4909)

* Add retries to flaky test

* get rid of extra import

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix 3.8 backport.timezone import (#4908)

* Fix 3.8 import and remove dead comment in requirements.txt

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lon Blauvelt <[email protected]>

* Update to Python 3.12 (#4901)

* Add Python 3.12 to CI

* Update sphinx-autoapi and astroid to deal with crash

https://github.com/pylint-dev/pylint/issues/8782

* Remove dead comment

* Add rules to 3.11 build

* update htcondor

* Update use of HTcondor in appliance build

* Ensure tests are instanced and don't jumble relative paths + debug logging

* oops, update utilsTest too

* is this a pytest issue?

* Add some more log messages

* Fix time.sleep

* Remove the debug statement in docker

* Bump flask-cors from 4.0.0 to 4.0.1 (#4916)

Bumps [flask-cors](https://github.com/corydolphin/flask-cors) from 4.0.0 to 4.0.1.
- [Release notes](https://github.com/corydolphin/flask-cors/releases)
- [Changelog](https://github.com/corydolphin/flask-cors/blob/main/CHANGELOG.md)
- [Commits](https://github.com/corydolphin/flask-cors/compare/4.0.0...4.0.1)

---
updated-dependencies:
- dependency-name: flask-cors
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Try /tmp before the workdir (#4914)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* biocontainer tests: use version corresponding to v2 Docker Image Format (#4912)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Revert "Update to Python 3.12 (#4901)" (#4917)

This reverts commit 460846d7ded3820acc505cccb9c866ea9a7a940a.

* Bump miniwdl from 1.11.1 to 1.12.0 (#4920)

Bumps [miniwdl](https://github.com/chanzuckerberg/miniwdl) from 1.11.1 to 1.12.0.
- [Release notes](https://github.com/chanzuckerberg/miniwdl/releases)
- [Commits](https://github.com/chanzuckerberg/miniwdl/compare/v1.11.1...v1.12.0)

---
updated-dependencies:
- dependency-name: miniwdl
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Support Python 3.12 (#4919)

* Add Python 3.12 to CI

* Update sphinx-autoapi and astroid to deal with crash

https://github.com/pylint-dev/pylint/issues/8782

* Remove dead comment

* Add rules to 3.11 build

* update htcondor

* Update use of HTcondor in appliance build

* Ensure tests are instanced and don't jumble relative paths + debug logging

* oops, update utilsTest too

* is this a pytest issue?

* Add some more log messages

* Fix time.sleep

* Remove the debug statement in docker

* remove logger print statements in utilsTest.py and pin pytest

* Up the timeout on some tests (possiby a timing issue)

* Up the timeout on more tests

* Up the pytest version again

* Add documentation for installing batch system plugins (#4926)

Co-authored-by: Adam Novak <[email protected]>

* Update Werkzeug to appease the Github security police (#4925)

It looks like if you give away your debugger PIN, people can use your Werkzeug debugger. This is somehow a security issue and was apparently never fixed on Werkzeug 2.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove unused comment

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: William Gao <[email protected]>
Co-authored-by: Adam Novak <[email protected]>
Co-authored-by: stxue1 <[email protected]>
Co-authored-by: Brandon Walker <[email protected]>
Co-authored-by: Brandon Walker <[email protected]>
Co-authored-by: Glenn Hickey <[email protected]>
Co-authored-by: Michael R. Crusoe <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Michael R. Crusoe <[email protected]>
Co-authored-by: Lon Blauvelt <[email protected]>
Co-authored-by: Alexandre Detiste <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Andreas Tille <[email protected]>
Co-authored-by: Theodore Ni <[email protected]>
Co-authored-by: Benedict Paten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants