Skip to content

Commit

Permalink
Merge branch 'master' into issues/4538-config-file
Browse files Browse the repository at this point in the history
  • Loading branch information
adamnovak authored Oct 3, 2023
2 parents 95aeb67 + 9b09d62 commit 49cc0ad
Show file tree
Hide file tree
Showing 12 changed files with 414 additions and 306 deletions.
7 changes: 4 additions & 3 deletions contrib/admin/cleanup_aws_resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def contains_uuid(string):
Determines if a string contains a pattern like: '28064c76-a491-43e7-9b50-da424f920354',
which toil uses in its test generated bucket names.
"""
return bool(re.compile('[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}').findall(string))
return bool(re.compile('[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{8,12}').findall(string))


def contains_uuid_with_underscores(string):
Expand All @@ -61,15 +61,16 @@ def contains_num_only_uuid(string):
Determines if a string contains a pattern like: '13614-31311-31347',
which toil uses in its test generated sdb domain names.
"""
return bool(re.compile('[0-9]{5}-[0-9]{5}-[0-9]{5}').findall(string))
return bool(re.compile('[0-9]{4,5}-[0-9]{4,5}-[0-9]{4,5}').findall(string))


def contains_toil_test_patterns(string):
return contains_uuid(string) or contains_num_only_uuid(string) or contains_uuid_with_underscores(string)


def matches(resource_name):
if resource_name.endswith('--files') or resource_name.endswith('--jobs') or resource_name.endswith('_toil'):
if (resource_name.endswith('--files') or resource_name.endswith('--jobs') or resource_name.endswith('_toil')
or resource_name.endswith('--internal') or resource_name.startswith('toil-s3test-')):
if contains_toil_test_patterns(resource_name):
return resource_name

Expand Down
45 changes: 23 additions & 22 deletions docs/appendices/deploy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,27 +31,27 @@ From here, you can install a project and its dependencies::
$ tree
.
├── util
   ├── __init__.py
   └── sort
   ├── __init__.py
   └── quick.py
├── __init__.py
└── sort
├── __init__.py
└── quick.py
└── workflow
├── __init__.py
└── main.py

3 directories, 5 files
$ pip install matplotlib
$ cp -R workflow util venv/lib/python2.7/site-packages
$ cp -R workflow util venv/lib/python3.9/site-packages

Ideally, your project would have a ``setup.py`` file (see `setuptools`_) which streamlines the installation process::

$ tree
.
├── util
   ├── __init__.py
   └── sort
   ├── __init__.py
   └── quick.py
├── __init__.py
└── sort
├── __init__.py
└── quick.py
├── workflow
│ ├── __init__.py
│ └── main.py
Expand All @@ -70,7 +70,7 @@ both Python and Toil are assumed to be present on the leader and all worker node

We can now run our workflow::

$ python main.py --batchSystem=mesos
$ python main.py --batchSystem=kubernetes

.. important::

Expand Down Expand Up @@ -101,13 +101,13 @@ This scenario applies if the user script imports modules that are its siblings::
$ cd my_project
$ ls
userScript.py utilities.py
$ ./userScript.py --batchSystem=mesos
$ ./userScript.py --batchSystem=kubernetes

Here ``userScript.py`` imports additional functionality from ``utilities.py``.
Toil detects that ``userScript.py`` has sibling modules and copies them to the
workers, alongside the user script. Note that sibling modules will be
auto-deployed regardless of whether they are actually imported by the user
scriptall .py files residing in the same directory as the user script will
script: all .py files residing in the same directory as the user script will
automatically be auto-deployed.

Sibling modules are a suitable method of organizing the source code of
Expand All @@ -134,16 +134,16 @@ The following shell session illustrates this::
$ tree
.
├── utils
   ├── __init__.py
   └── sort
   ├── __init__.py
   └── quick.py
├── __init__.py
└── sort
├── __init__.py
└── quick.py
└── workflow
├── __init__.py
└── main.py

3 directories, 5 files
$ python -m workflow.main --batchSystem=mesos
$ python -m workflow.main --batchSystem=kubernetes

.. _package: https://docs.python.org/2/tutorial/modules.html#packages

Expand All @@ -168,7 +168,7 @@ could do this::
$ cd my_project
$ export PYTHONPATH="$PWD"
$ cd /some/other/dir
$ python -m workflow.main --batchSystem=mesos
$ python -m workflow.main --batchSystem=kubernetes

Also note that the root directory itself must not be package, i.e. must not
contain an ``__init__.py``.
Expand All @@ -193,7 +193,8 @@ replicates ``PYTHONPATH`` from the leader to every worker.
Toil Appliance
--------------

The term Toil Appliance refers to the Mesos Docker image that Toil uses to simulate the machines in the virtual mesos
cluster. It's easily deployed, only needs Docker, and allows for workflows to be run in single-machine mode and for
clusters of VMs to be provisioned. To specify a different image, see the Toil :ref:`envars` section. For more
information on the Toil Appliance, see the :ref:`runningAWS` section.
The term Toil Appliance refers to the Ubuntu-based Docker image that Toil uses
for the machines in the cluster. It's easily deployed, only needs Docker, and
allows a consistent environment on all Toil clusters. To specify a different
image, see the Toil :ref:`envars` section. For more information on the Toil
Appliance, see the :ref:`runningAWS` section.
8 changes: 4 additions & 4 deletions docs/developingWorkflows/toilAPIBatchsystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ Batch System API
================

The batch system interface is used by Toil to abstract over different ways of running
batches of jobs, for example Slurm, GridEngine, Mesos, Parasol and a single node. The
batches of jobs, for example on Slurm clusters, Kubernetes clusters, or a single node. The
:class:`toil.batchSystems.abstractBatchSystem.AbstractBatchSystem` API is implemented to
run jobs using a given job management system, e.g. Mesos.
run jobs using a given job management system.

Batch System Enivronmental Variables
------------------------------------
Batch System Environment Variables
----------------------------------

Environmental variables allow passing of scheduler specific parameters.

Expand Down
40 changes: 17 additions & 23 deletions docs/gettingStarted/quickStart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,14 @@ Toil uses batch systems to manage the jobs it creates.

The ``singleMachine`` batch system is primarily used to prepare and debug workflows on a
local machine. Once validated, try running them on a full-fledged batch system (see :ref:`batchsysteminterface`).
Toil supports many different batch systems such as `Apache Mesos`_ and Grid Engine; its versatility makes it
Toil supports many different batch systems such as `Kubernetes`_ and Grid Engine; its versatility makes it
easy to run your workflow in all kinds of places.

Toil is totally customizable! Run ``python helloWorld.py --help`` to see a complete list of available options.

For something beyond a "Hello, world!" example, refer to :ref:`runningDetail`.

.. _Apache Mesos: https://mesos.apache.org/getting-started/
.. _Kubernetes: https://kubernetes.io/

.. _cwlquickstart:

Expand Down Expand Up @@ -279,7 +279,7 @@ workflow there is always one leader process, and potentially many worker process

When using the single-machine batch system (the default), the worker processes will be running
on the same machine as the leader process. With full-fledged batch systems like
Mesos the worker processes will typically be started on separate machines. The
Kubernetes the worker processes will typically be started on separate machines. The
boilerplate ensures that the pipeline is only started once---on the leader---but
not when its job functions are imported and executed on the individual workers.

Expand Down Expand Up @@ -394,8 +394,10 @@ Also! Remember to use the :ref:`destroyCluster` command when finished to destro
#. Launch a cluster in AWS using the :ref:`launchCluster` command::

(venv) $ toil launch-cluster <cluster-name> \
--clusterType kubernetes \
--keyPairName <AWS-key-pair-name> \
--leaderNodeType t2.medium \
--nodeTypes t2.medium -w 1 \
--zone us-west-2a

The arguments ``keyPairName``, ``leaderNodeType``, and ``zone`` are required to launch a cluster.
Expand Down Expand Up @@ -448,8 +450,10 @@ Also! Remember to use the :ref:`destroyCluster` command when finished to destro
#. First launch a node in AWS using the :ref:`launchCluster` command::

(venv) $ toil launch-cluster <cluster-name> \
--clusterType kubernetes \
--keyPairName <AWS-key-pair-name> \
--leaderNodeType t2.medium \
--nodeTypes t2.medium -w 1 \
--zone us-west-2a

#. Copy ``example.cwl`` and ``example-job.yaml`` from the :ref:`CWL example <cwlquickstart>` to the node using
Expand All @@ -462,24 +466,25 @@ Also! Remember to use the :ref:`destroyCluster` command when finished to destro

(venv) $ toil ssh-cluster --zone us-west-2a <cluster-name>

#. Once on the leader node, it's a good idea to update and install the following::
#. Once on the leader node, command line tools such as ``kubectl`` will be available to you. It's also a good idea to
update and install the following::

sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get -y dist-upgrade
sudo apt-get -y install git
sudo pip install mesos.cli

#. Now create a new ``virtualenv`` with the ``--system-site-packages`` option and activate::

virtualenv --system-site-packages venv
source venv/bin/activate

#. Now run the CWL workflow::
#. Now run the CWL workflow with the Kubernetes batch system::

(venv) $ toil-cwl-runner \
--provisioner aws \
--jobStore aws:us-west-2a:any-name \
--batchSystem kubernetes \
--jobStore aws:us-west-2:any-name \
/tmp/example.cwl /tmp/example-job.yaml

.. tip::
Expand Down Expand Up @@ -528,12 +533,14 @@ Also! Remember to use the :ref:`destroyCluster` command when finished to destro

#. Download :download:`pestis.tar.gz <../../src/toil/test/cactus/pestis.tar.gz>`

#. Launch a leader node using the :ref:`launchCluster` command::
#. Launch a cluster using the :ref:`launchCluster` command::

(venv) $ toil launch-cluster <cluster-name> \
--provisioner <aws, gce> \
--keyPairName <key-pair-name> \
--leaderNodeType <type> \
--nodeType <type> \
-w 1-2 \
--zone <zone>


Expand Down Expand Up @@ -579,13 +586,9 @@ Also! Remember to use the :ref:`destroyCluster` command when finished to destro

#. Run `Cactus <https://github.com/ComparativeGenomicsToolkit/cactus>`__ as an autoscaling workflow::

(cact_venv) $ TOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:3.14.0 cactus \
--provisioner <aws, gce> \
--nodeType <type> \
--maxNodes 2 \
--minNodes 0 \
(cact_venv) $ cactus \
--retry 10 \
--batchSystem mesos \
--batchSystem kubernetes \
--logDebug \
--logFile /logFile_pestis3 \
--configFile \
Expand All @@ -597,15 +600,6 @@ Also! Remember to use the :ref:`destroyCluster` command when finished to destro

**Pieces of the Puzzle**:

``TOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:3.14.0`` --- specifies the version of Toil being used, 3.14.0;
if the latest one is desired, please eliminate.

``--nodeType`` --- determines the instance type used for worker nodes. The instance type specified here must be on
the same cloud provider as the one specified with ``--leaderNodeType``

``--maxNodes 2`` --- creates up to two instances of the type specified with ``--nodeType`` and
launches Mesos worker containers inside them.

``--logDebug`` --- equivalent to ``--logLevel DEBUG``.

``--logFile /logFile_pestis3`` --- writes logs in a file named `logFile_pestis3` under ``/`` folder.
Expand Down
Loading

0 comments on commit 49cc0ad

Please sign in to comment.