CAWS: Carbon Aware Workflow Scheduling

Caws is a python library/executor/service for running workflows across multiple sights in a carbon and energy aware fashion. The ultimate goal is to make the environmental impact of the computing jobs run transparent to the user, and provide incentive and automation to reduce that footprint.

Setup

This repo can be used with the latest version of globus-compute-sdk. It can also schedule to any existing endpoint. However, to enable energy monitoring, the endpoints will need to be deployed using the forked version of gobus-compute-endpoint as well as the forked version of parsl. In the configuration of the enpoints, you will need to enable monitoring, enable energy monitoring, and point the monitroing database to a location accessible from your personal compute (i.e wherever you are the scheduler from).

Endpoints

First you have to setup a globus compute endpoint with the correct forks of all the repositories. The general globus compute documentation is here, but below I provide modified instructions that show how to configure an endpoint for use with CAWS.

On a system where you want to run a compute endpoint, use the following commands to install globus-compute-endpoint and create an endpoint:

git clone [email protected]:AK2000/funcX.git
cd funcx
git checkout power_monitoring_new
cd compute_endpoint
pip install . # Install globus compute endpoint

globus-compute-endpoint configure <ENDPOINT_NAME>

Then you have to replace the config.yaml file in ~/.globus_compute/<ENDPOINT_NAME>/config.yaml with an appropriate config.py file to correctly configure the endpoint. In the configuration, be sure to use the GlobusComputeEngine and specify an energy_monitor. This is a system specific class that tells the endpoint how it can read the total energy that is being used by a node.

Monitoring must aslo be enabled in the endpoint configration. Monitoring can be enabled in configuration of the endpoint using the monitoring infrastructure derived from parsl:

from parsl.monitoring import MonitoringHub
...
config = Config(...,
                monitoring_hub=MonitoringHub(
                        hub_address="localhost",
                        hub_port=55055,
                        monitoring_debug=True,
                        resource_monitoring_interval=1,
                        logging_endpoint="postgresql://<user>:<password>@<address>/monitoring"),
)

The logging_endpoint is a relational database that stores resource and task monitoring information and serves as a source for the CAWS client. Currently, you must host this database yourself.

A sample configuration with both monitoring enabled and the correct executor configuration is in docs/sample_config.json.

After the Globus Compute Endpoint is properly configured, start the endpoint:

globus-compute-endpoint start <ENDPOINT_NAME>

Host

First clone the repository. Make sure to also download the SeBS data submodule:

git clone [email protected]:/AK2000/caws.git
git submodule update --init --recursive
cd caws

Finally, to install this repo with it's dependencies and the experiments, run:

$ pip install .

Be sure to also setup the environment variable. Alternatively, you can pass it as an argument with every executor that you create.

export ENDPOINT_MONITOR_DEFAULT=<DATABASE_URI>

Testing

To run the test suite use:

$ pytest --endpoint_id <COMPUTE_ID>

Experiments

The experiments require some additional packages to be installed on the endpoint. to install those packages, on the endpoint run:

export ENV_DIR=<PATH_TO_ENV>
wget https://raw.githubusercontent.com/AK2000/caws/master/scripts/requirements.txt
conda install -p ${ENV_DIR} --yes --file requirements.txt

wget -q https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
tar -xf ffmpeg-release-amd64-static.tar.xz
rm *.tar.xz
mv ffmpeg-* ffmpeg
rm ffmpeg/ffprobe
# make the binary executable
chmod 755 ffmpeg/ffmpeg
# move the binary onto the environment path
mv ffmpeg/ffmpeg ${ENV_DIR}/bin/

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
caws		caws
caws_experiments		caws_experiments
docs		docs
notebooks		notebooks
reproducibility		reproducibility
serverless-benchmarks-data @ 6a17a46		serverless-benchmarks-data @ 6a17a46
services		services
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
Readme.md		Readme.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAWS: Carbon Aware Workflow Scheduling

Setup

Endpoints

Host

Testing

Experiments

Bookmarklet

About

Releases

Packages

Contributors 2

Languages

AK2000/caws

Folders and files

Latest commit

History

Repository files navigation

CAWS: Carbon Aware Workflow Scheduling

Setup

Endpoints

Host

Testing

Experiments

Bookmarklet

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages