From 197c802402bf4dff914f58b411cb31dc9203a7c0 Mon Sep 17 00:00:00 2001 From: Thomas Roeblitz Date: Tue, 26 Sep 2023 05:00:45 +0200 Subject: [PATCH 1/3] polishing pass over README.md + adding missing information --- README.md | 337 ++++++++++++++++++++++++++---------------------------- 1 file changed, 164 insertions(+), 173 deletions(-) diff --git a/README.md b/README.md index 897af4d6..5b0f0462 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,12 @@ -A bot to help with requests to add software installations to the [EESSI software layer](https://github.com/EESSI/software-layer) +> [!NOTE] +> In the future the installation and configuration of the bot will be moved +> to the EESSI docs, likely under [Build-test-deploy bot](https://www.eessi.io/docs/bot/). -GitHub App implemented in ``eessi_bot_event_handler.py`` - -Script to start app: ``event_handler.sh`` - -Requires: - -* Python 3 -* **PyGitHub**: Python library to use GitHub API v3 - * https://github.com/PyGithub/PyGithub - * API: https://pygithub.readthedocs.io/en/latest/reference.html -* **Waitress**: production-quality pure-Python WSGI server - * https://docs.pylonsproject.org/projects/waitress/en/stable/ -* **PyGHee**: Python library to facilitate creating a GitHub App implemented in Python - * https://github.com/boegel/PyGHee - -``` -pip3 install --user -r requirements.txt -``` +The bot helps automating tasks to build, to test and to deploy components of the +EESSI layers ([compatibility](https://github.com/EESSI/compatibility-layer) and +[software](https://github.com/EESSI/software-layer)). In the future, the bot may +be used with any repository that provides some scripts for building, testing and +deployment. # Instructions to set up the EESSI bot components @@ -33,57 +22,18 @@ two main components provided in this repository: - GitHub account(s) (two needed for a development scenario), referring to them as `YOU_1` and `YOU_2` below - A fork, say `YOU_1/software-layer`, of [EESSI/software-layer](https://github.com/EESSI/software-layer) and a fork, say `YOU_2/software-layer` of your first fork if you want to emulate the bot's behaviour but not change EESSI's repository. The EESSI bot will act on events triggered for the first fork (`YOU_1/software-layer`). - Access to a frontend/login node/service node of a Slurm cluster where the EESSI bot components shall run. For the sake of brevity, we call this node simply `bot machine`. -- `singularity` with version 3.6 or newer on the compute nodes of the Slurm cluster. -- The EESSI bot components and the (build) jobs will frequently access the Internet. Hence, worker nodes and `bot machine` of the Slurm cluster need access to the Internet. +- `singularity` with version 3.6 or newer _OR_ `apptainer` with version 1.0 or newer on the compute nodes of the Slurm cluster. +- The EESSI bot components and the (build) jobs will frequently access the + Internet. Hence, worker nodes and the `bot machine` of the Slurm cluster need +access to the Internet (either directly or via an HTTP proxy). ## Step 1: Smee.io channel and smee client We use smee.io as a service to relay events from GitHub to the EESSI bot. To do so, create a new channel on the page https://smee.io and note the URL, e.g., https://smee.io/CHANNEL-ID -On the `bot machine` we need a tool which receives events relayed from https://smee.io/CHANNEL-ID and forwards it to the EESSI bot. We use the Smee client for this. The Smee client can be installed globally with - -``` -npm install -g smee-client -``` - -or per user - -``` -npm install smee-client -``` - -If you don't have `npm` on your system and don't have sudo access to easily install it, you may use a container as follows - -``` -mkdir smee -cd smee -singularity pull docker://node -singularity exec node_latest.sif npm install smee-client -cat << 'EOF' > smee -#!/usr/bin/env bash - -BASEDIR=$(dirname "$0") - -singularity exec $BASEDIR/node_latest.sif $BASEDIR/node_modules/smee-client/bin/smee.js "$@" -EOF - -chmod 700 smee -export PATH=$PATH:$PWD -``` - -Finally, run the Smee client as follows - -``` -smee --url https://smee.io/CHANNEL-ID -``` - -If the event handler (see [Step 6.1](#step6.1)) receives events on a port different than the default (3000), you need to specify the port via the parameter `--port PORTNUMBER`, for example, - -``` -smee --url https://smee.io/CHANNEL-ID --port 3030 -``` - -Alternatively, you may use a container providing the smee client. For example, +On the `bot machine` we need a tool which receives events relayed from +https://smee.io/CHANNEL-ID and forwards it to the EESSI bot. We use the Smee +client for this. The Smee client can be run via a container as follows ``` singularity pull docker://deltaprojects/smee-client @@ -112,11 +62,11 @@ At the [app settings page](https://github.com/settings/apps) click "New GitHub A python3 -c 'import secrets; print(secrets.token_hex(64))' ``` - Permissions: assign permissions to the app it needs (e.g., read access to commits, issues, pull requests); - - Make sure to assign read and write access to the Pull request in Repository permissions section; These permisions can be changed later on; - - Make sure to accept the new permissions from the install app section. Select Install App option from the menu on the left hand side. - - Then select the wheel right next to your installed app or use the link https://github.com/settings/installations/INSTALLATION_ID - - Once the page open you’ll be able to accept the new permissions there. - - Some permissions (e.g., metadata) will be selected automatically because of others you have chosen. + - Make sure to assign read and write access to the Pull request in Repository permissions section; These permisions can be changed later on; + - Make sure to accept the new permissions from the install app section. Select Install App option from the menu on the left hand side. + - Then select the wheel right next to your installed app or use the link https://github.com/settings/installations/INSTALLATION_ID + - Once the page open you'll be able to accept the new permissions there. + - Some permissions (e.g., metadata) will be selected automatically because of others you have chosen. - Events: subscribe the app to events it shall react on (e.g., related to pull requests) - Select that the app can only be installed by this (your) GitHub account @@ -145,7 +95,9 @@ Determine full path to bot directory cd eessi-bot-software-layer pwd ``` -Note the output of `pwd`. This will be used to replace `PATH_TO_EESSI_BOT` in the configuration file `app.cfg` (see [Step 5.4](#step5.4)). +Note the output of `pwd`. This will be used to replace `PATH_TO_EESSI_BOT` in the +configuration file `app.cfg` (see [Step 5.4](#step5.4)). In the remainder of this +page we will refer to this directory as `PATH_TO_EESSI_BOT`. If you want to develop the EESSI bot, it is recommended that you fork the repository and use the fork on the `bot machine`. @@ -172,36 +124,11 @@ pip install -r requirements.txt Note, before you can start the bot components (see below), you have to activate the virtual environment with `source venv_bot_p37/bin/activate`. You can deactivate it simply by running `deactivate`. -**Troubles installing some of the requirements or their dependencies?** -You may try to upgrade `pip` first with -``` -python3 -m pip install --user --upgrade pip -``` -Then try to install the requirements with -``` -pip3 install --user -r requirements.txt -``` - -Alternatively, you may try to install some of the dependencies by fixing their version. For example, on the [EESSI CitC cluster](https://github.com/EESSI/hackathons/tree/main/2021-12/citc) installing PyGithub failed due to some problem installing its dependency PyNaCl. Apparently, PyGithub only required version 1.4.0 of PyNaCl but the most recent version 1.5.0 failed to install. Hence, when installing PyNaCl version 1.4.0 first, then PyGithub could be installed. Example commands - -``` -pip3 install --user PyNaCl==1.4.0 -pip3 install --user -r requirements.txt -``` - -### Step 4.1 Using a development version/branch of PyGHee - -The above command `pip3 install --user -r requirements.txt` installs the latest release of the PyGHee library. If you want to use a development version/branch, i.e., what is available from GitHub or your own local copy, you have to set `$PYTHONPATH` correctly. Assuming the library's main directory is `SOME_PATH/PyGHee` do the following in the terminal/shell/script where you run the bot: - -``` -export PYTHONPATH=SOME_PATH/PyGHee:$PYTHONPATH -``` - -### Step 4.2: Installing tools to access S3 bucket +### Step 4.1: Installing tools to access S3 bucket The script `scripts/eessi-upload-to-staging` uploads a tarball and an associated metadata file to an S3 bucket. It needs two tools for this, `aws` to actually upload the files and `jq` to create the metadata file. This section describes how these tools are installed and configured on the `bot machine`. -Create a new directory, say `BOT_ROOT/tools` and change into the directory. +Create a new directory, say `PATH_TO_EESSI_BOT/tools` and change into the directory. For installing the AWS Command Line Interface, including the tool `aws`, please follow the instructions at the @@ -264,11 +191,12 @@ The private key is needed to let the app authenticate when updating information Open the page https://github.com/settings/apps and then click on the icon left to the name of the GitHub App for the EESSI bot or the "Edit" button for the app. Near the end of the page you will find a section "Private keys" where you can create a private key by clicking on the button "Generate a private key". The private key should be automatically downloaded to your local computer. Copy it to the `bot machine` and note the full path to it (`PATH_TO_PRIVATE_KEY`). -For example: the private key is in the LOCAL computer. To copy it to the bot machine +For example: the private key is on your LOCAL computer. To transfer it to the +`bot machine` run ``` scp PATH_TO_PRIVATE_KEY_FILE_LOCAL_COMPUTER REMOTE_USERNAME@TARGET_HOST:TARGET/PATH ``` -the `TARGET/PATH` of the bot machine should be noted for PATH_TO_PRIVATE_KEY. +the `TARGET/PATH` of the bot machine should be noted for `PATH_TO_PRIVATE_KEY`. ### Step 5.4: Create the configuration file `app.cfg` @@ -324,7 +252,7 @@ The CVMFS configuration could be commented out unless there is a need to customi http_proxy = http://PROXY_DNS:3128/ https_proxy = http://PROXY_DNS:3128/ ``` -If compute nodes have no internet connection, we need to set `http(s)_proxy` +If compute nodes have no direct internet connection, we need to set `http(s)_proxy` or commands such as `pip3` and `eb` (EasyBuild) cannot download software from package repositories. Typically these settings are set in the prologue of a Slurm job. However, when entering the Gentoo Prefix, most environment settings @@ -344,11 +272,16 @@ read by the bot and handed over to `build_job_script` via the parameter ``` local_tmp = /tmp/$USER/EESSI ``` -This is the path to a temporary directory on the node building the stack, i.e., on a compute/worker node. You may have to change this if temporary storage under '/tmp' does not exist or is too small. This setting will be used for the environment variable `EESSI_TMPDIR`. Variables in the value may be esaped with '\' to delay their expansion to the start of the build_job_script. This can be used for referencing environment variables that are only set inside a Slurm job. +This is the path to a temporary directory on the node building the stack, i.e., +on a compute/worker node. You may have to change this if temporary storage under +`/tmp` does not exist or is too small. This setting will be used for the +environment variable `EESSI_TMPDIR`. The value is expanded only inside a running +job. Thus, typical job environment variables may be used to isolate jobs running +simultaneously on the same compute node. ``` slurm_params = "--hold" ``` -This defines additional parameters for submitting batch jobs. "--hold" should be kept or the bot might not work as intended (the release step would be circumvented). Additional parameters, for example, to specify an account, a partition or any other parameters supported by `sbatch`, may be added to customize the submission to your environment. +This defines additional parameters for submitting batch jobs. `"--hold"` should be kept or the bot might not work as intended (the release step would be circumvented). Additional parameters, for example, to specify an account, a partition or any other parameters supported by `sbatch`, may be added to customize the submission to your environment. ``` submit_command = /usr/bin/sbatch ``` @@ -368,7 +301,7 @@ commands. command_response_fmt = FORMAT_MARKDOWN_AND_HTML ``` This allows to customize the format of the comments about the handling of bot -commands. The format needs to include `{sender}`, `{comment_response}` and +commands. The format needs to include `{app_name}`, `{comment_response}` and `{comment_result}`. `{app_name}` is replaced with the name of the bot instance. `{comment_response}` is replaced with information about parsing the comment for commands before any command is run. `{comment_result}` is replaced with @@ -410,18 +343,64 @@ The option `deploy_permission` defines which GitHub accounts can trigger the deployment procedure. The value can be empty (any GH account can trigger the deployment) or a space delimited list of GH accounts. +``` +no_deploy_permission_comment = Label `bot:deploy` has been set by user `{deploy_labeler}`, but this person does not have permission to trigger deployments +``` +This defines a message that is added to the status table in a PR comment +corresponding to a job whose tarball should have been uploaded (e.g., after +setting the `bot:deploy` label). + ### Section `[architecturetargets]` The section `[architecturetargets]` defines for which targets (OS/SUBDIR), e.g., `linux/amd/zen2` the EESSI bot should submit jobs and what additional `sbatch` parameters will be used for requesting a compute node with the CPU microarchitecture needed to build the software stack. ``` arch_target_map = { "linux/x86_64/generic" : "--constraint shape=c4.2xlarge", "linux/x86_64/amd/zen2" : "--constraint shape=c5a.2xlarge" } ``` -The map has one to many entries of the format `OS/SUBDIR : ADDITIONAL_SBATCH_PARAMETERS`. For your cluster, you will have to figure out which microarchitectures (`SUBDIR`) are available (as `OS` only `linux` is currently supported) and how to instruct Slurm to request them (`ADDITIONAL_SBATCH_PARAMETERS`). +The map has one to many entries of the format `OS/SUBDIR : +ADDITIONAL_SBATCH_PARAMETERS`. For your cluster, you will have to figure out +which microarchitectures (`SUBDIR`) are available (as `OS` only `linux` is +currently supported) and how to instruct Slurm to allocate nodes with that +architecture to a job (`ADDITIONAL_SBATCH_PARAMETERS`). Note, if you do not have to specify additional parameters to `sbatch` to request a compute node with a specific microarchitecture, you can just write something like ``` arch_target_map = { "linux/x86_64/generic" : "" } ``` +### Section `[repo_targets]` +This section defines for what repositories and architectures the bot can run job. +Repositories are referenced by IDs (or `repo_id`). Architectures are identified +by `OS/SUBDIR` which correspond to settings in the `arch_target_map`. + +``` +repo_target_map = { + "OS_SUBDIR_1" : ["REPO_ID_1_1","REPO_ID_1_2"], + "OS_SUBDIR_2" : ["REPO_ID_2_1","REPO_ID_2_2"] } +``` +For each `OS/SUBDIR` combination a list of available repository IDs can be +provided. + +The repository IDs are defined in a separate file, say `repos.cfg` which is +stored in the directory defined via +``` +repos_cfg_dir = PATH_TO_SHARED_DIRECTORY/cfg_bundles +``` +The `repos.cfg` file also uses the `ini` format as follows +``` +[eessi-2023.06] +repo_name = pilot.eessi-hpc.org +repo_version = 2023.06 +config_bundle = eessi-hpc.org-cfg_files.tgz +config_map = { "eessi-hpc.org/cvmfs-config.eessi-hpc.org.pub":"/etc/cvmfs/keys/eessi-hpc.org/cvmfs-config.eessi-hpc.org.pub", "eessi-hpc.org/ci.eessi-hpc.org.pub":"/etc/cvmfs/keys/eessi-hpc.org/ci.eessi-hpc.org.pub", "eessi-hpc.org/pilot.eessi-hpc.org.pub":"/etc/cvmfs/keys/eessi-hpc.org/pilot.eessi-hpc.org.pub", "default.local":"/etc/cvmfs/default.local", "eessi-hpc.org.conf":"/etc/cvmfs/domain.d/eessi-hpc.org.conf"} +container = docker://ghcr.io/eessi/build-node:debian11 +``` +The repository id is given in brackets. Then the name of the repository and the +version are defined. Next a tarball containing configuration files for CernVM-FS +is provided. The `config_map` maps entries of that tarball to locations inside +the file system of the container which is used when running the job. Finally, the +container to be used is given. + +The `repos.cfg` file may contain multiple definitions of repositories. + ### Section `[event_handler]` The section contains information needed by the event handler ``` @@ -454,6 +433,81 @@ scontrol_command = /usr/bin/scontrol ``` This is the full path to the Slurm command used for manipulating existing jobs. You may want to verify if `scontrol` is provided at that path or determine its actual location (`which scontrol`). +### Section `[submitted_job_comments]` +Sets templates for messages about newly submitted jobs. +``` +initial_comment = New job on instance `{app_name}` for architecture `{arch_name}` for repository `{repo_id}` in job dir `{symlink}` +``` +Is used to create a comment to a PR when a new job has been created. + +``` +awaits_release = job id `{job_id}` awaits release by job manager +``` +Is used to provide a status update of a job (shown as a row in the job's status +table). + +### Section `[new_job_comments]` +Sets templates for messages about jobs whose `hold` flag was released. +``` +awaits_launch = job awaits launch by Slurm scheduler +``` +Status update that is used when the `hold` flag of a job has been removed. + +### Section `[running_job_comments]` +Sets templates for messages about jobs that are running. +``` +running_job = job `{job_id}` is running +``` +Status update for a job that started running. + +### Section `[finished_job_comments]` +Sets templates for messages about finished jobs. +``` +success = :grin: SUCCESS tarball `{tarball_name}` ({tarball_size} GiB) in job dir +``` +Message for a successful job that produced a tarball. + +``` +failure = :cry: FAILURE +``` +Message for a failed job. + +``` +no_slurm_out = No slurm output `{slurm_out}` in job dir +``` +Message for missing Slurm output file. + +``` +slurm_out = Found slurm output `{slurm_out}` in job dir +``` +Message for found Slurm output file. + +``` +missing_modules = Slurm output lacks message "No missing modules!". +``` +Template concerning the lack of a message signaling that all modules were built. + +``` +no_tarball_message = Slurm output lacks message about created tarball. +``` +Template concerning the lack of a message about a created tarball. + +``` +no_matching_tarball = No tarball matching `{tarball_pattern}` found in job dir. +``` +Template about a missing tarball. + +``` +multiple_tarballs = Found {num_tarballs} tarballs in job dir - only 1 matching `{tarball_pattern}` expected. +``` +Template to report that multiple tarballs have been found. + +``` +job_result_unknown_fmt =
:shrug: UNKNOWN _(click triangle for details)_
+``` +Template to be used in case no result file (produced by `bot/check-build.sh` +provided by target repository) was found. + # Instructions to run the bot components The bot consists of three components, the Smee client, the event handler and the job manager. Running the Smee client was explained in [Step 1](#step1). @@ -474,7 +528,8 @@ If multiple instances on the `bot machine` are being executed, you may need to r ``` See [Step 1](#step1) for telling the Smee client on which port the event handler receives events. -The event handler writes log information to the file `pyghee.log`. +The event handler writes log information to the files `pyghee.log` and +`eessi_bot_event_handler.log`. Note, if you run the bot on a frontend of a cluster with multiple frontends make sure that both the Smee client and the event handler run on the same machine. @@ -489,7 +544,7 @@ The job manager is provided by the Python script `eessi_bot_job_manager_layer.py It will run in an infinite loop monitoring jobs and acting on their state changes. -If you want to control how the job manager works, you can add two parameters: +If you want to limit the execution of the job manager, you can add two parameters: |Option|Argument| |------|--------| |`-i` / `--max-manager-iterations`|Any number _z_: _z_ < 0 - run the main loop indefinitely, _z_ == 0 - don't run the main loop, _z_ > 0 - run the main loop _z_ times| @@ -508,70 +563,6 @@ The job manager can run on a different machine than the event handler as long as # Example pull request on software-layer -Now that the bot is running on your cluster, we want to provide a little demo about how to use it to add a new package to the software layer. We assume that you have forked [EESSI/software-layer](https://github.com/EESSI/software-layer) to `YOUR_GITHUB_ACCOUNT/software-layer` Following methods can be used to test the bot. -Method 1: - - open the link https://github.com/YOUR_GITHUB_ACCOUNT/software-layer/compare/main...EESSI:software-layer:add-CaDiCaL-9.3.0?expand=1 - - create the label bot:build if it's not there. - - Create the pull request. - - Don’t merge the Pull request. It is important to close the pull request or delete the bot:build label after testing it. It can be added again for the other test. -If the above method is followed then there will be no need to create another Github account for the test which is shown in the following Method 2. - -Method 2: -Forked `YOU_1/software-layer` to `YOU_2/software-layer`. - -Clone into the second fork and create a new branch: - -``` -git clone https://github.com/YOU_2/software-layer -cd software-layer -git branch add-CaDiCaL-9.3.0 -git checkout add-CaDiCaL-9.3.0 -``` - -Open `EESSI-pilot-install-software.sh` and add the section - -``` -export CaDiCaL_EC="CaDiCaL-1.3.0-GCC-9.3.0.eb" -echo ">> Installing ${CaDiCaL_EC}..." -ok_msg="${CaDiCaL_EC} installed, let's solve some problems!" -fail_msg="Installation of ${CaDiCaL_EC} failed, that's a pity..." -$EB ${CaDiCaL_EC} --robot -check_exit_code $? "${ok_msg}" "${fail_msg}" -``` - -just before the line -``` -echo ">> Creating/updating Lmod cache..." -``` - -Open `eessi-2021.12.yml` and append the section - -``` - CaDiCaL: - toolchains: - GCC-9.3.0: - versions: ['1.3.0'] -``` - -Commit the changes and push them to `YOU_1/software-layer`. Create the pull request by opening the link shown by `git push`. Make sure that you request to merge into `YOU_1/software-layer` - your bot receives events for this repository only (and while you experiment you may not wish to create too much noise on EESSI's software-layer repository). - -At first, the page for the pull request will look like normal pull request. The event handler will already have received an event, but it will wait until the label `bot:build` is set for the pull request. - -Add the label `bot:build`. Now, the event handler will submit jobs - one for each target architecture. For each submitted job it will add a comment such as - -IMAGE-SCREENSHOT - -The jobs are submitted with the parameter `--hold`. They will not start immediately, but rather are required to be released explicitly by the job manager. This can be very useful to control the processing of jobs, for example, when developing the EESSI bot components. If you want to control the execution, the job manager shall not run in an endless loop. - -Next the job manager notes the submitted job(s), releases them and updates the comments corresponding to the released jobs. An example update could look like this - -IMAGE-SCREENSHOT - -When the job has finished, the job manager analyses the result of job (checking if no missing modules were found and if a tarball was generated) and updates the job's comment in the PR. An example update could look like (in case of success) - -IMAGE-SCREENSHOT - -or in case of failure - -IMAGE-SCREENSHOT +For information on how to make pull requests and let the bot build software, see +[build-test-deploy bot](https://www.eessi.io/docs/bot/). From af76f3eb4d36e3e73ae429149d628c20e8749d1e Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Wed, 27 Sep 2023 21:35:04 +0200 Subject: [PATCH 2/3] minor tweaks to README --- README.md | 384 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 233 insertions(+), 151 deletions(-) diff --git a/README.md b/README.md index 5b0f0462..e3609ac1 100644 --- a/README.md +++ b/README.md @@ -10,18 +10,17 @@ deployment. # Instructions to set up the EESSI bot components -The following sections describe and illustrate the steps necessary -to set up the EESSI bot for the software layer. The bot consists of -two main components provided in this repository: +The following sections describe and illustrate the steps necessary to set up the EESSI bot. +The bot consists of two main components provided in this repository: -- An event handler `eessi_bot_event_handler.py` which receives events from a GitHub repository and acts on them. -- A job manager `eessi_bot_job_manager.py` which monitors a Slurm job queue and acts on state changes of jobs submitted by the event handler. +- An event handler [`eessi_bot_event_handler.py`](eessi_bot_event_handler.py) which receives events from a GitHub repository and acts on them. +- A job manager [`eessi_bot_job_manager.py`](eessi_bot_job_manager.py) which monitors the Slurm job queue and acts on state changes of jobs submitted by the event handler. ## Prerequisites - GitHub account(s) (two needed for a development scenario), referring to them as `YOU_1` and `YOU_2` below -- A fork, say `YOU_1/software-layer`, of [EESSI/software-layer](https://github.com/EESSI/software-layer) and a fork, say `YOU_2/software-layer` of your first fork if you want to emulate the bot's behaviour but not change EESSI's repository. The EESSI bot will act on events triggered for the first fork (`YOU_1/software-layer`). -- Access to a frontend/login node/service node of a Slurm cluster where the EESSI bot components shall run. For the sake of brevity, we call this node simply `bot machine`. +- A fork, say `YOU_1/software-layer`, of [EESSI/software-layer](https://github.com/EESSI/software-layer) and a fork, say `YOU_2/software-layer` of your first fork if you want to emulate the bot's behaviour but not change EESSI's repository. The EESSI bot will act on events triggered for the target repository (in this context, either `EESSI/software-layer` or `YOU_1/software-layer`). +- Access to a frontend/login node/service node of a Slurm cluster where the EESSI bot components will run. For the sake of brevity, we call this node simply `bot machine`. - `singularity` with version 3.6 or newer _OR_ `apptainer` with version 1.0 or newer on the compute nodes of the Slurm cluster. - The EESSI bot components and the (build) jobs will frequently access the Internet. Hence, worker nodes and the `bot machine` of the Slurm cluster need @@ -29,10 +28,10 @@ access to the Internet (either directly or via an HTTP proxy). ## Step 1: Smee.io channel and smee client -We use smee.io as a service to relay events from GitHub to the EESSI bot. To do so, create a new channel on the page https://smee.io and note the URL, e.g., https://smee.io/CHANNEL-ID +We use [smee.io](https://smee.io) as a service to relay events from GitHub to the EESSI bot. To do so, create a new channel via https://smee.io and note the URL, e.g., `https://smee.io/CHANNEL-ID`. On the `bot machine` we need a tool which receives events relayed from -https://smee.io/CHANNEL-ID and forwards it to the EESSI bot. We use the Smee +`https://smee.io/CHANNEL-ID` and forwards it to the EESSI bot. We use the Smee client for this. The Smee client can be run via a container as follows ``` @@ -44,42 +43,52 @@ or ``` singularity pull docker://deltaprojects/smee-client -singularity run smee-client_latest.sif --url https://smee.io/CHANNEL-ID --port 3030 +singularity run smee-client_latest.sif --port 3030 --url https://smee.io/CHANNEL-ID ``` for specifying a different port than the default (3000). ## Step 2: Registering GitHub App -We need to register a GitHub App, link it to the Smee.io channel, set a secret token to verify the webhook sender, set some permissions for the app, subscribe it to selected events and define that this app should only be installed in your account. +We need to: +* register a GitHub App; +* link it to the `smee.io` channel; +* set a secret token to verify the webhook sender; +* set some permissions for the GitHub app; +* subscribe the GitHub app to selected events; +* define that this GitHub app should only be installed in your GitHub account (or organisation). -At the [app settings page](https://github.com/settings/apps) click "New GitHub App" and fill in the page, particular the following fields +At the [app settings page](https://github.com/settings/apps) click "`New GitHub App`" and fill in the page, in particular the following fields: - GitHub App name: give the app a name of you choice -- Homepage URL: use the Smee.io channel (https://smee.io/CHANNEL-ID) created in [Step 1](#step1) -- Webhook URL: use the Smee.io channel (https://smee.io/CHANNEL-ID) created in [Step 1](#step1) -- Webhook secret: create a secret token which is used to verify the webhook sender. For example: +- Homepage URL: use the Smee.io channel (`https://smee.io/CHANNEL-ID`) created in [Step 1](#step1) +- Webhook URL: use the Smee.io channel (`https://smee.io/CHANNEL-ID`) created in [Step 1](#step1) +- Webhook secret: create a secret token which is used to verify the webhook sender, for example using: ```shell python3 -c 'import secrets; print(secrets.token_hex(64))' ``` -- Permissions: assign permissions to the app it needs (e.g., read access to commits, issues, pull requests); - - Make sure to assign read and write access to the Pull request in Repository permissions section; These permisions can be changed later on; - - Make sure to accept the new permissions from the install app section. Select Install App option from the menu on the left hand side. - - Then select the wheel right next to your installed app or use the link https://github.com/settings/installations/INSTALLATION_ID - - Once the page open you'll be able to accept the new permissions there. +- Permissions: assign the required permissions to the app (e.g., read access to commits, issues, pull requests); + - Make sure to assign read and write access to the Pull request in "Repository permissions" section; these permisions can be changed later on; + - Make sure to accept the new permissions from the "Install App" section that you can reach via the menu on the left hand side. + - Then select the wheel right next to your installed app, or use the link `https://github.com/settings/installations/INSTALLATION_ID` + - Once the page is open you will be able to accept the new permissions there. - Some permissions (e.g., metadata) will be selected automatically because of others you have chosen. - Events: subscribe the app to events it shall react on (e.g., related to pull requests) -- Select that the app can only be installed by this (your) GitHub account +- Select that the app can only be installed by this (your) GitHub account or organisation. -Click on "Create GitHub App" +Click on "`Create GitHub App`" to complete this step. ## Step 3: Installing GitHub App -_Note, this will trigger the first event (`installation`). While the EESSI bot is not running yet, you can inspect this via the webpage for your Smee channel. Just open https://smee.io/CHANNEL-ID in a browser and browse through the information included in the event. Naturally, some of the information will be different for other types of events._ +_Note, this will trigger the first event (`installation`). While the EESSI bot is not running yet, you can inspect this via the webpage for your Smee channel. Just open `https://smee.io/CHANNEL-ID` in a browser, and browse through the information included in the event. Naturally, some of the information will be different for other types of events._ -You need to install the GitHub App -- essentially telling GitHub to link the app to an account and one, several or all repositories on whose events the app then should act upon. +You also need to *install* the GitHub App -- essentially telling GitHub to link the app to an account and one, several, or all repositories on whose events the app then should act upon. -Go to the page https://github.com/settings/apps and select the app you want to install by clicking on the icon left to the app's name or on the "Edit" button right to the name of the app. On the next page you should see the menu item "Install App" on the left-hand side. When you click on this you should see a page with a list of accounts you can install the app on. Choose one and click on the "Install" button next to it. This leads to a page where you can select the repositories on whose the app should react to. Here, for the sake of simplicity, choose just `YOU_1/software-layer` as described in [Prerequisites](#prerequisites). Select one, multiple or all and click on the "Install" button. +Go to https://github.com/settings/apps and select the app you want to install by clicking on the icon left to the app's name or on the "`Edit`" button right next to the name of the app. + +On the next page you should see the menu item "`Install App`" on the left-hand side. When you click on this you should see a page with a list of accounts and organisations you can install the app on. Choose one and click on the "`Install`" button next to it. + +This leads to a page where you can select the repositories on whose the app should react to. Here, for the sake of simplicity, choose just `YOU_1/software-layer` as described in the [prerequisites](#prerequisites). Select one, multiple, or all and click on the "`Install`" button. ## Step 4: Installing the EESSI bot on a `bot machine` @@ -90,7 +99,7 @@ Get the EESSI bot _installed_ onto the `bot machine` by running something like ``` git clone https://github.com/EESSI/eessi-bot-software-layer.git ``` -Determine full path to bot directory +Determine the full path to bot directory: ``` cd eessi-bot-software-layer pwd @@ -99,44 +108,62 @@ Note the output of `pwd`. This will be used to replace `PATH_TO_EESSI_BOT` in th configuration file `app.cfg` (see [Step 5.4](#step5.4)). In the remainder of this page we will refer to this directory as `PATH_TO_EESSI_BOT`. -If you want to develop the EESSI bot, it is recommended that you fork the repository and use the fork on the `bot machine`. +If you want to develop the EESSI bot, it is recommended that you fork the [EESSI/eessi-bot-software-layer](https://github.com/EESSI/eessi-bot-software-layer) repository and use the fork on the `bot machine`. -If you want to work with a specific pull request, say number 24, you obtain its contents with the following commands: +If you want to work with a specific pull request for the bot, say number 42, you can obtain the corresponding code with the following commands: ``` git clone https://github.com/EESSI/eessi-bot-software-layer.git cd eessi-bot-software-layer pwd -git fetch origin pull/24/head:PR24 -git checkout PR24 +git fetch origin pull/42/head:PR42 +git checkout PR42 ``` -The EESSI bot requires some Python packages to be installed. It is recommended to install these in a virtual environment based on Python 3.7 or newer. See the below sequence for an example on how to set up the environment, to activate it and to install the requirements for the EESSI bot. The sequence assumes that you are in the directory containing the bot's script: +The EESSI bot requires some Python packages to be installed, which are specified in the [`requirements.txt`](https://github.com/EESSI/eessi-bot-software-layer/tree/main/requirements.txt) file. It is recommended to install these in a virtual environment based on Python 3.7 or newer. See the commands below for an example on how to set up the virtual environment, activate it, and install the requirements for the EESSI bot. These commands assume that you are in the `eessi-bot-software-layer` directory: ``` +# assumption here is that you start from *within* the eessi-bot-software-layer directory cd .. -python3.7 -m venv venv_bot_p37 -source venv_bot_p37/bin/activate -python --version # output should match 'Python 3.7.*$' -which python # output should match '*/venv_bot_p37/bin/python$' +python3.7 -m venv venv_eessi_bot_p37 +source venv_eessi_bot_p37/bin/activate +python --version # output should match 'Python 3.7.*' +which python # output should match '*/venv_eessi_bot_p37/bin/python' python -m pip install --upgrade pip cd eessi-bot-software-layer pip install -r requirements.txt ``` -Note, before you can start the bot components (see below), you have to activate the virtual environment with `source venv_bot_p37/bin/activate`. You can deactivate it simply by running `deactivate`. +Note, before you can start the bot components (see below), you have to activate the virtual environment with `source venv_eessi_bot_p37/bin/activate`. + +You can exit the virtual environment simply by running `deactivate`. ### Step 4.1: Installing tools to access S3 bucket -The script `scripts/eessi-upload-to-staging` uploads a tarball and an associated metadata file to an S3 bucket. It needs two tools for this, `aws` to actually upload the files and `jq` to create the metadata file. This section describes how these tools are installed and configured on the `bot machine`. +The [`scripts/eessi-upload-to-staging`](https://github.com/EESSI/eessi-bot-software-layer/blob/main/scripts/eessi-upload-to-staging) script uploads a tarball and an associated metadata file to an S3 bucket. + +It needs two tools for this: +* the `aws` command to actually upload the files; +* the `jq` command to create the metadata file. + +This section describes how these tools are installed and configured on the `bot machine`. + +#### Create a home for the `aws` and `jq` commands + +Create a new directory, say `PATH_TO_EESSI_BOT/tools` and change into it. -Create a new directory, say `PATH_TO_EESSI_BOT/tools` and change into the directory. +``` +mkdir PATH_TO_EESSI_BOT/tools +cd PATH_TO_EESSI_BOT/tools +``` -For installing the AWS Command Line Interface, including the tool `aws`, -please follow the instructions at the -[AWS Command Line Interface guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) +#### Install `aws` command -Add the directory that contains `aws` to the `PATH` environment variable. -Make sure that the `PATH` is set correctly for newly spawned shells, e.g., -it should be exported in files such as `$HOME/.bash_profile`. +For installing the AWS Command Line Interface, which provides the `aws` command, +follow the instructions at the +[AWS Command Line Interface guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html). + +Add the directory that contains `aws` to the `$PATH` environment variable. +Make sure that `$PATH` is set correctly for newly spawned shells, e.g., +it should be exported in startup file such as `$HOME/.bash_profile`. Verify that `aws` executes by running `aws --version`. Then, run `aws configure` to set credentials for accessing the S3 bucket. @@ -145,11 +172,14 @@ for detailed setup instructions. If you are using a non AWS S3 bucket you will likely only have to provide the `Access Key ID` and the `Secret Access Key`. +#### Install `jq` command + Next, install the tool `jq` into the same directory into which -`aws` was installed in. First, run `cd $(dirname $(which aws))`. -Then, download `jq` from `https://github.com/stedolan/jq/releases` -by running, for example, +`aws` was installed in (for example `PATH_TO_EESSI_BOT/tools`). +Download `jq` from `https://github.com/stedolan/jq/releases` +into that directory by running, for example, ``` +cd PATH_TO_EESSI_BOT/tools curl https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64 -o jq-linux64 ``` You may check if there are newer releases and choose a different @@ -164,39 +194,60 @@ Finally, create a symbolic link for `jq` by running ln -s jq-linux64 jq ``` +Check that the `jq` command works by running `jq --version`. + ## Step 5: Configuring the EESSI bot on the `bot machine` -For the event handler, you need to set up two environment variables: `GITHUB_TOKEN` ([Step 5.1](#step5.1)) and `GITHUB_APP_SECRET_TOKEN` ([Step 5.2](#step5.2)). For both the event handler and the job manager you need a private key ([Step 5.3](#step5.3)). +For the event handler, you need to set up two environment variables: +* `$GITHUB_TOKEN` (see [Step 5.1](#step5.1)) +* `$GITHUB_APP_SECRET_TOKEN` (see [Step 5.2](#step5.2)). + +For both the event handler and the job manager you need a private key (see [Step 5.3](#step5.3)). ### Step 5.1: GitHub Personal Access Token (PAT) -Create a Personal Access Token (PAT) for your GitHub account via the page https://github.com/settings/tokens where you find a button "Generate new token". -Give it meaningful name (field titled "Note") and set the expiration date. Then select the scopes this PAT will be used for. Then click "Generate token". On the result page, take note/copy the resulting token string -- it will only be shown once. -On the `bot machine` set the environment variable `GITHUB_TOKEN`, e.g. +Create a Personal Access Token (PAT) for your GitHub account via the page https://github.com/settings/tokens where you find a button "`Generate new token`". + +Give it meaningful name (field titled "`Note`"), and set the expiration date. Then select the scopes this PAT will be used for. Then click "`Generate token`". + +On the result page, take note/copy the resulting token string -- it will only be shown once. + +On the `bot machine` set the environment variable `$GITHUB_TOKEN`: ``` -export GITHUB_TOKEN='THE_TOKEN_STRING' +export $GITHUB_TOKEN='THE_TOKEN_STRING' ``` +in which you replace `THE_TOKEN_STRING` with the actual token. + ### Step 5.2: GitHub App Secret Token + The GitHub App Secret Token is used to verify the webhook sender. You should have created one already when registering a new GitHub App in [Step 2](#step2). -On the `bot machine` set the environment variable `GITHUB_APP_SECTRET_TOKEN`, e.g. +On the `bot machine` set the environment variable `$GITHUB_APP_SECTRET_TOKEN`: ``` export GITHUB_APP_SECRET_TOKEN='THE_SECRET_TOKEN_STRING' ``` -Note, depending on the characters used in the string you will likely have to use single quotes when setting the value of the environment variable. + +in which you replace `THE_SECRET_TOKEN_STRING` with the actual token. + +Note that depending on the characters used in the string you will likely have to use *single quotes* (`'...'`) when setting the value of the environment variable. ### Step 5.3: Create a private key and store it on the `bot machine` + The private key is needed to let the app authenticate when updating information at the repository such as commenting on PRs, adding labels, etc. You can create the key at the page of the GitHub App you have registered in [Step 2](#step2). -Open the page https://github.com/settings/apps and then click on the icon left to the name of the GitHub App for the EESSI bot or the "Edit" button for the app. Near the end of the page you will find a section "Private keys" where you can create a private key by clicking on the button "Generate a private key". The private key should be automatically downloaded to your local computer. Copy it to the `bot machine` and note the full path to it (`PATH_TO_PRIVATE_KEY`). +Open the page https://github.com/settings/apps and then click on the icon left to the name of the GitHub App for the EESSI bot or the "`Edit`" button for the app. + +Near the end of the page you will find a section "`Private keys`" where you can create a private key by clicking on the button "`Generate a private key`". + +The private key should be automatically downloaded to your system. Copy it to the `bot machine` and note the full path to it (`PATH_TO_PRIVATE_KEY`). For example: the private key is on your LOCAL computer. To transfer it to the -`bot machine` run +`bot machine` use the `scp` command for example: ``` scp PATH_TO_PRIVATE_KEY_FILE_LOCAL_COMPUTER REMOTE_USERNAME@TARGET_HOST:TARGET/PATH ``` -the `TARGET/PATH` of the bot machine should be noted for `PATH_TO_PRIVATE_KEY`. +The location to where the private key is copied on the bot machine (`TARGET/PATH`) should be noted for `PATH_TO_PRIVATE_KEY`. ### Step 5.4: Create the configuration file `app.cfg` @@ -208,46 +259,57 @@ cp -i app.cfg.example app.cfg The example file (`app.cfg.example`) includes notes on what you have to adjust to run the bot in your environment. -### Section `[github]` + +#### `[github]` section + The section `[github]` contains information for connecting to GitHub: ``` app_id = 123456 ``` -Replace '123456' with the id of your GitHub App. You find the id of your GitHub App via the page [GitHub Apps](https://github.com/settings/apps). On this page, select the app you have registered in [Step 2](#step2). On the opened page you will find the `app_id` in the section headed "About" listed as 'App ID'. +Replace '`123456`' with the id of your GitHub App. You can find the id of your GitHub App via the page [GitHub Apps](https://github.com/settings/apps). On this page, select the app you have registered in [Step 2](#step2). On the opened page you will find the `app_id` in the section headed "`About`" listed as "`App ID`". ``` app_name = 'MY-bot' ``` -Is a short name representing your bot. It will appear in comments to a pull request. For example, it could include the name of the cluster where the bot runs and a label representing the user that runs the bot: `CitC-TR`. *NOTE avoid putting an actual username here as it will be visible on potentially publicly accessible GitHub pages.* +The `app_name` specified a short name for your bot. It will appear in comments to a pull request. For example, it could include the name of the cluster where the bot runs and a label representing the user that runs the bot, like `hal9000-bot`. + +*Note: avoid putting an actual username here as it will be visible on potentially publicly accessible GitHub pages.* + ``` installation_id = 12345678 ``` -Replace '12345678' with the id of the installation of your GitHub App (installed in [Step 3](#step3)). You find the id of your GitHub App via the page [GitHub Apps](https://github.com/settings/apps). On this page, select the app you have registered in [Step 2](#step2). For determining the `installation_id` select "Install App" in the menu on the left-hand side. Then click on the gearwheel button of the installation (to the right of the "Installed" label). The URL of the resulting page contains the `installation_id` -- the number after the last "/". The `installation_id` is also provided in the payload of every event within the top-level record named "installation". You can see the events and their payload on the webpage of your Smee.io channel (https://smee.io/CHANNEL-ID). Alternatively, you can see the events in the "Advanced" section of your GitHub App: Open the page [GitHub Apps](https://github.com/settings/apps), then select the app you have registered in [Step 2](#step2), and choose "Advanced" in the menu on the left-hand side. +Replace '`12345678`' with the id of the *installation* of your GitHub App (see [Step 3](#step3)). + +You find the installation id of your GitHub App via the page [GitHub Apps](https://github.com/settings/apps). On this page, select the app you have registered in [Step 2](#step2). For determining the `installation_id` select "`Install App`" in the menu on the left-hand side. Then click on the gearwheel button of the installation (to the right of the "`Installed`" label). The URL of the resulting page contains the `installation_id` -- the number after the last "/". + +The `installation_id` is also provided in the payload of every event within the top-level record named "`installation`". You can see the events and their payload on the webpage of your Smee.io channel (`https://smee.io/CHANNEL-ID`). Alternatively, you can see the events in the "`Advanced`" section of your GitHub App: open the [GitHub Apps](https://github.com/settings/apps) page, select the app you have registered in [Step 2](#step2), and choose "`Advanced`" in the menu on the left-hand side. ``` private_key = PATH_TO_PRIVATE_KEY ``` Replace `PATH_TO_PRIVATE_KEY` with the path you have noted in [Step 5.3](#step5.3). -### Section `[buildenv]` -The section `[buildenv]` contains information about the build environment. + +#### `[buildenv]` section + +The `[buildenv]` section contains information about the build environment. ``` build_job_script = PATH_TO_EESSI_BOT/scripts/bot-build.slurm ``` -This points to the job script which will be submitted by the event handler. +`build_job_script` points to the job script which will be submitted by the bot event handler. ``` container_cachedir = PATH_TO_SHARED_DIRECTORY ``` -The `container_cachedir` may be used to reuse downloaded container image files -across jobs. Thus, jobs can more quickly launch containers. +`container_cachedir` may be used to reuse downloaded container image files across jobs, so jobs can launch containers more quickly. ``` cvmfs_customizations = { "/etc/cvmfs/default.local": "CVMFS_HTTP_PROXY=\"http://PROXY_DNS_NAME:3128|http://PROXY_IP_ADDRESS:3128\"" } ``` -It may happen that we need to customize the CVMFS configuration for the build -job. The value of cvmfs_customizations is a dictionary which maps a file name +It may happen that we need to customize the [CernVM-FS](https://cernvm.cern.ch/fs/) configuration for the build +job. The value of `cvmfs_customizations` is a dictionary which maps a file name to an entry that needs to be appended to that file. In the example line above, the configuration of `CVMFS_HTTP_PROXY` is appended to the file `/etc/cvmfs/default.local`. -The CVMFS configuration could be commented out unless there is a need to customize the CVMFS configuration. +The CernVM-FS configuration can commented out, unless there is a need to customize the CernVM-FS configuration. + ``` http_proxy = http://PROXY_DNS:3128/ https_proxy = http://PROXY_DNS:3128/ @@ -255,52 +317,55 @@ https_proxy = http://PROXY_DNS:3128/ If compute nodes have no direct internet connection, we need to set `http(s)_proxy` or commands such as `pip3` and `eb` (EasyBuild) cannot download software from package repositories. Typically these settings are set in the prologue of a -Slurm job. However, when entering the Gentoo Prefix, most environment settings -are cleared. Hence, they need to be set again at a late stage (done in the -script `EESSI-pilot-install-software.sh`). +Slurm job. However, when entering the [EESSI compatibility layer](https://www.eessi.io/docs/compatibility_layer), +most environment settings are cleared. Hence, they need to be set again at a later stage. + ``` -jobs_base_dir = $HOME/jobs +jobs_base_dir = PATH_TO_JOBS_BASE_DIR ``` -Replace `$HOME/jobs` with absolute filepath `/home/USER/jobs`. Per job the directory structure under `jobs_base_dir` is `YYYY.MM/pr_PR_NUMBER/event_EVENT_ID/run_RUN_NUMBER/OS+SUBDIR`. The base directory will contain symlinks using the job ids pointing to the job's working directory `YYYY.MM/...`. +Replace `PATH_TO_JOBS_BASE_DIR` with an absolute filepath like `/home/YOUR_USER_NAME/jobs` (or another path of your choice). Per job the directory structure under `jobs_base_dir` is `YYYY.MM/pr_PR_NUMBER/event_EVENT_ID/run_RUN_NUMBER/OS+SUBDIR`. The base directory will contain symlinks using the job ids pointing to the job's working directory `YYYY.MM/...`. + ``` load_modules = MODULE1/VERSION1,MODULE2/VERSION2,... ``` -This setting provides a means to load modules in the `build_job_script`. +`load_modules` provides a means to load modules in the `build_job_script`. None to several modules can be provided in a comma-separated list. It is -read by the bot and handed over to `build_job_script` via the parameter -`--load-modules`. +read by the bot and handed over to `build_job_script` via the `--load-modules` option. + ``` local_tmp = /tmp/$USER/EESSI ``` -This is the path to a temporary directory on the node building the stack, i.e., +`local_tmp` specifies the path to a temporary directory on the node building the software, i.e., on a compute/worker node. You may have to change this if temporary storage under `/tmp` does not exist or is too small. This setting will be used for the -environment variable `EESSI_TMPDIR`. The value is expanded only inside a running -job. Thus, typical job environment variables may be used to isolate jobs running +environment variable `$EESSI_TMPDIR`. The value is expanded only inside a running +job. Thus, typical job environment variables (like `$USER` or `$SLURM_JOB_ID`) may be used to isolate jobs running simultaneously on the same compute node. ``` slurm_params = "--hold" ``` -This defines additional parameters for submitting batch jobs. `"--hold"` should be kept or the bot might not work as intended (the release step would be circumvented). Additional parameters, for example, to specify an account, a partition or any other parameters supported by `sbatch`, may be added to customize the submission to your environment. + +`slurm_params` defines additional parameters for submitting batch jobs. `"--hold"` should be kept or the bot might not work as intended (the release step done by the job manager component of the bot would be circumvented). Additional parameters, for example, to specify an account, a partition, or any other parameters supported by the [`sbatch` command](https://slurm.schedmd.com/sbatch.html), may be added to customize the job submission. ``` submit_command = /usr/bin/sbatch ``` -This is the full path to the Slurm command used for submitting batch jobs. You may want to verify if `sbatch` is provided at that path or determine its actual location (`which sbatch`). +`submit_command` is the full path to the Slurm job submission command used for submitting batch jobs. You may want to verify if `sbatch` is provided at that path or determine its actual location (using `which sbatch`). + +#### `[bot_control]` section -### Section `[bot_control]` -The section `[bot_control]` contains settings for configuring the feature to +The `[bot_control]` section contains settings for configuring the feature to send commands to the bot. ``` command_permission = GH_ACCOUNT_1 GH_ACCOUNT_2 ... ``` -The option `command_permission` defines which GitHub accounts can send commands -to the bot (via new PR comments). If the value is empty NO account can send +The `command_permission` setting defines which GitHub accounts can send commands +to the bot (via new PR comments). If the value is empty *no* GitHub account can send commands. ``` command_response_fmt = FORMAT_MARKDOWN_AND_HTML ``` -This allows to customize the format of the comments about the handling of bot +`command_response_fmt` allows to customize the format of the comments about the handling of bot commands. The format needs to include `{app_name}`, `{comment_response}` and `{comment_result}`. `{app_name}` is replaced with the name of the bot instance. `{comment_response}` is replaced with information about parsing the comment @@ -308,29 +373,32 @@ for commands before any command is run. `{comment_result}` is replaced with information about the result of the command that was run (can be empty). -### Section `[deploycfg]` -The section `[deploycfg]` defines settings for uploading built artefacts (tarballs). +#### `[deploycfg]` section + +The `[deploycfg]` section defines settings for uploading built artefacts (tarballs). ``` tarball_upload_script = PATH_TO_EESSI_BOT/scripts/eessi-upload-to-staging ``` -Provides the location for the script used for uploading built software packages to an S3 bucket. +`tarball_upload_script` provides the location for the script used for uploading built software packages to an S3 bucket. ``` endpoint_url = URL_TO_S3_SERVER ``` -Provides an endpoint (URL) to a server hosting an S3 bucket. The server could be hosted by a public Cloud provider or running in a private environment, for example, using Minio. The bot uploads tarballs to the bucket which will be periodically scanned by the ingestion procedure at the Stratum 0 server. +`endpoint_url` provides an endpoint (URL) to a server hosting an S3 bucket. The server could be hosted by a commercial cloud provider like AWS or Azure, or running in a private environment, for example, using Minio. The bot uploads tarballs to the bucket which will be periodically scanned by the ingestion procedure at the Stratum 0 server. ``` bucket_name = eessi-staging ``` -Name of the bucket used for uploading of tarballs. The bucket must be available on the default server (`https://${bucket_name}.s3.amazonaws.com`) or the one provided via `endpoint_url`. +`bucket_name` is the name of the bucket used for uploading of tarballs. The bucket must be available on the default server (`https://${bucket_name}.s3.amazonaws.com`), or the one provided via `endpoint_url`. ``` upload_policy = once ``` + The `upload_policy` defines what policy is used for uploading built artefacts to an S3 bucket. + +|`upload_policy` value|Policy| |:--------|:--------------------------------| -|Value|Policy| |`all`|Upload all artefacts (mulitple uploads of the same artefact possible).| |`latest`|For each build target (prefix in tarball name `eessi-VERSION-{software,init,compat}-OS-ARCH)` only upload the latest built artefact.| |`once`|Only once upload any built artefact for the build target.| @@ -339,9 +407,9 @@ The `upload_policy` defines what policy is used for uploading built artefacts to ``` deploy_permission = GH_ACCOUNT_1 GH_ACCOUNT_2 ... ``` -The option `deploy_permission` defines which GitHub accounts can trigger the -deployment procedure. The value can be empty (any GH account can trigger the -deployment) or a space delimited list of GH accounts. +The `deploy_permission` setting defines which GitHub accounts can trigger the +deployment procedure. The value can be empty (*no* GitHub account can trigger the +deployment), or a space delimited list of GitHub accounts. ``` no_deploy_permission_comment = Label `bot:deploy` has been set by user `{deploy_labeler}`, but this person does not have permission to trigger deployments @@ -350,24 +418,26 @@ This defines a message that is added to the status table in a PR comment corresponding to a job whose tarball should have been uploaded (e.g., after setting the `bot:deploy` label). -### Section `[architecturetargets]` -The section `[architecturetargets]` defines for which targets (OS/SUBDIR), e.g., `linux/amd/zen2` the EESSI bot should submit jobs and what additional `sbatch` parameters will be used for requesting a compute node with the CPU microarchitecture needed to build the software stack. +#### `[architecturetargets]` section + +The section `[architecturetargets]` defines for which targets (OS/SUBDIR), (for example `linux/x86_64/amd/zen2`) the EESSI bot should submit jobs, and which additional `sbatch` parameters will be used for requesting a compute node with the CPU microarchitecture needed to build the software stack. ``` arch_target_map = { "linux/x86_64/generic" : "--constraint shape=c4.2xlarge", "linux/x86_64/amd/zen2" : "--constraint shape=c5a.2xlarge" } ``` -The map has one to many entries of the format `OS/SUBDIR : +The map has one-to-many entries of the format `OS/SUBDIR : ADDITIONAL_SBATCH_PARAMETERS`. For your cluster, you will have to figure out which microarchitectures (`SUBDIR`) are available (as `OS` only `linux` is currently supported) and how to instruct Slurm to allocate nodes with that architecture to a job (`ADDITIONAL_SBATCH_PARAMETERS`). -Note, if you do not have to specify additional parameters to `sbatch` to request a compute node with a specific microarchitecture, you can just write something like +Note, if you do not have to specify additional parameters to `sbatch` to request a compute node with a specific microarchitecture, you can just write something like: ``` arch_target_map = { "linux/x86_64/generic" : "" } ``` -### Section `[repo_targets]` -This section defines for what repositories and architectures the bot can run job. +#### `[repo_targets]` section + +The `[repo_targets]` section defines for which repositories and architectures the bot can run job. Repositories are referenced by IDs (or `repo_id`). Architectures are identified by `OS/SUBDIR` which correspond to settings in the `arch_target_map`. @@ -380,12 +450,12 @@ For each `OS/SUBDIR` combination a list of available repository IDs can be provided. The repository IDs are defined in a separate file, say `repos.cfg` which is -stored in the directory defined via +stored in the directory defined via `repos_cfg_dir`: ``` repos_cfg_dir = PATH_TO_SHARED_DIRECTORY/cfg_bundles ``` The `repos.cfg` file also uses the `ini` format as follows -``` +```ini [eessi-2023.06] repo_name = pilot.eessi-hpc.org repo_version = 2023.06 @@ -393,132 +463,145 @@ config_bundle = eessi-hpc.org-cfg_files.tgz config_map = { "eessi-hpc.org/cvmfs-config.eessi-hpc.org.pub":"/etc/cvmfs/keys/eessi-hpc.org/cvmfs-config.eessi-hpc.org.pub", "eessi-hpc.org/ci.eessi-hpc.org.pub":"/etc/cvmfs/keys/eessi-hpc.org/ci.eessi-hpc.org.pub", "eessi-hpc.org/pilot.eessi-hpc.org.pub":"/etc/cvmfs/keys/eessi-hpc.org/pilot.eessi-hpc.org.pub", "default.local":"/etc/cvmfs/default.local", "eessi-hpc.org.conf":"/etc/cvmfs/domain.d/eessi-hpc.org.conf"} container = docker://ghcr.io/eessi/build-node:debian11 ``` -The repository id is given in brackets. Then the name of the repository and the -version are defined. Next a tarball containing configuration files for CernVM-FS -is provided. The `config_map` maps entries of that tarball to locations inside +The repository id is given in brackets (`[eessi-2023.06]`). Then the name of the repository (`repo_name`) and the +version (`repo_version`) are defined. Next, a tarball containing configuration files for CernVM-FS +is specified (`config_bundle`). The `config_map` setting maps entries of that tarball to locations inside the file system of the container which is used when running the job. Finally, the -container to be used is given. +container to be used is given (`container`). The `repos.cfg` file may contain multiple definitions of repositories. -### Section `[event_handler]` -The section contains information needed by the event handler +#### `[event_handler]` section + +The `[event_handler]` section contains information required by the bot event handler component. ``` log_path = /path/to/eessi_bot_event_handler.log ``` -Path to the event handler log. +`log_path` specifies the path to the event handler log. -### Section `[job_manager]` -The section `[job_manager]` contains information needed by the job manager. +#### `[job_manager]` section + +The `[job_manager]` section contains information needed by the job manager. ``` log_path = /path/to/eessi_bot_job_manager.log ``` -Path to the log file to log messages for job manager +`log_path` specifies the path to the job manager log. ``` job_ids_dir = /home/USER/jobs/ids ``` -Path to where the job manager stores information about jobs to be tracked. Under this directory it will store information about submitted/running jobs under `submitted` and about finished jobs under `finished`. +`job_ids_dir` specifies where the job manager should store information about jobs being tracked. Under this directory it will store information about submitted/running jobs under a subdirectory named '`submitted`', and about finished jobs under a subdirectory named '`finished`'. ``` poll_command = /usr/bin/squeue ``` -This is the full path to the Slurm command used for checking which jobs exist. You may want to verify if `squeue` is provided at that path or determine its actual location (`which squeue`). +`poll_command` is the full path to the Slurm command that can be used for checking which jobs exist. You may want to verify if `squeue` is provided at that path or determine its actual location (via `which squeue`). ``` poll_interval = 60 ``` -This defines how often the job manager checks the status of the jobs. The unit of the value is seconds. +`poll_interval` defines how often the job manager checks the status of the jobs. The unit of the value is seconds. ``` scontrol_command = /usr/bin/scontrol ``` -This is the full path to the Slurm command used for manipulating existing jobs. You may want to verify if `scontrol` is provided at that path or determine its actual location (`which scontrol`). +`scontrol_command` is the full path to the Slurm command used for manipulating existing jobs. You may want to verify if `scontrol` is provided at that path or determine its actual location (via `which scontrol`). + +#### `[submitted_job_comments]` section -### Section `[submitted_job_comments]` -Sets templates for messages about newly submitted jobs. +The `[submitted_job_comments]` section specifies templates for messages about newly submitted jobs. ``` initial_comment = New job on instance `{app_name}` for architecture `{arch_name}` for repository `{repo_id}` in job dir `{symlink}` ``` -Is used to create a comment to a PR when a new job has been created. +`initial_comment` is used to create a comment to a PR when a new job has been created. ``` awaits_release = job id `{job_id}` awaits release by job manager ``` -Is used to provide a status update of a job (shown as a row in the job's status +`awaits_release` is used to provide a status update of a job (shown as a row in the job's status table). -### Section `[new_job_comments]` -Sets templates for messages about jobs whose `hold` flag was released. +#### `[new_job_comments]` section + +The `[new_job_comments]` section sets templates for messages about jobs whose `hold` flag was released. ``` awaits_launch = job awaits launch by Slurm scheduler ``` -Status update that is used when the `hold` flag of a job has been removed. +`awaits_launch` specifies the status update that is used when the `hold` flag of a job has been removed. -### Section `[running_job_comments]` -Sets templates for messages about jobs that are running. +#### `[running_job_comments]` section + +The `[running_job_comments]` section sets templates for messages about jobs that are running. ``` running_job = job `{job_id}` is running ``` -Status update for a job that started running. +`running_job` specifies the status update for a job that started running. + +#### `[finished_job_comments]` section -### Section `[finished_job_comments]` -Sets templates for messages about finished jobs. +The `[finished_job_comments]` section sets templates for messages about finished jobs. ``` success = :grin: SUCCESS tarball `{tarball_name}` ({tarball_size} GiB) in job dir ``` -Message for a successful job that produced a tarball. +`success` specifies the message for a successful job that produced a tarball. ``` failure = :cry: FAILURE ``` -Message for a failed job. +`failure` specifies the message for a failed job. ``` no_slurm_out = No slurm output `{slurm_out}` in job dir ``` -Message for missing Slurm output file. +`no_slurm_out` specifies the message for missing Slurm output file. ``` slurm_out = Found slurm output `{slurm_out}` in job dir ``` -Message for found Slurm output file. +`slurm_out` specifies the message for found Slurm output file. ``` missing_modules = Slurm output lacks message "No missing modules!". ``` -Template concerning the lack of a message signaling that all modules were built. +`missing_modules` is used to signal the lack of a message that all modules were built. ``` no_tarball_message = Slurm output lacks message about created tarball. ``` -Template concerning the lack of a message about a created tarball. +`no_tarball_message` is used to signal the lack of a message about a created tarball. ``` no_matching_tarball = No tarball matching `{tarball_pattern}` found in job dir. ``` -Template about a missing tarball. +`no_matching_tarball` is used to signal a missing tarball. ``` multiple_tarballs = Found {num_tarballs} tarballs in job dir - only 1 matching `{tarball_pattern}` expected. ``` -Template to report that multiple tarballs have been found. +`multiple_tarballs` is used to report that multiple tarballs have been found. ``` job_result_unknown_fmt =
:shrug: UNKNOWN _(click triangle for details)_
  • Job results file `{filename}` does not exist in job directory or reading it failed.
  • No artefacts were found/reported.
``` -Template to be used in case no result file (produced by `bot/check-build.sh` +`job_result_unknown_fmt` is used in case no result file (produced by `bot/check-build.sh` provided by target repository) was found. # Instructions to run the bot components -The bot consists of three components, the Smee client, the event handler and the job manager. Running the Smee client was explained in [Step 1](#step1). +The bot consists of three components: +* the Smee client; +* the event handler; +* the job manager. + +Running the Smee client was explained in [Step 1](#step1). ## Step 6.1: Running the event handler As the event handler may run for a long time, it is advised to run it in a `screen` or `tmux` session. -The event handler is provided by the Python script `eessi_bot_event_handler.py`. +The event handler is provided by the [`eessi_bot_event_handler.py`](https://github.com/EESSI/eessi-bot-software-layer/blob/main/eessi_bot_event_handler.py) Python script. + Change directory to `eessi-bot-software-layer` (which was created by cloning the -repository in [Step 4](#step4) - either the original one from EESSI or your fork). -Then, simply run the event handler by executing +repository in [Step 4](#step4) - either the original one from EESSI, or your fork). + +Then, simply run the event handler script: ``` ./event_handler.sh ``` @@ -531,12 +614,12 @@ See [Step 1](#step1) for telling the Smee client on which port the event handler The event handler writes log information to the files `pyghee.log` and `eessi_bot_event_handler.log`. -Note, if you run the bot on a frontend of a cluster with multiple frontends make sure that both the Smee client and the event handler run on the same machine. +Note, if you run the bot on a frontend of a cluster with multiple frontends make sure that both the Smee client and the event handler run on the same system! ## Step 6.2: Running the job manager As the job manager may run for a long time, it is advised to run it in a `screen` or `tmux` session. -The job manager is provided by the Python script `eessi_bot_job_manager_layer.py`. You can run the job manager from the directory `eessi-bot-software-layer` simply by +The job manager is provided by the [`eessi_bot_job_manager_layer.py`](https://github.com/EESSI/eessi-bot-software-layer/blob/main/eessi_bot_job_manager.py) Python script. You can run the job manager from the directory `eessi-bot-software-layer` simply by: ``` ./job_manager.sh @@ -544,7 +627,7 @@ The job manager is provided by the Python script `eessi_bot_job_manager_layer.py It will run in an infinite loop monitoring jobs and acting on their state changes. -If you want to limit the execution of the job manager, you can add two parameters: +If you want to limit the execution of the job manager, you can use thes options: |Option|Argument| |------|--------| |`-i` / `--max-manager-iterations`|Any number _z_: _z_ < 0 - run the main loop indefinitely, _z_ == 0 - don't run the main loop, _z_ > 0 - run the main loop _z_ times| @@ -553,16 +636,15 @@ If you want to limit the execution of the job manager, you can add two parameter An example command would be ``` -./job_manager.sh -i 1 -j 2222 +./job_manager.sh -i 1 -j 1234 ``` -to run the main loop exactly once for job `2222`. +to run the main loop exactly once for the job with ID `1234`. The job manager writes log information to the file `eessi_bot_job_manager.log`. -The job manager can run on a different machine than the event handler as long as both have access to the same shared filesystem. +The job manager can run on a different machine than the event handler, as long as both have access to the same shared filesystem. # Example pull request on software-layer For information on how to make pull requests and let the bot build software, see -[build-test-deploy bot](https://www.eessi.io/docs/bot/). - +[the bot section of the EESSI documentation](https://www.eessi.io/docs/bot/). From 4c38814e3c07f63dd2f1ef445f76d7109a193d3f Mon Sep 17 00:00:00 2001 From: Thomas Roeblitz Date: Thu, 28 Sep 2023 14:46:51 +0200 Subject: [PATCH 3/3] micro polishing --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index e3609ac1..5d8e6a2b 100644 --- a/README.md +++ b/README.md @@ -163,7 +163,7 @@ follow the instructions at the Add the directory that contains `aws` to the `$PATH` environment variable. Make sure that `$PATH` is set correctly for newly spawned shells, e.g., -it should be exported in startup file such as `$HOME/.bash_profile`. +it should be exported in a startup file such as `$HOME/.bash_profile`. Verify that `aws` executes by running `aws --version`. Then, run `aws configure` to set credentials for accessing the S3 bucket. @@ -214,7 +214,7 @@ On the result page, take note/copy the resulting token string -- it will only be On the `bot machine` set the environment variable `$GITHUB_TOKEN`: ``` -export $GITHUB_TOKEN='THE_TOKEN_STRING' +export GITHUB_TOKEN='THE_TOKEN_STRING' ``` in which you replace `THE_TOKEN_STRING` with the actual token. @@ -270,7 +270,7 @@ Replace '`123456`' with the id of your GitHub App. You can find the id of your G ``` app_name = 'MY-bot' ``` -The `app_name` specified a short name for your bot. It will appear in comments to a pull request. For example, it could include the name of the cluster where the bot runs and a label representing the user that runs the bot, like `hal9000-bot`. +The `app_name` specifies a short name for your bot. It will appear in comments to a pull request. For example, it could include the name of the cluster where the bot runs and a label representing the user that runs the bot, like `hal9000-bot`. *Note: avoid putting an actual username here as it will be visible on potentially publicly accessible GitHub pages.* @@ -308,7 +308,7 @@ It may happen that we need to customize the [CernVM-FS](https://cernvm.cern.ch/f job. The value of `cvmfs_customizations` is a dictionary which maps a file name to an entry that needs to be appended to that file. In the example line above, the configuration of `CVMFS_HTTP_PROXY` is appended to the file `/etc/cvmfs/default.local`. -The CernVM-FS configuration can commented out, unless there is a need to customize the CernVM-FS configuration. +The CernVM-FS configuration can be commented out, unless there is a need to customize the CernVM-FS configuration. ``` http_proxy = http://PROXY_DNS:3128/ @@ -437,7 +437,7 @@ arch_target_map = { "linux/x86_64/generic" : "" } #### `[repo_targets]` section -The `[repo_targets]` section defines for which repositories and architectures the bot can run job. +The `[repo_targets]` section defines for which repositories and architectures the bot can run a job. Repositories are referenced by IDs (or `repo_id`). Architectures are identified by `OS/SUBDIR` which correspond to settings in the `arch_target_map`.