Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing bot PR263 by building and uploading CaDiCaL #73

Open
wants to merge 2 commits into
base: nessi.no-2023.06
Choose a base branch
from

Conversation

trz42
Copy link
Owner

@trz42 trz42 commented Mar 20, 2024

PR to test new deploy code relying on result files ... see EESSI/eessi-bot-software-layer#263

Software to be installed:

1 out of 3 required modules missing:

* CaDiCaL/1.3.0-GCC-10.3.0 (CaDiCaL-1.3.0-GCC-10.3.0.eb)

Test scenarios:

  1. Build with some successes and some failures, set bot:deploy label and check what gets uploaded and where (which S3 buckets). Document upload_policy, _prefix settings for upload directories, bucket specs, ...

@eessi-bot-devel-trz42
Copy link

Instance dev-PR254 is configured to build:

  • arch x86_64/amd/zen2 for repo nessi-2022.11-swl-deb10
  • arch x86_64/amd/zen2 for repo nessi-2023.06-cl
  • arch x86_64/amd/zen2 for repo nessi-2023.06-swl-deb10
  • arch x86_64/amd/zen2 for repo nessi-2023.06-swl-deb11
  • arch aarch64/generic for repo nessi-2022.11-swl-deb10
  • arch aarch64/generic for repo nessi-2023.06-cl
  • arch aarch64/generic for repo nessi-2023.06-swl-deb10
  • arch aarch64/generic for repo nessi-2023.06-swl-deb11
  • arch aarch64/thunderx2 for repo nessi-2022.11-swl-deb10
  • arch aarch64/thunderx2 for repo nessi-2023.06-cl
  • arch aarch64/thunderx2 for repo nessi-2023.06-swl-deb10
  • arch aarch64/thunderx2 for repo nessi-2023.06-swl-deb11

@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

bot: build repo:swl-deb10 arch:zen2
bot: build repo:swl-deb10 arch:generic

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

Updates by the bot instance dev-PR254 (click for details)

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

New job on instance dev-PR254 for architecture x86_64-amd-zen2 for repository nessi-2022.11-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334

date job status comment
Mar 20 10:40:56 AM UTC 2024 submitted job id 154334 awaits release by job manager
Mar 20 10:43:16 AM UTC 2024 released job awaits launch by Slurm scheduler
Mar 20 10:44:29 AM UTC 2024 running job 154334 is running
Mar 20 10:53:10 AM UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-154334.out
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2022.11-software-linux-x86_64-amd-zen2-1710931953.tar.gzsize: 34 MiB (35942302 bytes)
entries: 61050
modules under 2022.11/software/linux/x86_64/amd/zen2/modules/all
EasyBuild/4.7.2.lua
EasyBuild/4.9.0.lua
software under 2022.11/software/linux/x86_64/amd/zen2/software
EasyBuild/4.7.2
EasyBuild/4.9.0
other under 2022.11/software/linux/x86_64/amd/zen2
2022.11/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2022.11/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
2022.11/scripts/utils.sh
Mar 20 10:53:10 AM UTC 2024 test result (no tests yet)

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

New job on instance dev-PR254 for architecture x86_64-amd-zen2 for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335

date job status comment
Mar 20 10:40:57 AM UTC 2024 submitted job id 154335 awaits release by job manager
Mar 20 10:43:13 AM UTC 2024 released job awaits launch by Slurm scheduler
Mar 20 10:44:26 AM UTC 2024 running job 154335 is running
Mar 20 10:55:14 AM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-154335.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1710932055.tar.gzsize: 1 MiB (1360913 bytes)
entries: 25
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
CaDiCaL/1.3.0-GCC-10.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
CaDiCaL/1.3.0-GCC-10.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp
Mar 20 10:55:14 AM UTC 2024 test result (no tests yet)
Mar 20 02:35:14 PM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1710932055.tar.gz to S3 bucket succeeded

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

New job on instance dev-PR254 for architecture aarch64-generic for repository nessi-2022.11-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336

date job status comment
Mar 20 10:41:04 AM UTC 2024 submitted job id 154336 awaits release by job manager
Mar 20 10:43:07 AM UTC 2024 released job awaits launch by Slurm scheduler
Mar 20 10:44:23 AM UTC 2024 running job 154336 is running
Mar 20 10:46:41 AM UTC 2024 finished
🤷 UNKNOWN (click triangle for details)
  • Job results file _bot_job154336.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Mar 20 10:46:41 AM UTC 2024 test result
🤷 UNKNOWN (click triangle for details)
  • Job test file _bot_job154336.test does not exist in job directory, or parsing it failed.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 20, 2024

New job on instance dev-PR254 for architecture aarch64-generic for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337

date job status comment
Mar 20 10:41:05 AM UTC 2024 submitted job id 154337 awaits release by job manager
Mar 20 10:43:10 AM UTC 2024 released job awaits launch by Slurm scheduler
Mar 20 10:43:19 AM UTC 2024 running job 154337 is running
Mar 20 10:52:05 AM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-154337.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1710931902.tar.gzsize: 1 MiB (1273440 bytes)
entries: 25
modules under 2023.06/software/linux/aarch64/generic/modules/all
CaDiCaL/1.3.0-GCC-10.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
CaDiCaL/1.3.0-GCC-10.3.0
other under 2023.06/software/linux/aarch64/generic
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp
Mar 20 10:52:05 AM UTC 2024 test result (no tests yet)
Mar 20 02:35:31 PM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-aarch64-generic-1710931902.tar.gz to S3 bucket succeeded

@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

All 4 builds jobs launched via (#73 (comment)) have finished. Job 154336 was cancelled to produce a job with UNKNOWN status, particularly the job directory does not contain the results file _bot_job154336.result (verified on eX3 if it doesn't exist or if it couldn't be read).

The bot's configuration when processing the bot: deploy label:

bucket_name = {
    "nessi-2022.11-swl-deb10": "dev-pr254-3",
    "nessi-2023.06-swl-deb10": "dev-pr254-2",
    "nessi-2023.06-swl-deb11": "dev-pr254"}

upload_policy = latest

metadata_prefix = {
    "nessi-2022.11-swl-deb10": "new11-10/'${github_repository}'/'${pull_request_number}'",
    "nessi-2023.06-swl-deb10": "new06-10/'${github_repository}'/'${pull_request_number}'",
    "nessi-2023.06-swl-deb11": "new06-11/'${github_repository}'/'${pull_request_number}'"}

tarball_prefix = {
    "nessi-2022.11-swl-deb10": "tb22.11/'${github_repository}'/'${pull_request_number}'",
    "nessi-2023.06-swl-deb10": "tb23.06-deb10/'${github_repository}'/'${pull_request_number}'",
    "nessi-2023.06-swl-deb11": "tb23.06-deb11/'${github_repository}'/'${pull_request_number}'"}

@trz42 trz42 added the bot:deploy Instruct bot to deploy built artefacts to Stratum 0 label Mar 20, 2024
@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

None of the successful jobs was accurately identified as SUCCESS. Part of the log (pyghee.log)

[20240320-T12:01:37] deploy_built_artefacts(): job_dirs = /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334,/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335,/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336,/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334/_bot_job154334.result
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334/_bot_job154334.metadata
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334/_bot_job154334.result
[20240320-T12:01:37] check_job_status(): found status 'FAILURE' from '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334/_bot_job154334.result'

[20240320-T12:01:37] determine_successful_jobs(): FAILED job in '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154334'
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335/_bot_job154335.result
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335/_bot_job154335.metadata
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335/_bot_job154335.result
[20240320-T12:01:37] check_job_status(): found status 'FAILURE' from '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335/_bot_job154335.result'

[20240320-T12:01:37] determine_successful_jobs(): FAILED job in '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154335'
[20240320-T12:01:37] No metadata file found at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336/_bot_job154336.result.
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336/_bot_job154336.metadata
[20240320-T12:01:37] No metadata file found at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336/_bot_job154336.result.
[20240320-T12:01:37] check_job_status(): no result file '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336/_bot_job154336.result' or reading it failed

[20240320-T12:01:37] determine_successful_jobs(): FAILED job in '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154336'
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337/_bot_job154337.result
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337/_bot_job154337.metadata
[20240320-T12:01:37] Found metadata file at /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337/_bot_job154337.result
[20240320-T12:01:37] check_job_status(): found status 'FAILURE' from '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337/_bot_job154337.result'

[20240320-T12:01:37] determine_successful_jobs(): FAILED job in '/home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/154337'
[20240320-T12:01:37] determine_artefacts_to_deploy(): num successful jobs 0

@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

Added a bit more logging output.

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 labels Mar 20, 2024
@trz42
Copy link
Owner Author

trz42 commented Mar 20, 2024

The string values seem ok. Changed the comparison operator too.

diff --git a/tasks/deploy.py b/tasks/deploy.py
index 7855416..7092545 100644
--- a/tasks/deploy.py
+++ b/tasks/deploy.py
@@ -189,7 +189,9 @@ def check_job_status(job_dir):
         log(f"{fn}(): no result file '{job_result_file_path}' or reading it failed\n")
         return False

-    if job_status is job_metadata.JOB_RESULT_SUCCESS:
+    log(f"{fn}(): job status is {job_status} (compare against {job_metadata.JOB_RESULT_SUCCESS})\n")
+
+    if job_status == job_metadata.JOB_RESULT_SUCCESS:
         # case (2): result file && status = SUCCESS --> return True
         log(f"{fn}(): found status 'SUCCESS' from '{job_result_file_path}'\n")
         return True

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 labels Mar 20, 2024
@trz42
Copy link
Owner Author

trz42 commented Mar 26, 2024

Rerun builds after updates to bot PR (also removed unused settings in app.cfg)

bot: build repo:swl-deb10 arch:zen2
bot: build repo:swl-deb10 arch:generic

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

Updates by the bot instance dev-PR254 (click for details)

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

New job on instance dev-PR254 for architecture x86_64-amd-zen2 for repository nessi-2022.11-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/159202

date job status comment
Mar 26 04:53:19 PM UTC 2024 submitted job id 159202 awaits release by job manager
Mar 26 04:53:31 PM UTC 2024 released job awaits launch by Slurm scheduler
Mar 26 04:54:45 PM UTC 2024 running job 159202 is running
Mar 26 05:08:20 PM UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-159202.out
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2022.11-software-linux-x86_64-amd-zen2-1711472853.tar.gzsize: 34 MiB (35982437 bytes)
entries: 61050
modules under 2022.11/software/linux/x86_64/amd/zen2/modules/all
EasyBuild/4.7.2.lua
EasyBuild/4.9.0.lua
software under 2022.11/software/linux/x86_64/amd/zen2/software
EasyBuild/4.7.2
EasyBuild/4.9.0
other under 2022.11/software/linux/x86_64/amd/zen2
2022.11/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
2022.11/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
2022.11/scripts/utils.sh
Mar 26 05:08:20 PM UTC 2024 test result (no tests yet)

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

New job on instance dev-PR254 for architecture x86_64-amd-zen2 for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/159203

date job status comment
Mar 26 04:53:21 PM UTC 2024 submitted job id 159203 awaits release by job manager
Mar 26 04:53:28 PM UTC 2024 released job awaits launch by Slurm scheduler
Mar 26 04:54:42 PM UTC 2024 running job 159203 is running
Mar 26 04:59:05 PM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-159203.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1711472289.tar.gzsize: 1 MiB (1360881 bytes)
entries: 25
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
CaDiCaL/1.3.0-GCC-10.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
CaDiCaL/1.3.0-GCC-10.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp
Mar 26 04:59:05 PM UTC 2024 test result (no tests yet)
Mar 26 05:18:58 PM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1711472289.tar.gz to S3 bucket succeeded

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

New job on instance dev-PR254 for architecture aarch64-generic for repository nessi-2022.11-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/159204

date job status comment
Mar 26 04:53:27 PM UTC 2024 submitted job id 159204 awaits release by job manager
Mar 26 04:54:34 PM UTC 2024 released job awaits launch by Slurm scheduler
Mar 26 04:55:51 PM UTC 2024 finished
🤷 UNKNOWN (click triangle for details)
  • Job results file _bot_job159204.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Mar 26 04:55:51 PM UTC 2024 test result
🤷 UNKNOWN (click triangle for details)
  • Job test file _bot_job159204.test does not exist in job directory, or parsing it failed.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Mar 26, 2024

New job on instance dev-PR254 for architecture aarch64-generic for repository nessi-2023.06-swl-deb10 in job dir /home/thomarob/bot-devel/test_sync_feb24/jobs/2024.03/pr_73/159205

date job status comment
Mar 26 04:53:29 PM UTC 2024 submitted job id 159205 awaits release by job manager
Mar 26 04:54:37 PM UTC 2024 released job awaits launch by Slurm scheduler
Mar 26 04:54:39 PM UTC 2024 running job 159205 is running
Mar 26 04:58:01 PM UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-159205.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1711472211.tar.gzsize: 1 MiB (1273914 bytes)
entries: 25
modules under 2023.06/software/linux/aarch64/generic/modules/all
CaDiCaL/1.3.0-GCC-10.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
CaDiCaL/1.3.0-GCC-10.3.0
other under 2023.06/software/linux/aarch64/generic
.lmod/cache/spiderT.lua
.lmod/cache/spiderT.luac_5.1
.lmod/cache/timestamp
Mar 26 04:58:01 PM UTC 2024 test result (no tests yet)
Mar 26 05:21:47 PM UTC 2024 uploaded transfer of eessi-2023.06-software-linux-aarch64-generic-1711472211.tar.gz to S3 bucket succeeded

@trz42
Copy link
Owner Author

trz42 commented Mar 26, 2024

Same build result as before. Re-setting bot:deploy label to verify if deployment code still works as intended.

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 labels Mar 26, 2024
@trz42
Copy link
Owner Author

trz42 commented Mar 26, 2024

Some config settings weren't updated which lead to the error

[20240326-T18:10:40] WARNING: A crash occurred!
Traceback (most recent call last):
  File "/home/thomarob/bot-devel/test_sync_feb24/venv_bot_p310/lib/python3.10/site-packages/pyghee/lib.py", line 170, in process_event
    self.handle_event(event_info, log_file=log_file)
  File "/home/thomarob/bot-devel/test_sync_feb24/venv_bot_p310/lib/python3.10/site-packages/pyghee/lib.py", line 102, in handle_event
    handler(event_info, log_file=log_file)
  File "/home/thomarob/bot-devel/test_sync_feb24/eessi-bot-software-layer/eessi_bot_event_handler.py", line 382, in handle_pull_request_event
    handler(event_info, pr)
  File "/home/thomarob/bot-devel/test_sync_feb24/eessi-bot-software-layer/eessi_bot_event_handler.py", line 314, in handle_pull_request_labeled_event
    deploy_built_artefacts(pr, event_info)
  File "/home/thomarob/bot-devel/test_sync_feb24/eessi-bot-software-layer/tasks/deploy.py", line 589, in deploy_built_artefacts
    upload_artefact(job_dir, payload, timestamp, repo_name, pr.number, pr_comment_id)
  File "/home/thomarob/bot-devel/test_sync_feb24/eessi-bot-software-layer/tasks/deploy.py", line 291, in upload_artefact
    if artefact_prefix.lstrip().startswith('{'):
AttributeError: 'NoneType' object has no attribute 'lstrip'

@trz42 trz42 added bot:deploy Instruct bot to deploy built artefacts to Stratum 0 and removed bot:deploy Instruct bot to deploy built artefacts to Stratum 0 labels Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:deploy Instruct bot to deploy built artefacts to Stratum 0 development
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants