Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify HTCondor secondary with HTCondor and cleanup HTCondor secondary configs #1141

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 0 additions & 86 deletions group_vars/htcondor-secondary-submit-host.yml

This file was deleted.

12 changes: 0 additions & 12 deletions group_vars/htcondor-secondary-submit.yml

This file was deleted.

8 changes: 0 additions & 8 deletions group_vars/htcondor-secondary/vars.yml

This file was deleted.

10 changes: 0 additions & 10 deletions group_vars/htcondor-secondary/vault.yml

This file was deleted.

13 changes: 13 additions & 0 deletions group_vars/htcondor-submit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,19 @@
---
htcondor_role_submit: true

# Role: hxr.postgres-connection
postgres_user: galaxy
postgres_host: sn05.galaxyproject.eu
postgres_port: 5432

# MISC
galaxy_root: /opt/galaxy
galaxy_venv_dir: "{{ galaxy_root }}/venv"
galaxy_server_dir: "{{ galaxy_root }}/server"
galaxy_config_dir: "{{ galaxy_root }}/config"
galaxy_config_file: "{{ galaxy_config_dir }}/galaxy.yml"
galaxy_mutable_config_dir: "{{ galaxy_root }}/mutable-config"
galaxy_log_dir: "/var/log/galaxy"
galaxy_config:
galaxy:
job_working_directory: /data/jwd04/main
30 changes: 4 additions & 26 deletions group_vars/htcondor/vars.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# Configure nodes in the HTCondor cluster.
---
htcondor_server: "condor-cm.galaxyproject.eu"
mira-miracoli marked this conversation as resolved.
Show resolved Hide resolved
htcondor_server: "build.galaxyproject.eu"
htcondor_domain: bi.uni-freiburg.de
htcondor_port: 9618
htcondor_server_port: 9628
htcondor_shared_port: 9628
htcondor_version: 23.0
htcondor_channel: 23.0
htcondor_firewall_condor: "{{ true if htcondor_port == 9618 else false }}"
htcondor_firewall_condor: false
htcondor_firewall_nfs: false
htcondor_role_execute: false
htcondor_role_manager: false
Expand All @@ -30,29 +31,6 @@ htcondor_job_start_delay: 0
htcondor_claim_worklife: 120
htcondor_negotiator_post_job_rank: "isUndefined(RemoteOwner) * (10000 - TotalLoadAvg)"

# Settings specific to the `usegalaxy_eu.htcondor` role (to be replaced with
# `grycap.htcondor`).
condor_host: "{{ htcondor_server }}"
condor_fs_domain: "{{ htcondor_domain }}"
condor_uid_domain: "{{ htcondor_domain }}"
condor_allow_write: "{{ htcondor_allow_write }}"
# condor_daemons -> Defined per-host in host_vars.
condor_allow_negotiator: "{{ htcondor_allow_negotiator }}"
condor_allow_administrator: "{{ htcondor_allow_administrator }}"
condor_system_periodic_hold: "{{ htcondor_system_periodic_hold }}"
condor_system_periodic_remove: "{{ htcondor_system_periodic_remove }}"
condor_network_interface: "{{ htcondor_network_interface }}"
condor_extra: |
MASTER_UPDATE_INTERVAL = {{ htcondor_master_update_interval }}
CLASSAD_LIFETIME = {{ htcondor_classad_lifetime }}
NEGOTIATOR_INTERVAL = {{ htcondor_negotiator_interval }}
NEGOTIATOR_UPDATE_INTERVAL = {{ htcondor_negotiator_update_interval }}
SCHEDD_INTERVAL = {{ htcondor_schedd_interval }}
JOB_START_COUNT = {{ htcondor_job_start_count }}
JOB_START_DELAY = {{ htcondor_job_start_delay }}
CLAIM_WORKLIFE = {{ htcondor_claim_worklife }}
NEGOTIATOR_POST_JOB_RANK = {{ htcondor_negotiator_post_job_rank }}

# Configuration of `usegalaxy_eu.handy.os_setup`.
enable_create_user: true
enable_remap_user: true
Expand Down
18 changes: 9 additions & 9 deletions group_vars/htcondor/vault.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
$ANSIBLE_VAULT;1.1;AES256

Check warning on line 1 in group_vars/htcondor/vault.yml

View workflow job for this annotation

GitHub Actions / Lint

1:1 [document-start] missing document start "---"
36336166336332656436376537343036353234366164616236393139313932343538313133373639
3064333637333539353566396361666362666539353231360a646430356366343632633637326462
39333232646363656438316533666664613935353336313064323038313564383734373433656330
3161396636623764660a636332303565396630666134626235636363636434623537333933653537
37383165643433633630353961623930653139653132303235306539613332346662323764356563
65303062333738616266383339366165643264633038323533306365623034656563333731393465
66386263353433303832363936323138386637636366663338336263323835663730616639393831
32333161633131323534306565626530616364386261646439336436303834386265396161333133
3130
31353533313831356632376636636564653732313930623263376437313362386632623732306136
3465326632326138646330353164336363653764396237370a393562613834343765313835656362
66633030353534663831323939386335316130343137396139633038366438613731376130663564
6635643366613463390a663637643834366632643730666131323737633966393335343734663731
63346138623034333265633465376633313537313062633633353261623934333037646532303132
63643364633136613265333461623036313964383932336335623236623462316437303964346163
32386236303765353936333563303934323964383039626233613333396431383936326530343931
33636531343831663864373365613036333964343534616664356462383066623238326138373435
3566
2 changes: 0 additions & 2 deletions host_vars/nspawn-htcondor.sn06.galaxyproject.eu.yml

This file was deleted.

12 changes: 6 additions & 6 deletions host_vars/sn06.galaxyproject.eu.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
htcondor_network_interface: ens802f0.223

# Settings specific to the `usegalaxy_eu.htcondor` role.
condor_daemons:
- COLLECTOR
- NEGOTIATOR
- MASTER
- SCHEDD
# 15/03/2024: On sn06 HTCondor conf was manually adjusted to use port 9618 since the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can stop all job submissions on the Galaxy side via the Admin panel. So no official downtime is needed? Or we could simply stop all job-handlers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can stop the handlers and restart the htcondor systemd service.

# HTCondor container was using 9628. Changing this now will require a restart of the
# HTCondor service on sn06. So this needs to be combined with a maintenance window in the
# future. Rest of the schedulers are using 9628 including the manager.
# Adding it to the host_vars for the dedicated host sn06 thus it has the higher precedence.
htcondor_shared_port: 9618
38 changes: 3 additions & 35 deletions hosts
Original file line number Diff line number Diff line change
Expand Up @@ -79,48 +79,16 @@ ansible_ssh_user=centos
[htcondor:children]
htcondor-manager
htcondor-submit
htcondor-secondary

[htcondor-manager]
sn06.galaxyproject.eu

[htcondor-manager:children]
htcondor-secondary-manager
build.galaxyproject.eu ansible_ssh_user=root

[htcondor-manager:vars]
ansible_group_priority=2
ansible_group_priority=4

[htcondor-submit]
maintenance.galaxyproject.eu
sn06.galaxyproject.eu

[htcondor-submit:children]
htcondor-secondary-submit

[htcondor-submit:vars]
ansible_group_priority=2

[htcondor-secondary:children]
htcondor-secondary-manager
htcondor-secondary-submit

[htcondor-secondary:vars]
ansible_group_priority=3

[htcondor-secondary-manager]
build.galaxyproject.eu ansible_ssh_user=root

[htcondor-secondary-manager:vars]
ansible_group_priority=4

[htcondor-secondary-submit]
nspawn-htcondor.sn06.galaxyproject.eu ansible_host=127.0.0.1 ansible_port=2222 ansible_ssh_user=root ansible_ssh_common_args='-o HostKeyAlias=nspawn-htcondor.sn06.galaxyproject.eu -o ProxyCommand="ssh -W %h:%p -q [email protected]"'
maintenance.galaxyproject.eu

[htcondor-secondary-submit:vars]
ansible_group_priority=4

[htcondor-secondary-submit-host]
sn06.galaxyproject.eu

[htcondor-secondary-submit-host:vars]
ansible_group_priority=2
Loading
Loading