-
Launch a
m2.xlarge
machine for the head node in a suitable zone.- Make it a slurm cluster only
- Transient storage
- No GVL utilities
- Use cloudman key
- Set everything else as default.
-
Check it all out.
- Make sure that /mnt is attached.
-
Make dir structure for pulsar
sudo mkdir /mnt/pulsar sudo chown -R ubuntu:ubuntu /mnt/pulsar mkdir /mnt/pulsar/config mkdir -p /mnt/pulsar/files/staging
-
Create Pulsar's virtual env and activate it
cd /mnt/pulsar virtualenv venv source venv/bin/activate
-
Clone the Pulsar git repo
cd /mnt/pulsar git clone https://github.com/galaxyproject/pulsar server
-
Install the requirements
- First, edit the requirements.txt file in the server directory and uncomment the line about drmaa.
- Then install them
cd server vim requirements.txt #Uncomment the drmaa line pip install -r requirements.txt
-
Copy the config files to config dir
- The config files we need are:
- app.yml
- local_env.sh
- server.ini
- dependency_resolvers_conf.xml
- job_metrics_conf.xml
- Copy the sample files for each one from the
server
directory and remove the.sample
from the ends.
- The config files we need are:
-
Edit the configs
- app.yml - just change the following lines
- Make a really long private token. I normally use lastpass to produce one.
staging_directory: /mnt/pulsar/files/staging managers: _default_: type: queued_drmaa private_token: <your private token here> persistence_directory: files/persisted_data tool_dependency_dir: dependencies dependency_resolvers_config_file: /mnt/pulsar/config/dependency_resolvers_conf.xml
- dependency_resolvers_conf.xml
<dependency_resolvers> <conda auto_install="True" auto_init="True"/> <galaxy_packages versionless="true" /> <conda versionless="true" /> </dependency_resolvers>
- local_env.sh
## Place local configuration variables used by the LWR and run.sh in here. For example ## If using the drmaa queue manager, you will need to set the DRMAA_LIBRARY_PATH variable, ## you may also need to update LD_LIBRARY_PATH for underlying library as well. export DRMAA_LIBRARY_PATH=/usr/lib/slurm-drmaa/lib/libdrmaa.so ## If you wish to use a variety of Galaxy tools that depend on galaxy.eggs being defined, ## set GALAXY_HOME to point to a copy of Galaxy. ## Uncomment to verify GALAXY_HOME is set properly before starting the LWR. #export TEST_GALAXY_LIBS=1 ## If using a manager that runs jobs as real users, be sure to load your Python ## environement in here as well. # . .venv/bin/activate
- server.ini
[server:main] use = egg:Paste#http port = 8913 host = localhost ssl_pem = /etc/ssl/certs/host.pem [app:main] paste.app_factory = pulsar.web.wsgi:app_factory app_config=/mnt/pulsar/config/app.yml [uwsgi] master = True paste-logger = True http = localhost:8913 processes = 1 enable-threads = True [watcher:web] cmd = chaussette --fd $(circus.sockets.web) paste:server.ini use_sockets = True numprocesses = 1 [socket:web] host = localhost port = 8913 ...
- job_metrics.xml
<?xml version="1.0"?> <job_metrics> <core /> <cpuinfo verbose="true" /> <meminfo /> <uname /> </job_metrics> </xml>
- Install some final dependencies
sudo apt-get install libffi-dev python-dev libssl-dev pip install pyOpenSSL
- Make a certificate and add it to the certs folder
openssl genrsa 1024 > host.key chmod 400 host.key openssl req -new -x509 -nodes -sha1 -days 365 -key host.key > host.cert cat host.cert host.key > host.pem chmod 400 host.pem sudo cp host.pem /etc/ssl/certs/
-
Start it up just so it initialises conda etc
- Make sure the venv is still activated!
cd /mnt/pulsar/server ./run.sh -c /mnt/pulsar/config -m paster --daemon
-
Look at the
/mnt/pulsar/server/paster.log
logfile to make sure it's running correctly.- Wait for the conda installation to finish!
- Then shut pulsar down
./run.sh -c /mnt/pulsar/config -m paster --stop-daemon
-
Give your server an entry in DNS.
pulsar-xxx.genome.edu.au
for example
-
Install letsencrypt.
sudo add-apt-repository ppa:certbot/certbot sudo apt-get update sudo apt-get install python-certbot-nginx
-
Change nginx defaults
sudo vim /etc/nginx/sites-enabled/default.server
- Add the line
server_name pulsar-xxx.genome.edu.au;
just after the listen stuff in the httpsserver
section.
- Add the line
-
Restart nginx
sudo systemctl restart nginx
-
Obtain a certificate!
sudo certbot --nginx -d pulsar-xxx.genome.edu.au
-
Restart nginx again
sudo systemctl restart nginx
Note: You may need to make sure that the server is accessible via the net as sometimes it doesn't get the correct security groups.
If not, then add cloudlaunch
and cloudman
security groups to the server in the Nectar Dashboard.
-
Add a file
pulsar.locations
to /etc/nginx/sites-enabled with the following contents:# This file is maintained by CloudMan. # Changes will be overwritten! location /pulsar { rewrite ^/pulsar/(.*) /$1 break; proxy_pass https://127.0.0.1:8913/; proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; }
-
Restart nginx yet again!
sudo systemctl restart nginx
-
Check to see if nginx restarted correctly with all the new stuff!
sudo systemctl status nginx
cd /mnt/pulsar/server
source ../venv/bin/activate
./run.sh -c /mnt/pulsar/config -m paster --daemon
- Modify a galaxy server's
job_conf.xml
so that it sends jobs to the new server.- Make sure that it starts loading the deps via conda
- see if it transfers the files
- monitor the pulsar log and htop etc.
With any luck, it should all be done!
There are a few issues with worker nodes with this setup atm. They don't see the nfs share /mnt/pulsar automatically. Infact, you need to do some extra stuff to get the head node to share it and to workers to see it.
- On the head node, add the following line to
/etc/exports
:
/mnt/pulsar *(rw,sync,no_root_squash,no_subtree_check)
- Then reload the export table with:
sudo exportfs -a
- On each worker node you now need to mount the nfs share to the correct location.
ssh w1 #or what ever worker node you are changing
sudo mkdir /mnt/pulsar
sudo mount pulsar-xxx.genome.edu.au:/mnt/pulsar /mnt/pulsar
- Add it to the fstab so it will work in the future without having to manually mount it. Add the following line to the end of
/etc/fstab
:
pulsar-nci.genome.edu.au:/mnt/pulsar /mnt/pulsar nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0
If you need to change settings on a worker or play with it in anyway that could make it unstable for job running purposes, it's better to empty it of jobs first. You can set it's state to drain, wait for the jobs running on it to finish, alter it in whatever way you need and then set it's state to resume.
Set to drain
sudo scontrol update node=w1 state=drain reason=maintenance
Resume
sudo scontrol update node=w1 state=resume