Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate and test the SDR deployment document #133

Open
llivermore opened this issue Dec 5, 2022 · 21 comments
Open

Evaluate and test the SDR deployment document #133

llivermore opened this issue Dec 5, 2022 · 21 comments
Assignees

Comments

@llivermore
Copy link
Contributor

Naturalis will set up their own instance of the SDR using current documentation. We will record (and respond) to any questions or feedback here.

@TomDijkema
Copy link

Hi Oliver/Laurence,

I am currently trying to run the SDR locally to test the implementation before setting it up on a remote server. This process until now gave me some issues that the documentation did not directly address but I could fix, I can list those if you like. However, I am now getting pretty stuck at the last step: the deployment of the Ansible playbook (I skipped the SSL portion for my local installment).

In an attempt to host the SDR locally I attempted several things but started with the suggested 'hosts' file were I defined the localhost ip as the host:

127.0.0.1 ansible_connection=local

This keeps giving me the following result:

ansible-playbook deploy-galaxy.yml
[DEPRECATION WARNING]: include is deprecated, use include_tasks/import_tasks instead. See 
https://docs.ansible.com/ansible-core/2.14/user_guide/playbooks_reuse_includes.html for details. This feature will be
 removed in version 2.16. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: Could not match supplied host pattern, ignoring: galaxyservers
[WARNING]: Could not match supplied host pattern, ignoring: remoteservers

PLAY [galaxyservers,remoteservers] ***********************************************************************************
skipping: no hosts matched

PLAY RECAP ***********************************************************************************************************

Which I interpret as the host is not valid or I am probably doing something wrong in the way you should deploy this on localhost in comparison to a remote server.
I also tried some other methods like the ones suggested in: https://gist.github.com/alces/caa3e7e5f46f9595f715f0f55eef65c1, and for example tried editing the hosts variable in the deploy-galaxy.yml file to localhost, but with the same result.

Could you give me a hint on what I am doing wrong? Ansible and Galaxy are completely new to me so I am probably overseeing something.

Best regards, Tom

@OliverWoolland
Copy link
Collaborator

Hello @TomDijkema! Thanks for trying out the process :) I have a few responses!

  1. I wouldn't personally try to use the Ansible scripts to deploy the SDR locally, the playbooks were not written with that in mind and it would be hard to undo the changes if any problems are found. I'd suggest setting up a virtual machine and deploying to that
  2. I suspect your specific issue here is missing the [remoteservers] tag in your hosts file? (see here)
  3. I have recently updated the deployment documentation in response to some feedback! It might be worth giving it another skim

@OliverWoolland OliverWoolland self-assigned this Dec 8, 2022
@TomDijkema
Copy link

TomDijkema commented Dec 9, 2022

Hi @OliverWoolland,

Thanks, we have set up a remote server and now this part indeed works as expected in the manual.
I am, however, encountering a new error which seems to inherit from the python code itself when executing the second playbook for the enhanced SDR features (the first playbook has finished successfully after some attempts :) ).
Could you please have a look at the error message for me and let me know if this is something wrong in the code or if I need to change the configuration files?
I'll leave you with the error message and configuration files.

Error:

TASK [Create bootstrap admin] ***************************************************************
fatal: [ip]: FAILED! => changed=true 
  cmd: |-
    python3 '/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py' --master-api-key 'xiqnaejull' --admin-email '[email protected]' --admin-user 'ubuntu' --server 'localhost'
  delta: '0:00:00.481564'
  end: '2022-12-09 14:12:34.882872'
  msg: non-zero return code
  rc: 1
  start: '2022-12-09 14:12:34.401308'
  stderr: |-
    Traceback (most recent call last):
      File "/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py", line 20, in <module>
        gi = GalaxyInstance(url=args.server, key=args.master_api_key)
      File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxy/__init__.py", line 83, in __init__
        super().__init__(url, key, email, password, verify=verify)
      File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxyclient.py", line 67, in __init__
        raise ValueError(f"Missing scheme in url {url}")
    ValueError: Missing scheme in url localhost
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

PLAY RECAP **********************************************************************************
ip                : ok=14   changed=11   unreachable=0    failed=1    skipped=0    rescued=0    ignored=0    

Values in the encrypted sdr-secret.yml file:

api key: xiqnaejull
admin email: [email protected]
admin user: <user>
teklia:
bardecode:
ansible_ssh_pass: <server user password>
ansible_become_pass: <become pass>
vault_id_secret: <the vault password present in .vault-password.txt>

@OliverWoolland
Copy link
Collaborator

Hi @TomDijkema - excellent news that it has gone as planned. Please be a little careful posting IP addresses and passwords here!

Could you please double check you are working with the latest version of this repository? I would have expected this commit to solve your issue.

If you do have the latest version and that problem has persisted I will have to have a look in to it!

@TomDijkema
Copy link

Apologies, forgot to mute them, edited previous comment.
I will check the version I pulled.

@TomDijkema
Copy link

Ok, I pulled the latest version and retried. It now encounters a 502 Bad Gateway, originating from the domain itself. If you navigate to https://sdr.dissco.tech (our domain for sdr) you can see the error code. This is the nginx result from running the first playbook, maybe something in there went wrong. Checked our server config but ports 80 and 443 should be accessible from anywhere, also included the SSL certificates in the server (called foo with the correct extension) and connection is verified.

I am not sure if this is a problem with our server config or SDR, it looks like ports 80 and 443 are up and running with nginx master just fine. If so I will check this with Sam next week.

Error in playbook:

TASK [Create bootstrap admin] ********************************************************************************************************************************
fatal: [ip]: FAILED! => changed=true 
  cmd: |-
    python3 '/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py' --master-api-key 'xiqnaejull' --admin-email '[email protected]' --admin-user '<user>' --server 'http://localhost'
  delta: '<delta>'
  end: '2022-12-09 15:18:47.793975'
  msg: non-zero return code
  rc: 1
  start: '2022-12-09 15:18:47.477817'
  stderr: |-
    Traceback (most recent call last):
      File "/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py", line 27, in <module>
        for existing_user in gi.users.get_users():
      File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxy/users/__init__.py", line 74, in get_users
        return self._get(deleted=deleted, params=params)
      File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxy/client.py", line 134, in _get
        raise ConnectionError(
    bioblend.ConnectionError: GET: error 502: b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.18.0 (Ubuntu)</center>\r\n</body>\r\n</html>\r\n', 0 attempts left: <html>
    <head><title>502 Bad Gateway</title></head>
    <body>
    <center><h1>502 Bad Gateway</h1></center>
    <hr><center>nginx/1.18.0 (Ubuntu)</center>
    </body>
    </html>
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

PLAY RECAP ***************************************************************************************************************************************************
ip                : ok=13   changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0 

@OliverWoolland
Copy link
Collaborator

Ok thanks for trying that, did you rerun only deploy-sdr.yml or did you run deploy-galaxy.yml as well?

If only deploy-sdr can I suggest rerunning deploy-galaxy first and seeing if that helps?

@TomDijkema
Copy link

I think i did rerun it, but let me try it again

@OliverWoolland
Copy link
Collaborator

OliverWoolland commented Dec 9, 2022

Sadly the 502 is rather hard to debug. If needed, a procedure I have used is:

  • ssh to remote machine
  • stop the galaxy service sudo systemctl stop galaxy
  • become the galaxy user sudo su galaxy
  • move to galaxy server folder cd /srv/galaxy/
  • activate the galaxy virtual env source venv/bin/activate
  • launch server manually, watch for any error output /srv/galaxy/venv/bin/galaxyctl start --foreground

@TomDijkema
Copy link

TomDijkema commented Dec 9, 2022

Ok, when I manually start the Galaxy server as you described it gives the following logs:

supervisord is not running
supervisord is not running
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/gunicorn.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/celery.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/celery-beat.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/handler_0.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/handler_1.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/handler_2.log' for reading: No such file or directory
/usr/bin/tail: no files remaining
2022-12-09 16:09:42,001 WARN No file matches via include "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/*.conf"
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_celery-beat_celery-beat.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_celery_celery.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_gunicorn_gunicorn.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_standalone_handler_0.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_standalone_handler_1.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_standalone_handler_2.conf" during parsing
Error: Another program is already listening on a port that one of our HTTP servers is configured to use.  Shut this program down first before starting supervisord.
For help, use /srv/galaxy/venv/bin/supervisord -h

I guess some other service is running on the port supervisord wants to run, on which port should it run according to your configuration?

@OliverWoolland
Copy link
Collaborator

Thanks for trying that :) I agree that looks like a problem! The set up uses standard ports so 80 or 443 I would expect

@TomDijkema
Copy link

Hmm, strange. Server states nginx running on the ports, but that does seem intentional no?

tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      23194/nginx: master 
tcp6       0      0 :::80                   :::*                    LISTEN      23194/nginx: master 
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      23194/nginx: master 

@OliverWoolland
Copy link
Collaborator

For that to be a problem seems very strange to me.

Maybe bring nginx down too? sudo systemctl stop nginx then restarting galaxyctl

@OliverWoolland
Copy link
Collaborator

You could also dump the nginx config to check that the galaxy configuration is there sudo nginx -T

@TomDijkema
Copy link

Galaxy is present in the nginx.conf, in the last portion of the file, it mentions this section is maintained by Ansible

@TomDijkema
Copy link

Anyway, thanks for your help until now! Will continue on Monday.

@TomDijkema
Copy link

Hi @OliverWoolland ,

After some inspecting we think we have narrowed down the problem to a missing instance of Gunicorn, which already creates some errors while running the first playbook (the Galaxy one). On Friday, when I ran the playbook twice it somehow ignored this error which probably was not beneficial for the second playbook. It seems Galaxy runs on Gunicorn, thus it can not be missed. One of my previous comments when I ran the Galaxy instance solely by itself gave a similar issue calling out supervisord was not running.

Here is the error the first playbook gives:

RUNNING HANDLER [galaxyproject.galaxy : galaxy gravity restart] *************************************************
fatal: [ip]: FAILED! => changed=true 
  cmd:
  - /srv/galaxy/venv/bin/galaxyctl
  - graceful
  delta: '0:00:00.535848'
  end: '2022-12-12 08:13:37.619923'
  msg: non-zero return code
  rc: 1
  start: '2022-12-12 08:13:37.084075'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |-
    gunicorn: ERROR (not running)
    gunicorn: ERROR (no such file)
  stdout_lines: <omitted>

PLAY RECAP ******************************************************************************************************
ip                : ok=112  changed=6    unreachable=0    failed=1    skipped=63   rescued=0    ignored=0 

@OliverWoolland
Copy link
Collaborator

Hi @TomDijkema, I hope you had a nice weekend. The playbook should handle the creation and setup of the Gunicorn instance. I wonder if this could be linked to the playbook having run with an old version first.

If you've not tried it already, could you try running both playbooks again on a fresh (Ubuntu 20.04) VM? After the first playbook runs you should be able to find a (blank) instance of Galaxy running at your URL)

@TomDijkema
Copy link

The weekend was great! Hopefully yours was too.

Ok, I shall try and set up a new VM to install the SDR to ensure we have a clean instance.

@TomDijkema
Copy link

Just to be sure: the ansible_ssh_pass variable in the secrets file, it needs to contain the Remote machine ssh user password. What is your exact definition of this (because I am not 100% sure)?

@TomDijkema
Copy link

Hi @OliverWoolland,

Here are the notes I took during the set up of the SDR. Tried to summarize them, so let me know if I need to further clarify something.

Overview
The setup of the SDR is in principle not very difficult, but does require attention to detail and some advanced knowledge about server management, using the bash and setting up SSL certificates. The biggest hurdle we came across was wanting to go too fast which led to some complications in the deployment. After contact with Oliver Woolland, who was very responsive and helpful, we managed to fix the issues pretty quickly and get it to work.

Notes
The current documentation on the setup of the SDR is short, but touches all the necessary topics. There could be made some improvements. We state our suggestions in the bullet points bellow:

  • A bit more context about the function of the host and remote machine and how Ansible is used to deploy the SDR from a local system to a remote one (reference to the: Ansible for the SDR page). For example: at the start, we were not sure if to pull the repository to the local system or to the server environment (lack of Ansible knowledge).

  • Probably best to list or use bullet points for the requirements of the host machine as well. At first we oversaw these requirements (Ansible (>= 2.12), sshpass, pip installed)

  • Would recommend to list the commands for the secret parameters one at a time and conclude with the full example. Now they are stated above the explanation which can lead to the user just plainly executing these commands all at once. Would be nice if you could say for example: now, insert the required parameters as listed below (list of secret parameters) by opening the file with this command (nano command).

  • Would also move the reference to the secret parameters to the top of the paragraph.

  • The secret parameters reference is good, but we think it can include a little bit more context per parameter, what is Teklia / Bardecode (reference to the: SDR tools technical page), a bit broader description on ansile_ssh_pass and ansible_become_pass.

  • The vault id of course references to the random string that was generated with the first command (openssl rand -base64 24 > .vault-password.txt), probably best to add to the description to copy the value from the .vault-password.txt file to the vault id in the secrets file. The vault id is also called a secret as well as a password (kinda the same) but may be confusing.

  • Please state where to create the hosts file (in the root directory).

  • At the creation of the hosts file it is stated an ip is acceptable to define the remote server, this will however result in a nonfunctional data upload function because the SSL certificate can not relate to the ip, but should be related to a domain.

  • Mention default examples of the ssh_user like ubuntu on Ubuntu for example.

  • Generation of SSL certificates requires, of course, some technical knowledge. This may be a hick-up for inexperienced users trying to set up the SDR. However, it may be questioned if the SDR will ever be set up by these kinds of users or is always deployed by a technical team.

  • Please state the name of the certificate files should in fact be ‘foo’. It seems a bit like you have to define a name yourself since foo. Maybe also good to mention the name can be changed, but this also requires the user to change the reference name in the recurring file (did see it was defined somewhere)

  • When deploying the first Ansible playbook, please mention that the user should disable nginx if installed by using: service nginx stop. Otherwise it will somehow conflict with the Galaxy configuration and display the 502 error. It by the way always displays the 502 error after deploying, but after like ten seconds a refresh will show the Galaxy page (nginx probably takes some time).

  • The first Ansible playbook can fail the first time it is run, but a second try can do the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants