-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terrform script hangs at run_ansible. #38
Comments
I think the first problem might be that the 'powertools' repository isn't defined when Ansible tries installing Slurm on the |
I have identified the cause of the problem. In this case, it is because the "update all packages" script was trying to update to an newer version of Slurm which is not packaged fully yet. I have added a fix to the Ansible package so that a new cluster should not build correctly. Often the simplest solution with CitC is to destroy and recreate a cluster from scratch to get newer version. In this case, you can apply the fix by logging in to the management node, and as root running Will is right though that PowerTools has caused similar issues in the past due to case sensitivity. |
Hello again, I created another cluster (Oracle) earlier today, and found that the conflicts in ansible were resolved, but the ansible script still only ran up until this point, where it hung. This meant that the
I'll update this if I discover anything more. |
The packer run can take a long time to finish, especially on Oracle. I have increased the time-out on the latest version on |
Hello,
I think this problem may or may not be related to issue #34.
I have created 3 CITC clusters on Oracle OCI over the last 24 hours using the CITC docs, and after running
terraform apply oracle
and ssh'ing into the management node, I find that thefinish
script doesn't run, giving only the following error:[citc@mgmt ~]$ finish Error: The management node has not finished its setup Please allow it to finish before continuing. For information about why they have not finished, check the file /root/ansible-pull.log
The ansible-pull.log is as follows:
Even after leaving the script overnight, no further progress was made.
I found that there was one failure in the ansible-pull.log prior to this:
If I comment out the
- security_updates
role in themanagement.yml
in thecitc-ansible
checkout, and re-runrun_ansible
, then it runs until here:I set up a cluster using the same instructions last week, when it seemed to work as normal and I could run
finish
within a few minutes of runningterraform apply
.The text was updated successfully, but these errors were encountered: