You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a race condition triggered in template loop
{% for host in ansible_play_hosts %}
{% if host != inventory_hostname %}
[Peer]
# {{ host }}
PublicKey = {{hostvars[host].wireguard__fact_public_key}}
...
What happens here is that in the loop, wireguard__fact_public_key - a fact that is dynamically generated on other hosts, is needed in the loop body, however, there is no wait statement to make sure that at the point of execution, this fact really is defined. One will normally notice this when inventory is geographically spread, like in my case where I have some hosts in local network and some in the cloud. Hosts in local network are typically faster and will reach loop in template before cloud hosts had a chance to set the fact.
Simple workaround is to put something like this in main.yml just before template task:
- name: Sleep for 10 seconds and continue with play
ansible.builtin.wait_for:
timeout: 10
Of course, 10 seconds wait is arbitrary and this is no real solution. Solution may involve more flexible waiting in template loop by periodically checking for fact existence before using them or, even better, removing race condition altogether so that all wireguard keys and configs are delegated to ansible controller and distributed to the clients afterwards. The second solution may raise some security concerns regarding private keys, however, one should bear in mind that if ansible controller somehow gets compromised, one probably will have much bigger problems than wireguard keys anyway.
Notice also that serializing the play won't cut it either, because if one does that, play will always fail: none of the dynamically generated facts from other hosts will be available in the template loop for obvious reason. Similarly, when executing the play with big number of hosts, say 1000, and having serial set to say 20, loop will fail too, even if all hosts are in local network, i.e. equally fast.
Explicit looping over ansible inventory is generally discouraged anyway, but I can see that in this particular case this lends itself nicely to create wireguard config in pretty elegant way. Unfortunately, elegance comes with price.
The text was updated successfully, but these errors were encountered:
Thanks for your comment. I actually never had this problem also having hosts in the Cloud and locally. But yeah, with a slow line or lots of hosts it might happen. Make the timeout for ansible.builtin.wait_for a variable would at least give people the possibility to configure it. But as you said it's not really that reliable. So checking the facts would be way nicer. If I'll find some time I'll have a look. But currently I'm pretty busy with other stuff. Also it's not that easy to reproduce. You can basically only assume that it works after the change if the role itself produces the same output 😉
Well, yes, triggering race conditions is really use-case dependent. Here it will happen as soon as you have hosts on two separate locations because of network delay which is usually at least one order of magnitude higher than delays in local network or local host used as supervisor.
I am working currently on a project that needs full mesh VPN connectivity between several locations, cloud being one of them. I have hit the issue immediately upon first role execution. So, actually, it is not that difficult to reproduce. For test purposes, you can just set serial: 1 in the playbook and run it for your use case. It will fail on the first host, because it does not have facts from other hosts. Of course, wait workaround won't help in this case to fix it. But delegating key generation to Ansible controller would. If I find some time, I might provide full patch to you. It would certainly be less work than writing my own role from scratch ;-)
There is a race condition triggered in template loop
What happens here is that in the loop,
wireguard__fact_public_key
- a fact that is dynamically generated on other hosts, is needed in the loop body, however, there is no wait statement to make sure that at the point of execution, this fact really is defined. One will normally notice this when inventory is geographically spread, like in my case where I have some hosts in local network and some in the cloud. Hosts in local network are typically faster and will reach loop in template before cloud hosts had a chance to set the fact.Simple workaround is to put something like this in main.yml just before template task:
Of course, 10 seconds wait is arbitrary and this is no real solution. Solution may involve more flexible waiting in template loop by periodically checking for fact existence before using them or, even better, removing race condition altogether so that all wireguard keys and configs are delegated to ansible controller and distributed to the clients afterwards. The second solution may raise some security concerns regarding private keys, however, one should bear in mind that if ansible controller somehow gets compromised, one probably will have much bigger problems than wireguard keys anyway.
Notice also that serializing the play won't cut it either, because if one does that, play will always fail: none of the dynamically generated facts from other hosts will be available in the template loop for obvious reason. Similarly, when executing the play with big number of hosts, say 1000, and having serial set to say 20, loop will fail too, even if all hosts are in local network, i.e. equally fast.
Explicit looping over ansible inventory is generally discouraged anyway, but I can see that in this particular case this lends itself nicely to create wireguard config in pretty elegant way. Unfortunately, elegance comes with price.
The text was updated successfully, but these errors were encountered: