Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud-init cannot access 169.254.169.254 metadata on OpenStack #2139

Open
mark-ac-garcia opened this issue Nov 21, 2024 · 3 comments
Open

cloud-init cannot access 169.254.169.254 metadata on OpenStack #2139

mark-ac-garcia opened this issue Nov 21, 2024 · 3 comments

Comments

@mark-ac-garcia
Copy link

mark-ac-garcia commented Nov 21, 2024

[stable/zed]
neutron version:
$ neutron-dhcp-agent --version
neutron-dhcp-agent 21.2.1.dev40
Chart version: 0.3.29

nova version:
$ nova-api --version
26.2.2
Chat version: 0.3.27

reference: https://kubernetes.slack.com/archives/C056YSPJB7U/p1729689879392439

xvlan or project network:
Name: test-network
ID: 3caceee8-e1e3-4d89-b2a1-f392df57a5e0

Subnet:
NameL test-subnet
ID: d800ee64-c0ed-4b6b-b7fb-43cda3097bc0
allocation_pools:

  • end: 10.0.2.254
    start: 10.0.2.5
    cidr: 10.0.2.0/24
    dns_nameservers:
  • 8.8.8.8
  • 8.8.4.4
    enable_dhcp: true
    gateway_ip: 10.0.2.1
    ip_version: 4

The instances that fall into one our compute nodes, let's call it: kvmB
and is using the network test-network,
Well those instance have an connectivity issue:

[   22.005499] cloud-init[570]: ci-info: |   6   | 169.254.169.254 | 10.0.2.2 | 255.255.255.255 |    ens3   |  UGH  |
[   22.006545] cloud-init[570]: ci-info: +-------+-----------------+----------+-----------------+-----------+-------+
[   22.007620] cloud-init[570]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   22.008592] cloud-init[570]: ci-info: +-------+-------------+---------+-----------+-------+
[   22.009455] cloud-init[570]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   22.010315] cloud-init[570]: ci-info: +-------+-------------+---------+-----------+-------+
[   22.011186] cloud-init[570]: ci-info: |   1   |  fe80::/64  |    ::   |    ens3   |   U   |
[   22.012051] cloud-init[570]: ci-info: |   3   |    local    |    ::   |    ens3   |   U   |
[   22.012906] cloud-init[570]: ci-info: |   4   |  multicast  |    ::   |    ens3   |   U   |
[   22.013769] cloud-init[570]: ci-info: +-------+-------------+---------+-----------+-------+
[   52.024622] cloud-init[570]: 2024-11-20 17:19:14,774 - url_helper.py[WARNING]: Timed out waiting for addresses: http://[fe80::a9fe:a9fe%25ens3]/openstack http://169.254.169.254/openstack, exception(s) 
raised while waiting: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
[   52.040633] cloud-init[570]: 2024-11-20 17:19:14,774 - url_helper.py[ERROR]: Timed out, no response from urls: ['http://[fe80::a9fe:a9fe%25ens3]/openstack', 'http://169.254.169.254/openstack']
[   52.048416] cloud-init[570]: 2024-11-20 17:19:14,774 - util.py[WARNING]: No active metadata service found

I found out:

  • cloud-init cannot access 169.254.169.254 metadata on OpenStack for those instances ( as you see above )
  • Interface configuration entries are missing on one of the control planes ( below )
root@controller1:~# ip netns exec qdhcp-3caceee8-e1e3-4d89-b2a1-f392df57a5e0 ip a s dev tapae5f7381-e1
11795: tapae5f7381-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:77:e9:30 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.4/24 brd 10.0.2.255 scope global tapae5f7381-e1
       valid_lft forever preferred_lft forever
    inet 169.254.169.254/32 brd 169.254.169.254 scope global tapae5f7381-e1
       valid_lft forever preferred_lft forever
    inet6 fe80::a9fe:a9fe/128 scope link
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe77:e930/64 scope link
       valid_lft forever preferred_lft forever

root@controller2:~# ip netns exec qdhcp-3caceee8-e1e3-4d89-b2a1-f392df57a5e0 ip a s dev tap174eb54a-12
12148: tap174eb54a-12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:e9:02:92 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.2/24 brd 10.0.2.255 scope global tap174eb54a-12
       valid_lft forever preferred_lft forever
    inet 169.254.169.254/32 brd 169.254.169.254 scope global tap174eb54a-12
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fee9:292/64 scope link
       valid_lft forever preferred_lft forever

root@controller3:~# ip netns exec qdhcp-3caceee8-e1e3-4d89-b2a1-f392df57a5e0 ip a s dev tap0489c152-59
12951: tap0489c152-59: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:12:cc:6b brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.3/24 brd 10.0.2.255 scope global tap0489c152-59
       valid_lft forever preferred_lft forever
    inet 169.254.169.254/32 brd 169.254.169.254 scope global tap0489c152-59
       valid_lft forever preferred_lft forever
    inet6 fe80::a9fe:a9fe/128 scope link
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe12:cc6b/64 scope link
       valid_lft forever preferred_lft forever

Did you see it? the one on controller2, we are missing a inet6

Also,

root@controller1:~# ip netns exec qdhcp-3caceee8-e1e3-4d89-b2a1-f392df57a5e0 ip -6 r
fe80::a9fe:a9fe dev tapae5f7381-e1 proto kernel metric 256 pref medium
fe80::/64 dev tapae5f7381-e1 proto kernel metric 256 pref medium

root@controller2:~# ip netns exec qdhcp-3caceee8-e1e3-4d89-b2a1-f392df57a5e0 ip -6 r
fe80::/64 dev tap174eb54a-12 proto kernel metric 256 pref medium

I added those int6 entries manually, and reboot the instances,

ip netns exec qdhcp-3caceee8-e1e3-4d89-b2a1-f392df57a5e0 ip -6 route add fe80::a9fe:a9fe dev tap174eb54a-12
ip netns exec qdhcp-3caceee8-e1e3-4d89-b2a1-f392df57a5e0 ip -6 addr add fe80::a9fe:a9fe/128 dev tap174eb54a-12

But the issue still persistent

Probably as mentioned in the Slack chat, this is just happening on one of the compute nodes when we are using the test-network
If we migrate the instance elsewhere, there is not issue, and instances on the compute node kvmB using other networks are OK.

@mark-ac-garcia
Copy link
Author

Is this something already fixed on a later release? or something completed new?

I know it is a very particular case

@mnaser
Copy link
Member

mnaser commented Nov 24, 2024

I think @guilhermesteinmuller hass seen this sort of isssues before.

@mark-ac-garcia
Copy link
Author

Hey there,
FYI; by deleting the xvlan or project network:
Name: test-network
ID: 3caceee8-e1e3-4d89-b2a1-f392df57a5e0

and recreating it with Terraform, the issue stopped happening. The only change we made was adjusting the CIDR from 10.0.2.0/24 to 10.0.20.0/24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants