Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: IPMI managment of IPv4 static addresses have numerous behavioral issues #60

Open
Howitzer105mm opened this issue Feb 1, 2023 · 3 comments

Comments

@Howitzer105mm
Copy link
Contributor

Problem statement

Intel QA reports their tests for managing static IPv4 addresses are
failing. Specificially the long established mechanism for clearing
a static IPv4 address is no longer operational.

The test consists of the following steps:

The SUT begins with DHCP v4 and v6 enabled
DHCPv4 is disabled
Any existing static IPv4 address is explicitly cleared
DHCPv4 is re-enabled

Duplicate the issue

The defect entry describes a short list of IPMI commands to cause
the faulty behavior. The investigation into the issue described in
the case has subsequently increased the number of commands issued
and tested.

Remove the network config file

The time-zero state of a BMC network interface is for the
/etc/systemd/network/00-bmc-ethx.network file to be implicitly
held by the systemd-networkd system. For a BMC that has been in
use for some time that state can be replicated by deleting that
configuration file.

# rm /etc/systemd/network/00-bmc-eth0.network
# reboot

After the BMC reboots the BMC NIC will be configured by
systemd-networkd default state, per recent phosphor-networkd
source code changes.

Get LAN address source

Determine the current state of the IPv4 stack for the BMC NIC.

# ipmitool raw 0xc 2 3 4 0 0
11 02

The BMC NIC is assigned an IP address from a DHCP server.

Inspect the current IPv4 state

Find out what IPv4 address has been assigned to the BMC NIC from
the local DHCPv4 server.

# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 192.168.30.130/24 brd 192.168.30.255 scope global dynamic eth0
       valid_lft 534sec preferred_lft 534sec

Disable DHCPv4

Use IPMI to turn off the DHCPv4 address assignment.

# ipmitool raw 0xc 1 1 4 1

The default state for phosphor-networkd is to use a defacto
systemd-networkd configuration. The default state is defined in
the systemd-networkd service. Moving from DHCP to static IPv4
addressing writes a networkd.conf file into
/etc/systemd/network/00-bmc-ethx.network. The configuration of
the NIC can be inspected explicitly now.

Confirm the IPMI IPv4 address source

Examine the state of the BMC NIC after turning off the DHCPv4
functionality.

# ipmitool raw 0xc 2 1 3 0 0
 11 00 00 00 00
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft foreve
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The BMC NIC only has a self assigned IPv4 address. As can be seen
in the eth0.network contents:

DHCP=ipv6
IPv6AcceptRA=true

Only IPV6 dynamic assignment is active.

Clear any static IPv4 address

IPMI uses the 0.0.0.0 address assignment to remove any active
IPv4 static address assignment. Performing this action should be
effectively a No Operation in the current state of the NIC.

# ipmitool raw 0xc 1 1 3 0 0 0 0
# ipmitool raw 0xc 2 1 3 0 0
 11 00 00 00 00
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.242.17/32 scope global eth0
       valid_lft forever preferred_lft forever
root@obmcjgmacm0:~# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

As can be seen, the operation was not a No Operation. The
network configuration file received a new entry
(Address=0.0.0.0/32). As can also be seen, the BMC NIC has
acquired a random IP address.

Re-enable DHCPv4

Now restore DHCPv4 address assignment.

# ipmitool raw 0xc 1 1 4 2
# ipmitool raw 0xc 2 1 3 0 0 
 11 00 00 00 00
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 192.168.242.17/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.190.241/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.30.130/24 brd 192.168.30.255 scope global dynamic eth0
       valid_lft 578sec preferred_lft 578sec
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=192.168.242.17/32
Address=0.0.0.0/32
DHCP=true
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

This is a very strange combination of state.

  1. The DHCP v4 address (192.168.30.130) should be the only
    IPv4 address assigned.
  2. The static address assignment 0.0.0.0 is still present.
  3. The IPMI Get Lan IP Address shows there are several
    randomly assigned addresses.

This is not desirable behavior.

Issue is confirmed

The issue described by the Intel QA team is confirmed. Additional
testing of IPv4 address assignment shows additional undesirable
address handling artifacts.

Perform an extended sequence

Given the odd behavior above, try a more involved example.

Restart from a clean state

Restore the BMC to a "pristine" state.

# rm /etc/systemd/network/00-bmc-eth0.network
# reboot

Get LAN address source

Determine the current state of the IPv4 stack for the BMC NIC.

# ipmitool raw 0xc 2 3 4 0 0
11 02

The BMC NIC is assigned an IP address from a DHCP server.

Set LAN address source to static

# ipmitool raw 0xc 1 3 4 1

Get LAN address source

Confirm the BMC NIC is only accepting statically assigned IPv4
addresses.

# ipmitool raw 0xc 2 3 4 0 0 
11 01

The BMC NIC is only enabled to accept static addresses.

Collect the network configuration file

# cat /etc/systemd/network/00-bmc-eth0.network
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Gateway=<IPv4Address>
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

Assign a static IP address

Assign the static IPv4 address 192.168.20.123 and see what
results of the assignment.

# ipmitool raw 0xc 1 1 3 192 168 20 123
# ipmitool raw 0xc 2 1 3 0 0
 11 c0 a8 14 7b
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.20.123/32 scope global eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=192.168.20.123/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

This all looks correct. DHCPv4 has been disabled, and a static
IPv4 address assigned.

Remove the assigned address

Removing the IP address via IPMI is done by assigning the
0.0.0.0 address to the NIC. This is the long standing IPMI
method for clearing an assigned IPv4 static address.

# ipmitool raw 0xc 1 1 3 0 0 0 0
# ipmitool raw 0xc 2 1 3 0 0
 11 00 00 00 00
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.172.115/32 scope global eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The expectation is that there would no longer be a static IPv4
address present. Instead there is a 0.0.0.0 address in the
network file. Making matters worse there is a randomly assigned
address.

Reboot the BMC

Prior to re-enabling DHCPv4 see what happens when the BMC is
reset.

# ipmitool raw 6 2
# ### Wait for BMC to reboot to login prompt
# ipmitool raw 0xc 2 1 3 0 0
 11 c0 a8 87 27
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.241.61/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.135.39/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.24.225/32 scope global eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

A BMC reboot has caused there to be two randomly assigned
addresses. The addresses are not the same as the one prior to the
reboot.

NOTE: I have witnessed the list of static addresses
increase. One being added for each BMC reboot. This artifact did
not present in this sequence. It may be due to testing this on a
different SUT generation. A newer generation of SUT has shown
the 1:1 relationship of reboot to the addition of another random
IPv4 address assignment.

AC cycle the system under test

There's unexpected behavior when the BMC reboots. What occurs
when the whole system is power cycled?

# ipmitool raw 0xc 2 1 3 0 0 
 11 c0 a8 5e 0b
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.102.167/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
    inet 192.168.94.11/32 scope global eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The BMC is still getting a random IPv4 address assigned.

Now re-enable DHCP

# ipmitool raw 0xc 1 1 4 2
# ipmitool raw 0xc 2 1 3 0 0
 11 c0 a8 00 83
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 192.168.0.131/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.30.130/24 brd 192.168.30.255 scope global dynamic eth0
       valid_lft 568sec preferred_lft 568sec
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
Address=0.0.0.0/32
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

This is a very strange combination of state.

  1. The DHCP v4 address (192.168.30.130) should not be
    assigned. The network configuration file shows only ipv6
    is active (DHCP=ipv6).
  2. The static address assignment 0.0.0.0 is still present.
  3. The IPMI Get Lan IP Address shows the randomly assigned
    address.

This is not desirable behavior.

Disable DHCPv4 again

# ipmitool raw 0xc 1 1 4 1
# ipmitool raw 0xc 2 1 3 0 0
 11 00 00 00 00
# ip -4 a show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 169.254.102.167/16 brd 169.254.255.255 scope link eth0
       valid_lft forever preferred_lft forever
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
DHCP=ipv6
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The state of the network configuration file is better.

  1. The IPMI Get Lan IP Address command returns 0.0.0.0
  2. The DHCP value is correct (DHCP=ipv6)
  3. There isn't some randomly assigned address

Enable DHCPv4 again

Having eliminated the Address=0.0.0.0 entry in the network
configuration file, what is the state of the SUT when DHCPv4 is
re-enabled?

# ipmitool raw 0xc 1 1 4 2
# ipmitool raw 0xc 2 1 3 0 0 
 11 c0 a8 1e 82
# ip -4 a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 192.168.30.130/24 brd 192.168.30.255 scope global dynamic eth0
       valid_lft 578sec preferred_lft 578sec
# cat /etc/systemd/network/00-bmc-eth0.network 
[DHCP]
SendHostname=true
UseHostname=true
UseDomains=true
UseDNS=true
UseNTP=true
ClientIdentifier=mac
[IPv6AcceptRA]
DHCPv6Client=true
[Network]
DHCP=true
IPv6AcceptRA=true
LinkLocalAddressing=yes
[Link]
[Match]
Name=eth0

The BMC NIC is back to a clean state. The only difference now is
the /etc/systemd/network/00-bmc-eth0.network file is explicit
instead of implicit.

Conclusion

The recent changes to phosphor-network are a regression from
prior behavior. Several undesirable artifacts occur when using
IPMI to configure IPv4 static addresses.

The path to restoring a known good state is convoluted. A BMC
user who has assigned a static IPv4 address, and then decides
they no longer want it active is not going to find the DHCPv4
enable->disale->enable sequence desirable.

@sunharis
Copy link
Contributor

sunharis commented Apr 4, 2023

@Howitzer105mm
Copy link
Contributor Author

I am investigating these changes. There appears to be a new artifact with p-n not related to these changes. The SUT I am using has a NCSI NIC, and it is not being enumerated. Trying to ID what is causing that issue.

@Howitzer105mm
Copy link
Contributor Author

I am continuing to investigate this. The NSCI NIC issue was not related to these changes.

I have seen one piece of undesirable behavior related to the three commits. I am still characterizing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants