Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncm-network nmstate: error with unconfigured interface with bootproto=dhcp #1671

Closed
jouvin opened this issue Mar 13, 2024 · 39 comments
Closed
Assignees
Milestone

Comments

@jouvin
Copy link
Contributor

jouvin commented Mar 13, 2024

We are trying to use ncm-network nmstate variant on a EL9 system, using the ncm-network version from Jan. 10, 2024. Unfortunately, we are still fighting with unsupported device name: we are using em (instead of eth) and it seems to be a problem. Below is the output of the first messages from ncm-network output. It seems to loop on these errors...

2024/03/13-12:28:18 [VERB] [WARN] ipv6 addr found but not supported
2024/03/13-12:28:18 [VERB] [INFO] nmstate_file_dump: file /etc/nmstate/vlan1901.yml has newer version scheduled.
2024/03/13-12:28:18 [VERB] [ERROR] Using static bootproto for vlan1901 and no (IPv4) ip configured
2024/03/13-12:28:19 [VERB] [INFO] Applying changes using /usr/bin/nmstatectl em1, em2, em3, em4, p2p1, p2p2, vlan1901
2024/03/13-12:30:25 [VERB] [ERROR] Error '/usr/bin/nmstatectl apply /etc/nmstate/em2.yml' output: [2024-03-13T11:28:19Z INFO  nmstate::query_apply::net_state] Created checkpoint /org/freedesktop/NetworkManager/Checkpoint/2
2024/03/13-12:30:25 [VERB] [2024-03-13T11:28:19Z INFO  nmstate::nm::query_apply::profile] Modifying connection UUID Some("ed109d98-d967-489f-9f9c-a2aad16f2d6f"), ID Some("em2"), type Some("802-3-ethernet") name Some("em2")
2024/03/13-12:30:25 [VERB] [2024-03-13T11:28:19Z INFO  nmstate::nm::query_apply::profile] Activating connection ed109d98-d967-489f-9f9c-a2aad16f2d6f: em2/802-3-ethernet
2024/03/13-12:30:25 [VERB] [2024-03-13T11:28:19Z INFO  nmstate::nm::query_apply::profile] Got activation failure Bug: Manager(UnknownDevice): No suitable device found for this connection (device em1 not available because profile is not compatible with device (mismatching interface name)).
2024/03/13-12:30:25 [VERB] [2024-03-13T11:28:19Z INFO  nmstate::nm::query_apply::profile] Will retry activation 2 seconds
2024/03/13-12:30:25 [VERB] [2024-03-13T11:28:21Z INFO  nmstate::nm::query_apply::profile] Activating connection ed109d98-d967-489f-9f9c-a2aad16f2d6f: em2/802-3-ethernet
2024/03/13-12:30:25 [VERB] [2024-03-13T11:28:21Z INFO  nmstate::nm::query_apply::profile] Got activation failure Bug: Manager(UnknownDevice): No suitable device found for this connection (device em1 not available because profile is not compatible with device (mismatching interface name)).
2024/03/13-12:30:25 [VERB] [2024-03-13T11:28:21Z INFO  nmstate::nm::query_apply::profile] Will retry activation 4 seconds

If it helps, our interface configuration in the profile is:

+-/system/network/interfaces
  +-em1
    $ gateway : '134.158.72.1'
    $ ip : '134.158.72.9'
    $ netmask : '255.255.254.0'
    $ set_hwaddr : true
  +-em2
    $ bootproto : 'dhcp'
    $ onboot : false
    $ set_hwaddr : true
  +-em3
    $ bootproto : 'dhcp'
    $ onboot : false
    $ set_hwaddr : true
  +-em4
    $ bootproto : 'dhcp'
    $ onboot : false
    $ set_hwaddr : true
  +-p2p1
    $ bootproto : 'dhcp'
    $ onboot : false
    $ set_hwaddr : true
  +-p2p2
    $ bootproto : 'dhcp'
    $ onboot : false
    $ set_hwaddr : true
  +-vlan1901
    $ bootproto : 'static'
    $ device : 'em1.1901'
    $ ipv6_autoconf : false
    $ ipv6addr : '2001:660:3024:100:134:158:72:9/64'
    $ onboot : true
    $ physdev : 'em1'

One thing that is probably not related to this error but may hurt is the VLAN naming scheme we use and seems to be a potential problem according to #1660...

@jouvin jouvin assigned jouvin and aka7 and unassigned jouvin Mar 13, 2024
@aka7
Copy link
Contributor

aka7 commented Mar 13, 2024

yes, your vlan config would not be correct without this #1667
can you perhaps try it with this PR? right now I suspect the vlan1901.yml file will have empty vlanid, if you look into the file?

@jouvin
Copy link
Contributor Author

jouvin commented Mar 13, 2024

Ok, I'll do. At first glance it was not clear it was the same pb... so I preferred to wait for your confirmation!

@jouvin
Copy link
Contributor Author

jouvin commented Mar 13, 2024

@aka7 unfortunately the PR you mentioned doesn't help, even if it fixes the VLAN problem. But the other errors remain about em devices.

em2.yml is:

# File generated by NCM::Component::nmstate. Do not edit
---
interfaces:
- ipv4:
    dhcp: true
    enabled: true
  mac-address: 44:a8:42:2e:2b:d5
  name: em2
  profile-name: em2
  state: up
  type: ethernet
routes:
  config:
  - next-hop-interface: em2
    state: absent

I attached the ncm-network log from the last run (with the MR added): [
ncm-cdispd.log
](url)

@aka7
Copy link
Contributor

aka7 commented Mar 14, 2024

@jouvin the config it generated looks good to me.
what does nmcli conn show? does the Mac address match em1 Mac?

I'm wondering If NM has already created a connection already under a different profile name? I'll try test this locally tomorrow.

also verify output of nmcli device state looks ok?

@jouvin
Copy link
Contributor Author

jouvin commented Mar 14, 2024

Here is em1.yaml:

# File generated by NCM::Component::nmstate. Do not edit
---
interfaces:
- ipv4:
    address:
    - ip: 134.158.72.9
      prefix-length: 23
    dhcp: false
    enabled: true
  mac-address: 44:a8:42:2e:2b:d3
  name: em1
  profile-name: em1
  state: up
  type: ethernet
routes:
  config:
  - next-hop-interface: em1
    state: absent
  - destination: 0.0.0.0/0
    next-hop-address: 134.158.72.1
    next-hop-interface: em1

I checked that in my last tests, CCM config and em2.yaml did not change. Other potentially useful information:

[root@psonar1 ~]# nmcli conn
Warning: nmcli (1.44.0) and NetworkManager (1.42.2) versions don't match. Restarting NetworkManager is advised.
NAME           UUID                                  TYPE      DEVICE
em1            a38a6dbf-d6ae-4c53-b5b2-689a8b6fae7a  ethernet  em1
idrac          4adf676f-3600-48c7-85c6-aa002a7d15b9  ethernet  idrac
lo             fa78c628-285b-4415-b398-50557dfae22c  loopback  lo
vlan1901       6bb73461-c291-4861-b896-4de360bca01b  vlan      vlan1901
em2            ed109d98-d967-489f-9f9c-a2aad16f2d6f  ethernet  --
em3            9196895c-2890-41b7-a2fa-4ffe10fdaffc  ethernet  --
em4            21fb25f9-3d0a-473e-8d04-19afdf14119c  ethernet  --
enp0s26u1u6u3  ed1621b7-31be-49c4-9c12-d421f01d5175  ethernet  --
p2p1           7b4ae44c-8bc5-4eb2-9fec-372541df43a9  ethernet  --
p2p2           95aeaceb-40ed-45f0-a065-ea9aea1ea632  ethernet  --
[root@psonar1 ~]# ip l show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d3 brd ff:ff:ff:ff:ff:ff
    altname enp1s0f0
3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d5 brd ff:ff:ff:ff:ff:ff
    altname enp1s0f1
4: em3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d7 brd ff:ff:ff:ff:ff:ff
    altname enp1s0f2
5: em4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d9 brd ff:ff:ff:ff:ff:ff
    altname enp1s0f3
6: p2p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0e:1e:9e:57:80 brd ff:ff:ff:ff:ff:ff
7: p2p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0e:1e:9e:57:82 brd ff:ff:ff:ff:ff:ff
8: idrac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 74:e6:e2:fc:96:67 brd ff:ff:ff:ff:ff:ff
    altname enp0s26u1u6u3
9: vlan1901@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d3 brd ff:ff:ff:ff:ff:ff

@aka7
Copy link
Contributor

aka7 commented Mar 15, 2024

@jouvin thanks for sharing the output. I can't see anything obvious.
but looking at the output of ncmcli, is em1 up and working?
and wondering if this happening on all interface configured for dhcp ?

what do you get what you run a manually
nmstatectl apply /etc/nmstate/em2.yml ?

and or trying nmcli conn delete em2, then nmstatectl apply?

I would first try to see if you can manually edit the file and get the interface up using nsmtateclt apply, following https://nmstate.io/examples.html#dynamic-ip-configuration. This way we can see if I missed to add a option for dhcp config. I must say we don't have any host configured with dhcp with nmstate so I haven't tested this part on a active host.

@jouvin
Copy link
Contributor Author

jouvin commented Mar 17, 2024

@aka7 Below is the output from the suggested command:

[root@psonar1 ~]# nmcli conn delete em2
Warning: nmcli (1.44.0) and NetworkManager (1.42.2) versions don't match. Restarting NetworkManager is advised.
Connection 'em2' (ed109d98-d967-489f-9f9c-a2aad16f2d6f) successfully deleted.
[root@psonar1 ~]# nmstatectl apply /etc/nmstate/em2.yml
[2024-03-17T22:20:43Z INFO  nmstate::query_apply::net_state] Created checkpoint /org/freedesktop/NetworkManager/Checkpoint/139
[2024-03-17T22:20:43Z INFO  nmstate::nm::query_apply::profile] Creating connection UUID Some("465046aa-9b89-4374-85dc-fe6ed381eb9c"), ID Some("em2"), type Some("802-3-ethernet") name Some("em2")
[2024-03-17T22:20:43Z INFO  nmstate::nm::query_apply::profile] Activating connection 465046aa-9b89-4374-85dc-fe6ed381eb9c: em2/802-3-ethernet
[2024-03-17T22:20:43Z INFO  nmstate::nm::query_apply::profile] Got activation failure Bug: Manager(UnknownDevice): No suitable device found for this connection (device em1 not available because profile is not compatible with device (mismatching interface name)).
[2024-03-17T22:20:43Z INFO  nmstate::nm::query_apply::profile] Will retry activation 2 seconds
[2024-03-17T22:20:45Z INFO  nmstate::nm::query_apply::profile] Activating connection 465046aa-9b89-4374-85dc-fe6ed381eb9c: em2/802-3-ethernet
[2024-03-17T22:20:45Z INFO  nmstate::nm::query_apply::profile] Got activation failure Bug: Manager(UnknownDevice): No suitable device found for this connection (device em1 not available because profile is not compatible with device (mismatching interface name)).
[2024-03-17T22:20:45Z INFO  nmstate::nm::query_apply::profile] Will retry activation 4 seconds

Whether I delete or not the connection, the output of nmstatectl is the same. I'll look at the reference you gave to see if I manage to find a working configuration and report it.

@aka7
Copy link
Contributor

aka7 commented Mar 18, 2024

I tried to apply a dhcp config internally here with nmstate/nm and it worked fine, although this was single interface vm.

I'm starting to wonder if you should look at the warning you see?
'Warning: nmcli (1.44.0) and NetworkManager (1.42.2) versions don't match'

The other thing of note, is you are trying to apply for em2, but error is
device em1 not available because profile is not compatible with device (mismatching interface name

looking at the config you shared, they appear to be in order, which is puzzling.

One more thing you can try is perhaps delete all current connections you see in the nmcli output and then apply again?

@jouvin
Copy link
Contributor Author

jouvin commented Mar 18, 2024

@aka7 Thanks for pointing the version mismatch problem, my eyes are tired, I don't see these details! It was due to the fact I upgraded the machine to a more recent Yum snapshot but forgot to reboot it. This one was easy! Unfortunately it has no impact on the other problems.

Exceppt that after reboot nmcli conn gives a slightly different output with em2 (and its UUID) missing.

[root@psonar1 ~]# nmcli conn
NAME           UUID                                  TYPE      DEVICE
em1            a38a6dbf-d6ae-4c53-b5b2-689a8b6fae7a  ethernet  em1
lo             bf27c91a-ac40-4987-853d-ee14164d32aa  loopback  lo
idrac          2c14d4d1-bc81-49c9-8a15-986a256f64be  ethernet  idrac
vlan1901       6bb73461-c291-4861-b896-4de360bca01b  vlan      vlan1901
em3            9196895c-2890-41b7-a2fa-4ffe10fdaffc  ethernet  --
em4            21fb25f9-3d0a-473e-8d04-19afdf14119c  ethernet  --
enp0s26u1u6u3  ed1621b7-31be-49c4-9c12-d421f01d5175  ethernet  --
p2p1           7b4ae44c-8bc5-4eb2-9fec-372541df43a9  ethernet  --
p2p2           95aeaceb-40ed-45f0-a065-ea9aea1ea632  ethernet  --

It is weird... as is the fact that it gives an error for all the interfaces (the same as for em2) except for em1 (no message) and the VLAN attached to it, but the complaint is always about em1 because its device is not matching the interface name...

I found https://forums.centos.org/viewtopic.php?t=79957 pointing about a somewhat similar issue (even if the context is different) and saying that connection names in nmcli must match kernel interface names displayed by ip link show. And clearly it is not the case:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d3 brd ff:ff:ff:ff:ff:ff
    altname enp1s0f0
3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d5 brd ff:ff:ff:ff:ff:ff
    altname enp1s0f1
4: em3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d7 brd ff:ff:ff:ff:ff:ff
    altname enp1s0f2
5: em4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d9 brd ff:ff:ff:ff:ff:ff
    altname enp1s0f3
6: p2p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0e:1e:9e:57:80 brd ff:ff:ff:ff:ff:ff
7: p2p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:0e:1e:9e:57:82 brd ff:ff:ff:ff:ff:ff
8: idrac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 74:e6:e2:fc:96:67 brd ff:ff:ff:ff:ff:ff
    altname enp0s26u1u6u3
9: vlan1901@em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 44:a8:42:2e:2b:d3 brd ff:ff:ff:ff:ff:ff

Here em2 is present but not enp0s26u1u6u3 whose UUID is different from the one reported for em2? I could temporarily suppress the VLAN to be sure that it does not contribute to the mess.

There is also a long thread, https://forums.rockylinux.org/t/unable-to-manage-activate-network-interface/7440/17, about this error in a context where it was related to the HW if I understood properly... I should may try another HW to see if it specific to it or a wider problem.

@jouvin
Copy link
Contributor Author

jouvin commented Mar 19, 2024

@aka7 I think I identified the problem. It seems to be caused by all th "unused" interfaces (not connected, without a static configuration) that have the onboot protocol defined as dhcp resulting, if I am right in my observations, in nmstate defining dhcp=true, enabled=true. Setting enable=false seems to fix the issue.

I cannot say whether it should be expected (the existing ncm-network configuration is perfectly working with the standard ncm-network configuration module)... but at least I think I have a workaround to move forward...

@jouvin jouvin changed the title ncm-network nmstate: error complaining about em interface name ncm-network nmstate: error with unconfigured interface with bootproto=dhcp Mar 19, 2024
@jouvin
Copy link
Contributor Author

jouvin commented Mar 28, 2024

Sorry for the late follow-up, working on too many things in //. I was able to install from scratch an EL9 machine this morning and I had ncm-nmstate working properly after setting nm_manage_dns=true. It seems to be a requirement, may be we should document it at list or see if we can have a default different whether we use ncm-network or ncm-nmstate.

When false, it creates a /etc/nmstate/resolv.yml (may be it should not...) where all the parameters are empty and this gives an error when applying this configuration.

@aka7
Copy link
Contributor

aka7 commented Mar 28, 2024 via email

@jouvin
Copy link
Contributor Author

jouvin commented Mar 28, 2024

@aka7 I think there is a misunderstanding: we don't use DHCP. dhcp was enabled as the bootproto for unused interface, it used to be the default in Quattor template library since the beginning and was not posing problem with the standard ncm-network. It seems to be a problem with NetworkManager/nmstate (probably needs something more in the configuration) so I set bootproto=none on all the unused interfaces and this allowed ncm-nmstate to run properly on an already (partially) configured system.

I got the error on a freshly installed server, so I suspect a chicken & egg problem betwen ncm-nmstate and the resolver configuration by other means (I didn't experience it in the test done before). The "got" value in the error message is the configuration coming from Kickstart I guess. As the profile had nm_manage_dns=false, I set it to true and deployed the new profile successfully, thus my ticket!

Since then, I tried to reinstall the system with the new setting and the first run of SPMA fails with an error saying it is unable to resolve the hostname of the YUM server so I suspect nm_manage_dns=true is the culprit with NetworkManager trying to do clever things. When I connect the system the host name is resolved, so not sure what the problem is. I'll try to reinstall with nm_manage_dns=false... It is weird that there so many touchy details that prevent ncm-nmstate to be a drop in replacement for ncm-network, based on our experience...

@jouvin
Copy link
Contributor Author

jouvin commented Mar 28, 2024

BTW we don't use ncm-resolver, may be we should... We rely on the configuration created by Kickstart and we had no problem so far.

@aka7
Copy link
Contributor

aka7 commented Mar 28, 2024

@jouvin understood. Yes there are few things (hacks/workaround whatever you want to call it) we had to put in at post ks to fix the initial first reboot resolver issue, because NetworkManager interferes with resolv.conf by the time ncm runs.

So what we did is add the drop-in file in post ks for networkmanger not to manage dns so its there on ks post reboot, then you will get resolv.conf untouched as is from kickstart install.

NOTE: this isn't nmstate issue, this is a result of being forced to use NetworkManager in rhel9 which does all the changes early on in reboot process on first reboot and creates the empty managed by NM resolve.conf.

@aka7
Copy link
Contributor

aka7 commented Mar 28, 2024

Do you have something like this in post ks? if not try this

"/system/aii/hooks/post_install" = {
    append(dict(
        "module", "custom",
        "commands", list(
            dict("command", format('printf "[main]\ndns=none\n" > /etc/NetworkManager/conf.d/90-quattor.conf')),
      ));
};

@jouvin
Copy link
Contributor Author

jouvin commented Mar 28, 2024

@aka7 sorry if gave the impression that I thought it was ncm-nmstate fault... I understood it is the change in the workflow introduced by NetworkManager use that take the controls of everything related to the network early in the boot process...

In fact my attempt to reinstall with nm_manage_dns=false results in the same problem with ncm-spma, so you are right it is not related. Thanks for the trick, I don't have it and will try it...

@jouvin
Copy link
Contributor Author

jouvin commented Mar 28, 2024

Unfortunately after adding the suggested hook, I get the following error when running aii-shelffe:

*** Couldn't instantiate object of hook class custom (AII::custom): Can't locate object method "new" via package "AII::custom" (perhaps you forgot to load "AII::custom"?) at /usr/lib/perl/NCM/Component/ks.pm line 414.

What is the missing bit?

@jouvin
Copy link
Contributor Author

jouvin commented Mar 29, 2024

@aka7 I checked that after booting, you can run successfully ncm-ncd --configure --all which confirms the boot process timing issue. But what worries me, is that if I reboot the system after this successful command (and thus with the required file existing in /etc/NetworkManager/conf.d), the problem remains the same: ncm-ncd run by ncm-cdispd at boot time cannot resolve the name of the YUM repository server... Have you been able to work around this?

@aka7
Copy link
Contributor

aka7 commented Mar 29, 2024

@jouvin that is strange, I haven't experienced this issue. What is contents of resolv.conf before and after the reboot? Normally if it has been replaced by NetworkManager it will have a comment on it? We did have to fix few ncm dependency issues but that is because we configure local dns caching using metaconfig so we make sure that his is run before ncm-resolver. But I know you said you dont have this setup, however, check if you have any ncm ordering issue and the contents of resolv.conf when it stops working to see if NM is changing it back.

@jouvin
Copy link
Contributor Author

jouvin commented Mar 30, 2024

@aka7 i think it may be related to quattor/ncm-cdispd#56. May be the fact that ncm-cdispd doesn't have a proper systemd unit file results in an incorrect ordering... I may try to fix it next week, should not be difficult.

Have you seen my previous comment about AII complaining it has no plug in for custom?

@aka7
Copy link
Contributor

aka7 commented Mar 30, 2024

@jouvin
Yes I have. I haven't seen this myself, but our internal release of Aii is a bit different and I'm not sure how behind we are with AII or if this is something we added internally. I can check on Tuesday to see if our internal aii code is different to upstream.

@jouvin
Copy link
Contributor Author

jouvin commented Mar 30, 2024

@aka7 thanks. I can imagine it is an add-on you développed. If it is the case, if you can share it I'm interested!

@jouvin
Copy link
Contributor Author

jouvin commented Apr 2, 2024

@aka7 good news. I reinstalled my test system with the new cdp-listend and ncm-cdispd that both use a systemd unit instead of the legacy init scripts and now everything works properly.

I didn't use the AII hook (as I'm missing the corresponding plugin) and I use nm_manage_dns=false. I had no problem during initial installation or the reboot made afterwards. It seems I have now a running config! Thanks for your help!

@aka7
Copy link
Contributor

aka7 commented Apr 2, 2024

@jouvin great news. I would be interested in the updated cdp-listend without chkconfig once its merged. I will take a look to see what we added for the custom hook and if we can put a PR through. its very useful. I thought it was from upstream.

@jouvin
Copy link
Contributor Author

jouvin commented Apr 2, 2024

@aka7 I don't think the cdp-listend changes has any influence on the boot ordering problem (in fact it was already using a systemd unit but was still bringing also the chkconfig dependency for EL6 compatibility). What seems to make the difference is the ncm-cdispd changes. If you want to give them a try before the PR is merged, use my PR to @stdweird PR... (the only problem is that the tests are not passing so you need to disable/remove them to build the package until the problems with the tests are fixed).

@aka7
Copy link
Contributor

aka7 commented Apr 3, 2024

@jouvin I'm bit lost on this, which PR should I try?
also are we saying we don't need chkconfig package anymore? because it is still needed by /usr/lib/perl/NCM/Component/Systemd/Service/Chkconfig.pm ?

[VERB] Getting output of command: /sbin/chkconfig --list
[VERB] Perl warning: Use of uninitialized value $data in concatenation (.) or string at /usr/lib/perl/NCM/Component/Systemd/Service/Chkconfig.pm line 100.

@jouvin
Copy link
Contributor Author

jouvin commented Apr 3, 2024

@aka7 about the PR, you may want to try stdweird/ncm-cdispd#1, which is a PR I opened on @stdweird PR, quattor/ncm-cdispd#61 that was adressing the missing systemd unit but not removing the chkconfig requirement.

As for chkconfig, I don't say it is not needed but it is not required anymore by cdp-listend or ncm-cdispd. On my EL9 systems it is still there because initscripts is installed and requires it. The main problem was not that those Quattor components were requiring chkconfig but the fact that they were creating /etc/init.d if they were installed before chkconfig, resulting in a conflict when chkconfig was installed.

@aka7
Copy link
Contributor

aka7 commented Apr 3, 2024

@jouvin thanks. makes sense. I didnt hit this issue until recently when I tried build an image and use that, then spma failed trying to install chkconfig.

but
looking at the generated rpm from your pr, should /etc/init.d/cdp-listend still be provided?


$ rpm -qpl target/rpm/cdp-listend/RPMS/noarch/cdp-listend-23.9.0-rc2_SNAPSHOT20240403163751.noarch.rpm
/etc/cdp-listend.conf
/etc/cron.d/check-cdp-listend.cron
/etc/init.d/cdp-listend
/etc/logrotate.d/cdp-listend
/usr/lib/systemd/system-preset/80-cdp-listend.preset
/usr/lib/systemd/system/cdp-listend.service
/usr/sbin/cdp-listend
/usr/sbin/check-cdp-listend
/usr/share/doc/cdp-listend-23.9.0-rc2-SNAPSHOT/ChangeLog
/usr/share/quattor/check-cdp-listend

@jouvin
Copy link
Contributor Author

jouvin commented Apr 3, 2024

@aka7 the Pr I mentioned is for ncm-cdispd. There is a similar PR for cdp-listend (by me) that suppress everything chkconfig related from cdp-listend rpm.

@aka7
Copy link
Contributor

aka7 commented Apr 11, 2024

@jouvin I guess what I'm saying is, do we now need to install /etc/init.d/cdp-listend ?
as if this is installed first and chkconfig package is installed later, it fails to install because it can't create /etc/init.d symlink

 Error unpacking rpm package chkconfig-1.24-1.el9.x86_64
error: unpacking of archive failed on file /etc/init.d;6617f729: cpio: File from package already exists as a directory in system
error: chkconfig-1.24-1.el9.x86_64: install failed

I thought it also fixes this, but it doesn't.
Should the need to install this file be also be dropped if its not going to be used?

@jouvin
Copy link
Contributor Author

jouvin commented Apr 11, 2024

@aka7 yes my pr is supposed to fix this pb. It worked on the few machines we installed.

@aka7
Copy link
Contributor

aka7 commented Apr 12, 2024

@jouvin so I think your PR stops it pulling in chkconfig pkg but /etc/init.d still gets created. So if we still need to use chkconfig pkg it will fail to install. So my question is should this be omitted too.

@jouvin
Copy link
Contributor Author

jouvin commented Apr 12, 2024

@aka7 i must double check but the goal of the Pr is to stop creating /etc/Init.d. If its not the case I 'need to fix it. Ill check the rpm I use as I had no conflict with chkconfig, conversely to the previous situation...

@aka7
Copy link
Contributor

aka7 commented Apr 15, 2024

@jouvin so from what I've noticed,
with your PR, I see chkconfig dependency is removed but /etc/init.d/cdp-listend still being installed.
unless I got the wrong pr?

rpm -qpl target/rpm/cdp-listend/RPMS/noarch/cdp-listend-23.9.0-rc2_SNAPSHOT20240415105908.noarch.rpm
/etc/cdp-listend.conf
/etc/cron.d/check-cdp-listend.cron
/etc/init.d/cdp-listend
/etc/logrotate.d/cdp-listend
/usr/lib/systemd/system-preset/80-cdp-listend.preset
/usr/lib/systemd/system/cdp-listend.service
/usr/sbin/cdp-listend
/usr/sbin/check-cdp-listend
/usr/share/doc/cdp-listend-23.9.0-rc2-SNAPSHOT/ChangeLog
/usr/share/quattor/check-cdp-listend

so don't you also need this?

diff --git a/pom.xml b/pom.xml
index 2e954ae..480af7b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -129,16 +129,6 @@
               <mapping>
                 <sources>
                   <source>
-                    <location>${project.build.directory}/daemon</location>
-                  </source>
-                </sources>
-                <directory>/etc/init.d</directory>
-                <filemode>755</filemode>
-                <directoryIncluded>false</directoryIncluded>
-              </mapping>
-              <mapping>
-                <sources>
-                  <source>
                     <location>${project.build.directory}/systemd/${project.artifactId}.service</location>
                   </source>
                 </sources>

@jouvin
Copy link
Contributor Author

jouvin commented Apr 16, 2024

@aka7 I'm sure you got it right, I'll have a look. I think I did it in 2 steps but I may have down a mistake pushing the fixes...

@jouvin
Copy link
Contributor Author

jouvin commented Apr 16, 2024

@aka7 I suggest closing this issue (or keeping it for the original subject!) and moving the discussion about cdp-listend or ncm-cdispd to the relevant PRs.

@aka7
Copy link
Contributor

aka7 commented Apr 18, 2024

@jouvin yes that's fine by me.

@jouvin jouvin closed this as completed Nov 14, 2024
@jrha jrha added this to the 24.10 milestone Nov 14, 2024
@jrha
Copy link
Member

jrha commented Nov 14, 2024

If I understand correctly this was resolved by quattor/template-library-core#235.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants