Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user to specify order of network interfaces inside application #74

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

milan-zededa
Copy link
Contributor

Currently, when application has both virtual network interfaces
configured and some network devices directly assigned, EVE will first
attach virtual interfaces in the order that follows ACL IDs (a historical
workaround for missing interface order field), followed by directly assigned
network adapters, in the order of the AppInstanceConfig.adapters
list.

To allow the user to specify the order between all application
network interfaces (across both virtual and passthrough devices), we
introduce a new boolean flag enforce_network_interface_order inside
the application instance config and allow the controller to pass the order
requirements for all the application network adapters.

For backward compatibility reasons, by default this will be disabled
and the original ordering method will remain in use.

Currently, when application has both virtual network interfaces
configured and some network devices directly assigned, EVE will first
attach virtual interfaces in the order that follows ACL IDs (a historical
workaround for missing interface order field), followed by directly assigned
network adapters, in the order of the AppInstanceConfig.adapters
list.
To allow the user to specify the order between all application
network interfaces (across both virtual and passthrough devices), we
introduce a new boolean flag enforce_network_interface_order inside the
application instance config and allow the controller to pass the order
requirements for all application network adapters.
For backward compatibility reasons, by default this will be disabled
and the original ordering method will remain in use.

Signed-off-by: Milan Lenco <[email protected]>
Signed-off-by: Milan Lenco <[email protected]>
@uncleDecart
Copy link
Member

Why changes in appconfig? Ordering is for physical devices on am I wrong?

@milan-zededa
Copy link
Contributor Author

Why changes in appconfig? Ordering is for physical devices on am I wrong?

Ordering is for application interfaces (i.e. inside the app, not in EVE/host) - this includes the virtual (virtio) interfaces and directly assigned network devices.

@uncleDecart
Copy link
Member

@rucoder do I remember it correctly that there're some gotchas in enforcing ordering of PCI devices, can you confirm or deny that? :D

@milan-zededa
Copy link
Contributor Author

milan-zededa commented Nov 20, 2024

I'm hoping that this is as simple as setting the Addr to respect the configured interface order.
Here: https://github.com/lf-edge/eve/blob/master/pkg/pillar/hypervisor/kvm.go#L367
And here: https://github.com/lf-edge/eve/blob/master/pkg/pillar/hypervisor/kvm.go#L412
But PCI bridges probably complicate that...
(qemu/kvm is the priority, ordering for xen and kubevirt can be implemented later)
CC @rucoder

@uncleDecart
Copy link
Member

Why changes in appconfig? Ordering is for physical devices on am I wrong?

Ordering is for application interfaces (i.e. inside the app, not in EVE/host) - this includes the virtual (virtio) interfaces and directly assigned network devices.

But how do you link those? Maybe I'm lacking a bit of understanding:

I have 3 NICs which are in PCI slot 1, 2, 3 I want them to be eth1, eth2 and eth3 respectively on host and then in EdgeApp I want eth1 from host to be eth2 inside my app 1 and in app 2 I want eth2 and eth3 to be eth1 and eth2 inside the app 2. Is it what you're trying to achieve in a nutshell (I'm not taking yet into consideration how we do network plumbing in EVE)

@milan-zededa
Copy link
Contributor Author

Why changes in appconfig? Ordering is for physical devices on am I wrong?

Ordering is for application interfaces (i.e. inside the app, not in EVE/host) - this includes the virtual (virtio) interfaces and directly assigned network devices.

But how do you link those? Maybe I'm lacking a bit of understanding:

I have 3 NICs which are in PCI slot 1, 2, 3 I want them to be eth1, eth2 and eth3 respectively on host and then in EdgeApp I want eth1 from host to be eth2 inside my app 1 and in app 2 I want eth2 and eth3 to be eth1 and eth2 inside the app 2. Is it what you're trying to achieve in a nutshell (I'm not taking yet into consideration how we do network plumbing in EVE)

Yes, but the real complication is that we need to allow specifying order across both virtual and directly assigned interfaces. So for example, you may have config where (inside app) eth1 is a virtual interface, eth2 is directly assigned, eth3 is another virtual interface, eth3 is another directly assigned NIC, etc. So somehow we need to be able to interleave virtio with passthrough devices.

@uncleDecart
Copy link
Member

Why changes in appconfig? Ordering is for physical devices on am I wrong?

Ordering is for application interfaces (i.e. inside the app, not in EVE/host) - this includes the virtual (virtio) interfaces and directly assigned network devices.

But how do you link those? Maybe I'm lacking a bit of understanding:
I have 3 NICs which are in PCI slot 1, 2, 3 I want them to be eth1, eth2 and eth3 respectively on host and then in EdgeApp I want eth1 from host to be eth2 inside my app 1 and in app 2 I want eth2 and eth3 to be eth1 and eth2 inside the app 2. Is it what you're trying to achieve in a nutshell (I'm not taking yet into consideration how we do network plumbing in EVE)

Yes, but the real complication is that we need to allow specifying order across both virtual and directly assigned interfaces. So for example, you may have config where (inside app) eth1 is a virtual interface, eth2 is directly assigned, eth3 is another virtual interface, eth3 is another directly assigned NIC, etc. So somehow we need to be able to interleave virtio with passthrough devices.

When we do passthrough of the device we do it on PCI level, so in guest system I see PCI 0000:00:01 or whichever and when I pass virtio device I create a tap interface vie qemu on guest side. Can you specify in passthrough address to be assigned in guest system?
And yes, PCI bridges might complicate things

@uncleDecart
Copy link
Member

I mean, this is what you describe, right?

PCI -> host iface -> guest iface # virtual interface 
PCI -> guest iface # passthrough 

@christoph-zededa
Copy link
Contributor

and when I pass virtio device I create a tap interface vie qemu on guest side. Can you specify in passthrough address to be assigned in guest system?

Isn't it this one https://github.com/lf-edge/eve/blob/master/pkg/pillar/hypervisor/kvm.go#L383 ?

@uncleDecart
Copy link
Member

and when I pass virtio device I create a tap interface vie qemu on guest side. Can you specify in passthrough address to be assigned in guest system?

Isn't it this one https://github.com/lf-edge/eve/blob/master/pkg/pillar/hypervisor/kvm.go#L383 ?

Okay, seems like the one, so basically you need to have a universal map which tracks those convertions and be sure that one resource is not used twice. End point would be qemu/xen configuraiton. I mean that's the question, can we do it on xen as well.

@christoph-zededa
Copy link
Contributor

// NetworkAdapter are virtual adapters assigned to the application
// The order here is critical because they are presented to the VM or
// container in the order they are listed, e.g., the first NetworkAdapter
// will appear in a Linux VM as eth0. Also, the MAC address is determined
// based on the order in the list.

from https://github.com/lf-edge/eve-api/blob/main/proto/config/appconfig.proto#L126

Isn't the order already determined (at least from the point of the API)?
Or is this different because that does not include passthrough devices? So enforce_network_interface_order also means that this order is not honored anymore?

@uncleDecart
Copy link
Member

but holdup this thing is specifying PCI address, because guest OS does all the naming, it might or might not be based on PCI order, we can't just say to guest OS from HV perspective that I want this interface to be 1, 3, 17, johnDoe, because we Windows and Ubuntu are different, right?

Yes, it's determined from API point, but we need to do chekcs on EVE IMO, controller might not have checks, would be useful to have them

@christoph-zededa
Copy link
Contributor

but holdup this thing is specifying PCI address, because guest OS does all the naming, it might or might not be based on PCI order, we can't just say to guest OS from HV perspective that I want this interface to be 1, 3, 17, johnDoe, because we Windows and Ubuntu are different, right?

Yes, it's determined from API point, but we need to do chekcs on EVE IMO, controller might not have checks, would be useful to have them

Yes, " it might or might not be based on PCI order", it might also be the order it is in the configuration file (see lf-edge/eve#3369 (comment) ).

But in the end EVE can only give a hint to the guest OS and hope for the best.

Also this is not eve-api repo specific but rather for the PR into eve, is it?

@uncleDecart
Copy link
Member

Also this is not eve-api repo specific but rather for the PR into eve, is it?

Sure, but without understanding the plan of implementation, how can you be sure of API? :)

@christoph-zededa
Copy link
Contributor

Also this is not eve-api repo specific but rather for the PR into eve, is it?

Sure, but without understanding the plan of implementation, how can you be sure of API? :)

I would go even further and say without the implementation I cannot be sure of the API ;-)

@rucoder
Copy link

rucoder commented Nov 20, 2024

@milan-zededa can't we just use a convention for Logical labels and enforce them to have a continuous numbering. It is probably may break current application but IMO this is the most strate forward and non-confusing way to enforce numbering in the application

@rucoder
Copy link

rucoder commented Nov 20, 2024

Regarding PCI enumeration. IN THEORY devices are enumerated in the order they appear on the PCI bus however the driver can assign a device ID according to it's internal logic e.g. looking at MAC address etc. I do no think this is possible to reliably enforce PCI devices to be enumerated in required order but this is a logical assumption in case we have a flat PCI topology i.e no bridges/switches. In case of bridges we cannot guaranty the order

Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see what we can achieve with the current approach, as I see it.

First of all, we can guarantee any order only in the case of a Linux VM using systemd (https://www.freedesktop.org/software/systemd/man/latest/systemd.net-naming-scheme.html). We don't know how another guest handles the enumeration. It should be explicitly stated.

Then, we can guarantee orders only of devices of the same types; otherwise, the interface prefixes and interface naming scheme will be different, and we cannot talk about an order of devices of different types.

Then, let's consider the following setup:

Devices A and B: Each is directly attached and has a dedicated PCIe root port.
Devices C and D: Both are under a bridge (multifunction device) with no root port.

Now let's check if we can guarantee an order for different cases:

Order 1: A, B, C, D

# QEMU configuration for Order 1: A, B, C, D

# Device A's Root Port
[device "pcie_root_port_a"]
  driver = "pcie-root-port"
  bus = "pcie.0"
  addr = "0x1"
  multifunction = "on"
  port = "1"
  chassis = "1"
  slot = "1"
  id = "root_port_a"

# Device A
[device "device_a"]
  driver = "<driver_for_A>"
  bus = "root_port_a"
  addr = "0x0"

# Device B's Root Port
[device "pcie_root_port_b"]
  driver = "pcie-root-port"
  bus = "pcie.0"
  addr = "0x2"
  port = "2"
  chassis = "2"
  slot = "2"
  id = "root_port_b"

# Device B
[device "device_b"]
  driver = "<driver_for_B>"
  bus = "root_port_b"
  addr = "0x0"

# PCIe to PCI Bridge for devices C and D
[device "pcie_pci_bridge"]
  driver = "pcie-pci-bridge"
  bus = "pcie.0"
  addr = "0x3"
  chassis = "3"
  id = "bridge_cd"

# Device C
[device "device_c"]
  driver = "<driver_for_C>"
  bus = "bridge_cd"
  addr = "0x1"

# Device D
[device "device_d"]
  driver = "<driver_for_D>"
  bus = "bridge_cd"
  addr = "0x2"

and I will look like:

pcie.0 (Bus 0)
├── [0x1] Root Port A (root_port_a)
│   └── Bus 1
│       └── [0x0] Device A (device_a)
├── [0x2] Root Port B (root_port_b)
│   └── Bus 2
│       └── [0x0] Device B (device_b)
└── [0x3] PCIe to PCI Bridge (bridge_cd)
    └── Bus 3
        ├── [0x1] Device C (device_c)
        └── [0x2] Device D (device_d)

In this configuration, devices A and B are connected to the root bus (pcie.0) via dedicated PCIe root ports, with device numbers assigned to ensure they are enumerated first. The root port for device A is assigned addr = "0x1", and the root port for device B is assigned addr = "0x2". The PCIe to PCI bridge for devices C and D is connected to the root bus with addr = "0x3", ensuring it is enumerated after the root ports.

Order 2: B, A, C, D

# QEMU configuration for Order 2: B, A, C, D

# Device B's Root Port
[device "pcie_root_port_b"]
  driver = "pcie-root-port"
  bus = "pcie.0"
  addr = "0x1"
  multifunction = "on"
  port = "1"
  chassis = "1"
  slot = "1"
  id = "root_port_b"

# Device B
[device "device_b"]
  driver = "<driver_for_B>"
  bus = "root_port_b"
  addr = "0x0"

# Device A's Root Port
[device "pcie_root_port_a"]
  driver = "pcie-root-port"
  bus = "pcie.0"
  addr = "0x2"
  port = "2"
  chassis = "2"
  slot = "2"
  id = "root_port_a"

# Device A
[device "device_a"]
  driver = "<driver_for_A>"
  bus = "root_port_a"
  addr = "0x0"

# PCIe to PCI Bridge for devices C and D
[device "pcie_pci_bridge"]
  driver = "pcie-pci-bridge"
  bus = "pcie.0"
  addr = "0x3"
  chassis = "3"
  id = "bridge_cd"

# Device C
[device "device_c"]
  driver = "<driver_for_C>"
  bus = "bridge_cd"
  addr = "0x1"

# Device D
[device "device_d"]
  driver = "<driver_for_D>"
  bus = "bridge_cd"
  addr = "0x2"

which brings us to

pcie.0 (Bus 0)
├── [0x1] Root Port B (root_port_b)
│   └── Bus 1
│       └── [0x0] Device B (device_b)
├── [0x2] Root Port A (root_port_a)
│   └── Bus 2
│       └── [0x0] Device A (device_a)
└── [0x3] PCIe to PCI Bridge (bridge_cd)
    └── Bus 3
        ├── [0x1] Device C (device_c)
        └── [0x2] Device D (device_d)

In this configuration, we swap the device numbers of the root ports for devices A and B on the root bus. The root port for device B is assigned addr = "0x1", and the root port for device A is assigned addr = "0x2". This ensures that the root port for device B is enumerated before the root port for device A. The bridge for devices C and D remains at addr = "0x3".

Order 3: C, D, A, B

# QEMU configuration for Order 3: C, D, A, B

# PCIe to PCI Bridge for devices C and D
[device "pcie_pci_bridge"]
  driver = "pcie-pci-bridge"
  bus = "pcie.0"
  addr = "0x1"
  chassis = "1"
  id = "bridge_cd"

# Device C
[device "device_c"]
  driver = "<driver_for_C>"
  bus = "bridge_cd"
  addr = "0x1"

# Device D
[device "device_d"]
  driver = "<driver_for_D>"
  bus = "bridge_cd"
  addr = "0x2"

# Device A's Root Port
[device "pcie_root_port_a"]
  driver = "pcie-root-port"
  bus = "pcie.0"
  addr = "0x2"
  port = "2"
  chassis = "2"
  slot = "2"
  id = "root_port_a"

# Device A
[device "device_a"]
  driver = "<driver_for_A>"
  bus = "root_port_a"
  addr = "0x0"

# Device B's Root Port
[device "pcie_root_port_b"]
  driver = "pcie-root-port"
  bus = "pcie.0"
  addr = "0x3"
  port = "3"
  chassis = "3"
  slot = "3"
  id = "root_port_b"

# Device B
[device "device_b"]
  driver = "<driver_for_B>"
  bus = "root_port_b"
  addr = "0x0"

gives us

pcie.0 (Bus 0)
├── [0x1] PCIe to PCI Bridge (bridge_cd)
│   └── Bus 1
│       ├── [0x1] Device C (device_c)
│       └── [0x2] Device D (device_d)
├── [0x2] Root Port A (root_port_a)
│   └── Bus 2
│       └── [0x0] Device A (device_a)
└── [0x3] Root Port B (root_port_b)
    └── Bus 3
        └── [0x0] Device B (device_b)

In this configuration, we assign the bridge for devices C and D the lowest device number on the root bus (addr = "0x1"), ensuring it is enumerated first. The root ports for devices A and B are assigned higher device numbers (addr = "0x2" and addr = "0x3" respectively).

Order 4: D, C, A, B

# QEMU configuration for Order 4: D, C, A, B

# PCIe to PCI Bridge for devices C and D
[device "pcie_pci_bridge"]
  driver = "pcie-pci-bridge"
  bus = "pcie.0"
  addr = "0x1"
  chassis = "1"
  id = "bridge_cd"

# Device D
[device "device_d"]
  driver = "<driver_for_D>"
  bus = "bridge_cd"
  addr = "0x1"

# Device C
[device "device_c"]
  driver = "<driver_for_C>"
  bus = "bridge_cd"
  addr = "0x2"

# Device A's Root Port
[device "pcie_root_port_a"]
  driver = "pcie-root-port"
  bus = "pcie.0"
  addr = "0x2"
  port = "2"
  chassis = "2"
  slot = "2"
  id = "root_port_a"

# Device A
[device "device_a"]
  driver = "<driver_for_A>"
  bus = "root_port_a"
  addr = "0x0"

# Device B's Root Port
[device "pcie_root_port_b"]
  driver = "pcie-root-port"
  bus = "pcie.0"
  addr = "0x3"
  port = "3"
  chassis = "3"
  slot = "3"
  id = "root_port_b"

# Device B
[device "device_b"]
  driver = "<driver_for_B>"
  bus = "root_port_b"
  addr = "0x0"

results in this

pcie.0 (Bus 0)
├── [0x1] PCIe to PCI Bridge (bridge_cd)
│   └── Bus 1
│       ├── [0x1] Device D (device_d)
│       └── [0x2] Device C (device_c)
├── [0x2] Root Port A (root_port_a)
│   └── Bus 2
│       └── [0x0] Device A (device_a)
└── [0x3] Root Port B (root_port_b)
    └── Bus 3
        └── [0x0] Device B (device_b)

In this configuration, we aim to have device D enumerated before device C. We achieve this by swapping their device numbers on the subordinate bus of the bridge (bridge_cd). Device D is assigned addr = "0x1", and device C is assigned addr = "0x2". The bridge itself is assigned the lowest device number on the root bus (addr = "0x1"), ensuring it is encountered first during enumeration.

Order 5: C, A, B, D

This order is not achievable under the given constraints. Devices under the same bridge (C and D) are always enumerated together and cannot be interleaved with devices on different buses without moving them to separate bridges or changing the physical connections, which is not allowed in this setup. Therefore, we cannot have device C enumerated first, then devices A and B, and then device D. The enumeration process does not support interleaving devices from different buses in this manner when they are connected under the same bridge.

@milan-zededa, @christoph-zededa, @rucoder, fix me if I made a mistake somewhere.

// of application network interfaces. The controller can check
// ZInfoDevice.api_capability to verify if the configured device supports the
// API capability API_CAPABILITY_ENFORCED_NET_INTERFACE_ORDER.
bool enforce_network_interface_order = 28;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a "fixed" option? Once set, we are not going to change it, I guess? I mean, it looks like a proper part of AppInstanceConfig.FixedResources. In this case, it should go to vm.proto.

@milan-zededa
Copy link
Contributor Author

@milan-zededa can't we just use a convention for Logical labels and enforce them to have a continuous numbering. It is probably may break current application but IMO this is the most strate forward and non-confusing way to enforce numbering in the application

@rucoder Yes this could be done as an alternative to the order field. But from what I'm reading from you and Nikolay we will have to abandon this ordering scheme anyway and provide different method for the guest to map device to its configured order.

@uncleDecart
Copy link
Member

What we discussed @milan-zededa might be useful, how about we allow EVE user to specify PCI address for guest device they're either passing through or adding as virtual Eth bridges via TAP iface? Or am I missing something?

@rucoder
Copy link

rucoder commented Nov 20, 2024

If I paraphrase the original ask: "we want to see interfaces in the guest nubered in the same order as they appear in the application manifest". Is my understanding correct?

@milan-zededa
Copy link
Contributor Author

If I paraphrase the original ask: "we want to see interfaces in the guest nubered in the same order as they appear in the application manifest". Is my understanding correct?

Yes, that is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants