Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VF/Switchdev abstraction #346

Merged
merged 8 commits into from
Dec 7, 2023
Merged

Conversation

jtluka
Copy link
Collaborator

@jtluka jtluka commented Nov 15, 2023

Description

This MR extends the Device class with methods and properties required to configure and use vfs of an ethernet device in switchdev mode.

User would typically configure the device as follows (see also the change to BaseSRIOVNetnsTcRecipe):

class SwitchdevRecipe(BaseRecipe):
    host1 = HostReq()
    host1.eth0 = DeviceReq(label="net1", driver=RecipeParam("driver"))

    host2 = HostReq()
    host2.eth0 = DeviceReq(label="net1", driver=RecipeParam("driver"))

    def run(self):
        host1, host2 = self.matched.host1, self.matched.host2

        # create virtual functions
        for host in [host1, host2]:
            host.eth0.eswitch_mode = "switchdev"
            host.eth0.create_vfs(1)
            vf_device = host.eth0.vf_devices[0]
            vf_rep_device = host.eth0.vf_rep_devices[0]

            host.map_device("vf_representor_eth0", {"ifname": vf_rep_device.name})

            host.run(f"ethtool -K {host.vf_representor_eth0.name} hw-tc-offload on")
            host.run(f"ethtool -K {host.eth0.name} hw-tc-offload on")

Known issues:

  • for i40e NICs, the code does not work as there are no representors by the NIC design
  • for bnxt_en NICs, the code does not work on RHEL distros due to missing patch [PATCH net-next v2] bnxt_en: Allow to set switchdev mode without existing VFs

Tests

J:8572626 J:8572625 J:8572624 J:8572623

Reviews

@olichtne @Axonis

Closes: #345 #344

@jtluka jtluka requested review from olichtne and Axonis November 15, 2023 13:46
@jtluka jtluka self-assigned this Nov 15, 2023
@jtluka jtluka force-pushed the switchdev-abstraction branch from 2f107f6 to 137004c Compare November 15, 2023 13:55
@jtluka
Copy link
Collaborator Author

jtluka commented Nov 15, 2023

Tests done through beaker job ran without issues.

I'd hold the MR until I get a feedback from Ivan whether the use case when VFs are created before setting the switchdev mode is valid and expected. If it is valid this will need to be reworked.

@jtluka
Copy link
Collaborator Author

jtluka commented Nov 15, 2023

Just a note that here's the RFC for unfication of PF/VF/VF_REP names: https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/

@jtluka
Copy link
Collaborator Author

jtluka commented Nov 15, 2023

I'd hold the MR until I get a feedback from Ivan whether the use case when VFs are created before setting the switchdev mode is valid and expected. If it is valid this will need to be reworked.

Apparently it is possible to create VFs without setting the switchdev mode - this is typical scenario where tuntap is substituted with virtual function devices.

The MR should be updated and create_vfs should be updated to support both non-switchdev and switchdev scenarios.

@jtluka
Copy link
Collaborator Author

jtluka commented Nov 15, 2023

Note, for mlx5 it is not possible to create vfs and then set the switchdev mode:

[root@wsfd-advnetlab57 ~]# echo 1 > /sys/class/net/ens5f0/device/sriov_numvfs 
[root@wsfd-advnetlab57 ~]# cat /sys/class/net/ens5f0/device/sriov_numvfs 
1
[root@wsfd-advnetlab57 ~]# devlink dev eswitch set pci/0000:81:00.0 mode switchdev
Error: mlx5_core: Inline mode is different between vports.
kernel answers: Invalid argument
[root@wsfd-advnetlab57 ~]# dmesg | tail
[  165.786118] mlx5_core 0000:81:00.2 ens5f0v0: renamed from eth0
[  165.810195] IPv6: ADDRCONF(NETDEV_UP): ens5f0v0: link is not ready
[  165.955773] mlx5_core 0000:81:00.2 ens5f0v0: Link up
[  165.962261] IPv6: ADDRCONF(NETDEV_UP): ens5f0v0: link is not ready
[  165.968699] IPv6: ADDRCONF(NETDEV_CHANGE): ens5f0v0: link becomes ready
[  264.102362] mlx5_core 0000:81:00.0: E-Switch: Disable: mode(LEGACY), nvfs(1), active vports(2)
[  264.121696] mlx5_core 0000:81:00.2 ens5f0v0: Link down
[  265.831026] mlx5_core 0000:81:00.0: mlx5_cmd_out_err:806:(pid 2226): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x98afbb), err(-22)
[  265.846675] mlx5_core 0000:81:00.0: mlx5_cmd_dr_create_flow_table:86:(pid 2226): Failed creating dr flow_table
[  265.856671] mlx5_core 0000:81:00.0: E-Switch: Failed to create slow path FDB Table err -22

@jtluka jtluka changed the title Switchdev abstraction DRAFT: Switchdev abstraction Nov 15, 2023
@jtluka jtluka force-pushed the switchdev-abstraction branch 2 times, most recently from 8d9e324 to e6b8572 Compare November 16, 2023 17:37
@jtluka jtluka changed the title DRAFT: Switchdev abstraction VF/Switchdev abstraction Nov 20, 2023
@jtluka jtluka force-pushed the switchdev-abstraction branch from f57c2ee to 331380a Compare November 20, 2023 13:56
@Axonis
Copy link
Contributor

Axonis commented Nov 27, 2023

With all the differences between drivers/NICs, I think it makes sense to split all VF/SRIOV related actions to separate commands. What do you think?

@jtluka
Copy link
Collaborator Author

jtluka commented Nov 27, 2023

With all the differences between drivers/NICs, I think it makes sense to split all VF/SRIOV related actions to separate commands. What do you think?

The only difference is the bnxt_en driver. The fact that you can't set switchdevice mode unless you have virtual functions created is considered a bug that has been already fixed in upstream kernel. I asked Ivan to update RHEL kernels.

I could expose more Device methods, but you would have to be more specific. Eventually these methods can be exposed with followup patches.

The example in the MR description was a bit out of date, maybe that would fit what you're asking for. I updated it, so please check.

@jtluka
Copy link
Collaborator Author

jtluka commented Nov 28, 2023

With all the differences between drivers/NICs, I think it makes sense to split all VF/SRIOV related actions to separate commands. What do you think?

The only difference is the bnxt_en driver. The fact that you can't set switchdevice mode unless you have virtual functions created is considered a bug that has been already fixed in upstream kernel. I asked Ivan to update RHEL kernels.

I've checked with Ivan and there's already a MR for RHEL9 kernel to update the bnxt_en driver, so there's no other difference across the drivers.

@Axonis
Copy link
Contributor

Axonis commented Nov 29, 2023

With all the differences between drivers/NICs, I think it makes sense to split all VF/SRIOV related actions to separate commands. What do you think?

The only difference is the bnxt_en driver. The fact that you can't set switchdevice mode unless you have virtual functions created is considered a bug that has been already fixed in upstream kernel. I asked Ivan to update RHEL kernels.

I could expose more Device methods, but you would have to be more specific. Eventually these methods can be exposed with followup patches.

The example in the MR description was a bit out of date, maybe that would fit what you're asking for. I updated it, so please check.

I'm fine with current implementation, makes sense.

@jtluka
Copy link
Collaborator Author

jtluka commented Nov 30, 2023

Based on a discussion with LNST team I plan to update the code.

The main issue detected was with vf_devices property in following use case:

host1.phys_dev.create_vfs()
vf_device = host1.phys_dev.vf_devices[0]
host1.newns = NetNamespace("lnst") 
host1.newns.vf_eth0 = vf_device
...
vf_device = host1.phys_dev.vf_devices[0]

The last statement will not work because phys_dev is in different namespace than the vf device and would not be able to fetch it from netlink messages during rescan of available devices.

We came to conclusion that vf_devices and vf_rep_devices properties will be removed and instead the create_vfs() method will return the vf/vf_rep devices directly.

lnst/Devices/Device.py Outdated Show resolved Hide resolved
@jtluka jtluka force-pushed the switchdev-abstraction branch from 331380a to 0f83ecd Compare December 1, 2023 15:01
@jtluka
Copy link
Collaborator Author

jtluka commented Dec 4, 2023

Testing is blocked by podman dependency issue containers/podman-py#350

olichtne
olichtne previously approved these changes Dec 5, 2023
Copy link
Collaborator

@olichtne olichtne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me.

lnst/Devices/Device.py Show resolved Hide resolved
lnst/Devices/Device.py Outdated Show resolved Hide resolved
lnst/Recipes/ENRT/SRIOVNetnsGeneveTcRecipe.py Outdated Show resolved Hide resolved
lnst/Recipes/ENRT/SRIOVNetnsTcRecipe.py Outdated Show resolved Hide resolved
This extends the Device class with properties and methods required to
configure virtual functions.

Signed-off-by: Jan Tluka <[email protected]>
This extends the Device class with methods and properties that can be used
to work with Device in switchdev mode.

Signed-off-by: Jan Tluka <[email protected]>
In few setups some devices and their vfs do not have the alternate
names available in netlink messages, but the alternate name is used
directly as the device name.

To deal with these situations, simply include also the device name in
the search.

Signed-off-by: Jan Tluka <[email protected]>
Instead of returning empty string, the method should raise
DeviceFeatureNotSupported exception.

Signed-off-by: Jan Tluka <[email protected]>
…presentors

This class wraps creation and access to Device virtual functions and
their representors.

Signed-off-by: Jan Tluka <[email protected]>
Update of the recipe to use the SRIOVDevices for configuration of a
switchdev capable device.

Signed-off-by: Jan Tluka <[email protected]>
This parameter is no longer required as the device names are parsed from
netlink messages.

Signed-off-by: Jan Tluka <[email protected]>
Without the patch the device names in any device method calls would
appear as None. This is because the devices are not mapped (unless moved
to a network namespace in case of VFs).

The patch adds mapping of the devices with following pattern:
{phys_dev._id}_vf{vf_index}
{phys_dev._id}_vf_rep{vf_index}

Signed-off-by: Jan Tluka <[email protected]>
@jtluka jtluka force-pushed the switchdev-abstraction branch from 59aa3d0 to d0d1b14 Compare December 6, 2023 16:23
@jtluka
Copy link
Collaborator Author

jtluka commented Dec 6, 2023

I've cleaned up the commit history and added one more patch for SRIOVDevices class, where I noticed that device method calls printed None as the _id of vf_rep devices (but applies also to vfs if they're not moved to a namespace). I added map_device() calls for created vf/vf_reps.

I'll schedule a Beaker test job to verify the functionality and this can be merged.

@jtluka
Copy link
Collaborator Author

jtluka commented Dec 6, 2023

New test jobs: J:8640504 J:8640505 J:8640506 J:8640507

@jtluka
Copy link
Collaborator Author

jtluka commented Dec 7, 2023

New test jobs: J:8640504 J:8640505 J:8640506 J:8640507

These tests passed. Note that bnxt_en driver contains bug where the vfs need to be created before moving to switchdev mode.

@Axonis
Copy link
Contributor

Axonis commented Dec 7, 2023

Ack, no more comments at the moment.

@jtluka jtluka merged commit 306e21d into LNST-project:master Dec 7, 2023
2 of 3 checks passed
@jtluka jtluka deleted the switchdev-abstraction branch December 7, 2023 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

create an API for the switchdev VFs
3 participants