-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traffic drop on OPX VM #21
Comments
@vnam1 I'll take a look. Can you please tell me what version of OPX you're running? Also show the output of 'opx-show-system-status'. Thank you, |
root@leaf-2:/var/log# opx-show-version Please ignore vboxadd.service below since we use KVM hypervisor. root@leaf-2:/var/log# opx-show-system-status |
@vnam1
I don't have KVM so I'm trying to reproduce this on ESXi. |
@GarrickHe |
Related to this? open-switch/opx-sai-vm#15 |
This should have nothing to do with the configuration. I doubt that the logs are relevant either. It may have something to do with the connectivity from the server to OPX VM. Would it be possible - for instance - that you have two identical IP addresses on the same L2 segment (as OPX VM and server) ? Does the problem happen consistently? Please keep in mind that effectively this is communications between two Linux stacks - on OPX VM and the Linux server - so there's no reason obvious reason for lost packets. I can think of other issues - e.g. running out of CPU power - but I can't tell without more info about your system. |
Thanks for the pointers. Just want to rule out any issue of my setup. When I tcpdump on OPX bridge interface this is the pattern I see:23:26:29.578075 52:54:00:22:4e:56 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 172.16.2.5 tell 172.16.2.1, length 28
|
root@leaf-2:/home/admin# brctl show br10 |
In the KVM, OPX VM interfaces and server interfaces have corresponding vnet interfaces and they are connected to each other via bridge instance |
Yes, this problem is seen every time. |
ARP requests coming from server to OPX are seen correctly on both e101-003-0 and br10: on e101-003-0:23:26:54.750065 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype ARP (0x0806), length 60: Request who-has 172.16.2.1 tell 172.16.2.5, length 46 On br10:23:26:54.750065 52:54:00:4b:54:33 > 52:54:00:22:4e:56, ethertype ARP (0x0806), length 60: Request who-has 172.16.2.1 tell 172.16.2.5, length 46 |
@GarrickHe please advise if you've been able to repro on ESXi |
Issue not seen on ESXi + OPX2.3.1. Will have to retest this on KVM and with OPX3.0. Will retest after fix for below issue is finalized. |
I have an OPX VM that has a bridge configured ‘br10’
br10 Link encap:Ethernet HWaddr 52:54:00:13:99:28
inet addr:172.16.2.1 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe13:9928/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:3570 errors:0 dropped:0 overruns:0 frame:0
TX packets:2631 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:297904 (290.9 KiB) TX bytes:177878 (173.7 KiB)
untagged ports in the bridge : e101-003-0
A server (Ubuntu VM) is connected to the bridge port e101-003-0 on OPX
server interface config:
eth1 Link encap:Ethernet HWaddr 52:54:00:ca:ef:7c
inet addr:172.16.2.5 Bcast:172.16.2.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:feca:ef7c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3233 errors:0 dropped:0 overruns:0 frame:0
TX packets:8613 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:358302 (358.3 KB) TX bytes:871479 (871.4 KB)
When I run some ping tests from server to the bridge interface address (src: 172.16.2.5, dest: 172.16.2.1)
there is packet loss (look at the missing sequence numbers below)
root@localhost:~# ping 172.16.2.1
PING 172.16.2.1 (172.16.2.1) 56(84) bytes of data.
64 bytes from 172.16.2.1: icmp_seq=1 ttl=64 time=2.13 ms
64 bytes from 172.16.2.1: icmp_seq=2 ttl=64 time=0.572 ms
64 bytes from 172.16.2.1: icmp_seq=3 ttl=64 time=0.921 ms
64 bytes from 172.16.2.1: icmp_seq=4 ttl=64 time=0.742 ms
64 bytes from 172.16.2.1: icmp_seq=5 ttl=64 time=0.680 ms
64 bytes from 172.16.2.1: icmp_seq=6 ttl=64 time=0.751 ms
64 bytes from 172.16.2.1: icmp_seq=7 ttl=64 time=0.803 ms
64 bytes from 172.16.2.1: icmp_seq=8 ttl=64 time=0.694 ms
64 bytes from 172.16.2.1: icmp_seq=9 ttl=64 time=0.770 ms
64 bytes from 172.16.2.1: icmp_seq=10 ttl=64 time=0.821 ms
64 bytes from 172.16.2.1: icmp_seq=48 ttl=64 time=2005 ms
64 bytes from 172.16.2.1: icmp_seq=47 ttl=64 time=3005 ms
64 bytes from 172.16.2.1: icmp_seq=49 ttl=64 time=1005 ms
64 bytes from 172.16.2.1: icmp_seq=50 ttl=64 time=5.34 ms
64 bytes from 172.16.2.1: icmp_seq=51 ttl=64 time=0.773 ms
64 bytes from 172.16.2.1: icmp_seq=52 ttl=64 time=0.888 ms
64 bytes from 172.16.2.1: icmp_seq=53 ttl=64 time=0.980 ms
64 bytes from 172.16.2.1: icmp_seq=54 ttl=64 time=0.698 ms
64 bytes from 172.16.2.1: icmp_seq=55 ttl=64 time=0.730 ms
64 bytes from 172.16.2.1: icmp_seq=56 ttl=64 time=0.902 ms
64 bytes from 172.16.2.1: icmp_seq=57 ttl=64 time=0.845 ms
64 bytes from 172.16.2.1: icmp_seq=58 ttl=64 time=0.795 ms
64 bytes from 172.16.2.1: icmp_seq=59 ttl=64 time=0.814 ms
64 bytes from 172.16.2.1: icmp_seq=99 ttl=64 time=8.95 ms
64 bytes from 172.16.2.1: icmp_seq=101 ttl=64 time=2.35 ms
64 bytes from 172.16.2.1: icmp_seq=102 ttl=64 time=0.860 ms
64 bytes from 172.16.2.1: icmp_seq=103 ttl=64 time=0.668 ms
I see the following syslogs continuously:
—
Jun 25 18:46:16 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Port 0x1010000000000 is not a valid logical port
Jun 25 18:46:16 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Attr get for port id 0x1010000000000's attr index 0 attr id 8 failed with err -19
Jun 25 18:46:18 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_FDB:Switch Id: 0], FDB Entry not found for MAC: ff:ff:ff:ff:ff:ff vlan/bridge: 0xa
Jun 25 18:46:18 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_NEIGHBOR:Switch Id: 0], Failed to lookup port for Neighbor MAC from L2 FDB, Err: -16.
Jun 25 18:46:18 opx23_vm opx_nas_daemon[2832]: [ROUTE:HAL-RT-NDI], Failed to add : host: 172.16.2.5 mac_addr: 00:00:00:00:00:00, state: 3, port: 48 status:0x1 NPU status:0 unit:0 rif:0x600000000000a action: Drop Err -1845428240
Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present
Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present
Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present
Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present
Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Port 0x1010000000000 is not a valid logical port
Jun 25 18:46:21 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Attr get for port id 0x1010000000000's attr index 0 attr id 8 failed with err -19
Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present
Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present
Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present
Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [INTERFACE:NAS-COM-INT-GET], Get request handler not present
Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Port 0x1010000000000 is not a valid logical port
Jun 25 18:46:26 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_PORT:Switch Id: 0], Attr get for port id 0x1010000000000's attr index 0 attr id 8 failed with err -19
Jun 25 18:46:31 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_FDB:Switch Id: 0], FDB Entry not found for MAC: ff:ff:ff:ff:ff:ff vlan/bridge: 0xa
Jun 25 18:46:31 opx23_vm opx_nas_daemon[2832]: [ev_log_t_SAI_NEIGHBOR:Switch Id: 0], Failed to lookup port for Neighbor MAC from L2 FDB, Err: -16.
Jun 25 18:46:31 opx23_vm opx_nas_daemon[2832]: [ROUTE:HAL-RT-NDI], Failed to add : host: 172.16.2.5 mac_addr: 00:00:00:00:00:00, state: 3, port: 48 status:0x1 NPU status:0 unit:0 rif:0x600000000000a action: Drop Err -1845428240
—
On OPX side: ARP entry for the server interface does not seems to have been populated:
root@leaf-2:/var/log# ip neigh
…
172.16.2.5 dev br10 INCOMPLETE
…
The text was updated successfully, but these errors were encountered: