Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ubridge -e" cause a core dump on fedora workstation 36 #81

Open
kefins opened this issue May 11, 2023 · 4 comments
Open

"ubridge -e" cause a core dump on fedora workstation 36 #81

kefins opened this issue May 11, 2023 · 4 comments
Labels
Milestone

Comments

@kefins
Copy link

kefins commented May 11, 2023

Hi, guys.

I got a core dump while running "ubridge -e" on fedora workstation 36, here is the deail output.

[root@fedora ubridge]#uname -a
Linux fedora 5.17.5-300.fc36.x86_64 #1 SMP PREEMPT Thu Apr 28 15:51:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

[root@fedora ubridge]#./ubridge -e
Network device list:

Segmentation fault (core dumped)

[root@fedora ubridge]#ldd ./ubridge 
        linux-vdso.so.1 (0x00007ffed33fd000)
        libpcap.so.1 => /lib64/libpcap.so.1 (0x00007f668c9f0000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f668c7ee000)
        libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00007f668c7cc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f668ca54000)
        libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00007f668c746000)
        libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007f668c722000)
@grossmj
Copy link
Member

grossmj commented May 11, 2023

Thanks for reporting. This may be a problem with the installed libpcap or the way we call pcap_findalldevs_ex(): https://github.com/GNS3/ubridge/blob/master/src/ubridge.c#L330L354

@grossmj grossmj added the bug label May 11, 2023
@grossmj grossmj added this to the 3.0 milestone May 11, 2023
@kefins
Copy link
Author

kefins commented May 11, 2023

I executed the program on Fedora Workstation 36, and it did not define macro CYGWIN, so the program would call pcap_findalldevs, in which would cause the core dump.
After taking a debug, I found it was caused by the routine nlmsg_inherit in libnl, in which accessed a zero pointer address.

#0  nlmsg_inherit (hdr=hdr@entry=0x7fffffffdc60) at lib/msg.c:329
#1  0x00007ffff7c9c3e8 in nlmsg_alloc_simple (nlmsgtype=nlmsgtype@entry=5121, flags=flags@entry=768) at lib/msg.c:358
#2  0x00007ffff7ca0540 in nl_send_simple (sk=sk@entry=0x41b430, type=type@entry=5121, flags=flags@entry=768, buf=buf@entry=0x0, size=size@entry=0) at lib/nl.c:587
#3  0x00007ffff7d4e79d in rdmanl_get_devices (cb_func=0x7ffff7d4f300 <find_sysfs_devs_nl_cb>, data=0x7fffffffde80, nl=0x41b430) at /usr/src/debug/rdma-core-39.0-1.fc36.x86_64/util/rdma_nl.c:113
#4  find_sysfs_devs_nl (tmp_sysfs_dev_list=0x7fffffffde80) at /usr/src/debug/rdma-core-39.0-1.fc36.x86_64/libibverbs/ibdev_nl.c:200
#5  0x00007ffff7d4c362 in ibverbs_get_device_list (device_list=0x7ffff7d59010 <device_list.lto_priv>) at /usr/src/debug/rdma-core-39.0-1.fc36.x86_64/libibverbs/init.c:560
#6  __ibv_get_device_list_1_1 (num=num@entry=0x7fffffffdee4) at /usr/src/debug/rdma-core-39.0-1.fc36.x86_64/libibverbs/device.c:74
#7  0x00007ffff7f69450 in rdmasniff_findalldevs (devlistp=0x7fffffffdf68, err_str=0x7fffffffdfd0 '#' <repeats 16 times>) at ./pcap-rdmasniff.c:437
#8  0x00007ffff7f69b02 in pcap_findalldevs (alldevsp=<optimized out>, errbuf=<optimized out>) at ./pcap.c:732
#9  0x000000000040460f in display_network_devices () at src/ubridge.c:339
#10 0x000000000040486b in main (argc=2, argv=0x7fffffffe248) at src/ubridge.c:409
(gdb) l
324     struct nl_msg *nlmsg_inherit(struct nlmsghdr *hdr)
325     {
326             struct nl_msg *nm;
327
328             nm = nlmsg_alloc();
329             if (nm && hdr) {
330                     struct nlmsghdr *new = nm->nm_nlh;
331
332                     new->nlmsg_type = hdr->nlmsg_type;
333                     new->nlmsg_flags = hdr->nlmsg_flags;

In line 332, the pointer new is zero. I think it is a bug in libnl, so we should take a change to nlmsg_alloc in lib/nl.c.

@kefins
Copy link
Author

kefins commented May 11, 2023

Thanks for reporting. This may be a problem with the installed libpcap or the way we call pcap_findalldevs_ex(): https://github.com/GNS3/ubridge/blob/master/src/ubridge.c#L330L354

I figured it out finally, actually it should be a linking problem, the routine in src/netlink/nl.c has the same name with libnl-3, which would cause a linking confusion in libibverbs. Those routines in libibverbs should invoke those routines in libnl-3, but because the linking confusion, they invokd those in src/netlink/nl.c, and result in the core dump.
So, the solution I took was changing all routine names in src/netlink/nl.c, for example, add a ubridge_ prefix. And ubridge -e will output all network devices properly.

[parkeryan@fedora gg]$ ./ubridge -e
Network device list:

  ens160 => no description
  any => Pseudo-device that captures on all interfaces
  lo => no description
  bluetooth-monitor => Bluetooth Linux Monitor
  usbmon2 => Raw USB traffic, bus number 2
  usbmon1 => Raw USB traffic, bus number 1
  usbmon0 => Raw USB traffic, all USB buses
  nflog => Linux netfilter log (NFLOG) interface
  nfqueue => Linux netfilter queue (NFQUEUE) interface

@grossmj
Copy link
Member

grossmj commented Jul 30, 2023

I am a bit confused, where is libibverbs linked to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants