Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault on running example application #142

Open
harry-van-haaren opened this issue Nov 4, 2021 · 10 comments
Open

segfault on running example application #142

harry-van-haaren opened this issue Nov 4, 2021 · 10 comments
Labels
bug Something isn't working

Comments

@harry-van-haaren
Copy link

Hi Capsule Community,

First time contributer here! Just finished reading the contrib guidelines & code-of-conduct, that's how new I am :)

Describe the bug?

pktdump example applications segfaults when starting to rx traffic, gdb indiciates that a DPDK dev->data structure is NULL.

Steps to reproduce?

  1. Following example instructions for pktdump example here: https://github.com/capsule-rs/capsule/tree/410696acb2e033cabb287750b36810753b54a59e/examples/pktdump

  2. Wget & extract DPDK 19.11.10 (LTS) tarball here; http://fast.dpdk.org/rel/dpdk-19.11.10.tar.xz

  3. Extract, build and install DPDK with meson tooling: meson build_gcc && cd build_gcc && meson configure -Dprefix=/usr && ninja install (Warning; this will install DPDK 19.11 system-wide).

  4. cd /examples/pktdump and then execute cargo run -- -f pktdump.toml just like in the documentation.

Expected behavior?

Application runs without segfault

Capsule version?

git master @ 410696a

OS?

Linux (Ubuntu/Debian based)

Docker / VM / Bare?

"Baremetal"

Stack trace or error log output

Thread 1 "pktdump" received signal SIGSEGV, Segmentation fault.
rte_eth_rx_burst (port_id=4, queue_id=0, rx_pkts=0x555555ed6260, nb_pkts=32) at /usr/local/include/rte_ethdev.h:4849
4849            nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
(gdb) p dev->data
$1 = (struct rte_eth_dev_data *) 0x0
(gdb) p dev
'$2 = (struct rte_eth_dev *) 0x7ffff7e5d600 <rte_eth_devices+66304>
(gdb) p dev
$3 = (struct rte_eth_dev *) 0x7ffff7e5d600 <rte_eth_devices+66304>
(gdb) p *dev
$4 = {rx_pkt_burst = 0x7ffff686b780 <eth_pcap_rx>, tx_pkt_burst = 0x7ffff686b3b0 <eth_pcap_tx>, tx_pkt_prepare = 0x0,
  rx_queue_count = 0x17ffb9240, rx_descriptor_done = 0x17e75e6c0, rx_descriptor_status = 0x7ffff68709c0 <ops>,
  tx_descriptor_status = 0x555555fdd900, data = 0x0, process_private = 0x0, dev_ops = 0x7ffff7e5d640 <rte_eth_devices+66368>,
  device = 0x0, intr_handle = 0x0, link_intr_cbs = {tqh_first = 0x0, tqh_last = 0x0}, post_rx_burst_cbs = {
    0x0 <repeats 1024 times>}, pre_tx_burst_cbs = {0x0 <repeats 1020 times>, 0x1, 0x0, 0x0, 0x0}, state = RTE_ETH_DEV_UNUSED,
  security_ctx = 0x0, reserved_64s = {0, 0, 0, 0}, reserved_ptrs = {0x0, 0x0, 0x0, 0x0}}
(gdb) p *dev->data
Cannot access memory at address 0x0

Somehow, Capsule/DPDK are not initializing the dev->data correctly. I'm pretty familiar with DPDK, and was checking some things in GDB. Tested this with a real HW ethernet NIC, as well as with net_pcap software NIC, both have same issue, so I'm convinced this is not a DPDK PMD init bug, perhaps somewhere in the Rust/DPDK bindings/config?

If somebody can try to validate this same build/config, and report back working/not-working that would be very helpful.
Regards, -Harry

@harry-van-haaren harry-van-haaren added the bug Something isn't working label Nov 4, 2021
@drunkirishcoder
Copy link
Contributor

hey @harry-van-haaren, thank you for reporting the issue. the only difference I can see is the version of DPDK we've tested baremetal setup with was 19.11.6. I wonder if there's something changed in the bug fixes between last tested version and the most recent 19.11.10. we will give it a spin and let you know what we find.

@harry-van-haaren
Copy link
Author

Hey Daniel, yeah thanks getting a repro would be good to error-check my setup here. Typically DPDK LTS-es (19.11.x to 19.11.y) are meant to be stable and bugfix backports only, without any regression potential... so I hope its not that!

Thanks for the prompt reply, lets see what results a test there give and identify next steps from there.

@drunkirishcoder
Copy link
Contributor

I don't have access to a true baremetal linux box, so I did the equivalent (as close as I can get) testing with a VirtualBox VM running Debian. I bumped the version of DPDK from 19.11.6 to 19.11.10 to match what you used. The result is I am not able to reproduce this error. No segfault. In our controlled virtualized environment, doesn't look like there are binding or compatibility issues with 19.11.10.

this is my VM

root@buster:/vagrant/capsule/examples/pktdump# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster
root@buster:/vagrant/capsule/examples/pktdump# uname -r
4.19.0-18-amd64

Would you be able to run the Vagrant/VirtualBox setup to validate that the DPDK binding should work? Then maybe we can try dig into why it's not working on your physical box.

@harry-van-haaren
Copy link
Author

OK - thanks for reporting back - yes I can get a clean setup and see what's going on. Thanks for testing & reporting back!

@drunkirishcoder
Copy link
Contributor

yeah no problem. please let us know how we can help or if there's indeed a hard to find bug somewhere.

@harry-van-haaren
Copy link
Author

Cannot get VirtualBox or Docker based images working on Ubuntu 21.10 platform:

Vagrant: didn't manage to install docker appropraitely, so when attempting to run the docker ... command, it just replied no such command "docker"

Docker (native on Linux)

docker pull getcapsule/dpdk-devbind:19.11.6
Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting
 for connection (Client.Timeout exceeded while awaiting headers)

@drunkirishcoder
Copy link
Contributor

I've only ran Vagrant on MacOS. Not sure how well it works on a linux distro or what type of problem you'd run into. Do you have a capture log of the vagrant output? Maybe we can take a look at what's not working.

Regarding the Docker on native Linux, are you able to pull down any other images? That seems like a connection timeout to docker hub?

@zeeshanlakhani
Copy link
Member

@harry-van-haaren did you finally get things to work?

@harry-van-haaren
Copy link
Author

Unfortunatly not no, I did try the vagrant option (as per #142 (comment)) which also did not work: however the networking setup here is not standard, so its not conclusive. I only ran the simple commands, and didn't get to debugging it in detail - and unfortunately won't get to in the next weeks either.

@zeeshanlakhani
Copy link
Member

@harry-van-haaren I'll try this on a bare linux box and see this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

3 participants