Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lock free ring buffer with throttling [HUMBLE/IRON] #321

Merged
merged 3 commits into from
May 28, 2024

Conversation

Samahu
Copy link
Contributor

@Samahu Samahu commented Apr 16, 2024

Related Issues & PRs

Problem Analysis

  • When adding multiple subscribers or when having a bad network or low performing system, LidarScan composition and processing could eventually take more time, this leads to a build up in the packets buffer which will eventually be exhausted and the current mechanism of the thread_safe_ring_buffer::write_overwrite made things only worse. The solution can be in one of two things either ensure that the LidarScan composition + processing takes less time than the time needed to add new packets, or decouple packet addition from LidarScan consumption, if the consumption thread can't keep up then throttle.

Summary of Changes

  • Implement lock free ring buffer with unit tests
  • Decouple packet reception from packet processing
  • Throttle the ring buffer as we reach 70% capacity
  • Drop packets only when the ring buffer has been completely exhausted

TODO(s):

  • Replace the LockFreeRingBuffer implementation with ThreadSafeRingBuffer within os_sensor_nodelet.
    • Alternatively, forward the packets directly to LidarScan without a staging buffers.

Validation

  • Run the driver as usual with RIVZ, add multiple subscribers to /ouster/points
  • Verify the problem with missing columns is eliminated or reduced.
  • Observe that the point cloud publish frequency isn't affected significantly

@Samahu Samahu added the enhancement New feature or request label Apr 16, 2024
@Samahu Samahu self-assigned this Apr 16, 2024
@Samahu Samahu changed the base branch from master to ros2 April 16, 2024 15:01
@Samahu Samahu changed the title WIP: Implement lock free ring buffer with throttling [humble/iron] WIP: Implement lock free ring buffer with throttling [HUMBLE/IRON] Apr 16, 2024
@Samahu Samahu changed the title WIP: Implement lock free ring buffer with throttling [HUMBLE/IRON] Implement lock free ring buffer with throttling [HUMBLE/IRON] Apr 16, 2024
@TillBeemelmanns
Copy link

TillBeemelmanns commented Apr 23, 2024

We tested both, this branch and this attempt #302 to solve the problem with missing parts of the point cloud. We set the recv buffer size to the following values

sudo sysctl -w net.core.rmem_default=26214400
sudo sysctl -w net.core.rmem_max=26214400

on the host machine and also set proc_mask:="PCL". However in all cases we could observe partial missing point clouds as soon as we add a second subscriber node to the driver. That applied to resolutions 1024x20, 2048x10, 4096x5, for the other resolutions 512x10, 512x20, 1024x10 we could observed a significant delay in the point cloud without visible package loss.

For all resolutions, the point cloud looks fine if only RVIZ is used for visualization (single subscriber).
The host is a high performance workstation with 64 cores and 128 GB RAM.

@marioney
Copy link

We also tested this PR with an OS-1-128 Rev 7 and it fixed the missing points issue for us, running ROS Humble in Ubuntu 22.04.

We tested both 1024x10 and 1024x20 modes and the issue was not present in any of those.

@Samahu
Copy link
Contributor Author

Samahu commented Apr 24, 2024

@TillBeemelmanns @marioney thank you both for the feedback.

@TillBeemelmanns sorry that didn't help with your situation .. I still have one TODO item that I predict it to help with the delay and might improve things overall. I should have this PR completed and merged by next week.

@mkiael
Copy link

mkiael commented May 27, 2024

Tested on OS0-128 Rev C running ROS Iron on Ubuntu 22.04. Solves issue with missing data.

Great work!

@Samahu
Copy link
Contributor Author

Samahu commented May 28, 2024

Thanks @mkiael for the feedback. I will proceed with the PR as is since it resolves the issue for most of the driver users and will address the rest of suggested improvements in a separate PR.

@Samahu Samahu added the bug Something isn't working label May 28, 2024
@Samahu Samahu marked this pull request as ready for review May 28, 2024 16:26
@Samahu Samahu merged commit 81b425a into ros2 May 28, 2024
3 checks passed
@Samahu Samahu deleted the ros2-implement_lock_free_ring_buffer_with_throttling branch May 28, 2024 16:28
@James-R-Han
Copy link

Hi @Samahu!

Following up on this thread, similar to @TillBeemelmanns, after the PR, I still encounter a drop of about ~40% of the point cloud when I have 2 subscribers and no drop when it's only the rviz (single subscriber). Increasing the buffer unfortunately did not solve this problem. I am also running with minimized bandwidth configurations: point_type=xyzi, proc_mask="PCL", and UDP Profile Lidar = RNG15_RFL8_NIR8

Are there any updates regarding this problem?

@Samahu
Copy link
Contributor Author

Samahu commented Jul 30, 2024

Hi @Samahu!

Following up on this thread, similar to @TillBeemelmanns, after the PR, I still encounter a drop of about ~40% of the point cloud when I have 2 subscribers and no drop when it's only the rviz (single subscriber). Increasing the buffer unfortunately did not solve this problem. I am also running with minimized bandwidth configurations: point_type=xyzi, proc_mask="PCL", and UDP Profile Lidar = RNG15_RFL8_NIR8

Are there any updates regarding this problem?

Hi @James-R-Han No more updates yes, have you researched trying a different DDS middleware?

@James-R-Han
Copy link

The issue does not occur with FastRTPS; CycloneDDS has this dropping issue. But as per other discussions, using FastRTPS, we can not hit 10Hz for the point cloud topic.

Also I have tried the galactic branch with CycloneDDS and I encounter the same issue.

@Samahu
Copy link
Contributor Author

Samahu commented Jul 30, 2024

I do have further improvements that I want to implement which should help with the situation but can't give you an ETA yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parts of point cloud missing Missing lidar chunk os1-128-u
5 participants