Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jazzy] JTC fails with - Caught exception in callback for transition 10 #1389

Closed
firesurfer opened this issue Nov 25, 2024 · 16 comments
Closed
Labels

Comments

@firesurfer
Copy link
Contributor

firesurfer commented Nov 25, 2024

Describe the bug

Loading and activating a JTC fails with:

[ros2_control_node-19] [ERROR] [1732535679.279340867] [joint_trajectory_controller_top]: Caught exception in callback for transition 10
[ros2_control_node-19] [ERROR] [1732535679.279449259] [joint_trajectory_controller_top]: Original error: Resource temporarily unavailable
[ros2_control_node-19] [WARN] [1732535679.279520632] [joint_trajectory_controller_top]: Error occurred while doing error handling.
[ros2_control_node-19] [ERROR] [1732535679.279555908] [controller_manager]: After configuring, controller 'joint_trajectory_controller_top' is in state 'unconfigured' , expected inactive.

From my controllers.yaml

...
joint_trajectory_controller_top:
      type: joint_trajectory_controller/JointTrajectoryController
....

joint_trajectory_controller_top:
  ros__parameters:
    joints:
      - ur_top/shoulder_pan_joint
      - ur_top/shoulder_lift_joint
      - ur_top/elbow_joint
      - ur_top/wrist_1_joint
      - ur_top/wrist_2_joint
      - ur_top/wrist_3_joint
    command_interfaces:
      - position
    state_interfaces:
      - position
      - velocity
    state_publish_rate: 100.0
    action_monitor_rate: 20.0
    allow_partial_joints_goal: false
    constraints:
      stopped_velocity_tolerance: 0.1
      goal_time: 0.05
      ur_top/shoulder_pan_joint: { trajectory: 0.2, goal: 0.0015 }
      ur_top/shoulder_lift_joint: { trajectory: 0.2, goal: 0.0015 }
      ur_top/elbow_joint: { trajectory: 0.2, goal: 0.0015 }
      ur_top/wrist_1_joint: { trajectory: 0.2, goal: 0.0015 }
      ur_top/wrist_2_joint: { trajectory: 0.2, goal: 0.0015 }
      ur_top/wrist_3_joint: { trajectory: 0.2, goal: 0.0015 }



Everything was working mostly with a jazzy binary installation before updating it to the latest jazzy release with only this issue occurring from time to time: ros-controls/ros2_control#1883

Environment (please complete the following information):

  • Version Jazzy, binary. controller_manager 4.20.0, JTC 4.16.0
  • UR packages are built from source. Minor changes to the ScaledJTC. But this issue also occurs with the default JTC
  • Running in a docker container on a Ubuntu 24.04 host (networking host)
  • Running with mocked hardware

EDIT:
I did a bit of additional research. Looks like: Resource temporarily unavailable is usually issued by the operating system.
But even after reading quite a bit of code I have no clue why this error should occur in the configure method of a controller.
I am running the system with mocked hardware at the moment.

@firesurfer
Copy link
Contributor Author

I just did some more testing:

  1. In random cases it works to load and activate a controller -> but loading the next controller won't
  2. When trying to configure and activate a controller that has failed before the ros2_control_node crashes with:
[ros2_control_node-19] terminate called after throwing an instance of 'St9bad_alloc'
[ros2_control_node-19]   what():  std::bad_alloc
[ros2_control_node-19] terminate called recursively

@firesurfer
Copy link
Contributor Author

I am one step further:

I can configure and activate a JTC now by disabling locked memory for the controller manager:

controller_manager:
  ros__parameters:
    update_rate: 500
    lock_memory: false

@saikishor I think you added the memlock - perhaps you have some insights ?

@saikishor
Copy link
Member

I am one step further:

I can configure and activate a JTC now by disabling locked memory for the controller manager:

controller_manager:
  ros__parameters:
    update_rate: 500
    lock_memory: false

@saikishor I think you added the memlock - perhaps you have some insights ?

Can you share the output of ulimit -a?

@firesurfer
Copy link
Contributor Author

firesurfer commented Nov 26, 2024

ulimit -a in my container:

real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 111518
max locked memory           (kbytes, -l) 3577496
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1048576
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 99
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 111518
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

And on my host:

real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 111518
max locked memory           (kbytes, -l) 3577496
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 99
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 111518
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

I am using rootless podman.

@saikishor
Copy link
Member

The max locked memory should be unlimited. You can set it on your system. You can check online the requirements and setup to have the lock memory

@saikishor
Copy link
Member

saikishor commented Nov 26, 2024

Ideally, if you configure your ros2_control_node as a privileged process, the mlockall shouldn't have a limit.

@firesurfer
Copy link
Contributor Author

I also tried "--ulimit memlock=-1:-1" as argument for podman. This improved the situation so that loading some controllers works fine.

The max locked memory should be unlimited. You can set it on your system. You can check online the requirements and setup to have the lock memory

I will try that tomorrow and let you know.

Just as a comment from my side:

I have a default Ubuntu 24.04 installation. I am wondering if a default ubuntu installation doesn't play well with the default parameters of ros2control it might be a good idea to have lock_memory default to false. (Which is btw. also missing in the documentation so far - https://control.ros.org/rolling/doc/ros2_control/controller_manager/doc/userdoc.html)

@saikishor
Copy link
Member

You are right. It might make sense to have it false by default for this case. Can you test it tomorrow increasing the limit and see if this solves your issue?

@firesurfer
Copy link
Contributor Author

@saikishor I increased my locked memory limit not to unlimited by 64GB and it seems to work fine now.

max locked memory (kbytes, -l) 64000000

I also made sure that I removed the configuration entry from my controllers.yaml

@saikishor
Copy link
Member

@firesurfer do you mind reviewing the PR on the ros2_control side and approve it. If it looks good to you

@firesurfer
Copy link
Contributor Author

Already at it.

What still puzzles me is the error I got there.

Original error: Resource temporarily unavailable

Which corresponds to: EAGAIN

From my understanding the only thing that could fail due to not enough locked memory is a some memory allocation:
But according to the documentation the only error flag set by malloc or calloc is: ENOMEM

@saikishor
Copy link
Member

Already at it.

What still puzzles me is the error I got there.

Original error: Resource temporarily unavailable

Which corresponds to: EAGAIN

From my understanding the only thing that could fail due to not enough locked memory is a some memory allocation: But according to the documentation the only error flag set by malloc or calloc is: ENOMEM

Yeah, that is weird.

@firesurfer
Copy link
Contributor Author

I found the explanation:
https://stackoverflow.com/questions/34165276/malloc-setting-errno-to-eagain

Apparently EAGAIN comes from mmap which is apparently called in case malloc fails.

@qihemu
Copy link

qihemu commented Dec 17, 2024

I am one step further:
I can configure and activate a JTC now by disabling locked memory for the controller manager:

controller_manager:
  ros__parameters:
    update_rate: 500
    lock_memory: false

@saikishor I think you added the memlock - perhaps you have some insights ?

Thank you so much! it's really useful, works on my system!

@TonyInvent
Copy link

Thank you all. I met the same problem on a raspberry pi 5 8G RAM. add lock_memory: false to ros2_control.yaml under config folder did fix the problem.

@saikishor
Copy link
Member

Thank you all. I met the same problem on a raspberry pi 5 8G RAM. add lock_memory: false to ros2_control.yaml under config folder did fix the problem.

Hello @TonyInvent!

We have changed the logic in ros-controls/ros2_control#1896, such that if the realtime system is properly setup, this parameter will be true, else it would be false. So, you won't be having to deal with it. Thanks for reporting that it solved your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants