Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix exclusive hardware control mode switching on controller failed activation #1522

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

saikishor
Copy link
Member

If the hardware supports only one active control mode at a time and when there is a failing controller activation, ros2_control is not propagating this information to the hardware, and the hardware might have the resource locked internally. This PR allows to solve such situations as explained in #1487 and #1486.

Fixes #1487
Closes #1492

@saikishor
Copy link
Member Author

@firesurfer This PR has the fix that should solve your issue. If you can test it on your hardware and let us know, it would be great.
I could reproduce the same issue with a test and I added a fix, so that the test passes, so I'm confident that this might work for you.

@saikishor saikishor added bug backport-humble This label should be used by maintainers only! Label triggers PR backport to ROS2 humble. backport-iron labels Apr 29, 2024
Copy link

codecov bot commented Apr 29, 2024

Codecov Report

Attention: Patch coverage is 84.92063% with 19 lines in your changes missing coverage. Please review.

Project coverage is 87.96%. Comparing base (ec70ae1) to head (3a19634).

Files with missing lines Patch % Lines
..._components/test_actuator_exclusive_interfaces.cpp 83.82% 10 Missing and 1 partial ⚠️
...ailed_activate/test_controller_failed_activate.cpp 66.66% 4 Missing ⚠️
controller_manager/src/controller_manager.cpp 81.81% 1 Missing and 1 partial ⚠️
...ontroller_manager/test/test_release_interfaces.cpp 92.85% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1522      +/-   ##
==========================================
- Coverage   88.01%   87.96%   -0.05%     
==========================================
  Files         121      124       +3     
  Lines       12412    12535     +123     
  Branches     1109     1117       +8     
==========================================
+ Hits        10924    11027     +103     
- Misses       1083     1099      +16     
- Partials      405      409       +4     
Flag Coverage Δ
unittests 87.96% <84.92%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ailed_activate/test_controller_failed_activate.hpp 100.00% <100.00%> (ø)
controller_manager/src/controller_manager.cpp 78.38% <81.81%> (+0.05%) ⬆️
...ontroller_manager/test/test_release_interfaces.cpp 89.28% <92.85%> (+1.78%) ⬆️
...ailed_activate/test_controller_failed_activate.cpp 66.66% <66.66%> (ø)
..._components/test_actuator_exclusive_interfaces.cpp 83.82% <83.82%> (ø)

... and 1 file with indirect coverage changes

@firesurfer
Copy link
Contributor

@saikishor Thanks a lot. Just for my understanding: If a controller fails activation now, a command mode switch is issued so that the hardware interface can release internally locked hardware?

I try to test this patch with an iron based setup.

@firesurfer
Copy link
Contributor

@saikishor I just tested your PR and for me it didn't work so far.

It could be that I did something wrong. What I did was:

  1. Clone your repository into by workspace
  2. Rebase your branch onto iron with: git rebase fix/exclusive_hw_interface_switching --onto=iron
  3. Build it with: --allow-overriding controller_interface controller_manager hardware_interface ros2_control_test_assets hardware_interface_testing
  4. Change my controller to fail activation
  on_activate()....{
      release_interfaces();
      return controller_interface::CallbackReturn::FAILURE;
  1. Source it and run it

When I activate the controller it fails as it should. But when I then try to load another controller I run into the same issue as before - the interfaces haven't been stopped.

@saikishor
Copy link
Member Author

@saikishor Thanks a lot. Just for my understanding: If a controller fails activation now, a command mode switch is issued so that the hardware interface can release internally locked hardware?

I try to test this patch with an iron based setup.

@firesurfer yes you are right! The hardware will be told to stop the interfaces that are started for the hardware that the controller needed. This should be good for the hardware component to internally release the locked control modes

@saikishor
Copy link
Member Author

@firesurfer can you share the logs here?. Can you print something in your hardware interfaces to see if you receive the stop interfaces?. Basically, if your controller fails activation, it should immediately receive to release interfaces even before activating a new one. If you go over tests, I'm testing exactly the same

@saikishor
Copy link
Member Author

@firesurfer for initial testing, maybe you can remove the installed dependencies and this might help. For me, after removing the installed dependencies, it worked

@firesurfer
Copy link
Contributor

So I removed the installed packages:
sudo dpkg --force-all --remove ros-iron-controller-interface ros-iron-controller-manager ros-iron-hardware-interface just to make sure the ones from the workspace are used.

Log while failing activation:

[ros2_control_node-14] [ERROR] [1714461945.456711669] [controller_manager]: After activation, controller 'my_controller' is in state 'inactive' (2), expected 'active' (3).

Log during activation of a ForwardController for the same joints after the other one has failed:

[ros2_control_node-14] [ERROR] [1714461952.579709743] [my_hardware]: Joint already in used: my_joint [ros2_control_node-14] [ERROR] [1714461952.579871675] [resource_manager]: Component 'my_hardware' did not accept command interfaces combination:

I added a log message in the perform_command_mode_switch and did not get any log message.

I am not sure if perhaps rebasing on Iron broke something.

@saikishor
Copy link
Member Author

So I removed the installed packages: sudo dpkg --force-all --remove ros-iron-controller-interface ros-iron-controller-manager ros-iron-hardware-interface just to make sure the ones from the workspace are used.

Log while failing activation:

[ros2_control_node-14] [ERROR] [1714461945.456711669] [controller_manager]: After activation, controller 'my_controller' is in state 'inactive' (2), expected 'active' (3).

Log during activation of a ForwardController for the same joints after the other one has failed:

[ros2_control_node-14] [ERROR] [1714461952.579709743] [my_hardware]: Joint already in used: my_joint [ros2_control_node-14] [ERROR] [1714461952.579871675] [resource_manager]: Component 'my_hardware' did not accept command interfaces combination:

I added a log message in the perform_command_mode_switch and did not get any log message.

I am not sure if perhaps rebasing on Iron broke something.

Hello @firesurfer!

It seems like you are not using the changes, because If you use the changes of this branch, your error should be like the following

[ros2_control_node-14] [ERROR] [1714461945.456711669] [controller_manager]: After activation, controller 'my_controller' is in state 'inactive' (2), expected 'active' (3). Releasing Interfaces!

RCLCPP_ERROR(
get_logger(),
"After activation, controller '%s' is in state '%s' (%d), expected '%s' (%d). Releasing "
"interfaces!",
controller->get_node()->get_name(), new_state.label().c_str(), new_state.id(),
hardware_interface::lifecycle_state_names::ACTIVE,
lifecycle_msgs::msg::State::PRIMARY_STATE_ACTIVE);

In your error log, it doesn't mention about the Releasing interfaces, so please crosscheck your setup.

Thank you!

@firesurfer
Copy link
Contributor

@saikishor I made a mistake while rebasing onto iron. When I do it properly I run into a lots of conflicts. When I try to resolve them it doesn't compile anymore.

Do you have an recommendations how to easily test this with an iron setup ?

@christophfroehlich
Copy link
Contributor

christophfroehlich commented Apr 30, 2024

You should be able to compile the rolling stack (also this PR) on your iron distro directly. override should also work, no need to uninstall your binary install. (check with ros2 pkg xml ros2_control -t version)
edit: can be that you have to patch your hardware component to the new API

@saikishor
Copy link
Member Author

saikishor commented Apr 30, 2024

You should be able to compile the rolling stack (also this PR) on your iron distro directly. override should also work, no need to uninstall your binary install. (check with ros2 pkg xml ros2_control -t version) edit: can be that you have to patch your hardware component to the new API

Thank you @christophfroehlich. Yes, you can do this. In case you continue to have issues, then try to just cherry-pick the last 2 commits of this PR (77e932e and 4a4d780) onto your iron branch

@firesurfer
Copy link
Contributor

@saikishor Thanks. Cherry-picking worked :)

When I now have the controller failing I run into an endless loop of ros2control trying to activating the controller - failing - stopping the interfaces - starting them again - failing again...

[ros2_control_node-14] [ERROR] [1714469097.477339381] [controller_manager]: After activation, controller 'my_controller' is in state 'inactive' (2), expected 'active' (3). Releasing interfaces!

But I can confirm. The interfaces are stopped!.

@saikishor
Copy link
Member Author

@saikishor Thanks. Cherry-picking worked :)

When I now have the controller failing I run into an endless loop of ros2control trying to activating the controller - failing - stopping the interfaces - starting them again - failing again...

[ros2_control_node-14] [ERROR] [1714469097.477339381] [controller_manager]: After activation, controller 'my_controller' is in state 'inactive' (2), expected 'active' (3). Releasing interfaces!

But I can confirm. The interfaces are stopped!.

@firesurfer I'm glad that the fix worked for you. If that's the case, do you mind reviewing this PR and approving it?

@firesurfer
Copy link
Contributor

@saikishor shouldn't be the issue of running into an endless loop of activation -> failing... be addressed first?

@saikishor
Copy link
Member Author

saikishor commented Apr 30, 2024

@saikishor shouldn't be the issue of running into an endless loop of activation -> failing... be addressed first?

@firesurfer I don't really see how there is an endless loop. CM never reactivates or tries if it is failing. Maybe the print you have added is printing many times, because you have this component per joint and it is trying to iterate over the HW components to do the prepare and perform switch. This is the behavior inside the Resource Manager and I'm not touching that part.

EDIT: How many times does it print the ....Releasing Interfaces! error?

@saikishor
Copy link
Member Author

@firesurfer I can confirm after testing it again on our setup that the fix works and there is no looping activation from the CM side.

Thank you

@firesurfer
Copy link
Contributor

@saikishor

EDIT: How many times does it print the ....Releasing Interfaces! error?

As I am seeing the activation failure message again and again (of the controller) I would say this is not just because of the printout in the hardware_interface. I had it running for like 10-20s before killing the process.

I am wondering if that could be because I cherry-picked it on the iron branch?

@saikishor
Copy link
Member Author

saikishor commented May 2, 2024

As I am seeing the activation failure message again and again (of the controller) I would say this is not just because of the printout in the hardware_interface. I had it running for like 10-20s before killing the process.

I am wondering if that could be because I cherry-picked it on the iron branch?

That would be strange to have this in the iron branch. Can you share some logs?

EDIT: Can you also share how you are trying to activate this controller?

@firesurfer
Copy link
Contributor

@saikishor I just tested in a clean dockerized environment instead of my host installation and it seems to work fine.

I will review and approve the PR then.

@saikishor
Copy link
Member Author

@bmagyar and @saikishor I even vote that we fobid failing of controllers on activation because of keeping up with the real-time capabilities. This is the reason why we have configure in the first place. Allowing this, I forsee, as elaborated earlier, opening pandora box of funky management of stuff in the update loop.

I see your point. However, the thing of failing or not is not in our hand with user controllers, moreover, the LifeCycleNode allows this kind of behavior. So, I think it is reasonable to have this fix. We can warn the users that if this situation arises, the realtimeness is not assured anymore.

Copy link
Contributor

mergify bot commented Aug 21, 2024

This pull request is in conflict. Could you fix it @saikishor?

@saikishor saikishor force-pushed the fix/exclusive_hw_interface_switching branch from 7755df0 to fc2dfe3 Compare September 26, 2024 09:06
@saikishor saikishor force-pushed the fix/exclusive_hw_interface_switching branch from fc2dfe3 to f04b680 Compare October 8, 2024 15:48
Copy link
Contributor

@christophfroehlich christophfroehlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reported issue seams reasonable for me to fix, which LGTM except for some copy-paste remnants in the comments.

controller_manager/test/test_release_interfaces.cpp Outdated Show resolved Hide resolved
controller_manager/test/test_release_interfaces.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

mergify bot commented Jan 4, 2025

This pull request is in conflict. Could you fix it @saikishor?

@saikishor saikishor force-pushed the fix/exclusive_hw_interface_switching branch from 5da7a31 to 381e277 Compare January 6, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-humble This label should be used by maintainers only! Label triggers PR backport to ROS2 humble. bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling of exclusive command interfaces
5 participants