Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Memory Leak in Actions Using EventsExecutor #2661

Open
wants to merge 3 commits into
base: rolling
Choose a base branch
from

Conversation

mauropasse
Copy link
Collaborator

When using ROS2 Actions with the EventsExecutor, completed goals are not automatically cleaned up because execute_check_expired_goals() is not called.

As a result, goal results remain indefinitely stored, leading to increased memory usage with each completed goal.

In contrast, the SingleThreadedExecutor manages goal expiration using the server "goal expire" timeout on wait_for_work(<timeout>).

To address this issue, I'm creating a one-shot timer on the server (only for EventsExecutor) that triggers after a goal expires, to handle cleanup.

While other solutions may exist, this PR is primarily to discuss the issue and explore options.

rclcpp_action/include/rclcpp_action/server.hpp Outdated Show resolved Hide resolved
rclcpp_action/src/server.cpp Outdated Show resolved Hide resolved
rclcpp_action/src/server.cpp Outdated Show resolved Hide resolved
@mauropasse
Copy link
Collaborator Author

mauropasse commented Nov 5, 2024

My fix is a workaround for something that might not be needed:

In ROS2 actions, the user first sends a goal request and gets an accepted/rejected response. Then they must send another request to get the result with action_client->async_get_result(handle)

  • First client/service pair: goal request and accepted response.
  • Second client/service pair: result request and result.

To support this, the server must store the result until expiration timeout, giving time for the client to request the result. The result is not cleaned when the client requests the result, it is cleaned using a timer (default expiration is 10 seconds).

We at iRobot think it may be better for the server to send the result directly to the client as soon as it's ready, avoiding the need for storage and cleanup.

This could work by replacing the second client/service pair with a simple publish-subscribe model, where the server publishes the result as soon as it’s available:

  • Action Server Publishing: The server publishes the result immediately when it’s ready.
  • Action Client Subscription: The client subscribes to the result topic and, upon receiving the result, fulfills a promise.
  • Maintained API Consistency: On the client side, this setup allows the use of result_future.get() and spin_until_future_complete() as usual, so the change in the underlying mechanism doesn’t affect the client’s overall API.

What do you think?

@fujitatomoya: can we keep the timer object and recalculate the timeout based on the current caching results?

Yes, that's a good improvement for my fix implem.

Mauro Passerino added 2 commits November 5, 2024 14:04
Signed-off-by: Mauro Passerino <[email protected]>
Signed-off-by: Mauro Passerino <[email protected]>
@mjcarroll
Copy link
Member

Perhaps something worth bringing up in Client Library working group. CC: @alsora and @jmachowinski

@jmachowinski
Copy link
Contributor

When using ROS2 Actions with the EventsExecutor, completed goals are not automatically cleaned up because execute_check_expired_goals() is not called.

Why does this not happen ?
This would mean, that the events system never emits 'EntityType::Expired'.

@jmachowinski
Copy link
Contributor

To answer my own question, I just had a look at the code, and found the hidden timer in rcl, that we are missing in the events part.
I wonder if we should refactor _recalculate_expire_timer in rcl and expose it and reuse it here. This would increase the chance, that the behaviors of both implementations do not diverge...

void
ServerBase::initialize_expire_goal_timer()
{
expire_goal_timer_ = rclcpp::create_wall_timer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight behavior difference to the rcl implementation here. RCL uses the clock of the node.
Perhaps use create_timer with the node clock ?

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/next-client-library-wg-meeting-friday-8th-november-2024/40457/1

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/next-client-library-wg-meeting-friday-8th-november-2024/40457/2

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/next-client-library-wg-meeting-friday-22nd-november-2024/40703/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants