Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak on status update #1245

Open
trws opened this issue Jul 15, 2024 · 4 comments
Open

memory leak on status update #1245

trws opened this issue Jul 15, 2024 · 4 comments
Assignees

Comments

@trws
Copy link
Member

trws commented Jul 15, 2024

The json_t * variables R_all, R_down and R_alloc in status_request_cb are only freed by json_decref on the error path. The caches in the ctx (m_r_{all,down,alloc}) are never freed at all. Found this thanks to a 10k leak heaptrack found on one of our drained resource tests, tiny in the test, but could get larger over time if there are a lot of requests to this RPC, or a lot of resource updates, or especially both.

Plan:

  • add a lifecycle management type for json_t so this can't happen by accident anymore, either a smart pointer wrapper with json_decref as the free function or an actual functional wrapper like the janssoncpp headers provide
  • add the same for flux-core with __attribute__((cleanup)) so we can be safer about all of this

This is a priority bugfix, if I can get it together today I will, and we should try to get it deployed ASAP.

@trws trws self-assigned this Jul 15, 2024
@milroy
Copy link
Member

milroy commented Jul 15, 2024

I don't think the sched-fluxion-resource.status RPC is used anymore, since I believe all the functionality is handled by core now.

The caches in the ctx (m_r_{all,down,alloc}) are never freed at all.

The member data is updated in status_request_cb whenever the values change or a configured amount of time has elapsed, but yeah, they aren't freed when Fluxion is stopped.

@trws
Copy link
Member Author

trws commented Jul 15, 2024

Mark mentioned the same thing, and I agree it shouldn't be (at least not often) but it shows up in my recent heaptrack leak report. Will check again, try to get more detail here.

@garlick
Copy link
Member

garlick commented Jul 15, 2024

It's not used by default anymore, but we can check the scheduler's view of resources by running e.g.

FLUX_RESOURCE_LIST_RPC=sched.resource-status flux resource list

which can be handy. I'd say don't get rid of it :-)

@trws
Copy link
Member Author

trws commented Jul 15, 2024

Yup, not planning to get rid of it, do think we need to fix the leak though, and rather wondering what's poking it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants