Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix memory leak caused by ThreadLocal in PipelineOptionFactory #29952

Merged
merged 1 commit into from
Jan 9, 2024

Conversation

jrthe42
Copy link
Contributor

@jrthe42 jrthe42 commented Jan 8, 2024

This pr aims to fix a classloader leak caused by ThreadLocal in PipelineOptionFactory. The threadlocal for DESERIALIZATION_CONTEXT cannot be removed and will eventually cause meatspace oom in Flink Jobmanager, which has been reported by several users #29890, #25510.

We can remove the ThreadLocal and use DefaultDeserializationContext.copy instead.

@github-actions github-actions bot added the java label Jan 8, 2024
Copy link
Contributor

github-actions bot commented Jan 8, 2024

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@Abacn
Copy link
Contributor

Abacn commented Jan 8, 2024

Thanks for fixing this long standing issue. CC: @jinyangzhen @ashishdeole would you mind taking a look?

@Abacn
Copy link
Contributor

Abacn commented Jan 8, 2024

In #16680 there were DESERIALIZATION_CONTEXT.get() call in 4 places, here it changed to .copy() in two places, is this suffice? Have you been able to test the memory leak is fixed with this PR?

@jrthe42
Copy link
Contributor Author

jrthe42 commented Jan 9, 2024

@Abacn I've tested this with my local environment, and yes, no memory leak on Flink's ChildFirstClassloader using this fix. As for your first question, DESERIALIZATION_CONTEXT is currently only used in 2 places in PipelineOptionsFactory.

It also works fine with multithreaded contexts.

@jinyangzhen
Copy link

jinyangzhen commented Jan 9, 2024

@Abacn - confirm this PR resolve this issue.
If this PR is merged, how long do we expect to see a new version in maven repo?

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PipelineOptionFactory is used in pipeline construction time and is not performance sensitive. .copy() should be fine and it avoids racing condition

@Abacn Abacn merged commit 97ca443 into apache:master Jan 9, 2024
26 checks passed
@Abacn
Copy link
Contributor

Abacn commented Jan 9, 2024

@Abacn - confirm this PR resolve this issue. If this PR is merged, how long do we expect to see a new version in maven repo?

It will ship with Beam 2.54.0 scheduled in late Feburary (over 6 week interval).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants