Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per DoFn latency instrumentation #29592

Merged
merged 31 commits into from
Dec 13, 2023
Merged

Per DoFn latency instrumentation #29592

merged 31 commits into from
Dec 13, 2023

Conversation

clmccart
Copy link
Contributor

@clmccart clmccart commented Dec 1, 2023

Tracks the amount of time dofn processing takes for individual messages run on the StreamingDataflowWorker.

Changes:

  • modifies the DataflowExecutionStateTracker to record processing times per message per DoFn.
    • When the tracker transitions state, the time spent in state is recorded in an in-memory map (keyed by DoFn name)
    • Trackers are now identified by shardingkey-worktoken.
  • adds a DataflowExecutionStateSampler (extension of existing ExecutionStateSampler) that is responsible for managing ExecutionStateTrackers over the lifetime of the StreamingDataflowWorker.
    • When trackers get removed (ie, processing has completed for the thread / work item it was responsible for), synthesized summary statistics from the tracker are held onto inside of the sampler until the work item has been committed (at which point the in-memory map is cleared for that work item).
  • adds a message to the windmill API proto for sending dofn latency info back to windmill
    • On the heartbeats: sends the DoFn name that the currently active message is in and how long it has been there for
    • On commits: sends the mean, min, max, count, and sum of individual message processing times for each DoFn

Load testing has been performed to ensure these changes do not introduce significant changes in user worker memory usage or windmill API bytes transferred.

addresses #29602

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@clmccart clmccart marked this pull request as ready for review December 4, 2023 21:55
@clmccart
Copy link
Contributor Author

clmccart commented Dec 4, 2023

R: @m-trieu

Copy link
Contributor

github-actions bot commented Dec 4, 2023

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Contributor

@m-trieu m-trieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay!

@clmccart
Copy link
Contributor Author

R: @m-trieu

Copy link
Contributor

@m-trieu m-trieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@clmccart
Copy link
Contributor Author

R: @Abacn

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a minor comment

@Abacn Abacn merged commit 951b3b1 into apache:master Dec 13, 2023
22 checks passed
@Abacn
Copy link
Contributor

Abacn commented Jan 26, 2024

The newly added test testDoFnActiveMessageMetadataReportedOnHeartbeat is flaky: https://github.com/apache/beam/runs/20887446311

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants