Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Slowness and/or broken metrics visualization when Lineage metrics is large #32649

Closed
1 of 17 tasks
Abacn opened this issue Oct 4, 2024 · 1 comment · Fixed by #32650
Closed
1 of 17 tasks

[Bug]: Slowness and/or broken metrics visualization when Lineage metrics is large #32649

Abacn opened this issue Oct 4, 2024 · 1 comment · Fixed by #32650
Assignees

Comments

@Abacn
Copy link
Contributor

Abacn commented Oct 4, 2024

What happened?

Beam Java 2.59.0 introduced Lineage metrics support for file-based IO (FileIO, TextIO, etc).

  • When a pipeline read from lots of files (e.g. using a file pattern and match lots of file), one observes Dataflow UI metrics based components are broken. For example, live throughput no longer shown, progress bar stale, user counters increment incompletely.

This is due to some internal limit of total job status response size of Dataflow runner (grpc limit ~20 MB). When the size is exceeded such limit, all metrics update (counter, stringset, etc) gets dropped

  • Writes to lots of files (e.g. set a large shard number), one observe the following slowness:
Operation ongoing in step Write content to files/WriteFiles/FinalizeTempFileBundles/Finalize for at least 15m00s without outputting or completing in state process in thread pool-3-thread-2 with id 27
  at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$RegularSetBuilderImpl.insertInHashTable(ImmutableSet.java:780)
  at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$RegularSetBuilderImpl.add(ImmutableSet.java:763)
  at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$Builder.add(ImmutableSet.java:527)
  at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$Builder.add(ImmutableSet.java:478)
  at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:475)
  at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:549)
  at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.metrics.StringSetData.combine(StringSetData.java:58)
  at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.metrics.StringSetCell.update(StringSetCell.java:62)
  at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.metrics.StringSetCell.add(StringSetCell.java:104)
  at org.apache.beam.sdk.metrics.Metrics$DelegatingStringSet.add(Metrics.java:179)
  at org.apache.beam.sdk.metrics.Lineage.add(Lineage.java:133)

This was because the stringset metrics is added in the finalize write step (after moving temp file to final destination), done on single worker. Unfortunately current implementation of stringSetData.addAll is of O(N^2) complexity -- each time it copies to a new ImmutableSet, and done this for N elements.

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Abacn
Copy link
Contributor Author

Abacn commented Oct 8, 2024

reopen to track cherry pick PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant