[KafkaIO] Decouple consumer threads from harness threads #32986

sjvanrossum · 2024-10-31T17:48:50Z

Allows a Kafka consumer to be used for bundled assignments. By decoupling consumers from splits and running them in separate threads the harness thread should be blocked less by the creation of network connections or outstanding polls. The consumer thread may prefetch a batch of records while the harness thread is processing the current record batch. Multiplexing assigned TopicPartitions onto a single consumer may improve utilization of the network connection. A follow up PR may introduce consumer pools for cases where a single consumer would become a bottleneck.

These changes to KafkaIO's SDF should meet or exceed the throughput performance of the unbounded source implementation. Attached are the throughput and backlog bytes graphs, shown on the left is KafkaIO's unbounded source on Dataflow (legacy) and shown on the right is KafkaIO's SDF with these changes on Dataflow (portability). The input is produced in GCP by a n2d-standard-16 machine at a rate of ~110 MiB/s to a topic with 500 partitions in a cluster hosted on Google Cloud Managed Service for Apache Kafka. The pipelines use a pool of up to 8 n2d-standard-2 machines each and the pipeline on the left was intentionally configured to not quite catch up with its backlog. It's possible that applying these changes to the unbounded source implementation may result in a slight uplift to throughput performance there as well.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

scwhittle · 2024-12-02T18:46:07Z

I'm interested in reviewing since I was looking at performance of this earlier. Let me know if you'd like me to take a pass in the current state or wait. Thanks! looking forward to these improvements.

sjvanrossum · 2024-12-13T11:47:06Z

Feel free to take a peek @scwhittle! 😃
There's also a separate branch with these changes applied on top of 2.61.0 at https://github.com/sjvanrossum/beam/tree/kafkaio-decoupled-consumer-sdf-2.61.0 if you want to build a pipeline.

I had some inspiration on Sunday and revised the approach to thread-safe sharing of a Kafka consumer after hitting that 1 GiB/s bottleneck. The new ConcurrentConsumer wrapper uses a Phaser to synchronize access to the Consumer. Registration at the Phaser happens when processElement is about to enter its poll loop, arrival at the phaser happens at every poll and deregistration happens at every return point or thrown exception. Advancement of the Phaser issues a poll on the consumer executed by the thread that last arrived or deregistered.

Kafka's info logs are verbose enough to make out that Kafka consumers on various workers are frequently assigned more than one partition. The image below is from a throughput test using 256-512 cores (n2d-standard-2) to read from 500 partitions that are filled by a producer at ~10 GiB/s and the time mark highlights the moment at which the pipeline had scaled up from 256 to 512 cores. I seem to be hitting a cap at 2.5 GiB/s at the moment when I run this test with a shuffle step as a simple IO sink. This same workload using the unmodified source implementation on Dataflow's Runner V1 reports 2.5-3 GiB/s after scaling up. Without an IO sink both of them are able to process 10 GiB/s with comparable behavior throughout the pipeline's lifetime.

The Phaser does become a bottleneck, as would be expected, when this same workload is packed onto larger machines and that stacks on top of reduced consumer throughput. I've got a few more ideas to reduce the number of forced sync points and to dynamically size the number of consumers per worker, but I'm also open for suggestions if you have any. 👍

scwhittle

Just some initial comments, didn't get through everything

scwhittle · 2024-12-02T18:36:33Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java

    private static final Map<Long, LoadingCache<KafkaSourceDescriptor, AverageRecordSize>>
        AVG_RECORD_SIZE_CACHE = new ConcurrentHashMap<>();
+    private static final Map<
+            Long, LoadingCache<Optional<ImmutableSet<String>>, ConsumerExecutionContext>>


comment on keys of map and loading cache

scwhittle · 2024-12-02T18:37:29Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java

+  }
+
+  static final class TopicPartitionPollState implements AutoCloseable {
+    private static final List<ConsumerRecord<byte[], byte[]>> CLOSED_SENTINEL = Arrays.asList();


nit: maybe ImmutableList.of() would be clearer this isn't to be modified

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java

scwhittle · 2024-12-16T13:59:57Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConcurrentConsumer.java

+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+final class ConcurrentConsumer<K, V> implements AutoCloseable {


An overview comment would be helpful

Agreed, I've been putting off the documentation since I had a few variations of this change running in tandem.

scwhittle · 2024-12-16T14:14:00Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConcurrentConsumer.java

+    final Supplier<Metric> metric =
+        this.partitionRecordsLag.getOrDefault(topicPartition, this.recordsLagMax);
+    try {
+      return ((Number) metric.get().metricValue()).longValue();


handle null explicitly without exception?

scwhittle · 2024-12-16T14:15:35Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConcurrentConsumer.java

+  }
+
+  @Nullable
+  OffsetAndTimestamp initialOffsetForTime(final TopicPartition topicPartition, final long time) {


what is units for time? can Instant be used here and changed to long internally?

It's in milliseconds, the types requested by the ConcurrentConsumer wrapper match the client library where possible: https://kafka.apache.org/38/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#offsetsForTimes(java.util.Map)

scwhittle · 2024-12-16T14:17:36Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConcurrentConsumer.java

+    this.partitionRecordsLag.computeIfAbsent(
+        topicPartition,
+        k ->
+            Suppliers.memoize(


similar concern here, what if metrics aren't updated or it is missing on original call?

scwhittle · 2024-12-16T14:26:44Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java

+    private final KafkaSourceDescriptor sourceDescriptor;
+    private final LoadingCache<Optional<ImmutableSet<String>>, ConcurrentConsumer<byte[], byte[]>>
+        consumerExecutionContextCache;
+    private @MonotonicNonNull ConcurrentConsumer<byte[], byte[]> consumerExecutionContextInstance;


This doesn't appear to be used to avoid lookups in the cache. Should it be removed or should the logic be changed?

The context cache stores values as weak references so it's mainly there to retain a strong reference to a healthy context. The intent is to make a context eligible for eviction from this cache when the last remaining bundle processor referring to this ReadFromKafkaDoFn instance has been evicted and eventually collected to minimize unnecessary evictions from the context cache that would result from a time or size based eviction policy.

The use of this field is a mess though so I'll clean that up. 👍

scwhittle · 2024-12-16T14:28:41Z

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java

-    public long estimate() {
-      return memoizedBacklog.get();
+      final long position =
+          this.consumerExecutionContextInstance.position(this.sourceDescriptor.getTopicPartition());


use consumerExecutionContext instead of this.consumerExecutionContext

github-actions bot added java io extensions sql kafka labels Oct 31, 2024

sjvanrossum force-pushed the kafkaio-decoupled-consumer-sdf branch from 1c56d7c to 11fde0e Compare November 20, 2024 12:26

github-actions bot removed extensions sql labels Nov 20, 2024

sjvanrossum mentioned this pull request Nov 22, 2024

KafkaIO SDF: Fetch end position for each topic-partition tuple in a background thread, reusing kafka consumers. #32558

Closed

3 tasks

sjvanrossum force-pushed the kafkaio-decoupled-consumer-sdf branch from 11fde0e to 0fd21b5 Compare November 28, 2024 15:32

scwhittle self-assigned this Dec 2, 2024

sjvanrossum force-pushed the kafkaio-decoupled-consumer-sdf branch from 0fd21b5 to 27f473a Compare December 12, 2024 18:18

Decouple consumer threads from harness threads

69853c2

sjvanrossum force-pushed the kafkaio-decoupled-consumer-sdf branch from 27f473a to 69853c2 Compare December 13, 2024 11:54

scwhittle requested changes Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KafkaIO] Decouple consumer threads from harness threads #32986

[KafkaIO] Decouple consumer threads from harness threads #32986

sjvanrossum commented Oct 31, 2024 •

edited

Loading

scwhittle commented Dec 2, 2024

sjvanrossum commented Dec 13, 2024

scwhittle left a comment

scwhittle Dec 2, 2024

scwhittle Dec 2, 2024

scwhittle Dec 16, 2024

sjvanrossum Dec 18, 2024

scwhittle Dec 16, 2024

sjvanrossum Dec 18, 2024

scwhittle Dec 16, 2024

sjvanrossum Dec 18, 2024

scwhittle Dec 16, 2024

scwhittle Dec 16, 2024

sjvanrossum Dec 18, 2024

scwhittle Dec 16, 2024

[KafkaIO] Decouple consumer threads from harness threads #32986

Are you sure you want to change the base?

[KafkaIO] Decouple consumer threads from harness threads #32986

Conversation

sjvanrossum commented Oct 31, 2024 • edited Loading

GitHub Actions Tests Status (on master branch)

scwhittle commented Dec 2, 2024

sjvanrossum commented Dec 13, 2024

scwhittle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjvanrossum commented Oct 31, 2024 •

edited

Loading