Kafka Connect. Significant delay in connector availability after restarting #1509

berezinsn · 2024-11-02T11:21:05Z

Description

After restarting the Kafka Connect application, Mongo Sink connectors become available (appear in the Kafka Connect UI) after a prolonged period, ranging from 5 to 15 minutes. Connectors become available simultaneously, and there are no subsequent issues with their operation.

Environment Details

Number of Connectors: 200
Deployment: Single-instance connector on Kubernetes
Pod Resource Limits:
- Memory Limits: 8Gi
- Memory Requests: 4Gi
- Kafka Heap Options: -Xmx4096m -Xms2048m

Kafka Distributed Properties

key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.flush.interval.ms=10000
max.poll.records=10
plugin.path=/opt/bitnami/kafka/plugins
cleanup.policy=compact
group.id=kafka-connect-group
config.storage.topic=kafka-connect-configs
config.storage.replication.factor=1
offset.storage.topic=kafka-connect-offsets
offset.storage.replication.factor=1
status.storage.topic=kafka-connect-status
status.storage.replication.factor=1
producer.max.request.size=104857600
errors.retry.timeout=300000

Observations

Based on monitoring data, there are no network delays; CPU and Memory resources appear to be sufficient, indicating the application is not resource-bound.
During the researching on this problem the second test environment was set up. Restart on this stand occurs almost instantly and connectors become available rather fast (in seconds).
Configuration settings between environments are identical. The only diff is the size of the kafka-connect-offsets system topic, which is automatically created by Kafka Connect. The topic has 25 partitions by default. Newly created stand has the topic’s size in MBs, but original stand has kafka-connect-offset topic’s size from 0.8Gb to 2.4Gb (depends on compaction time)
Please check the screen with 30 days statistics on the kafka-connect-offset topic's size

Kafka Connect Offsets Topic Settings

compression.type gzip
leader.replication.throttled.replicas
min.insync.replicas 1
message.downconversion.enable true
segment.jitter.ms 0
cleanup.policy compact
flush.ms 1000
follower.replication.throttled.replicas
segment.bytes 1073741824
retention.ms 604800000
flush.messages 10000
message.format.version 2.7-IV2
max.compaction.lag.ms 9223372036854775807
file.delete.delay.ms 60000
max.message.bytes 1000012
min.compaction.lag.ms 0
message.timestamp.type CreateTime
preallocate false
index.interval.bytes 4096
min.cleanable.dirty.ratio 0.5
unclean.leader.election.enable false
retention.bytes 1073741824
delete.retention.ms 86400000
segment.ms 604800000
message.timestamp.difference.max.ms 9223372036854775807

Compaction Observations

Despite compaction occurring periodically, the first offset remains at 0, and the total number of messages continues to grow despite the size of the topic decreasing. There are numerous duplicates, especially with heartbeat messages.

Typical Heartbeat Message Structure

Key:

[
    "mongo-source",
    {
        "ns": "mongo-source"
    }
]

Value:

{
    "_id": "{\"_data\": \"\"}",
    "HEARTBEAT": "true"
}

Every 10 seconds, each of the 200 connectors sends this message, resulting in 1200 messages per minute.

The topic continues to grow, leading me to suspect there may be an issue with compaction, as the size is decreasing while the total message count remains unchanged. Also, IDK. Is it normal, that the first offset is still equal 0?

Resume

What strategies can we employ to reduce the restart time of the Kafka Connect application?
We suspect that the delays are due to Kafka Connect handling a high volume of messages from the topic. Increasing resources hasn’t noticeably improved the restart speed.
Has anyone else faced similar challenges or have any optimization tips?

Any assistance would be greatly appreciated

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka Connect. Significant delay in connector availability after restarting #1509

Kafka Connect. Significant delay in connector availability after restarting #1509

berezinsn commented Nov 2, 2024

Kafka Connect. Significant delay in connector availability after restarting #1509

Kafka Connect. Significant delay in connector availability after restarting #1509

Comments

berezinsn commented Nov 2, 2024

Description

Environment Details

Kafka Distributed Properties

Observations

Kafka Connect Offsets Topic Settings

Compaction Observations

Typical Heartbeat Message Structure

Resume