Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(connector): optionally limit kafka input throughput bytes #7058

Merged
merged 8 commits into from
Dec 26, 2022
Merged

Conversation

lmatz
Copy link
Contributor

@lmatz lmatz commented Dec 26, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

Add a configurable parameter to limit the source throughput in terms of bytes.

This parameter may or may not(so not go into doc and release notes this time) be intended to be used by public users at the moment.

It is convenient for people to test resource utilization/performance for a fixed input throughput.(https://redpanda.com/blog/redpanda-vs-kafka-performance-benchmark#:~:text=3.2%20under%20workloads%20(-,up%20to%201GB/sec,-)%20that%20are%20common)
Otherwise, people have to always generate events by some source generators on the fly to Kafka/Redpanda at exactly the throughput you want, which may not be achievable and depends on many other factors.
Instead, we can pre-generate a lot and then limit the source throughput stably.

Also, it may be a workaround solution for #5214.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

@codecov
Copy link

codecov bot commented Dec 26, 2022

Codecov Report

Merging #7058 (c765b29) into main (569e4ff) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #7058      +/-   ##
==========================================
- Coverage   73.33%   73.32%   -0.01%     
==========================================
  Files        1045     1045              
  Lines      167653   167659       +6     
==========================================
- Hits       122940   122929      -11     
- Misses      44713    44730      +17     
Flag Coverage Δ
rust 73.32% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/connector/src/source/kafka/mod.rs 0.00% <ø> (ø)
src/connector/src/source/kafka/source/reader.rs 0.00% <0.00%> (ø)
src/batch/src/executor/group_top_n.rs 68.42% <0.00%> (-6.44%) ⬇️
src/common/src/cache.rs 97.31% <0.00%> (-0.23%) ⬇️
src/object_store/src/object/mod.rs 51.29% <0.00%> (-0.22%) ⬇️
src/meta/src/hummock/manager/mod.rs 79.43% <0.00%> (-0.06%) ⬇️
src/meta/src/hummock/mock_hummock_meta_client.rs 65.26% <0.00%> (ø)
src/stream/src/executor/aggregation/minput.rs 96.49% <0.00%> (+0.10%) ⬆️
src/storage/src/hummock/compactor/mod.rs 83.41% <0.00%> (+0.15%) ⬆️
src/meta/src/manager/cluster.rs 77.11% <0.00%> (+0.24%) ⬆️
... and 1 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lmatz
Copy link
Contributor Author

lmatz commented Dec 26, 2022

With limiting source throughput:

create source kafkas(v1 bigint) with (
  connector = 'kafka',
  kafka.brokers = '127.0.0.1:9092',
  kafka.topic = 'sink_target',
  scan.startup.mode = 'earliest',
  kafka.bytes.per.second = '20000'
) row format json;
parallelism: 4

SCR-20221226-oux

Without:

create source kafkas(v1 bigint) with (
  connector = 'kafka',
  kafka.brokers = '127.0.0.1:9092',
  kafka.topic = 'sink_target',
--  kafka.bytes.per.second = '20000'
  scan.startup.mode = 'earliest'
) row format json;

SCR-20221226-p6v

@lmatz lmatz requested a review from tabVersion December 26, 2022 10:09
@lmatz lmatz changed the title feat(connector): limit kafka input throughput bytes feat(connector): optionally limit kafka input throughput bytes Dec 26, 2022
Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

src/connector/src/source/kafka/mod.rs Show resolved Hide resolved
@tabVersion
Copy link
Contributor

It can work as a temp fix for fine-tuning the performance but not a user-facing option.

@mergify mergify bot merged commit bcbe772 into main Dec 26, 2022
@mergify mergify bot deleted the lz/kafka branch December 26, 2022 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants